BioWikiNet: Multilingual wikipedia taxonomic networks aligned with the GBIF backbone taxonomy

wikidata 2026-01-14

Sci Data. 2026 Jan 6. doi: 10.1038/s41597-025-06524-1. Online ahead of print.

ABSTRACT

BioWikiNet is a multilingual dataset describing biodiversity representation across 11 Wikipedia language editions selected for their global reach and relevance to biodiversity-rich regions. The dataset includes 1,266,215 taxonomic articles linked to 751,843 unique taxa from the GBIF Backbone Taxonomy, derived from January 2025 Wikipedia dumps and mapped via Wikidata identifiers. Each record contains article-level metadata (pageviews, edits, editors, creation dates, content metrics) combined with GBIF taxonomic classifications. The dataset provides 6,955,289 taxonomic hyperlinks connecting articles within and across language editions, along with three network-based indices-Species Connectivity Index, Core Index, and Excess Focus Index-that quantify the structural characteristics of taxonomic linkages. BioWikiNet enables transparent, reproducible analyses of biodiversity representation and editorial coverage across linguistic communities, serving as an open resource for biodiversity informatics, conservation culturomics, and multilingual knowledge equity research.

PMID:41495078 | DOI:10.1038/s41597-025-06524-1