Transformer-Based Multilabel NER Using Wikipedia Corpora in Multiple Languages
wikidata 2025-05-19
Summary:
The high cost of manual data labeling and privacy concerns result in a considerable dearth of medical annotations in non-English texts. Recent work by Frank and Kramer [1] introduces an unsupervised approach for constructing an ontology-annotated corpora from Wikipedia (https://www.wikidata.org) for German medical NER. We evaluate the proposed approach across English, German, Spanish, and French for medication and diagnosis entity recognition. Our multilabel corpora yield notable improvements in...