Transformer-based multilabel NER using Wikipedia corpora in multiple languages
- The high cost of manual data labeling and privacy concerns result in a considerable dearth of medical annotations in non-English texts. Recent work by Frank and Kramer [1] introduces an unsupervised approach for constructing an ontology-annotated corpora from Wikipedia (https://www.wikidata.org) for German medical NER. We evaluate the proposed approach across English, German, Spanish, and French for medication and diagnosis entity recognition. Our multilabel corpora yield notable improvements in German medication detection under sparse annotations compared to the baseline, with consistent performance across other languages.
Author: | Yelyzaveta Ahapova, Johann FreiORCiD, Frank KramerORCiD |
---|---|
URN: | urn:nbn:de:bvb:384-opus4-1223888 |
Frontdoor URL | https://opus.bibliothek.uni-augsburg.de/opus4/122388 |
ISBN: | 9781643685960OPAC |
ISSN: | 0926-9630OPAC |
ISSN: | 1879-8365OPAC |
Parent Title (English): | Intelligent health systems – from technology to data and knowledge: proceedings of MIE 2025 |
Publisher: | IOS Press |
Place of publication: | Amsterdam |
Editor: | Elisavet Andrikopoulou, Parisis Gallos, Theodoros N. Arvanitis, Rosalynn Austin, Arriel Benis, Ronald Cornet, Panagiotis Chatzistergos, Alexander Dejaco, Linda Dusseljee-Peute, Alaa Mohasseb, Pantelis Natsiavas, Haythem Nakkas, Philip Scott |
Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2025 |
Publishing Institution: | Universität Augsburg |
Release Date: | 2025/05/30 |
First Page: | 878 |
Last Page: | 879 |
Series: | Studies in Health Technology and Informatics ; 327 |
DOI: | https://doi.org/10.3233/shti250488 |
Institutes: | Fakultät für Angewandte Informatik |
Fakultät für Angewandte Informatik / Institut für Informatik | |
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für IT-Infrastrukturen für die Translationale Medizinische Forschung | |
Dewey Decimal Classification: | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik |
Licence (German): | ![]() |