Transformer-based multilabel NER using Wikipedia corpora in multiple languages

  • The high cost of manual data labeling and privacy concerns result in a considerable dearth of medical annotations in non-English texts. Recent work by Frank and Kramer [1] introduces an unsupervised approach for constructing an ontology-annotated corpora from Wikipedia (https://www.wikidata.org) for German medical NER. We evaluate the proposed approach across English, German, Spanish, and French for medication and diagnosis entity recognition. Our multilabel corpora yield notable improvements in German medication detection under sparse annotations compared to the baseline, with consistent performance across other languages.

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Yelyzaveta Ahapova, Johann FreiORCiD, Frank KramerORCiD
URN:urn:nbn:de:bvb:384-opus4-1223888
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/122388
ISBN:9781643685960OPAC
ISSN:0926-9630OPAC
ISSN:1879-8365OPAC
Parent Title (English):Intelligent health systems – from technology to data and knowledge: proceedings of MIE 2025
Publisher:IOS Press
Place of publication:Amsterdam
Editor:Elisavet Andrikopoulou, Parisis Gallos, Theodoros N. Arvanitis, Rosalynn Austin, Arriel Benis, Ronald Cornet, Panagiotis Chatzistergos, Alexander Dejaco, Linda Dusseljee-Peute, Alaa Mohasseb, Pantelis Natsiavas, Haythem Nakkas, Philip Scott
Type:Conference Proceeding
Language:English
Year of first Publication:2025
Publishing Institution:Universität Augsburg
Release Date:2025/05/30
First Page:878
Last Page:879
Series:Studies in Health Technology and Informatics ; 327
DOI:https://doi.org/10.3233/shti250488
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für IT-Infrastrukturen für die Translationale Medizinische Forschung
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):CC-BY-NC 4.0: Creative Commons: Namensnennung - Nicht kommerziell (mit Print on Demand)