Cross- & multi-lingual medication detection: a transformer-based analysis

Raithel, Lisa; Frei, Johann; Thomas, Philippe; Roller, Roland; Zweigenbaum, Pierre; Möller, Sebastian; Kramer, Frank

doi:10.1186/s12911-025-03179-1

Cross- & multi-lingual medication detection: a transformer-based analysis

Lisa Raithel, Johann Frei, Philippe Thomas, Roland Roller, Pierre Zweigenbaum, Sebastian Möller, Frank Kramer

Extracting specific information, such as medication mentions, from large unstructured medical texts can be challenging, especially when no annotated corpus exists in the target language for training. To overcome this, leveraging existing machine learning models and datasets is essential, and since most pre-trained resources are in English, adopting multilingual approaches can help transferring between languages. In this work, we investigate the usage of a multi-lingual transformer model in a multi-lingual and cross-lingual setting to extract drug names from medical texts using named entity recognition in four European languages: German, English, French, and Spanish. We report the scores obtained by cross-lingual transfer with several published datasets after fine-tuning a multi-lingual model, aiming to create empirical evidence on how the transfer of "medical" knowledge between languages can be expected to benefit various language pairs. We further perform a qualitative error analysisExtracting specific information, such as medication mentions, from large unstructured medical texts can be challenging, especially when no annotated corpus exists in the target language for training. To overcome this, leveraging existing machine learning models and datasets is essential, and since most pre-trained resources are in English, adopting multilingual approaches can help transferring between languages. In this work, we investigate the usage of a multi-lingual transformer model in a multi-lingual and cross-lingual setting to extract drug names from medical texts using named entity recognition in four European languages: German, English, French, and Spanish. We report the scores obtained by cross-lingual transfer with several published datasets after fine-tuning a multi-lingual model, aiming to create empirical evidence on how the transfer of "medical" knowledge between languages can be expected to benefit various language pairs. We further perform a qualitative error analysis and find that the performance on all languages achieves competitive levels. Conversely, erroneous prediction artifacts are introduced by annotation inconsistencies, differences in annotation guidelines and vague entity labels in general.…

Metadaten
Author:	Lisa Raithel, Johann Frei ORCiD, Philippe Thomas, Roland Roller, Pierre Zweigenbaum, Sebastian Möller, Frank Kramer ORCiD GND
URN:	urn:nbn:de:bvb:384-opus4-1258055
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/125805
ISSN:	1472-6947OPAC
Parent Title (English):	BMC Medical Informatics and Decision Making
Publisher:	Springer Science and Business Media LLC
Place of publication:	Berlin
Type:	Article
Language:	English
Year of first Publication:	2025
Publishing Institution:	Universität Augsburg
Release Date:	2025/10/13
Volume:	25
Issue:	1
First Page:	359
DOI:	https://doi.org/10.1186/s12911-025-03179-1
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für IT-Infrastrukturen für die Translationale Medizinische Forschung
Dewey Decimal Classification:	6 Technik, Medizin, angewandte Wissenschaften / 61 Medizin und Gesundheit / 610 Medizin und Gesundheit
Licence (German):	CC-BY 4.0: Creative Commons: Namensnennung

Open Access

Cross- & multi-lingual medication detection: a transformer-based analysis

Download full text files

Export metadata

Statistics

Additional Services