Combining ontologies and neural networks for analyzing historical language varieties: a case study in Middle Low German

  • In this paper, we describe experiments on the morphosyntactic annotation of historical language varieties for the example of Middle Low German (MLG), the official language of the German Hanse during the Middle Ages and a dominant language around the Baltic Sea by the time. To our best knowledge, this is the first experiment in automatically producing morphosyntactic annotations for Middle Low German, and accordingly, no part-of-speech (POS) tagset is currently agreed upon. In our experiment, we illustrate how ontology-based specifications of projected annotations can be employed to circumvent this issue: Instead of training and evaluating against a given tagset, we decomponse it into independent features which are predicted independently by a neural network. Using consistency constraints (axioms) from an ontology, then, the predicted feature probabilities are decoded into a sound ontological representation. Using these representations, we can finally bootstrap a POS tagset capturingIn this paper, we describe experiments on the morphosyntactic annotation of historical language varieties for the example of Middle Low German (MLG), the official language of the German Hanse during the Middle Ages and a dominant language around the Baltic Sea by the time. To our best knowledge, this is the first experiment in automatically producing morphosyntactic annotations for Middle Low German, and accordingly, no part-of-speech (POS) tagset is currently agreed upon. In our experiment, we illustrate how ontology-based specifications of projected annotations can be employed to circumvent this issue: Instead of training and evaluating against a given tagset, we decomponse it into independent features which are predicted independently by a neural network. Using consistency constraints (axioms) from an ontology, then, the predicted feature probabilities are decoded into a sound ontological representation. Using these representations, we can finally bootstrap a POS tagset capturing only morphosyntactic features which could be reliably predicted. In this way, our approach is capable to optimize precision and recall of morphosyntactic annotations simultaneously with bootstrapping a tagset rather than performing iterative cycles.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Maria Sukhareva, Christian ChiarcosORCiDGND
URN:urn:nbn:de:bvb:384-opus4-1041369
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/104136
URL:https://aclanthology.org/L16-1234
Parent Title (English):Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), May 23-28, 2016, Protorož, Slovenia
Publisher:European Language Resources Association
Place of publication:Paris
Editor:Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Type:Conference Proceeding
Language:English
Year of first Publication:2016
Publishing Institution:Universität Augsburg
Release Date:2023/05/15
First Page:1471
Last Page:1480
Institutes:Philologisch-Historische Fakultät
Philologisch-Historische Fakultät / Angewandte Computerlinguistik
Philologisch-Historische Fakultät / Angewandte Computerlinguistik / Lehrstuhl für Angewandte Computerlinguistik (ACoLi)
Dewey Decimal Classification:4 Sprache / 40 Sprache / 400 Sprache
Licence (German):CC-BY-NC 4.0: Creative Commons: Namensnennung - Nicht kommerziell (mit Print on Demand)