Unifying morphology resources with OntoLex-Morph: a case study in German
- The OntoLex vocabulary has become a widely used community standard for machine-readable lexical resources on the web. The primary motivation to use OntoLex in favor of tool- or application-specific formalisms is to facilitate interoperability and information integration across different resources. One of its extension that is currently being developed is a module for representing morphology, OntoLex-Morph. In this paper, we show how OntoLex-Morph can be used for the encoding and integration of different types of morphological resources on a unified basis. With German as the example, we demonstrate it for (a) a full-form dictionary with inflection information (Unimorph), (b) a dictionary of base forms and their derivations (UDer), (c) a dictionary of compounds (from GermaNet), and (d) lexicon and inflection rules of a finite-state parser/generator (SMOR/Morphisto). These data are converted to OntoLex-Morph, their linguistic information is consolidated and corresponding lexical entriesThe OntoLex vocabulary has become a widely used community standard for machine-readable lexical resources on the web. The primary motivation to use OntoLex in favor of tool- or application-specific formalisms is to facilitate interoperability and information integration across different resources. One of its extension that is currently being developed is a module for representing morphology, OntoLex-Morph. In this paper, we show how OntoLex-Morph can be used for the encoding and integration of different types of morphological resources on a unified basis. With German as the example, we demonstrate it for (a) a full-form dictionary with inflection information (Unimorph), (b) a dictionary of base forms and their derivations (UDer), (c) a dictionary of compounds (from GermaNet), and (d) lexicon and inflection rules of a finite-state parser/generator (SMOR/Morphisto). These data are converted to OntoLex-Morph, their linguistic information is consolidated and corresponding lexical entries are linked with each other. The main contribution of this paper is the discussion of the current state of OntoLex-Morph and its validation on different types of real-world resources for a single language. In the longer term, the successful application of OntoLex-Morph to such diverse data, along with the adjustments to the vocabulary observed in the process, will be a means to establish interoperability among morphological resources as well as between them and classical lexical data such as dictionaries, WordNets, or thesauri.…


| Author: | Christian ChiarcosORCiDGND, Christian Fäth, Maxim Ionov |
|---|---|
| URN: | urn:nbn:de:bvb:384-opus4-1038646 |
| Frontdoor URL | https://opus.bibliothek.uni-augsburg.de/opus4/103864 |
| URL: | https://aclanthology.org/volumes/2022.lrec-1/ |
| ISBN: | 979-10-95546-72-6OPAC |
| Parent Title (English): | Proceedings of the 13th Language Resources and Evaluation Conference, 20-25 June 2022, Marseille, France |
| Publisher: | European Language Resources Association |
| Place of publication: | Paris |
| Editor: | Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis |
| Type: | Conference Proceeding |
| Language: | English |
| Date of Publication (online): | 2023/04/20 |
| Year of first Publication: | 2022 |
| Publishing Institution: | Universität Augsburg |
| Release Date: | 2023/05/16 |
| First Page: | 4842 |
| Last Page: | 4850 |
| Institutes: | Philologisch-Historische Fakultät |
| Philologisch-Historische Fakultät / Angewandte Computerlinguistik | |
| Philologisch-Historische Fakultät / Angewandte Computerlinguistik / Lehrstuhl für Angewandte Computerlinguistik (ACoLi) | |
| Dewey Decimal Classification: | 4 Sprache / 40 Sprache / 400 Sprache |
| Licence (German): | CC-BY-NC 4.0: Creative Commons: Namensnennung - Nicht kommerziell |



