Unifying morphology resources with OntoLex-Morph: a case study in German
- The OntoLex vocabulary has become a widely used community standard for machine-readable lexical resources on the web. The primary motivation to use OntoLex in favor of tool- or application-specific formalisms is to facilitate interoperability and information integration across different resources. One of its extension that is currently being developed is a module for representing morphology, OntoLex-Morph. In this paper, we show how OntoLex-Morph can be used for the encoding and integration of different types of morphological resources on a unified basis. With German as the example, we demonstrate it for (a) a full-form dictionary with inflection information (Unimorph), (b) a dictionary of base forms and their derivations (UDer), (c) a dictionary of compounds (from GermaNet), and (d) lexicon and inflection rules of a finite-state parser/generator (SMOR/Morphisto). These data are converted to OntoLex-Morph, their linguistic information is consolidated and corresponding lexical entriesThe OntoLex vocabulary has become a widely used community standard for machine-readable lexical resources on the web. The primary motivation to use OntoLex in favor of tool- or application-specific formalisms is to facilitate interoperability and information integration across different resources. One of its extension that is currently being developed is a module for representing morphology, OntoLex-Morph. In this paper, we show how OntoLex-Morph can be used for the encoding and integration of different types of morphological resources on a unified basis. With German as the example, we demonstrate it for (a) a full-form dictionary with inflection information (Unimorph), (b) a dictionary of base forms and their derivations (UDer), (c) a dictionary of compounds (from GermaNet), and (d) lexicon and inflection rules of a finite-state parser/generator (SMOR/Morphisto). These data are converted to OntoLex-Morph, their linguistic information is consolidated and corresponding lexical entries are linked with each other. The main contribution of this paper is the discussion of the current state of OntoLex-Morph and its validation on different types of real-world resources for a single language. In the longer term, the successful application of OntoLex-Morph to such diverse data, along with the adjustments to the vocabulary observed in the process, will be a means to establish interoperability among morphological resources as well as between them and classical lexical data such as dictionaries, WordNets, or thesauri.…
Author: | Christian ChiarcosORCiDGND, Christian Fäth, Maxim Ionov |
---|---|
URN: | urn:nbn:de:bvb:384-opus4-1038646 |
Frontdoor URL | https://opus.bibliothek.uni-augsburg.de/opus4/103864 |
URL: | https://aclanthology.org/volumes/2022.lrec-1/ |
ISBN: | 979-10-95546-72-6OPAC |
Parent Title (English): | Proceedings of the 13th Language Resources and Evaluation Conference, 20-25 June 2022, Marseille, France |
Publisher: | European Language Resources Association |
Place of publication: | Paris |
Editor: | Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis |
Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2022 |
Publishing Institution: | Universität Augsburg |
Release Date: | 2023/05/16 |
First Page: | 4842 |
Last Page: | 4850 |
Institutes: | Philologisch-Historische Fakultät |
Philologisch-Historische Fakultät / Angewandte Computerlinguistik | |
Philologisch-Historische Fakultät / Angewandte Computerlinguistik / Lehrstuhl für Angewandte Computerlinguistik (ACoLi) | |
Dewey Decimal Classification: | 4 Sprache / 40 Sprache / 400 Sprache |
Licence (German): | CC-BY-NC 4.0: Creative Commons: Namensnennung - Nicht kommerziell (mit Print on Demand) |