An ontology for CoNLL-RDF: formal data structures for TSV formats in language technology
- In language technology and language sciences, tab-separated values (TSV) represent a frequently used formalism to represent linguistically annotated natural language, often addressed as "CoNLL formats". A large number of such formats do exist, but although they share a number of common features, they are not interoperable, as different pieces of information are encoded differently in these dialects.
CoNLL-RDF refers to a programming library and the associated data model that has been introduced to facilitate processing and transforming such TSV formats in a serialization-independent way. CoNLL-RDF represents CoNLL data, by means of RDF graphs and SPARQL update operations, but so far, without machine-readable semantics, with annotation properties created dynamically on the basis of a user-defined mapping from columns to labels. Current applications of CoNLL-RDF include linking between corpora and dictionaries [Mambrini and Passarotti, 2019] and knowledge graphs [Tamper et al., 2018],In language technology and language sciences, tab-separated values (TSV) represent a frequently used formalism to represent linguistically annotated natural language, often addressed as "CoNLL formats". A large number of such formats do exist, but although they share a number of common features, they are not interoperable, as different pieces of information are encoded differently in these dialects.
CoNLL-RDF refers to a programming library and the associated data model that has been introduced to facilitate processing and transforming such TSV formats in a serialization-independent way. CoNLL-RDF represents CoNLL data, by means of RDF graphs and SPARQL update operations, but so far, without machine-readable semantics, with annotation properties created dynamically on the basis of a user-defined mapping from columns to labels. Current applications of CoNLL-RDF include linking between corpora and dictionaries [Mambrini and Passarotti, 2019] and knowledge graphs [Tamper et al., 2018], syntactic parsing of historical languages [Chiarcos et al., 2018; Chiarcos et al., 2018], the consolidation of syntactic and semantic annotations [Chiarcos and Fäth, 2019], a bridge between RDF corpora and a traditional corpus query language [Ionov et al., 2020], and language contact studies [Chiarcos et al., 2018].
We describe a novel extension of CoNLL-RDF, introducing a formal data model, formalized as an ontology. The ontology is a basis for linking RDF corpora with other Semantic Web resources, but more importantly, its application for transformation between different TSV formats is a major step for providing interoperability between CoNLL formats.…


| Author: | Christian ChiarcosORCiDGND, Maxim Ionov, Luis Glaser, Christian Fäth |
|---|---|
| URN: | urn:nbn:de:bvb:384-opus4-1040010 |
| Frontdoor URL | https://opus.bibliothek.uni-augsburg.de/opus4/104001 |
| URL: | https://drops.dagstuhl.de/opus/portals/oasics/index.php?semnr=16205 |
| ISBN: | 978-3-95977-199-3OPAC |
| ISSN: | 2190-6807OPAC |
| Parent Title (English): | 3rd Conference on Language, Data and Knowledge (LDK 2021), September 1–3, 2021, Zaragoza, Spain |
| Publisher: | Schloss Dagstuhl – Leibniz-Zentrum für Informatik |
| Place of publication: | Saarbrücken |
| Editor: | Dagmar Gromann, Gilles Sérasset, Thierry Declerck, John P. McCrae, Jorge Gracia, Julia Bosque-Gil, Fernando Bobillo, Barbara Heinisch |
| Type: | Conference Proceeding |
| Language: | English |
| Date of Publication (online): | 2023/04/24 |
| Year of first Publication: | 2021 |
| Publishing Institution: | Universität Augsburg |
| Release Date: | 2023/05/16 |
| First Page: | 20:1 |
| Last Page: | 20:14 |
| Series: | OASIcs ; 93 |
| Institutes: | Philologisch-Historische Fakultät |
| Philologisch-Historische Fakultät / Angewandte Computerlinguistik | |
| Philologisch-Historische Fakultät / Angewandte Computerlinguistik / Lehrstuhl für Angewandte Computerlinguistik (ACoLi) | |
| Dewey Decimal Classification: | 4 Sprache / 40 Sprache / 400 Sprache |
| Licence (German): | CC-BY 4.0: Creative Commons: Namensnennung |



