Modelling collocations in OntoLex-FrAC
- Following presentations of frequency and attestations, and embeddings and distributional similarity, this paper introduces the third cornerstone of the emerging OntoLex module for Frequency, Attestation and Corpus-based Information, OntoLex-FrAC. We provide an RDF vocabulary for collocations, established as a consensus over contributions from five different institutions and numerous data sets, with the goal of eliciting feedback from reviewers, workshop audience and the scientific community in preparation of the final consolidation of the OntoLex-FrAC module, whose publication as a W3C community report is foreseen for the end of this year. The novel collocation component of OntoLex-FrAC is described in application to a lexicographic resource and corpus-based collocation scores available from the web, and finally, we demonstrate the capability and genericity of the model by showing how to retrieve and aggregate collocation information by means of SPARQL, and its export to a tabularFollowing presentations of frequency and attestations, and embeddings and distributional similarity, this paper introduces the third cornerstone of the emerging OntoLex module for Frequency, Attestation and Corpus-based Information, OntoLex-FrAC. We provide an RDF vocabulary for collocations, established as a consensus over contributions from five different institutions and numerous data sets, with the goal of eliciting feedback from reviewers, workshop audience and the scientific community in preparation of the final consolidation of the OntoLex-FrAC module, whose publication as a W3C community report is foreseen for the end of this year. The novel collocation component of OntoLex-FrAC is described in application to a lexicographic resource and corpus-based collocation scores available from the web, and finally, we demonstrate the capability and genericity of the model by showing how to retrieve and aggregate collocation information by means of SPARQL, and its export to a tabular format, so that it can be easily processed in downstream applications.…
Author: | Christian ChiarcosORCiDGND, Katerina Gkirtzou, Maxim Ionov, Besim Kabashi, Fahad Kha, Ciprian-Octavian Truică |
---|---|
URN: | urn:nbn:de:bvb:384-opus4-1038605 |
Frontdoor URL | https://opus.bibliothek.uni-augsburg.de/opus4/103860 |
URL: | https://aclanthology.org/2022.gwll-1.3 |
ISBN: | 979-10-95546-92-4OPAC |
Parent Title (English): | Proceedings of Globalex Workshop on Linked Lexicography (GWLL 2022) within the 13th Language Resources and Evaluation Conference, 20-25 June 2022, Marseille, France |
Publisher: | European Language Resources Association |
Place of publication: | Paris |
Editor: | Ilan Kernerman, Simon Kre |
Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2022 |
Publishing Institution: | Universität Augsburg |
Release Date: | 2023/05/03 |
First Page: | 10 |
Last Page: | 18 |
Institutes: | Philologisch-Historische Fakultät |
Philologisch-Historische Fakultät / Angewandte Computerlinguistik | |
Philologisch-Historische Fakultät / Angewandte Computerlinguistik / Lehrstuhl für Angewandte Computerlinguistik (ACoLi) | |
Dewey Decimal Classification: | 4 Sprache / 40 Sprache / 400 Sprache |
Licence (German): | CC-BY-NC 4.0: Creative Commons: Namensnennung - Nicht kommerziell (mit Print on Demand) |