Modelling collocations in OntoLex-FrAC

  • Following presentations of frequency and attestations, and embeddings and distributional similarity, this paper introduces the third cornerstone of the emerging OntoLex module for Frequency, Attestation and Corpus-based Information, OntoLex-FrAC. We provide an RDF vocabulary for collocations, established as a consensus over contributions from five different institutions and numerous data sets, with the goal of eliciting feedback from reviewers, workshop audience and the scientific community in preparation of the final consolidation of the OntoLex-FrAC module, whose publication as a W3C community report is foreseen for the end of this year. The novel collocation component of OntoLex-FrAC is described in application to a lexicographic resource and corpus-based collocation scores available from the web, and finally, we demonstrate the capability and genericity of the model by showing how to retrieve and aggregate collocation information by means of SPARQL, and its export to a tabularFollowing presentations of frequency and attestations, and embeddings and distributional similarity, this paper introduces the third cornerstone of the emerging OntoLex module for Frequency, Attestation and Corpus-based Information, OntoLex-FrAC. We provide an RDF vocabulary for collocations, established as a consensus over contributions from five different institutions and numerous data sets, with the goal of eliciting feedback from reviewers, workshop audience and the scientific community in preparation of the final consolidation of the OntoLex-FrAC module, whose publication as a W3C community report is foreseen for the end of this year. The novel collocation component of OntoLex-FrAC is described in application to a lexicographic resource and corpus-based collocation scores available from the web, and finally, we demonstrate the capability and genericity of the model by showing how to retrieve and aggregate collocation information by means of SPARQL, and its export to a tabular format, so that it can be easily processed in downstream applications.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Christian ChiarcosORCiDGND, Katerina Gkirtzou, Maxim Ionov, Besim Kabashi, Fahad Kha, Ciprian-Octavian Truică
URN:urn:nbn:de:bvb:384-opus4-1038605
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/103860
URL:https://aclanthology.org/2022.gwll-1.3
ISBN:979-10-95546-92-4OPAC
Parent Title (English):Proceedings of Globalex Workshop on Linked Lexicography (GWLL 2022) within the 13th Language Resources and Evaluation Conference, 20-25 June 2022, Marseille, France
Publisher:European Language Resources Association
Place of publication:Paris
Editor:Ilan Kernerman, Simon Kre
Type:Conference Proceeding
Language:English
Year of first Publication:2022
Publishing Institution:Universität Augsburg
Release Date:2023/05/03
First Page:10
Last Page:18
Institutes:Philologisch-Historische Fakultät
Philologisch-Historische Fakultät / Angewandte Computerlinguistik
Philologisch-Historische Fakultät / Angewandte Computerlinguistik / Lehrstuhl für Angewandte Computerlinguistik (ACoLi)
Dewey Decimal Classification:4 Sprache / 40 Sprache / 400 Sprache
Licence (German):CC-BY-NC 4.0: Creative Commons: Namensnennung - Nicht kommerziell (mit Print on Demand)