Inducing discourse marker inventories from lexical knowledge graphs
- Discourse marker inventories are important tools for the development of both discourse parsers and corpora with discourse annotations. In this paper we explore the potential of massively multilingual lexical knowledge graphs to induce multilingual discourse marker lexicons using concept propagation methods as previously developed in the context of translation inference across dictionaries. Given one or multiple source languages with discourse marker inventories that discourse relations as senses of potential discourse markers, as well as a large number of bilingual dictionaries that link them – directly or indirectly – with the target language, we specifically study to what extent discourse marker induction can benefit from the integration of information from different sources, the impact of sense granularity and what limiting factors may need to be considered. Our study uses discourse marker inventories from nine European languages normalized against the discourse relation inventoryDiscourse marker inventories are important tools for the development of both discourse parsers and corpora with discourse annotations. In this paper we explore the potential of massively multilingual lexical knowledge graphs to induce multilingual discourse marker lexicons using concept propagation methods as previously developed in the context of translation inference across dictionaries. Given one or multiple source languages with discourse marker inventories that discourse relations as senses of potential discourse markers, as well as a large number of bilingual dictionaries that link them – directly or indirectly – with the target language, we specifically study to what extent discourse marker induction can benefit from the integration of information from different sources, the impact of sense granularity and what limiting factors may need to be considered. Our study uses discourse marker inventories from nine European languages normalized against the discourse relation inventory of the Penn Discourse Treebank (PDTB), as well as three collections of machine-readable dictionaries with different characteristics, so that the interplay of a large number of factors can be studied.…
Author: | Christian ChiarcosORCiDGND |
---|---|
URN: | urn:nbn:de:bvb:384-opus4-1039876 |
Frontdoor URL | https://opus.bibliothek.uni-augsburg.de/opus4/103987 |
URL: | https://aclanthology.org/2022.lrec-1.257 |
ISBN: | 979-10-95546-72-6OPAC |
Parent Title (English): | Proceedings of the Thirteenth Language Resources and Evaluation Conference, 20-25 June 2022, Marseille, France |
Publisher: | European Language Resources Association |
Place of publication: | Paris |
Editor: | Nicoletta Calzolar, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis |
Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2022 |
Publishing Institution: | Universität Augsburg |
Release Date: | 2023/05/16 |
First Page: | 2401 |
Last Page: | 2412 |
Institutes: | Philologisch-Historische Fakultät |
Philologisch-Historische Fakultät / Angewandte Computerlinguistik | |
Philologisch-Historische Fakultät / Angewandte Computerlinguistik / Lehrstuhl für Angewandte Computerlinguistik (ACoLi) | |
Dewey Decimal Classification: | 4 Sprache / 40 Sprache / 400 Sprache |
Licence (German): | CC-BY-NC 4.0: Creative Commons: Namensnennung - Nicht kommerziell (mit Print on Demand) |