Applying Occam's Razor to transformer-based dependency parsing: what works, what doesn't, and what is really necessary
- The introduction of pre-trained transformer-based contextualized word embeddings has led to considerable improvements in the accuracy of graph-based parsers for frameworks such as Universal Dependencies (UD). However, previous works differ in various dimensions, including their choice of pre-trained language models and whether they use LSTM layers. With the aims of disentangling the effects of these choices and identifying a simple yet widely applicable architecture, we introduce STEPS, a new modular graph-based dependency parser. Using STEPS, we perform a series of analyses on the UD corpora of a diverse set of languages. We find that the choice of pre-trained embeddings has by far the greatest impact on parser performance and identify XLM-R as a robust choice across the languages in our study. Adding LSTM layers provides no benefits when using transformer-based embeddings. A multi-task training setup outputting additional UD features may contort results. Taking these insightsThe introduction of pre-trained transformer-based contextualized word embeddings has led to considerable improvements in the accuracy of graph-based parsers for frameworks such as Universal Dependencies (UD). However, previous works differ in various dimensions, including their choice of pre-trained language models and whether they use LSTM layers. With the aims of disentangling the effects of these choices and identifying a simple yet widely applicable architecture, we introduce STEPS, a new modular graph-based dependency parser. Using STEPS, we perform a series of analyses on the UD corpora of a diverse set of languages. We find that the choice of pre-trained embeddings has by far the greatest impact on parser performance and identify XLM-R as a robust choice across the languages in our study. Adding LSTM layers provides no benefits when using transformer-based embeddings. A multi-task training setup outputting additional UD features may contort results. Taking these insights together, we propose a simple but widely applicable parser architecture and configuration, achieving new state-of-the-art results (in terms of LAS) for 10 out of 12 diverse languages.…
Author: | Stefan Grünewald, Annemarie FriedrichORCiDGND, Jonas Kuhn |
---|---|
URN: | urn:nbn:de:bvb:384-opus4-1056345 |
Frontdoor URL | https://opus.bibliothek.uni-augsburg.de/opus4/105634 |
ISBN: | 978-1-954085-80-0OPAC |
Parent Title (English): | Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021), August 6, 2021, Bangkok, Thailand (online) |
Publisher: | Association for Computational Linguistics |
Place of publication: | Stroudsburg, PA |
Editor: | Stephan Oepen, Kenji Sagae, Reut Tsarfaty, Gosse Bouma, Djamé Seddah, Daniel Zeman |
Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2021 |
Publishing Institution: | Universität Augsburg |
Release Date: | 2023/07/10 |
First Page: | 131 |
Last Page: | 144 |
DOI: | https://doi.org/10.18653/v1/2021.iwpt-1.13 |
Institutes: | Fakultät für Angewandte Informatik |
Fakultät für Angewandte Informatik / Institut für Informatik | |
Fakultät für Angewandte Informatik / Institut für Informatik / Professur für Sprachverstehen mit der Anwendung Digital Humanities | |
Dewey Decimal Classification: | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik |
Licence (German): | CC-BY 4.0: Creative Commons: Namensnennung (mit Print on Demand) |