Applying Occam's Razor to transformer-based dependency parsing: what works, what doesn't, and what is really necessary

Grünewald, Stefan; Friedrich, Annemarie; Kuhn, Jonas

Applying Occam's Razor to transformer-based dependency parsing: what works, what doesn't, and what is really necessary

Stefan Grünewald, Annemarie Friedrich, Jonas Kuhn

The introduction of pre-trained transformer-based contextualized word embeddings has led to considerable improvements in the accuracy of graph-based parsers for frameworks such as Universal Dependencies (UD). However, previous works differ in various dimensions, including their choice of pre-trained language models and whether they use LSTM layers. With the aims of disentangling the effects of these choices and identifying a simple yet widely applicable architecture, we introduce STEPS, a new modular graph-based dependency parser. Using STEPS, we perform a series of analyses on the UD corpora of a diverse set of languages. We find that the choice of pre-trained embeddings has by far the greatest impact on parser performance and identify XLM-R as a robust choice across the languages in our study. Adding LSTM layers provides no benefits when using transformer-based embeddings. A multi-task training setup outputting additional UD features may contort results. Taking these insightsThe introduction of pre-trained transformer-based contextualized word embeddings has led to considerable improvements in the accuracy of graph-based parsers for frameworks such as Universal Dependencies (UD). However, previous works differ in various dimensions, including their choice of pre-trained language models and whether they use LSTM layers. With the aims of disentangling the effects of these choices and identifying a simple yet widely applicable architecture, we introduce STEPS, a new modular graph-based dependency parser. Using STEPS, we perform a series of analyses on the UD corpora of a diverse set of languages. We find that the choice of pre-trained embeddings has by far the greatest impact on parser performance and identify XLM-R as a robust choice across the languages in our study. Adding LSTM layers provides no benefits when using transformer-based embeddings. A multi-task training setup outputting additional UD features may contort results. Taking these insights together, we propose a simple but widely applicable parser architecture and configuration, achieving new state-of-the-art results (in terms of LAS) for 10 out of 12 diverse languages.…

Metadaten
Author:	Stefan Grünewald, Annemarie Friedrich ORCiD GND, Jonas Kuhn
URN:	urn:nbn:de:bvb:384-opus4-1056345
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/105634
ISBN:	978-1-954085-80-0OPAC
Parent Title (English):	Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021), August 6, 2021, Bangkok, Thailand (online)
Publisher:	Association for Computational Linguistics
Place of publication:	Stroudsburg, PA
Editor:	Stephan Oepen, Kenji Sagae, Reut Tsarfaty, Gosse Bouma, Djamé Seddah, Daniel Zeman
Type:	Conference Proceeding
Language:	English
Year of first Publication:	2021
Publishing Institution:	Universität Augsburg
Release Date:	2023/07/10
First Page:	131
Last Page:	144
DOI:	https://doi.org/10.18653/v1/2021.iwpt-1.13
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Professur für Sprachverstehen mit der Anwendung Digital Humanities
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	CC-BY 4.0: Creative Commons: Namensnennung (mit Print on Demand)

Open Access

Applying Occam's Razor to transformer-based dependency parsing: what works, what doesn't, and what is really necessary

Download full text files

Export metadata

Statistics

Print On Demand

Additional Services