Supertagging the long tail with tree-structured decoding of complex categories

  • Although current CCG supertaggers achieve high accuracy on the standard WSJ test set, few systems make use of the categories’ internal structure that will drive the syntactic derivation during parsing. The tagset is traditionally truncated, discarding the many rare and complex category types in the long tail. However, supertags are themselves trees. Rather than give up on rare tags, we investigate constructive models that account for their internal structure, including novel methods for tree-structured prediction. Our best tagger is capable of recovering a sizeable fraction of the long-tail supertags and even generates CCG categories that have never been seen in training, while approximating the prior state of the art in overall tag accuracy with fewer parameters. We further investigate how well different approaches generalize to out-of-domain evaluation sets.

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Jakob PrangeGND, Nathan Schneider, Vivek Srikumar
URN:urn:nbn:de:bvb:384-opus4-1178314
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/117831
ISSN:2307-387XOPAC
Parent Title (English):Transactions of the Association for Computational Linguistics
Publisher:MIT Press
Place of publication:Cambridge, MA
Type:Article
Language:English
Year of first Publication:2021
Publishing Institution:Universität Augsburg
Release Date:2025/01/07
Volume:9
First Page:243
Last Page:260
DOI:https://doi.org/10.1162/tacl_a_00364
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Computerlinguistik
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):CC-BY 4.0: Creative Commons: Namensnennung (mit Print on Demand)