Annotating genericity: a survey, a scheme, and a corpus

  • Generics are linguistic expressions that make statements about or refer to kinds, or that report regularities of events. Non-generic expressions make statements about particular individuals or specific episodes. Generics are treated extensively in semantic theory (Krifka et al., 1995). In practice, it is often hard to decide whether a referring expression is generic or non-generic, and to date there is no data set which is both large and satisfactorily annotated. Such a data set would be valuable for creating automatic systems for identifying generic expressions, in turn facilitating knowledge extraction from natural language text. In this paper we provide the next steps for such an annotation endeavor. Our contributions are: (1) we survey the most important previous projects annotating genericity, focusing on resources for English; (2) with a new agreement study we identify problems in the annotation scheme of the largest currently available resource (ACE-2005); and (3) we introduceGenerics are linguistic expressions that make statements about or refer to kinds, or that report regularities of events. Non-generic expressions make statements about particular individuals or specific episodes. Generics are treated extensively in semantic theory (Krifka et al., 1995). In practice, it is often hard to decide whether a referring expression is generic or non-generic, and to date there is no data set which is both large and satisfactorily annotated. Such a data set would be valuable for creating automatic systems for identifying generic expressions, in turn facilitating knowledge extraction from natural language text. In this paper we provide the next steps for such an annotation endeavor. Our contributions are: (1) we survey the most important previous projects annotating genericity, focusing on resources for English; (2) with a new agreement study we identify problems in the annotation scheme of the largest currently available resource (ACE-2005); and (3) we introduce a linguistically-motivated annotation scheme for marking both clauses and their subjects with regard to their genericity. (4) We present a corpus of MASC (Ide et al., 2010) and Wikipedia texts annotated according to our scheme, achieving substantial agreement.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Annemarie FriedrichORCiDGND, Alexis Palmer, Melissa Peate Sørensen, Manfred Pinkal
URN:urn:nbn:de:bvb:384-opus4-1056974
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/105697
ISBN:978-1-941643-47-1OPAC
Parent Title (English):Proceedings of The 9th Linguistic Annotation Workshop, June 5, 2015, Denver, Colorado
Publisher:Association for Computational Linguistics
Place of publication:Stroudsburg, PA
Editor:Adam Meyers, Ines Rehbein, Heike Zinsmeister
Type:Conference Proceeding
Language:English
Year of first Publication:2015
Publishing Institution:Universität Augsburg
Release Date:2023/07/10
First Page:21
Last Page:30
DOI:https://doi.org/10.3115/v1/w15-1603
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Professur für Sprachverstehen mit der Anwendung Digital Humanities
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):Sonstige Open-Access-Lizenz