Three real-world datasets and neural computational models for classification tasks in patent landscaping
- Patent Landscaping, one of the central tasks of intellectual property management, includes selecting and grouping patents according to user-defined technical or application-oriented criteria. While recent transformer-based models have been shown to be effective for classifying patents into taxonomies such as CPC or IPC, there is yet little research on how to support real-world Patent Landscape Studies (PLSs) using natural language processing methods. With this paper, we release three labeled datasets for PLS-oriented classification tasks covering two diverse domains. We provide a qualitative analysis and report detailed corpus statistics.Most research on neural models for patents has been restricted to leveraging titles and abstracts. We compare strong neural and non-neural baselines, proposing a novel model that takes into account textual information from the patents’ full texts as well as embeddings created based on the patents’ CPC labels. We find that for PLS-orientedPatent Landscaping, one of the central tasks of intellectual property management, includes selecting and grouping patents according to user-defined technical or application-oriented criteria. While recent transformer-based models have been shown to be effective for classifying patents into taxonomies such as CPC or IPC, there is yet little research on how to support real-world Patent Landscape Studies (PLSs) using natural language processing methods. With this paper, we release three labeled datasets for PLS-oriented classification tasks covering two diverse domains. We provide a qualitative analysis and report detailed corpus statistics.Most research on neural models for patents has been restricted to leveraging titles and abstracts. We compare strong neural and non-neural baselines, proposing a novel model that takes into account textual information from the patents’ full texts as well as embeddings created based on the patents’ CPC labels. We find that for PLS-oriented classification tasks, going beyond title and abstract is crucial, CPC labels are an effective source of information, and combining all features yields the best results.…
Author: | Subhash Pujari, Jannik Strötgen, Mark Giereth, Michael Gertz, Annemarie FriedrichORCiDGND |
---|---|
URN: | urn:nbn:de:bvb:384-opus4-1055629 |
Frontdoor URL | https://opus.bibliothek.uni-augsburg.de/opus4/105562 |
URL: | https://aclanthology.org/2022.emnlp-main.791 |
Parent Title (English): | Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 7-11 December 2022, Abu Dhabi, United Arab Emirates |
Publisher: | Association for Computational Linguistics |
Place of publication: | Stroudsburg, PA |
Editor: | Yoav Goldberg, Zornitsa Kozareva, Yue Zhang |
Type: | Conference Proceeding |
Language: | English |
Date of Publication (online): | 2023/07/05 |
Year of first Publication: | 2022 |
Publishing Institution: | Universität Augsburg |
Release Date: | 2023/07/10 |
First Page: | 11498 |
Last Page: | 11513 |
Institutes: | Fakultät für Angewandte Informatik |
Fakultät für Angewandte Informatik / Institut für Informatik | |
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Computerlinguistik | |
Dewey Decimal Classification: | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik |
Licence (German): | ![]() |