Three real-world datasets and neural computational models for classification tasks in patent landscaping
- Patent Landscaping, one of the central tasks of intellectual property management, includes selecting and grouping patents according to user-defined technical or application-oriented criteria. While recent transformer-based models have been shown to be effective for classifying patents into taxonomies such as CPC or IPC, there is yet little research on how to support real-world Patent Landscape Studies (PLSs) using natural language processing methods. With this paper, we release three labeled datasets for PLS-oriented classification tasks covering two diverse domains. We provide a qualitative analysis and report detailed corpus statistics.Most research on neural models for patents has been restricted to leveraging titles and abstracts. We compare strong neural and non-neural baselines, proposing a novel model that takes into account textual information from the patents’ full texts as well as embeddings created based on the patents’ CPC labels. We find that for PLS-orientedPatent Landscaping, one of the central tasks of intellectual property management, includes selecting and grouping patents according to user-defined technical or application-oriented criteria. While recent transformer-based models have been shown to be effective for classifying patents into taxonomies such as CPC or IPC, there is yet little research on how to support real-world Patent Landscape Studies (PLSs) using natural language processing methods. With this paper, we release three labeled datasets for PLS-oriented classification tasks covering two diverse domains. We provide a qualitative analysis and report detailed corpus statistics.Most research on neural models for patents has been restricted to leveraging titles and abstracts. We compare strong neural and non-neural baselines, proposing a novel model that takes into account textual information from the patents’ full texts as well as embeddings created based on the patents’ CPC labels. We find that for PLS-oriented classification tasks, going beyond title and abstract is crucial, CPC labels are an effective source of information, and combining all features yields the best results.…


| Author: | Subhash Pujari, Jannik Strötgen, Mark Giereth, Michael Gertz, Annemarie FriedrichORCiDGND |
|---|---|
| URN: | urn:nbn:de:bvb:384-opus4-1055629 |
| Frontdoor URL | https://opus.bibliothek.uni-augsburg.de/opus4/105562 |
| URL: | https://aclanthology.org/2022.emnlp-main.791 |
| Parent Title (English): | Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 7-11 December 2022, Abu Dhabi, United Arab Emirates |
| Publisher: | Association for Computational Linguistics |
| Place of publication: | Stroudsburg, PA |
| Editor: | Yoav Goldberg, Zornitsa Kozareva, Yue Zhang |
| Type: | Conference Proceeding |
| Language: | English |
| Date of Publication (online): | 2023/07/05 |
| Year of first Publication: | 2022 |
| Publishing Institution: | Universität Augsburg |
| Release Date: | 2023/07/10 |
| First Page: | 11498 |
| Last Page: | 11513 |
| Institutes: | Fakultät für Angewandte Informatik |
| Fakultät für Angewandte Informatik / Institut für Informatik | |
| Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Computerlinguistik | |
| Dewey Decimal Classification: | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik |
| Licence (German): | CC-BY 4.0: Creative Commons: Namensnennung |



