Three real-world datasets and neural computational models for classification tasks in patent landscaping

Pujari, Subhash; Strötgen, Jannik; Giereth, Mark; Gertz, Michael; Friedrich, Annemarie

Three real-world datasets and neural computational models for classification tasks in patent landscaping

Subhash Pujari, Jannik Strötgen, Mark Giereth, Michael Gertz, Annemarie Friedrich

Patent Landscaping, one of the central tasks of intellectual property management, includes selecting and grouping patents according to user-defined technical or application-oriented criteria. While recent transformer-based models have been shown to be effective for classifying patents into taxonomies such as CPC or IPC, there is yet little research on how to support real-world Patent Landscape Studies (PLSs) using natural language processing methods. With this paper, we release three labeled datasets for PLS-oriented classification tasks covering two diverse domains. We provide a qualitative analysis and report detailed corpus statistics.Most research on neural models for patents has been restricted to leveraging titles and abstracts. We compare strong neural and non-neural baselines, proposing a novel model that takes into account textual information from the patents’ full texts as well as embeddings created based on the patents’ CPC labels. We find that for PLS-orientedPatent Landscaping, one of the central tasks of intellectual property management, includes selecting and grouping patents according to user-defined technical or application-oriented criteria. While recent transformer-based models have been shown to be effective for classifying patents into taxonomies such as CPC or IPC, there is yet little research on how to support real-world Patent Landscape Studies (PLSs) using natural language processing methods. With this paper, we release three labeled datasets for PLS-oriented classification tasks covering two diverse domains. We provide a qualitative analysis and report detailed corpus statistics.Most research on neural models for patents has been restricted to leveraging titles and abstracts. We compare strong neural and non-neural baselines, proposing a novel model that takes into account textual information from the patents’ full texts as well as embeddings created based on the patents’ CPC labels. We find that for PLS-oriented classification tasks, going beyond title and abstract is crucial, CPC labels are an effective source of information, and combining all features yields the best results.…

Metadaten
Author:	Subhash Pujari, Jannik Strötgen, Mark Giereth, Michael Gertz, Annemarie Friedrich ORCiD GND
URN:	urn:nbn:de:bvb:384-opus4-1055629
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/105562
URL:	https://aclanthology.org/2022.emnlp-main.791
Parent Title (English):	Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 7-11 December 2022, Abu Dhabi, United Arab Emirates
Publisher:	Association for Computational Linguistics
Place of publication:	Stroudsburg, PA
Editor:	Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Type:	Conference Proceeding
Language:	English
Date of Publication (online):	2023/07/05
Year of first Publication:	2022
Publishing Institution:	Universität Augsburg
Release Date:	2023/07/10
First Page:	11498
Last Page:	11513
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Computerlinguistik
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	CC-BY 4.0: Creative Commons: Namensnennung

Open Access

Three real-world datasets and neural computational models for classification tasks in patent landscaping

Download full text files

Export metadata

Statistics

Additional Services