The SOFC-Exp corpus and neural approaches to information extraction in the materials science domain

Friedrich, Annemarie; Adel, Heike; Tomazic, Federico; Hingerl, Johannes; Benteau, Renou; Marusczyk, Anika; Lange, Lukas

The SOFC-Exp corpus and neural approaches to information extraction in the materials science domain

Annemarie Friedrich, Heike Adel, Federico Tomazic, Johannes Hingerl, Renou Benteau, Anika Marusczyk, Lukas Lange

This paper presents a new challenging information extraction task in the domain of materials science. We develop an annotation scheme for marking information on experiments related to solid oxide fuel cells in scientific publications, such as involved materials and measurement conditions. With this paper, we publish our annotation guidelines, as well as our SOFC-Exp corpus consisting of 45 open-access scholarly articles annotated by domain experts. A corpus and an inter-annotator agreement study demonstrate the complexity of the suggested named entity recognition and slot filling tasks as well as high annotation quality. We also present strong neural-network based models for a variety of tasks that can be addressed on the basis of our new data set. On all tasks, using BERT embeddings leads to large performance gains, but with increasing task complexity, adding a recurrent neural network on top seems beneficial. Our models will serve as competitive baselines in future work, and analysisThis paper presents a new challenging information extraction task in the domain of materials science. We develop an annotation scheme for marking information on experiments related to solid oxide fuel cells in scientific publications, such as involved materials and measurement conditions. With this paper, we publish our annotation guidelines, as well as our SOFC-Exp corpus consisting of 45 open-access scholarly articles annotated by domain experts. A corpus and an inter-annotator agreement study demonstrate the complexity of the suggested named entity recognition and slot filling tasks as well as high annotation quality. We also present strong neural-network based models for a variety of tasks that can be addressed on the basis of our new data set. On all tasks, using BERT embeddings leads to large performance gains, but with increasing task complexity, adding a recurrent neural network on top seems beneficial. Our models will serve as competitive baselines in future work, and analysis of their performance highlights difficult cases when modeling the data and suggests promising research directions.…

Metadaten
Author:	Annemarie Friedrich ORCiD GND, Heike Adel, Federico Tomazic, Johannes Hingerl, Renou Benteau, Anika Marusczyk, Lukas Lange
URN:	urn:nbn:de:bvb:384-opus4-1056692
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/105669
ISBN:	978-1-952148-25-5OPAC
Parent Title (English):	Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 5-10, 2020, online
Publisher:	Association for Computational Linguistics
Place of publication:	Stroudsburg, PA
Editor:	Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Type:	Conference Proceeding
Language:	English
Year of first Publication:	2020
Publishing Institution:	Universität Augsburg
Release Date:	2023/07/10
First Page:	1255
Last Page:	1268
DOI:	https://doi.org/10.18653/v1/2020.acl-main.116
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Professur für Sprachverstehen mit der Anwendung Digital Humanities
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	CC-BY 4.0: Creative Commons: Namensnennung (mit Print on Demand)

Open Access

The SOFC-Exp corpus and neural approaches to information extraction in the materials science domain

Download full text files

Export metadata

Statistics

Print On Demand

Additional Services