MuLMS: A multi-layer annotated text corpus for information extraction in the materials science domain

  • Keeping track of all relevant recent publications and experimental results for a research area is a challenging task. Prior work has demonstrated the efficacy of information extraction models in various scientific areas. Recently, several datasets have been released for the yet understudied materials science domain. However, these datasets focus on sub-problems such as parsing synthesis procedures or on subdomains, e.g., solid oxide fuel cells. In this resource paper, we present MuLMS, anew dataset of 50 open-access articles, spanning seven sub-domains of materials science.The corpus has been annotated by domain experts with several layers ranging from named entities over relations to frame structures. We present competitive neural models for all tasks and demonstrate that multi-task training with existing related resources leads to benefits.

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Timo Pierre Schrader, Matteo Finco, Stefan Grünewald, Felix Hildebrand, Annemarie FriedrichORCiDGND
URN:urn:nbn:de:bvb:384-opus4-1267344
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/126734
ISBN:979-8-89176-020-2OPAC
Parent Title (English):Proceedings of the Second Workshop on Information Extraction from Scientific Publications (WIESP 2023), 1 November 2023, Bali, Indonesia
Publisher:Association for Computational Linguistics (ACL)
Place of publication:Stroudsburg, PA
Editor:Tirthankar Ghosal, Felix Grezes, Thomas Allen, Kelly Lockhart, Alberto Accomazzi, Sergi Blanco-Cuaresma
Type:Conference Proceeding
Language:English
Year of first Publication:2023
Publishing Institution:Universität Augsburg
Release Date:2025/12/05
First Page:84
Last Page:100
DOI:https://doi.org/10.18653/v1/2023.wiesp-1.11
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Computerlinguistik
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):CC-BY 4.0: Creative Commons: Namensnennung