Optimising data set creation in the cybersecurity landscape with a special focus on digital forensics: principles, characteristics, and use cases

  • Data sets (samples) in cybersecurity and digital forensics are important for research, training, and tool development. While the FAIR principles, data repositories and archives like Zenodo and NIST's Computer Forensic Reference Data Sets (CFReDS) enhance the accessibility and reusability of data sets, standardised practices for crafting and describing these data sets require further attention. This paper analyses the existing literature to identify the key data set (generation) characteristics, issues, desirable attributes, and use cases. Although our findings are generally applicable, i.e., to the cybersecurity domain, our special focus is on the digital forensics domain. We define principles and properties for cybersecurity-relevant data sets and their implications for the data creation process to maximise their quality, utility and applicability, taking into account specific data set use cases and data origin. We aim to guide data set creators in enhancing their data sets' value forData sets (samples) in cybersecurity and digital forensics are important for research, training, and tool development. While the FAIR principles, data repositories and archives like Zenodo and NIST's Computer Forensic Reference Data Sets (CFReDS) enhance the accessibility and reusability of data sets, standardised practices for crafting and describing these data sets require further attention. This paper analyses the existing literature to identify the key data set (generation) characteristics, issues, desirable attributes, and use cases. Although our findings are generally applicable, i.e., to the cybersecurity domain, our special focus is on the digital forensics domain. We define principles and properties for cybersecurity-relevant data sets and their implications for the data creation process to maximise their quality, utility and applicability, taking into account specific data set use cases and data origin. We aim to guide data set creators in enhancing their data sets' value for the cybersecurity and digital forensics field.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Thomas Göbel, Frank BreitingerORCiDGND, Harald Baier
URN:urn:nbn:de:bvb:384-opus4-1177067
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/117706
ISSN:2666-2825OPAC
Parent Title (English):Forensic Science International: Digital Investigation
Publisher:Elsevier BV
Place of publication:Amsterdam
Type:Article
Language:English
Year of first Publication:2025
Publishing Institution:Universität Augsburg
Release Date:2024/12/18
Volume:52
First Page:301882
DOI:https://doi.org/10.1016/j.fsidi.2025.301882
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Cybersicherheit
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):CC-BY 4.0: Creative Commons: Namensnennung