Towards a standardized methodology and dataset for evaluating LLM-based digital forensic timeline analysis

Studiawan, Hudan; Breitinger, Frank; Scanlon, Mark

doi:10.1016/j.fsidi.2025.301982

Towards a standardized methodology and dataset for evaluating LLM-based digital forensic timeline analysis

Hudan Studiawan, Frank Breitinger, Mark Scanlon

Large language models (LLMs) have widespread adoption in many domains, including digital forensics. While prior research has largely centered on case studies and examples demonstrating how LLMs can assist forensic investigations, deeper explorations remain limited, i.e., a standardized approach for precise performance evaluations is lacking. Inspired by the NIST Computer Forensic Tool Testing Program, this paper proposes a standardized methodology to quantitatively evaluate the application of LLMs for digital forensic tasks, specifically in timeline analysis. The paper describes the components of the methodology, including the dataset, timeline generation, and ground truth development. In addition, the paper recommends the use of BLEU and ROUGE metrics for the quantitative evaluation of LLMs through case studies or tasks involving timeline analysis. Experimental results using ChatGPT demonstrate that the proposed methodology can effectively evaluate LLM-based forensic timelineLarge language models (LLMs) have widespread adoption in many domains, including digital forensics. While prior research has largely centered on case studies and examples demonstrating how LLMs can assist forensic investigations, deeper explorations remain limited, i.e., a standardized approach for precise performance evaluations is lacking. Inspired by the NIST Computer Forensic Tool Testing Program, this paper proposes a standardized methodology to quantitatively evaluate the application of LLMs for digital forensic tasks, specifically in timeline analysis. The paper describes the components of the methodology, including the dataset, timeline generation, and ground truth development. In addition, the paper recommends the use of BLEU and ROUGE metrics for the quantitative evaluation of LLMs through case studies or tasks involving timeline analysis. Experimental results using ChatGPT demonstrate that the proposed methodology can effectively evaluate LLM-based forensic timeline analysis. Finally, we discuss the limitations of applying LLMs to forensic timeline analysis.…

Metadaten
Author:	Hudan Studiawan, Frank Breitinger GND, Mark Scanlon
URN:	urn:nbn:de:bvb:384-opus4-1263286
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/126328
ISSN:	2666-2817OPAC
Parent Title (English):	Forensic Science International: Digital Investigation
Publisher:	Elsevier BV
Place of publication:	Amsterdam
Type:	Article
Language:	English
Year of first Publication:	2025
Publishing Institution:	Universität Augsburg
Release Date:	2025/11/14
Volume:	54
Issue:	Supplement
First Page:	301982
DOI:	https://doi.org/10.1016/j.fsidi.2025.301982
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Cybersicherheit
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	CC-BY-NC-ND 4.0: Creative Commons: Namensnennung - Nicht kommerziell - Keine Bearbeitung

Open Access

Towards a standardized methodology and dataset for evaluating LLM-based digital forensic timeline analysis

Download full text files

Export metadata

Statistics

Additional Services