Automation is crucial for managing the increasing volume of digital evidence. However, the absence of a clear foundation comprising a definition, classification, and common terminology has led to a fragmented landscape where diverse interpretations of automation exist. This resembles the wild west: some consider keyword searches or file carving as automation while others do not. We, therefore, reviewed automation literature (in the domain of digital forensics and other domains), performed three practitioner interviews, and discussed the topic with domain experts from academia. On this basis, we propose a definition and then showcase several considerations concerning automation for digital forensics, e.g., what we classify as no/basic automation or full automation (autonomous). We conclude that it requires these foundational discussions to promote and progress the discipline through a common understanding.
Generative AIs, especially Large Language Models (LLMs) such as ChatGPT or Llama, have advanced significantly, positioning them as valuable tools for digital forensics. While initial studies have explored the potential of ChatGPT in the context of investigations, the question of to what extent LLMs can assist the forensic report writing process remains unresolved. To answer the question, this article first examines forensic reports with the goal of generalization (e.g., finding the ‘average structure’ of a report). We then evaluate the strengths and limitations of LLMs for generating the different parts of the forensic report using a case study. This work thus provides valuable insights into the automation of report writing, a critical facet of digital forensics investigations. We conclude that combined with thorough proofreading and corrections, LLMs may assist practitioners during the report writing process but at this point cannot replace them.
The current state of automation in digital forensics remains insufficiently defined. While the complexity of automated tools and methods has evolved significantly (e.g., from basic parsers to the integration of advanced techniques), it remains challenging to pinpoint the field’s overall progress or compare methods. A first step towards a solution was the work ‘Automation for digital forensics: Towards a definition for the community’ which defines automation but cannot categorize various methods. This work aims to address this gap and presents a first classification model for automation for digital forensics. Therefore, we analyzed automation classification schemes from different disciplines (e.g., cars) and assessed various model possibilities as well as characteristics. We conclude that a 2-dimensional model with the axis ‘decision’ and ‘level of automation’ is most appropriate and provide an overview table with examples.
Fine-tuning large language models for digital forensics: case study and general recommendations
(2025)
Large language models (LLMs) have rapidly gained popularity in various fields, including digital forensics (DF), where they offer the potential to accelerate investigative processes. Although several studies have explored LLMs for tasks such as evidence identification, artifact analysis, and report writing, fine-tuning models for specific forensic applications remains underexplored. This paper addresses this gap by proposing recommendations for fine-tuning LLMs tailored to digital forensics tasks. A case study on chat summarization is presented to showcase the applicability of the recommendations, where we evaluate multiple fine-tuned models to assess their performance. The study concludes with sharing the lessons learned from the case study.