Malware family classification via efficient Huffman features

O’Shaughnessy, Stephen; Breitinger, Frank

doi:10.1016/j.fsidi.2021.301192

Malware family classification via efficient Huffman features

As malware evolves and becomes more complex, researchers strive to develop detection and classification schemes that abstract away from the internal intricacies of binary code to represent malware without the need for architectural knowledge or invasive analysis procedures. Such approaches can reduce the complexities of feature generation and simplify the analysis process. In this paper, we present efficient Huffman features (eHf), a novel compression-based approach to feature construction, based on Huffman encoding, where malware features are represented in a compact format, without the need for intrusive reverse-engineering or dynamic analysis processes. We demonstrate the viability of eHf as a solution for classifying malware into their respective families on a large malware corpus of 15 k samples, indicative of the current threat landscape. We evaluate eHf against current compression-based alternatives and show that our method is comparable or superior for classification accuracy,As malware evolves and becomes more complex, researchers strive to develop detection and classification schemes that abstract away from the internal intricacies of binary code to represent malware without the need for architectural knowledge or invasive analysis procedures. Such approaches can reduce the complexities of feature generation and simplify the analysis process. In this paper, we present efficient Huffman features (eHf), a novel compression-based approach to feature construction, based on Huffman encoding, where malware features are represented in a compact format, without the need for intrusive reverse-engineering or dynamic analysis processes. We demonstrate the viability of eHf as a solution for classifying malware into their respective families on a large malware corpus of 15 k samples, indicative of the current threat landscape. We evaluate eHf against current compression-based alternatives and show that our method is comparable or superior for classification accuracy, while exhibiting considerably greater runtime efficiency. Finally we demonstrate that eHf is resilient against code reordering obfuscation.…

Metadaten
Author:	Stephen O’Shaughnessy, Frank Breitinger ORCiD GND
URN:	urn:nbn:de:bvb:384-opus4-1175667
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/117566
ISSN:	2666-2817OPAC
Parent Title (English):	Forensic Science International: Digital Investigation
Publisher:	Elsevier BV
Type:	Article
Language:	English
Year of first Publication:	2021
Publishing Institution:	Universität Augsburg
Release Date:	2024/12/16
Volume:	37
Issue:	Supplement
First Page:	301192
DOI:	https://doi.org/10.1016/j.fsidi.2021.301192
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Cybersicherheit
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	CC-BY-NC-ND 4.0: Creative Commons: Namensnennung - Nicht kommerziell - Keine Bearbeitung

Open Access

Malware family classification via efficient Huffman features

Download full text files

Export metadata

Statistics

Additional Services