A fuzzy hashing approach based on random sequences and hamming distance

  • Hash functions are well-known methods in computer science to map arbitrary large input to bit strings of a fixed length that serve as unique input identifier fingerprints. A key property of cryptographic hash functions is that even if only one bit of the input is changed the output behaves pseudo randomly and therefore similar files cannot be identified. However, in the area of computer forensics it is also necessary to find similar files (e.g. different versions of a file), wherefore we need a similarity preserving hash function also called fuzzy hash function. In this paper we present a new approach for fuzzy hashing called bbHash. It is based on the idea to ‘rebuild’ an input as good as possible using a fixed set of randomly chosen byte sequences called building blocks of byte lengthl (e.g.l= 128 ). The proceeding is as follows: slide through the input byte-by-byte, read out the current input byte sequence of lengthl , and compute the Hamming distances of all building blocks againstHash functions are well-known methods in computer science to map arbitrary large input to bit strings of a fixed length that serve as unique input identifier fingerprints. A key property of cryptographic hash functions is that even if only one bit of the input is changed the output behaves pseudo randomly and therefore similar files cannot be identified. However, in the area of computer forensics it is also necessary to find similar files (e.g. different versions of a file), wherefore we need a similarity preserving hash function also called fuzzy hash function. In this paper we present a new approach for fuzzy hashing called bbHash. It is based on the idea to ‘rebuild’ an input as good as possible using a fixed set of randomly chosen byte sequences called building blocks of byte lengthl (e.g.l= 128 ). The proceeding is as follows: slide through the input byte-by-byte, read out the current input byte sequence of lengthl , and compute the Hamming distances of all building blocks against the current input byte sequence. Each building block with Hamming distance smaller than a certain threshold contributes the file’s bbHash. We discuss (dis-)advantages of our bbHash to further fuzzy hash approaches. A key property of bbHash is that it is the first fuzzy hashing approach based on a comparison to external data structures.show moreshow less

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Frank BreitingerORCiDGND, Harald Baier
URN:urn:nbn:de:bvb:384-opus4-1178205
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/117820
URL:https://commons.erau.edu/adfsl/2012/wednesday/15
Parent Title (English):7th Annual ADFSL Conference on Digital Forensics, Security and Law, Richmond, Virginia, USA, May 30-31, 2012
Publisher:Association of Digital Forensics, Security and Law (ADFSL)
Place of publication:Ponce Inlet, FL
Editor:Glenn S. Dardick
Type:Conference Proceeding
Language:English
Year of first Publication:2012
Publishing Institution:Universität Augsburg
Release Date:2025/01/07
First Page:89
Last Page:100
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Cybersicherheit
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):CC-BY-NC-ND 4.0: Creative Commons: Namensnennung - Nicht kommerziell - Keine Bearbeitung (mit Print on Demand)