Fbhash: a new similarity hashing scheme for digital forensics

Ghosh, M.; Sanadhya, S.K.; Singh, M.; White, D.R.; Chang, D.

DSpace Home
→
Research Publications
→
Year-2019
→
View Item

dc.contributor.author	Chang, D.
dc.contributor.author	Ghosh, M.
dc.contributor.author	Sanadhya, S.K.
dc.contributor.author	Singh, M.
dc.contributor.author	White, D.R.
dc.date.accessioned	2019-08-23T11:43:32Z
dc.date.available	2019-08-23T11:43:32Z
dc.date.issued	2019-08-23
dc.identifier.uri	http://localhost:8080/xmlui/handle/123456789/1324
dc.description.abstract	With the rapid growth of the World Wide Web and Internet of Things, a huge amount of digital data is being produced every day. Digital forensics investigators face an uphill task when they have to manually screen through and examine tons of such data during a crime investigation. To expedite this process, several automated techniques have been proposed and are being used in practice. Among which tools based on Approximate Matching algorithms have gained prominence, e.g., ssdeep, sdhash, mvHash etc. These tools produce hash signatures for all the files to be examined, compute a similarity score and then compare it with a known reference set to filter out known good as well as bad files. In this way, exact as well as similar matches can be screened out. However, all of these schemes have been shown to be prone to active adversary attack, whereby an attacker, by making feasible changes in the content of the file, intelligently modifies the final hash signature produced to evade detection. Thus, an alternate hashing scheme is required which can resist this attack. In this work, we propose a new Approximate Matching scheme termed as - FbHash. We show that our scheme is secure against active attacks and detects similarity with 98% accuracy. We also provide a detailed comparative analysis with other existing schemes and show that our scheme has a 28% higher accuracy rate than other schemes for uncompressed file format (e.g., text files) and 50% higher accuracy rate for compressed file format (e.g., docx etc.). Our proposed scheme is able to correlate a fragment as small as 1% to the source file with 100% detection rate and able to detect commonality as small as 1% between two documents with appropriate similarity score. Further, our scheme also produces the least false negatives in comparison to other schemes. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license.	en_US
dc.language.iso	en_US	en_US
dc.subject	Data fingerprinting	en_US
dc.subject	Similarity digests	en_US
dc.subject	Fuzzy hashing	en_US
dc.subject	TF-IDF	en_US
dc.subject	Cosine-similarity	en_US
dc.title	Fbhash: a new similarity hashing scheme for digital forensics	en_US
dc.type	Article	en_US