Please use this identifier to cite or link to this item: http://dspace.iitrpr.ac.in:8080/xmlui/handle/123456789/1324
Title: Fbhash: a new similarity hashing scheme for digital forensics
Authors: Chang, D.
Ghosh, M.
Sanadhya, S.K.
Singh, M.
White, D.R.
Keywords: Data fingerprinting
Similarity digests
Fuzzy hashing
TF-IDF
Cosine-similarity
Issue Date: 23-Aug-2019
Abstract: With the rapid growth of the World Wide Web and Internet of Things, a huge amount of digital data is being produced every day. Digital forensics investigators face an uphill task when they have to manually screen through and examine tons of such data during a crime investigation. To expedite this process, several automated techniques have been proposed and are being used in practice. Among which tools based on Approximate Matching algorithms have gained prominence, e.g., ssdeep, sdhash, mvHash etc. These tools produce hash signatures for all the files to be examined, compute a similarity score and then compare it with a known reference set to filter out known good as well as bad files. In this way, exact as well as similar matches can be screened out. However, all of these schemes have been shown to be prone to active adversary attack, whereby an attacker, by making feasible changes in the content of the file, intelligently modifies the final hash signature produced to evade detection. Thus, an alternate hashing scheme is required which can resist this attack. In this work, we propose a new Approximate Matching scheme termed as - FbHash. We show that our scheme is secure against active attacks and detects similarity with 98% accuracy. We also provide a detailed comparative analysis with other existing schemes and show that our scheme has a 28% higher accuracy rate than other schemes for uncompressed file format (e.g., text files) and 50% higher accuracy rate for compressed file format (e.g., docx etc.). Our proposed scheme is able to correlate a fragment as small as 1% to the source file with 100% detection rate and able to detect commonality as small as 1% between two documents with appropriate similarity score. Further, our scheme also produces the least false negatives in comparison to other schemes. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license.
URI: http://localhost:8080/xmlui/handle/123456789/1324
Appears in Collections:Year-2019

Files in This Item:
File Description SizeFormat 
Full text.pdf1.95 MBAdobe PDFView/Open    Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.