Glitch in the matrix: A large scale benchmark for content driven audio–visual forgery detection and localization

Stefanov, K; Gedeon, T; Cai, Z; Dhall, A; Hayat, M; Ghosh, S

Please use this identifier to cite or link to this item: http://dspace.iitrpr.ac.in:8080/xmlui/handle/123456789/4493

Full metadata record

DC Field	Value	Language
dc.contributor.author	Cai, Z	-
dc.contributor.author	Ghosh, S	-
dc.contributor.author	Dhall, A	-
dc.contributor.author	Gedeon, T	-
dc.contributor.author	Stefanov, K	-
dc.contributor.author	Hayat, M	-
dc.date.accessioned	2024-05-19T05:07:56Z	-
dc.date.available	2024-05-19T05:07:56Z	-
dc.date.issued	2024-05-19	-
dc.identifier.uri	http://dspace.iitrpr.ac.in:8080/xmlui/handle/123456789/4493	-
dc.description.abstract	Abstract: Most deepfake detection methods focus on detecting spatial and/or spatio-temporal changes in facial attributes and are centered around the binary classification task of detecting whether a video is real or fake. This is because available benchmark datasets contain mostly visual-only modifications present in the entirety of the video. However, a sophisticated deepfake may include small segments of audio or audio–visual manipulations that can completely change the meaning of the video content. To addresses this gap, we propose and benchmark a new dataset, Localized Audio Visual DeepFake (LAV-DF), consisting of strategic content-driven audio, visual and audio–visual manipulations. The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture which effectively captures multimodal manipulations. We further improve (i.e. BA-TFD ) the baseline method by replacing the backbone with a Multiscale Vision Transformer and guide the training process with contrastive, frame classification, boundary matching and multimodal boundary matching loss functions. The quantitative analysis demonstrates the superiority of BA-TFD on temporal forgery localization and deepfake detection tasks using several benchmark datasets including our newly proposed dataset. The dataset, models and code are available at https://github.com/ControlNet/LAV-DF.	en_US
dc.language.iso	en_US	en_US
dc.subject	Datasets	en_US
dc.subject	Deepfake	en_US
dc.subject	Localization	en_US
dc.subject	Detection	en_US
dc.title	Glitch in the matrix: A large scale benchmark for content driven audio–visual forgery detection and localization	en_US
dc.type	Article	en_US
Appears in Collections:	Year-2023

Files in This Item:

File	Description	Size	Format
Full Text.pdf		867.13 kB	Adobe PDF	View/Open Request a copy

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets