Please use this identifier to cite or link to this item:
http://dspace.iitrpr.ac.in:8080/xmlui/handle/123456789/4493
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Cai, Z | - |
dc.contributor.author | Ghosh, S | - |
dc.contributor.author | Dhall, A | - |
dc.contributor.author | Gedeon, T | - |
dc.contributor.author | Stefanov, K | - |
dc.contributor.author | Hayat, M | - |
dc.date.accessioned | 2024-05-19T05:07:56Z | - |
dc.date.available | 2024-05-19T05:07:56Z | - |
dc.date.issued | 2024-05-19 | - |
dc.identifier.uri | http://dspace.iitrpr.ac.in:8080/xmlui/handle/123456789/4493 | - |
dc.description.abstract | Abstract: Most deepfake detection methods focus on detecting spatial and/or spatio-temporal changes in facial attributes and are centered around the binary classification task of detecting whether a video is real or fake. This is because available benchmark datasets contain mostly visual-only modifications present in the entirety of the video. However, a sophisticated deepfake may include small segments of audio or audio–visual manipulations that can completely change the meaning of the video content. To addresses this gap, we propose and benchmark a new dataset, Localized Audio Visual DeepFake (LAV-DF), consisting of strategic content-driven audio, visual and audio–visual manipulations. The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture which effectively captures multimodal manipulations. We further improve (i.e. BA-TFD ) the baseline method by replacing the backbone with a Multiscale Vision Transformer and guide the training process with contrastive, frame classification, boundary matching and multimodal boundary matching loss functions. The quantitative analysis demonstrates the superiority of BA-TFD on temporal forgery localization and deepfake detection tasks using several benchmark datasets including our newly proposed dataset. The dataset, models and code are available at https://github.com/ControlNet/LAV-DF. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | Datasets | en_US |
dc.subject | Deepfake | en_US |
dc.subject | Localization | en_US |
dc.subject | Detection | en_US |
dc.title | Glitch in the matrix: A large scale benchmark for content driven audio–visual forgery detection and localization | en_US |
dc.type | Article | en_US |
Appears in Collections: | Year-2023 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Full Text.pdf | 867.13 kB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.