INSTITUTIONAL DIGITAL REPOSITORY

Not made for each other– Audio-Visual Dissonance-based deepfake detection and localization

Show simple item record

dc.contributor.author Chugh, K.
dc.contributor.author Dhall, A.
dc.contributor.author Gupta, P.
dc.contributor.author Subramanian, R.
dc.date.accessioned 2021-07-03T11:38:57Z
dc.date.available 2021-07-03T11:38:57Z
dc.date.issued 2021-07-03
dc.identifier.uri http://localhost:8080/xmlui/handle/123456789/1978
dc.description.abstract We propose detection of deepfake videos based on the dissimilarity between the audio and visual modalities, termed as the Modality Dissonance Score (MDS). We hypothesize that manipulation of either modality will lead to dis-harmony between the two modalities, e.g., loss of lip-sync, unnatural facial and lip movements, etc. MDS is computed as an aggregate of dissimilarity scores between audio and visual segments in a video. Discriminative features are learnt for the audio and visual channels in a chunk-wise manner, employing the cross-entropy loss for individual modalities, and a contrastive loss that models inter-modality similarity. Extensive experiments on the DFDC and DeepFake-TIMIT Datasets show that our approach outperforms the state-of-the-art by up to 7%. We also demonstrate temporal forgery localization, and show how our technique identifies the manipulated video segments. en_US
dc.language.iso en_US en_US
dc.subject Deepfake detection and localization en_US
dc.subject Neural networks en_US
dc.subject Modality dissonance en_US
dc.subject Contrastive loss en_US
dc.title Not made for each other– Audio-Visual Dissonance-based deepfake detection and localization en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account