ASL-MDFD: Adversarial Self-Supervised Learning for Generalizable GAN-Resilient Multimodal Deepfake Detection
DOI:
https://doi.org/10.63412/btg5gj03Keywords:
Deepfake Detection, Generative Adversarial Networks, Self-Supervised Learning, Adversarial Training, Multimodal Fusion, Cross-Dataset GeneralizationAbstract
The rise of hyper-realistic synthetic media generated by Generative Adversarial Networks (GANs) and diffusion models poses significant challenges to deepfake detection systems, particularly in cross-dataset and cross-GAN generalization. In this work, we propose ASL-MDFD: a novel framework that unifies Adversarial training, Self-Supervised Learning (SSL), and Multimodal Fusion to detect deepfakes across diverse sources. Our approach leverages rotation prediction, patch shuffling recovery, and contrastive audio-visual alignment as pretext tasks to learn intrinsic representations without heavy reliance on labels. Simultaneously, adversarially perturbed examples generated using PGD simulate artifacts from unseen GANs, improving model robustness. The multimodal architecture integrates visual, audio, and temporal streams using cross-modal attention to detect inconsistencies in facial textures, voice artifacts, and motion dynamics. Evaluated across FaceForensics++, DFDC, Celeb-DF, StyleGAN3, and StarGANv2 datasets, ASL-MDFD achieves state-of-the-art performance, including 92.3% AUC on Celeb-DF and 88.7% accuracy on StyleGAN3 fakes, significantly outperforming existing baselines. Our results demonstrate the effectiveness of combining SSL, adversarial resilience, and multimodal cues in building robust, generalizable deepfake detectors.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Global Innovations and Solutions

This work is licensed under a Creative Commons Attribution 4.0 International License.