A deepfake detection system built on Vision Transformers (ViT) using Facebook's Data-efficient Image Transformers (DeiT) architecture, fine-tuned for binary classification of real and fake images.
This repository contains the implementation and validation of a deepfake detection model based on the DeiT (Data-efficient Image Transformer) architecture. The model achieves state-of-the-art performance in distinguishing between real and synthetically generated (deepfake) images across multiple benchmark datasets.
Key Features:
- Fine-tuned
facebook/deit-base-patch16-224Vision Transformer - Multi-GPU and TPU training support
- Validated on multiple benchmark datasets (CelebDF-v2, FaceForensics++, OpenForensics)
- High accuracy (98.71%) and AUROC (99.93%) on test data
- Face detection and alignment preprocessing pipeline
The model was trained on the Deepfake and Real Images dataset:
| Metric | Score |
|---|---|
| Accuracy | 98.71% |
| Macro F1-Score | 98.71% |
| AUROC | 99.93% |
The model has been extensively validated on multiple standard deepfake detection benchmarks:
- CelebDF-v2: Industry-standard celebrity deepfake dataset
- FaceForensics++: Large-scale forensics dataset with multiple manipulation techniques
- OpenForensics: Open-world deepfake detection benchmark
Detailed validation results and metrics are available in the respective validation notebooks.
Base Model: facebook/deit-base-patch16-224
DeiT (Data-efficient Image Transformer) is a Vision Transformer architecture that:
- Uses a transformer encoder architecture adapted for image classification
- Processes 224Γ224 images with 16Γ16 patches
- Employs distillation techniques for efficient training
- Contains approximately 86M parameters
Classification Head: Binary classifier (Real vs. Fake)
pip install torch torchvision
pip install transformers datasets
pip install scikit-learn imbalanced-learn
pip install facenet-pytorch opencv-python
pip install tqdm pandas numpy pillowpip install torch_xla[tpu]from transformers import AutoModelForImageClassification, AutoImageProcessor
from PIL import Image
# Load the model and processor
model = AutoModelForImageClassification.from_pretrained("sakshamkr1/deepfake-fb-deit-vit-224")
processor = AutoImageProcessor.from_pretrained("sakshamkr1/deepfake-fb-deit-vit-224")
# Load and preprocess image
image = Image.open("path/to/image.jpg")
inputs = processor(images=image, return_tensors="pt")
# Get prediction
outputs = model(**inputs)
logits = outputs.logits
predicted_class = logits.argmax(-1).item()
# 0: Fake, 1: Real
print(f"Prediction: {'Real' if predicted_class == 1 else 'Fake'}")Use the DeitFake_complete.ipynb notebook which includes:
- Data loading and preparation
- Data augmentation and preprocessing
- Model definition and training configuration
- Training with mixed precision
- Evaluation and metrics visualization
Use the DeitFake_retrain.ipynb notebook to fine-tune the model on your own dataset.
Three validation notebooks are provided:
DeitFake_Validation1_celebdf-v2.ipynb- CelebDF-v2 validationDeitFake_Validation2_FF++.ipynb- FaceForensics++ validationDeitFake_Validation3_OpenForensics_Official_TestDev.ipynb- OpenForensics validation
DeitFake/
βββ DeitFake_complete.ipynb # Complete training pipeline
βββ DeitFake_retrain.ipynb # Retraining on custom data
βββ DeitFake_Validation1_celebdf-v2.ipynb # CelebDF-v2 validation
βββ DeitFake_Validation2_FF++.ipynb # FaceForensics++ validation
βββ DeitFake_Validation3_OpenForensics_Official_TestDev.ipynb # OpenForensics validation
βββ README.md # This file
- Epochs: 5
- Learning Rate: 2e-5
- Batch Size: Configured for multi-GPU/TPU
- Weight Decay: 0.01
- Mixed Precision: Enabled (fp16)
- Optimizer: AdamW
- Random horizontal flipping
- Color jittering
- Random rotation
- Normalization with ImageNet statistics
- Random over-sampling applied to balance real and fake classes during training
If you use this work in your research, please reference the paper:
@article{KUMAR2026100734,
title = {DeiTFake: Deepfake detection model using DeiT multi-stage training},
journal = {Array},
pages = {100734},
year = {2026},
issn = {2590-0056},
doi = {https://doi.org/10.1016/j.array.2026.100734},
url = {https://www.sciencedirect.com/science/article/pii/S2590005626000573},
author = {Saksham Kumar and Ashish Singh and Srinivasarao Thota and Sunil Kumar Singh and Chandan Kumar},
keywords = {DeepFake detection, DeiT, Vision transformers, Transfer learning, Progressive training, OpenForensics},
abstract = {Deepfakes are major threats to the integrity of digital media. We propose DeiTFake, a DeiT-based transformer and a two-stage progressive training strategy with increasing augmentation complexity. The approach applies an initial transfer-learning phase with standard augmentations, followed by a fine-tuning phase using advanced affine and color-based augmentations. We use DeiT models pre-trained weights, providing a strong initialization for learning manipulation artifacts, increasing the robustness of the detection model. Trained on a face-cropped dataset derived from the OpenForensics dataset (190,335 images), DeiTFake achieves 98.71% accuracy after stage one and 99.22% accuracy with an AUROC of 99.97%, after stage two, achieving strong performance under the same face-level evaluation setting. We analyze augmentation impact and training schedules, and provide practical benchmarks for facial deepfake detection.}
}- Model: Based on Facebook AI's DeiT (Data-efficient Image Transformers)
- Framework: Built with Hugging Face Transformers
- Training Dataset: Deepfake and Real Images on Kaggle
- Validation Datasets:
- CelebDF-v2
- FaceForensics++
- OpenForensics
The trained model is available on Hugging Face Hub: sakshamkr1/deepfake-fb-deit-vit-224
- Research purposes related to deepfake detection
- Binary classification of images as Real or Fake
- Benchmarking deepfake detection algorithms
- Designed for face images; may not generalize to other types of deepfakes
- Performance may vary on deepfakes generated by methods not represented in training data
- Requires face detection preprocessing for optimal results
This model should be used responsibly and ethically:
- Not for surveillance without proper consent
- Not for discriminatory purposes
- Consider privacy implications when processing personal images
- Be aware of potential biases in training data
The work is licensed under the Apache-2.0.
For questions or issues, please open an issue in this repository.
Note: This is a research project. Performance may vary depending on the nature and quality of input images and the specific deepfake generation techniques used.