DeitFake: Deepfake Detection using Vision Transformers

A deepfake detection system built on Vision Transformers (ViT) using Facebook's Data-efficient Image Transformers (DeiT) architecture, fine-tuned for binary classification of real and fake images.

🎯 Overview

This repository contains the implementation and validation of a deepfake detection model based on the DeiT (Data-efficient Image Transformer) architecture. The model achieves state-of-the-art performance in distinguishing between real and synthetically generated (deepfake) images across multiple benchmark datasets.

Key Features:

Fine-tuned facebook/deit-base-patch16-224 Vision Transformer
Multi-GPU and TPU training support
Validated on multiple benchmark datasets (CelebDF-v2, FaceForensics++, OpenForensics)
High accuracy (98.71%) and AUROC (99.93%) on test data
Face detection and alignment preprocessing pipeline

📊 Model Performance

Training Dataset Performance

The model was trained on the Deepfake and Real Images dataset:

Metric	Score
Accuracy	98.71%
Macro F1-Score	98.71%
AUROC	99.93%

Validation on Benchmark Datasets

The model has been extensively validated on multiple standard deepfake detection benchmarks:

CelebDF-v2: Industry-standard celebrity deepfake dataset
FaceForensics++: Large-scale forensics dataset with multiple manipulation techniques
OpenForensics: Open-world deepfake detection benchmark

Detailed validation results and metrics are available in the respective validation notebooks.

🏗️ Model Architecture

Base Model: facebook/deit-base-patch16-224

DeiT (Data-efficient Image Transformer) is a Vision Transformer architecture that:

Uses a transformer encoder architecture adapted for image classification
Processes 224×224 images with 16×16 patches
Employs distillation techniques for efficient training
Contains approximately 86M parameters

Classification Head: Binary classifier (Real vs. Fake)

🔧 Installation

Prerequisites

pip install torch torchvision
pip install transformers datasets
pip install scikit-learn imbalanced-learn
pip install facenet-pytorch opencv-python
pip install tqdm pandas numpy pillow

For TPU Support (Optional)

pip install torch_xla[tpu]

🚀 Usage

Quick Start - Inference

from transformers import AutoModelForImageClassification, AutoImageProcessor
from PIL import Image

# Load the model and processor
model = AutoModelForImageClassification.from_pretrained("sakshamkr1/deepfake-fb-deit-vit-224")
processor = AutoImageProcessor.from_pretrained("sakshamkr1/deepfake-fb-deit-vit-224")

# Load and preprocess image
image = Image.open("path/to/image.jpg")
inputs = processor(images=image, return_tensors="pt")

# Get prediction
outputs = model(**inputs)
logits = outputs.logits
predicted_class = logits.argmax(-1).item()

# 0: Fake, 1: Real
print(f"Prediction: {'Real' if predicted_class == 1 else 'Fake'}")

Training from Scratch

Use the DeitFake_complete.ipynb notebook which includes:

Data loading and preparation
Data augmentation and preprocessing
Model definition and training configuration
Training with mixed precision
Evaluation and metrics visualization

Retraining on Custom Data

Use the DeitFake_retrain.ipynb notebook to fine-tune the model on your own dataset.

Validation on Benchmark Datasets

Three validation notebooks are provided:

DeitFake_Validation1_celebdf-v2.ipynb - CelebDF-v2 validation
DeitFake_Validation2_FF++.ipynb - FaceForensics++ validation
DeitFake_Validation3_OpenForensics_Official_TestDev.ipynb - OpenForensics validation

📁 Repository Structure

DeitFake/
├── DeitFake_complete.ipynb                    # Complete training pipeline
├── DeitFake_retrain.ipynb                     # Retraining on custom data
├── DeitFake_Validation1_celebdf-v2.ipynb      # CelebDF-v2 validation
├── DeitFake_Validation2_FF++.ipynb            # FaceForensics++ validation
├── DeitFake_Validation3_OpenForensics_Official_TestDev.ipynb  # OpenForensics validation
└── README.md                                   # This file

🔬 Training Details

Hyperparameters

Epochs: 5
Learning Rate: 2e-5
Batch Size: Configured for multi-GPU/TPU
Weight Decay: 0.01
Mixed Precision: Enabled (fp16)
Optimizer: AdamW

Data Augmentation

Random horizontal flipping
Color jittering
Random rotation
Normalization with ImageNet statistics

Class Balancing

Random over-sampling applied to balance real and fake classes during training

🎓 Citation

If you use this work in your research, please reference the paper:

@article{KUMAR2026100734,
title = {DeiTFake: Deepfake detection model using DeiT multi-stage training},
journal = {Array},
pages = {100734},
year = {2026},
issn = {2590-0056},
doi = {https://doi.org/10.1016/j.array.2026.100734},
url = {https://www.sciencedirect.com/science/article/pii/S2590005626000573},
author = {Saksham Kumar and Ashish Singh and Srinivasarao Thota and Sunil Kumar Singh and Chandan Kumar},
keywords = {DeepFake detection, DeiT, Vision transformers, Transfer learning, Progressive training, OpenForensics},
abstract = {Deepfakes are major threats to the integrity of digital media. We propose DeiTFake, a DeiT-based transformer and a two-stage progressive training strategy with increasing augmentation complexity. The approach applies an initial transfer-learning phase with standard augmentations, followed by a fine-tuning phase using advanced affine and color-based augmentations. We use DeiT models pre-trained weights, providing a strong initialization for learning manipulation artifacts, increasing the robustness of the detection model. Trained on a face-cropped dataset derived from the OpenForensics dataset (190,335 images), DeiTFake achieves 98.71% accuracy after stage one and 99.22% accuracy with an AUROC of 99.97%, after stage two, achieving strong performance under the same face-level evaluation setting. We analyze augmentation impact and training schedules, and provide practical benchmarks for facial deepfake detection.}
}

🙏 Acknowledgments

Model: Based on Facebook AI's DeiT (Data-efficient Image Transformers)
Framework: Built with Hugging Face Transformers
Training Dataset: Deepfake and Real Images on Kaggle
Validation Datasets:
- CelebDF-v2
- FaceForensics++
- OpenForensics

📝 Model Card

The trained model is available on Hugging Face Hub: sakshamkr1/deepfake-fb-deit-vit-224

Intended Uses

Research purposes related to deepfake detection
Binary classification of images as Real or Fake
Benchmarking deepfake detection algorithms

Limitations

Designed for face images; may not generalize to other types of deepfakes
Performance may vary on deepfakes generated by methods not represented in training data
Requires face detection preprocessing for optimal results

Ethical Considerations

This model should be used responsibly and ethically:

Not for surveillance without proper consent
Not for discriminatory purposes
Consider privacy implications when processing personal images
Be aware of potential biases in training data

🔒 License

The work is licensed under the Apache-2.0.

📧 Contact

For questions or issues, please open an issue in this repository.

Note: This is a research project. Performance may vary depending on the nature and quality of input images and the specific deepfake generation techniques used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeitFake: Deepfake Detection using Vision Transformers

🎯 Overview

📊 Model Performance

Training Dataset Performance

Validation on Benchmark Datasets

🏗️ Model Architecture

🔧 Installation

Prerequisites

For TPU Support (Optional)

🚀 Usage

Quick Start - Inference

Training from Scratch

Retraining on Custom Data

Validation on Benchmark Datasets

📁 Repository Structure

🔬 Training Details

Hyperparameters

Data Augmentation

Class Balancing

🎓 Citation

🙏 Acknowledgments

📝 Model Card

Intended Uses

Limitations

Ethical Considerations

🔒 License

📧 Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
DeitFake_Validation1_celebdf-v2.ipynb		DeitFake_Validation1_celebdf-v2.ipynb
DeitFake_Validation2_FF++.ipynb		DeitFake_Validation2_FF++.ipynb
DeitFake_Validation3_OpenForensics_Official_TestDev.ipynb		DeitFake_Validation3_OpenForensics_Official_TestDev.ipynb
DeitFake_complete.ipynb		DeitFake_complete.ipynb
DeitFake_retrain.ipynb		DeitFake_retrain.ipynb
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

DeitFake: Deepfake Detection using Vision Transformers

🎯 Overview

📊 Model Performance

Training Dataset Performance

Validation on Benchmark Datasets

🏗️ Model Architecture

🔧 Installation

Prerequisites

For TPU Support (Optional)

🚀 Usage

Quick Start - Inference

Training from Scratch

Retraining on Custom Data

Validation on Benchmark Datasets

📁 Repository Structure

🔬 Training Details

Hyperparameters

Data Augmentation

Class Balancing

🎓 Citation

🙏 Acknowledgments

📝 Model Card

Intended Uses

Limitations

Ethical Considerations

🔒 License

📧 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages