A deep learning-powered fatigue detection system that analyzes driver eye and mouth states in real time using CNN models and MediaPipe FaceMesh, served through an interactive Streamlit dashboard.
The Driver Drowsiness Detection System is an end-to-end computer vision pipeline designed to detect driver fatigue before it leads to accidents. It uses MediaPipe FaceMesh to extract precise eye and mouth regions from a driver image, then runs two independently trained CNN classifiers โ one for eye state (open/closed) and one for mouth state (yawn/no_yawn) โ to compute a weighted fatigue score. The system was developed by comparing a custom CNN against a transfer-learning MobileNetV2 baseline, selecting the best-performing model for each region, and wrapping the full pipeline in a Streamlit dashboard with live fatigue history tracking.
- Started with a raw source directory containing four class folders:
open,closed,yawn, andno_yawn - Programmatically split images into two separate domain datasets: Eye_Dataset (
open/closed) and Mouth_Dataset (yawn/no_yawn) - Filtered for valid image extensions (
.jpg,.jpeg,.png) during copy to avoid corrupt files
- Applied a 70 / 15 / 15 split (train / val / test) to both Eye_Dataset and Mouth_Dataset using a custom
split_dataset()function - Used
random.shuffle()before slicing to ensure class-balanced randomization - Verified split counts using
count_images()across both datasets and all splits
- Fine-tuned a pretrained
MobileNetV2(ImageNet weights) for both eye and mouth classification - Froze all
model.featuresparameters and replaced onlymodel.classifier[1]with a task-specific linear layer - Trained with Adam optimizer (
lr=0.0001),CrossEntropyLoss, and augmentations:RandomHorizontalFlip,RandomRotation(10) - Saved best checkpoint based on highest validation accuracy across 15 epochs
- Built a 4-block convolutional architecture:
Conv2d โ ReLU โ MaxPool2drepeated with channel sizes32 โ 64 โ 128 โ 256 - Classifier head:
Flatten โ Linear(50176, 512) โ ReLU โ Dropout(0.5) โ Linear(512, num_classes) - Trained from scratch with Adam optimizer (
lr=0.001) and same augmentation pipeline - Saved best
EyeCNNtocnn_eye_model.pthand bestMouthCNNtocnn_mouth_model.pth
- Evaluated both MobileNetV2 and CNN on the held-out test set for each region model
- Generated confusion matrices (seaborn heatmap) and classification reports (precision, recall, F1-score) for both architectures
- Selected the custom CNN as the best model based on test accuracy and generalization
๐๏ธ Eye State Detection
| Model | Validation Accuracy | Test Accuracy |
|---|---|---|
| MobileNetV2 | 94.50% | 90.83% |
| CNN โ | 97.71% | 96.33% |
๐ Mouth State Detection
| Model | Validation Accuracy | Test Accuracy |
|---|---|---|
| MobileNetV2 | 94.93% | 90.37% |
| CNN โ | 99.08% | 99.08% |
The custom CNN outperformed MobileNetV2 by +5.5% on eye test accuracy and +8.71% on mouth test accuracy, making it the clear choice for deployment.
- Built a wide-layout Streamlit app (
app4.py) with@st.cache_resourcefor efficient model loading - Integrated MediaPipe FaceMesh (
static_image_mode=True) to extract eye and mouth landmark crops from any uploaded driver image - Implemented a dual-signal fatigue scoring formula:
score = eye_conf ร 0.8 (if closed) + mouth_conf ร 0.4 (if yawn), capped at 100 - Added session-state fatigue history (rolling 20 predictions) with a line chart progression curve and aggregate condition summary
Independently trained CNN classifiers for eye state (open/closed) and mouth state (yawn/no_yawn) โ enabling fine-grained, region-specific fatigue signals.
Uses 468-point facial landmark detection to precisely crop eye and mouth regions with configurable padding, ensuring consistent inputs regardless of face size or distance.
Combines eye closure confidence (ร0.8) and yawn confidence (ร0.4) into a single 0โ100 fatigue score, giving higher weight to the more clinically significant signal (eye closure).
Maps raw scores to Alert (<30), Mild Fatigue (30โ69), and Severe Fatigue (โฅ70) with color-coded Streamlit alerts for instant driver status visibility.
Tracks up to 20 consecutive predictions in session state, plots a line chart of fatigue history, and computes an overall average condition for longitudinal monitoring.
Full comparative training pipeline โ pretrained MobileNetV2 vs. custom CNN โ with confusion matrices and classification reports to justify model selection.
@st.cache_resource ensures both CNN models are loaded once per session, preventing repeated disk reads and dramatically reducing inference latency.
Rounded image cards, structured column layouts, confidence progress bars, and centered image display create a clean, professional dashboard experience.
LEFT_EYEuses landmarks[33, 133, 160, 159, 158, 157, 173, 144, 145, 153]with 15px paddingMOUTHuses 22-point landmark polygon with 25px padding- Both
extract_eye_crop()andextract_mouth_crop()clamp coordinates to image bounds to prevent out-of-bounds slicing
- 4 convolutional blocks:
Conv2d(3โ32),Conv2d(32โ64),Conv2d(64โ128),Conv2d(128โ256), each followed byReLU + MaxPool2d(2) - Classifier:
Flatten โ Linear(50176, 512) โ ReLU โ Dropout(0.5) โ Linear(512, num_classes) 224ร224input with(0.5, 0.5, 0.5)mean/std normalization
- Loaded pretrained ImageNet weights via
models.mobilenet_v2(weights="DEFAULT") - Feature extractor fully frozen (
requires_grad = False) - Custom head:
nn.Linear(model.last_channel, num_classes)replacing the default classifier - Trained with
lr=0.0001(lower than CNN to preserve pretrained features)
predict()runs a single forward pass, appliestorch.softmax, and returns(class_name, confidence_pct)- Both eye and mouth confidence displayed as text metrics +
st.progressbars
fatigue_level_to_number()mapsAlert โ 0,Mild Fatigue โ 1,Severe Fatigue โ 2- Rolling history stored in
st.session_state.fatigue_history(capped at 20) - Average fatigue score thresholds:
<0.5 โ Alert,0.5โ1.5 โ Mild,>1.5 โ Severe
confusion_matrixandclassification_reportfromsklearn.metrics- Heatmap rendered with
seaborn(cmap='Blues', annotated with counts) - Evaluated independently for eye model and mouth model on their respective test splits
| Library | Role |
|---|---|
streamlit |
Interactive web dashboard, file uploader, session state, metrics, charts |
| Library | Role |
|---|---|
torch |
Core deep learning framework, model training, inference |
torch.nn |
CNN architecture (Conv2d, Linear, Dropout, ReLU, MaxPool2d) |
torch.optim |
Adam optimizer |
torchvision.models |
Pretrained MobileNetV2 for transfer learning baseline |
torchvision.datasets |
ImageFolder for structured dataset loading |
torchvision.transforms |
Image augmentation and normalization pipeline |
| Library | Role |
|---|---|
numpy |
Array manipulation, image conversion |
pandas |
Fatigue history DataFrame for line chart rendering |
sklearn.metrics |
confusion_matrix, classification_report (precision, recall, F1) |
PIL (Pillow) |
Image loading, RGB conversion, transform preparation |
| Library | Role |
|---|---|
matplotlib |
Confusion matrix figure rendering |
seaborn |
Heatmap visualization for confusion matrices |
| Library | Role |
|---|---|
mediapipe |
FaceMesh 468-point landmark detection for eye/mouth region extraction |
| Library | Role |
|---|---|
os, shutil |
Dataset organization โ folder creation and image copying |
random |
Shuffle before train/val/test split |
torch.utils.data.DataLoader |
Batched data loading with shuffle control |
git clone https://github.com/your-username/driver-drowsiness-detection.git
cd driver-drowsiness-detection# Windows
python -m venv venv
venv\Scripts\activate
# macOS / Linux
python -m venv venv
source venv/bin/activatepip install -r requirements.txtKey libraries:
torch torchvision
streamlit
mediapipe
Pillow
numpy
pandas
scikit-learn
matplotlib
seaborn
- Place raw class folders (
open/,closed/,yawn/,no_yawn/) inside aTotal/directory - Run the dataset organization notebook cells to create
Eye_Dataset/andMouth_Dataset/withtrain/val/testsplits
- Run the MobileNetV2 cells to generate baseline eye and mouth models
- Run the CNN cells to train
EyeCNNโcnn_eye_model.pthandMouthCNNโcnn_mouth_model.pth - Models are auto-saved on best validation accuracy
In app4.py, update the paths to your saved checkpoints:
EYE_MODEL_PATH = "path/to/cnn_eye_model.pth"
MOUTH_MODEL_PATH = "path/to/cnn_mouth_model.pth"streamlit run app.py-
Fleet Safety Monitoring โ Deploy at logistics or trucking companies to flag drowsy drivers before long-haul trips begin.
-
Ride-Sharing Driver Fatigue Checks โ Integrate into driver apps to periodically verify alertness during extended shifts.
-
Research Baseline โ Use the MobileNetV2 vs. CNN comparison pipeline as a reproducible benchmark for eye/mouth state classification tasks.
-
Driver Training Systems โ Demonstrate the physiological markers of fatigue (eye closure, yawning) as part of safety training curricula.
-
Embedded Vehicle Systems โ Adapt the FaceMesh + CNN pipeline for real-time webcam inference in ADAS (Advanced Driver Assistance Systems).
-
Insurance Telematics โ Provide objective fatigue evidence for accident investigation and risk scoring.
- Real-Time Webcam Mode โ Replace the static image uploader with OpenCV-based live video stream inference
- EAR / MAR Geometric Scoring โ Complement CNN predictions with Eye Aspect Ratio and Mouth Aspect Ratio heuristics for ensemble confidence
- GRAD-CAM Explainability โ Overlay class activation maps on eye/mouth crops to visualize what the CNN learned
- Temporal Fatigue Modeling โ Use an LSTM or sliding window over consecutive frames to detect sustained drowsiness patterns
- Audio Alerting โ Trigger beep alerts via
playsoundwhen severe fatigue is detected - Multi-Face Support โ Extend FaceMesh to
max_num_faces > 1for monitoring multiple drivers or passengers - Mobile Deployment โ Convert models to ONNX or TFLite for on-device inference on dashcam hardware
- Automated Reporting โ Generate per-session PDF fatigue reports with timeline charts for fleet managers
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Streamlit Dashboard (app.py) โ
โ File Uploader โ Image Display โ Reset Button โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MediaPipe FaceMesh Layer โ
โ FaceMesh(static_image_mode=True, max_faces=1) โ
โ 468-point landmark detection on RGB image โ
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
โ extract_eye_ โ โ extract_mouth_crop() โ
โ crop() โ โ 22-point MOUTH polygonโ
โ LEFT_EYE[10pts] โ โ padding=25px โ
โ padding=15px โ โโโโโโโโโโโโฌโโโโโโโโโโโโโ
โโโโโโโโโฌโโโโโโโโโโ โ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโ
โ EyeCNN โ โ MouthCNN โ
โ cnn_eye_model โ โ cnn_mouth_model โ
โ .pth โ โ .pth โ
โ classes: โ โ classes: โ
โ [open, closed] โ โ [no_yawn, yawn] โ
โโโโโโโโโฌโโโโโโโโโโโ โโโโโโโโโโโโโฌโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ get_fatigue_score() โ
โ score = eye_conf ร 0.8 + mouth_conf ร 0.4 โ
โ (if closed) (if yawn) โ
โ Capped at 100 โ
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ get_fatigue_level() โ
โ < 30 โ ๐ข Alert โ
โ 30โ69 โ ๐ก Mild Fatigue โ
โ โฅ 70 โ ๐ด Severe Fatigue โ
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Session State Fatigue History โ
โ fatigue_history[] โ rolling 20 predictions โ
โ Line chart + Average Score + Overall Condition โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The Driver Drowsiness Detection System is a computer vision safety tool that combines geometric facial landmark extraction with custom deep learning classifiers to assess driver fatigue from static images. The pipeline begins with MediaPipe FaceMesh isolating anatomically precise eye and mouth regions, which are then fed independently into two 4-layer CNN models โ EyeCNN (classifying open/closed) and MouthCNN (classifying yawn/no_yawn) โ both trained from scratch on 224ร224 normalized image inputs. A transfer-learning MobileNetV2 baseline was also trained and evaluated using confusion matrices and classification reports before the custom CNN was selected as the final architecture for superior task-specific performance. Fatigue is quantified through a weighted scoring formula that combines eye closure and yawn confidence into a single 0โ100 score, mapped to three alert levels, and tracked across sessions via a Streamlit dashboard with real-time progression curves and aggregate condition summaries.
โญ If you find this project useful, give it a star on GitHub and share your feedback!