Skip to content

BenYang12/predictEPL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⚽ predictEPL: Predicting Premier League Outcomes with Random Forest

This project builds a machine learning model to predict whether a Premier League team will win or not, using a Random Forest Classifier and hyperparameter tuning with cross-validation.

The project uses the EPL Matches dataset containing match-by-match team statistics across seasons, and demonstrates the full workflow β€” from data preprocessing and feature engineering (rolling averages) to model evaluation with confusion matrices and classification metrics.

πŸ“Š Project Overview

Football outcomes depend on a variety of factors β€” opponent strength, home advantage, and recent form. Automating match outcome prediction helps quantify these patterns and test hypotheses about team performance.

My project leverages a Random Forest Classifier to predict match results (win = 1, not win = 0) based on both static and time-dependent features.

βš™οΈ Technologies Used

Python 3.10+

NumPy

Pandas

Matplotlib

scikit-learn

🧠 Feature Engineering

To improve prediction accuracy, the model incorporates rolling averages β€” smoothed versions of recent match statistics (e.g., goals, shots, distance).

These features reduce short term fluctuations and randomness from single-match fluctuations, allowing the model to recognize trends like team momentum or fatigue effects.

πŸ”§ Model Training

Model: RandomForestClassifier(n_estimators=50, min_samples_split=10)

Train/test split: based on match date (before vs after 2022-01-01)

Hyperparameter tuning: Performed using RandomizedSearchCV with 7-fold cross-validation over:

n_estimators ∈ [50, 500)

min_samples_split ∈ [1, 20)

βœ… Interpretation

The tuned Random Forest achieved 65% accuracy, which is good considering the complex and unpredictable nature of sports matches

πŸš€ Results

Best hyperparameters: n_estimators β‰ˆ 343, min_samples_split β‰ˆ 13

Highest Accuracy: 0.65

Conclusion: The Random Forest Classifier performs robustly on EPL match data β€” handling nonlinear relationships and mixed categorical features while avoiding overfitting through ensemble averaging.

🧠 Future Improvements

Incorporate expected goals (xG) or player-level features for deeper predictive context

Test advanced ensemble models (e.g., XGBoost, LightGBM)

Add feature-importance visualization to interpret drivers of victory

Extend prediction horizon

πŸ‘€ Author

Benjamin Yang University of North Carolina at Chapel Hill

πŸ“§ Email: yangbenjamin19@gmail.com

πŸ”— GitHub: BenYang12

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors