GitHub - BenYang12/predictEPL

⚽ predictEPL: Predicting Premier League Outcomes with Random Forest

This project builds a machine learning model to predict whether a Premier League team will win or not, using a Random Forest Classifier and hyperparameter tuning with cross-validation.

The project uses the EPL Matches dataset containing match-by-match team statistics across seasons, and demonstrates the full workflow — from data preprocessing and feature engineering (rolling averages) to model evaluation with confusion matrices and classification metrics.

📊 Project Overview

Football outcomes depend on a variety of factors — opponent strength, home advantage, and recent form. Automating match outcome prediction helps quantify these patterns and test hypotheses about team performance.

My project leverages a Random Forest Classifier to predict match results (win = 1, not win = 0) based on both static and time-dependent features.

⚙️ Technologies Used

Python 3.10+

NumPy

Pandas

Matplotlib

scikit-learn

🧠 Feature Engineering

To improve prediction accuracy, the model incorporates rolling averages — smoothed versions of recent match statistics (e.g., goals, shots, distance).

These features reduce short term fluctuations and randomness from single-match fluctuations, allowing the model to recognize trends like team momentum or fatigue effects.

🔧 Model Training

Model: RandomForestClassifier(n_estimators=50, min_samples_split=10)

Train/test split: based on match date (before vs after 2022-01-01)

Hyperparameter tuning: Performed using RandomizedSearchCV with 7-fold cross-validation over:

n_estimators ∈ [50, 500)

min_samples_split ∈ [1, 20)

✅ Interpretation

The tuned Random Forest achieved 65% accuracy, which is good considering the complex and unpredictable nature of sports matches

🚀 Results

Best hyperparameters: n_estimators ≈ 343, min_samples_split ≈ 13

Highest Accuracy: 0.65

Conclusion: The Random Forest Classifier performs robustly on EPL match data — handling nonlinear relationships and mixed categorical features while avoiding overfitting through ensemble averaging.

🧠 Future Improvements

Incorporate expected goals (xG) or player-level features for deeper predictive context

Test advanced ensemble models (e.g., XGBoost, LightGBM)

Add feature-importance visualization to interpret drivers of victory

Extend prediction horizon

👤 Author

Benjamin Yang University of North Carolina at Chapel Hill

📧 Email: yangbenjamin19@gmail.com

🔗 GitHub: BenYang12

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
matches (5).csv		matches (5).csv
predict (10).ipynb		predict (10).ipynb
predict (9).ipynb		predict (9).ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages