Skip to content

Reasonant/Project-From-Model-to-Production

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MetroPT IsolationForest Simple App

This is a small, Streamlit-friendly project for training and serving an IsolationForest model on local MetroPT-3 data.

The project intentionally keeps the moving parts simple:

  • config.py holds shared paths, features, and thresholds.
  • setup_project.py checks folders, splits the dataset, and trains the first model if needed.
  • model.py trains, saves, loads, and predicts with an IsolationForest.
  • drift.py compares a current dataframe with the initial training dataframe.
  • api.py serves the saved model with FastAPI.
  • simulation.py runs the whole workflow and simulates one day, one month, or three months of sensor readings.
  • app/streamlit_app.py gives you a visual UI for the same flow.

Setup

Install dependencies:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

This project is intentionally built for the MetroPT-3 dataset as a CSV file. Put the file at:

dataset/MetroPT3.csv

If your file is somewhere else, edit DATASET_PATH in src/metropt_app/config.py.

The timestamp column name must be configured in src/metropt_app/config.py:

TIMESTAMP_COLUMN = "timestamp"

The project assumes the MetroPT-3 CSV contains the configured feature columns in FEATURES. It does not perform generic dataset-format detection, timestamp-column discovery, or missing-feature validation.

Run With Streamlit

streamlit run app/streamlit_app.py

Use the tabs in order:

  1. Setup: split the dataset and check files.
  2. Train: train and save the model.
  3. Predict: start the API or run a local prediction.
  4. Simulate: simulate one day, one month, or three months.
  5. Logs: inspect simulation summaries.

The Train tab also shows recent MLflow runs. You can start the MLflow UI, inspect model history, and restore a previous logged model into models/isolation_forest.joblib.

Run From The Command Line

Run a one-day local simulation:

set PYTHONPATH=src
python -m metropt_app.simulation --mode day

Run a one-month local simulation:

set PYTHONPATH=src
python -m metropt_app.simulation --mode month

Run a three-month local simulation:

set PYTHONPATH=src
python -m metropt_app.simulation --mode three_months

Run known failure-period simulations:

set PYTHONPATH=src
python -m metropt_app.simulation --mode failure_2020_04_18
python -m metropt_app.simulation --mode failure_2020_05_29_30
python -m metropt_app.simulation --mode failure_2020_07_15

Start the API manually:

set PYTHONPATH=src
uvicorn metropt_app.api:app --host 127.0.0.1 --port 8000

Run a one-month simulation through an already running API:

set PYTHONPATH=src
python -m metropt_app.simulation --mode month --use-api

API docs will be at:

http://127.0.0.1:8000/docs

MLflow Model Tracking

Every model training run is logged to MLflow under:

mlruns/

The project logs:

  • model parameters
  • training row count
  • selected feature list
  • training anomaly ratio
  • training score summary statistics
  • the trained sklearn model artifact

Start the MLflow UI with:

set PYTHONPATH=src
set MLFLOW_ALLOW_FILE_STORE=true
mlflow ui --backend-store-uri ./mlruns --port 5000

Older logged models can be restored from the Streamlit Train tab by selecting a run and clicking Restore selected MLflow model.

Drift Behavior

The simulation checks drift in non-overlapping 21,600 row windows. Each window is compared with the full first month of training data. Drift is reported if any feature drifts.

Retraining is calendar-month based: drift checks during a simulated month contribute to one monthly retrain decision. If at least one window in that month says should_retrain, the model may retrain once at the end of the simulated month.

Retraining is blocked if any window in that month is marked as possible_failure, because failure-like data should not be learned as normal behavior. When retraining is allowed, the model trains only on rows from the last month that the current model predicted as normal.

Each drift window also logs its anomaly ratio. If more than 0.99 of the rows in a window are anomalies, the window is marked as possible_failure.

About

This is a small, Streamlit-friendly project for training and serving an `IsolationForest` model on local MetroPT-3 data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors