This is a small, Streamlit-friendly project for training and serving an IsolationForest model on local MetroPT-3 data.
The project intentionally keeps the moving parts simple:
config.pyholds shared paths, features, and thresholds.setup_project.pychecks folders, splits the dataset, and trains the first model if needed.model.pytrains, saves, loads, and predicts with anIsolationForest.drift.pycompares a current dataframe with the initial training dataframe.api.pyserves the saved model with FastAPI.simulation.pyruns the whole workflow and simulates one day, one month, or three months of sensor readings.app/streamlit_app.pygives you a visual UI for the same flow.
Install dependencies:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txtThis project is intentionally built for the MetroPT-3 dataset as a CSV file. Put the file at:
dataset/MetroPT3.csv
If your file is somewhere else, edit DATASET_PATH in src/metropt_app/config.py.
The timestamp column name must be configured in src/metropt_app/config.py:
TIMESTAMP_COLUMN = "timestamp"The project assumes the MetroPT-3 CSV contains the configured feature columns in FEATURES. It does not perform generic dataset-format detection, timestamp-column discovery, or missing-feature validation.
streamlit run app/streamlit_app.pyUse the tabs in order:
- Setup: split the dataset and check files.
- Train: train and save the model.
- Predict: start the API or run a local prediction.
- Simulate: simulate one day, one month, or three months.
- Logs: inspect simulation summaries.
The Train tab also shows recent MLflow runs. You can start the MLflow UI, inspect model history, and restore a previous logged model into models/isolation_forest.joblib.
Run a one-day local simulation:
set PYTHONPATH=src
python -m metropt_app.simulation --mode dayRun a one-month local simulation:
set PYTHONPATH=src
python -m metropt_app.simulation --mode monthRun a three-month local simulation:
set PYTHONPATH=src
python -m metropt_app.simulation --mode three_monthsRun known failure-period simulations:
set PYTHONPATH=src
python -m metropt_app.simulation --mode failure_2020_04_18
python -m metropt_app.simulation --mode failure_2020_05_29_30
python -m metropt_app.simulation --mode failure_2020_07_15Start the API manually:
set PYTHONPATH=src
uvicorn metropt_app.api:app --host 127.0.0.1 --port 8000Run a one-month simulation through an already running API:
set PYTHONPATH=src
python -m metropt_app.simulation --mode month --use-apiAPI docs will be at:
http://127.0.0.1:8000/docs
Every model training run is logged to MLflow under:
mlruns/
The project logs:
- model parameters
- training row count
- selected feature list
- training anomaly ratio
- training score summary statistics
- the trained sklearn model artifact
Start the MLflow UI with:
set PYTHONPATH=src
set MLFLOW_ALLOW_FILE_STORE=true
mlflow ui --backend-store-uri ./mlruns --port 5000Older logged models can be restored from the Streamlit Train tab by selecting a run and clicking Restore selected MLflow model.
The simulation checks drift in non-overlapping 21,600 row windows. Each window is compared with the full first month of training data. Drift is reported if any feature drifts.
Retraining is calendar-month based: drift checks during a simulated month contribute to one monthly retrain decision. If at least one window in that month says should_retrain, the model may retrain once at the end of the simulated month.
Retraining is blocked if any window in that month is marked as possible_failure, because failure-like data should not be learned as normal behavior. When retraining is allowed, the model trains only on rows from the last month that the current model predicted as normal.
Each drift window also logs its anomaly ratio. If more than 0.99 of the rows in a window are anomalies, the window is marked as possible_failure.