This repository contains comprehensive python code and results for predictive modeling and survival analysis in disease risk assessment. It integrates various data sources, including metabolomics, PRS (Polygenic Risk Scores), and phenotype data, to fit classification and survival models for different diseases. The iPython notebooks offer flexibility in model selection and the inclusion of specific feature types in the training process.
- Model Selection: Choose from models XG Boost, ADA Boost, Logistic Regression, Lasso, Ridge, Elastic Net, Multi-Layer Perceptron, or Cox Regression.
- Omic Specific Training: Option to include any combination of demographics, genomics, biomarkers, and metabolomics in training data. Additional flexibility for how to fuse models with the PRS.
- Feature Selection: Perform feature importance analysis on any model. Additional ability to train sparse models using feature rankings from any model.
To use this pipeline, clone the repository and ensure you have the required Python packages installed.
- iPython environment
- Necessary Python libraries (installed at the start of each notebook)
Clone the repository to your local machine:
git clone [repository-url]See the README for Disease Prediction and Survival Analysis for instructions on running the respective code.
