A portfolio project demonstrating an end to end analytics workflow for personal finance reporting, from raw transaction files to a structured Power BI dashboard.
This project demonstrates how raw financial transaction exports can be transformed into a structured analytical model using a reproducible Python pipeline and a Power BI star schema.
Workflow:
Raw CSV exports
→ Python ETL pipeline
→ Rule based categorization
→ Clean combined dataset
→ Power BI star schema model
→ Budget analytics dashboards
The project is presented as a public portfolio case study for data analytics and BI roles.
Example financial overview dashboard built on the processed transaction dataset.
This project demonstrates an end to end analytics pipeline:
Raw transaction exports
↓
Python ETL pipeline
↓
Rule based categorization
↓
Clean transaction dataset
↓
Power BI star schema model
↓
Budget analytics dashboards
This repository presents a simplified but realistic analytics workflow for personal finance reporting.
Included in this project:
• synthetic transaction datasets
• rule based categorization examples
• Python data preparation pipeline
• documentation of the analytical model
• reproducible output datasets
• Power BI modeling guidance
The public repository intentionally excludes any real financial records or private source exports.
All examples are synthetic or sanitized to demonstrate the workflow without exposing personal data.
If you are reviewing this repository, a good path is:
- Review the analytics workflow described in the architecture documentation
- Inspect the synthetic datasets in
data/sample/ - Review the rule-based categorization examples in
rules/ - Run the Python pipeline from
src/main.py - Inspect the generated datasets and documented BI model
This reflects how the full analytics process moves from raw data to reporting.
- Python based transaction ingestion and standardization
- Rule based categorization workflow
- Recategorization support for continuous cleanup
- Star schema model for reporting
- Power BI dashboards for budget analysis and diagnostics
- Public safe documentation and synthetic example data
- Python
- Pandas
- CSV based ingestion
- Power BI
- DAX
- Dimensional modeling
- Install Python 3.10 or newer
- Install dependencies with
pip install -r requirements.txt - Run the pipeline with
python src/main.py - Review generated outputs in the
output/folder - Use the synthetic datasets and outputs to build the Power BI model
A minimal Python pipeline is included to demonstrate the data preparation workflow.
See:
docs/run_pipeline.md
This public repository does not contain real personal financial data.
All public examples, screenshots, documentation, and sample datasets are either sanitized or synthetically generated. Real source exports, personal identifiers, bank references, counterparties, and transaction descriptions are excluded.
The analytics flow is structured as:
Raw CSV exports
→ Data standardization
→ Categorization rules
→ Combined transactions dataset
→ Star schema model
→ Budget reporting dashboards
See docs/architecture.md for more detail.
For a visual overview of the workflow see:
docs/architecture_diagram.md
Core analytical tables:
- FactTransactions
- DimDate
- DimCategory
- Budget
The model follows star schema principles for clean and scalable reporting.
See docs/data_model.md for full details.
- Financial Overview
- Category Calibration
- Budget vs Actual
- Budget Diagnostics
Sanitized screenshots will be stored in the screenshots/ folder.
Example analytics outputs produced by the pipeline and Power BI model.
Measures are organized into the following groups:
- Budget Amount Raw
- Variance Raw
- Variance % Raw
- Budget Amount
- Variance
- Variance %
- Budget Consumed %
- Budget Remaining
- Budget Pace %
- Expenses YTD
- Budget YTD Correct
- Variance YTD
- Variance YTD %
- Forecast Year End
- Forecast vs Budget %
personal-finance-analytics/
├ README.md
├ .gitignore
├ LICENSE
├ requirements.txt
├ data/
│ └ sample/
├ docs/
├ output/
├ rules/
├ screenshots/
├ src/
└ tests/
The public version of the project is designed to be reproducible using synthetic sample data and example rule files.
See docs/reproducibility.md for setup steps.
This project demonstrates:
- ETL design
- data cleaning
- rule based categorization
- dimensional modeling
- DAX measure design
- dashboard design
- documentation discipline
- privacy aware analytics delivery
This repository is designed to show practical analytics skills across the full workflow:
- ingestion of raw style CSV inputs
- transformation and standardization in Python
- rule based categorization design
- reproducible output generation
- dimensional modeling for BI
- Power BI ready semantic structure
- privacy safe public project delivery
A good review path for this project is:
- Read the project summary and architecture documents
- Inspect the synthetic sample datasets in
data/sample/ - Review the example categorization rules in
rules/ - Run the Python pipeline from
src/main.py - Inspect the generated outputs and documented Power BI workflow
This mirrors how the full analytics process is structured from raw data to reporting.
This project demonstrates practical experience with:
• Python data pipelines using Pandas and CSV ingestion
• Rule based data categorization workflows
• Data cleaning and transformation
• Dimensional modeling using a star schema
• Power BI semantic modeling
• DAX measures for financial analytics
• Budget versus actual variance analysis
• Reproducible analytics workflows
• Privacy safe data publishing
Possible future enhancements:
- stronger rule engine logic
- automated validation checks
- parameterized ingestion paths
- cloud deployment version
- expanded forecasting diagnostics
