|
1 | 1 | # SQL-Ball |
2 | 2 |
|
3 | | -SQL-Ball is a football analytics platform that transforms football match data into actionable insights through interactive visualizations and natural language queries. This project was developed as a content project for the **CodeCademy Mastering Generative AI for Developers Bootcamp** held from August to September 2025. |
| 3 | +Football analytics platform that converts natural language questions into SQL queries, processes European league match data, and visualises patterns and statistics. |
| 4 | + |
| 5 | +## What It Does |
| 6 | + |
| 7 | +SQL-Ball lets you: |
| 8 | +- Ask questions about football matches in plain English ("Show me upsets in the Premier League") |
| 9 | +- Get automated SQL query generation and execution |
| 10 | +- View interactive dashboards with historical trends, team performance, and statistical anomalies |
| 11 | +- Explore data across 22 European leagues with 7,600+ matches |
| 12 | + |
| 13 | +The backend uses LangChain RAG (Retrieval-Augmented Generation) to parse queries and retrieve relevant schema context. The frontend provides real-time visualisations via Chart.js. |
| 14 | + |
| 15 | +## Setup |
| 16 | + |
| 17 | +### Prerequisites |
| 18 | +- Node.js 18+ |
| 19 | +- Python 3.11+ |
| 20 | +- Supabase project (free tier works) |
| 21 | +- OpenAI or Anthropic API key (optional, for AI features) |
4 | 22 |
|
5 | | -<img width="1261" height="621" alt="image" src="https://github.com/user-attachments/assets/407966f8-e680-458a-aa7a-a3d66b7d1318" /> |
| 23 | +### Installation |
6 | 24 |
|
7 | | -## Overview |
| 25 | +**Frontend:** |
| 26 | +```bash |
| 27 | +npm install |
| 28 | +npm run dev |
| 29 | +``` |
8 | 30 |
|
9 | | -SQL-Ball allows users to: |
10 | | -- View and analyze historical match data. |
11 | | -- Filter data by league division (e.g., Premier League, La Liga, Bundesliga). |
12 | | -- Generate visualizations including trends, distribution charts, and performance radar charts. |
13 | | -- Execute natural language queries converted to SQL for retrieving football statistics. |
| 31 | +Runs on `http://localhost:5173` by default. |
14 | 32 |
|
15 | | -## Architecture & Use Cases |
| 33 | +**Backend:** |
| 34 | +```bash |
| 35 | +cd backend |
| 36 | +python -m venv .venv |
| 37 | +source .venv/bin/activate # Windows: .venv\Scripts\activate |
| 38 | +pip install -r requirements.txt |
| 39 | +python main.py |
| 40 | +``` |
16 | 41 |
|
17 | | -For detailed architectural decisions and design patterns used in this project, please refer to [ARCHITECTURE.md](ARCHITECTURE.md). |
| 42 | +Runs on `http://localhost:8000`. |
18 | 43 |
|
19 | | -**Key Use Cases:** |
20 | | -- **Data Filtering:** Filter match data based on league divisions using unique codes such as: |
21 | | - - **Premier League (E1)** |
22 | | - - **La Liga (SP1)** |
23 | | - - **Bundesliga (G1)** |
24 | | -- **Interactive Visualizations:** Generate charts for goals trends, result distributions, and team performance. |
25 | | -- **Natural Language Querying:** Allow users to enter plain-English queries that are converted into SQL to retrieve relevant insights. |
26 | | -- **Backend Integration:** Support backend data processing (Python-based API) and a frontend built with Svelte and Vite. |
| 44 | +**Environment Variables** |
27 | 45 |
|
28 | | -## Project Details |
29 | | - |
30 | | -- **Frontend:** Built using Svelte and Vite, with visualizations rendered via Chart.js components. |
31 | | -- **Backend:** Python scripts and APIs serve match data and handle data processing. |
32 | | -- **Deployment:** Designed to be deployed to Vercel with separate configuration for the backend. Sensitive files like `backend/.env` should be added to `.gitignore` to protect credentials. |
33 | | -- **Content Project:** This project was created as part of an intensive bootcamp to master generative AI applications in software development. |
34 | | - |
35 | | -## Acknowledgements |
36 | | - |
37 | | -Thank you for the incredible datasets, this would not be possible without - https://www.football-data.co.uk/data.php |
38 | | - |
39 | | -A special thanks to the CodeCademy Mastering Generative AI for Developers Bootcamp for inspiring this project and providing the learning environment that led to its creation. The course and this project in particular were a big learning curve for me and really enjoyed the experience. |
40 | | - |
41 | | -## Getting Started |
42 | | - |
43 | | -1. **Installation:** |
44 | | - - Clone the repository. |
45 | | - - Install dependencies using `npm install` for the frontend. |
46 | | - - Set up the backend by following the instructions in `backend/setup.sh`. |
47 | | - - Ensure you have proper environment variables set up (see `.env.example`). |
48 | | - |
49 | | -2. **Development:** |
50 | | - - Start the frontend by running `npm run dev`. |
51 | | - - For backend development, run the provided scripts in the `backend` directory. |
52 | | - |
53 | | -3. **Deployment:** |
54 | | - - The project is ready for deployment on Vercel. Make sure to configure environment variables on Vercel and exclude sensitive files (like `backend/.env`) from version control. |
| 46 | +Create `.env` files in both root and `backend/` directories: |
| 47 | + |
| 48 | +``` |
| 49 | +# Frontend .env |
| 50 | +VITE_SUPABASE_URL=your_supabase_url |
| 51 | +VITE_SUPABASE_ANON_KEY=your_supabase_key |
| 52 | +VITE_OPENAI_API_KEY=your_openai_key # Optional |
| 53 | +``` |
| 54 | + |
| 55 | +``` |
| 56 | +# Backend .env |
| 57 | +VITE_SUPABASE_URL=your_supabase_url |
| 58 | +VITE_SUPABASE_ANON_KEY=your_supabase_key |
| 59 | +VITE_OPENAI_API_KEY=your_openai_key |
| 60 | +``` |
| 61 | + |
| 62 | +## Architecture |
| 63 | + |
| 64 | +**Frontend Stack** |
| 65 | +- Svelte + Vite for rapid development |
| 66 | +- Chart.js for visualisations |
| 67 | +- TypeScript for type safety |
| 68 | +- Tailwind CSS for styling |
| 69 | + |
| 70 | +**Backend Stack** |
| 71 | +- FastAPI for HTTP API |
| 72 | +- LangChain for RAG and SQL generation |
| 73 | +- ChromaDB for vector embeddings |
| 74 | +- Supabase PostgreSQL for data |
| 75 | + |
| 76 | +**Data Flow** |
| 77 | +1. User types natural language question |
| 78 | +2. Frontend sends to backend `/query` endpoint |
| 79 | +3. Backend retrieves schema context via embeddings |
| 80 | +4. LLM generates SQL with OpenAI/Anthropic |
| 81 | +5. SQL validation and repair layer handles edge cases |
| 82 | +6. Query executes via Supabase RPC |
| 83 | +7. Results return with explanation |
| 84 | + |
| 85 | +## Common Issues |
| 86 | + |
| 87 | +**"No API key configured"** |
| 88 | +- Add `VITE_OPENAI_API_KEY` to `.env` for AI features |
| 89 | +- Platform still works with template-based query generation |
| 90 | + |
| 91 | +**"Failed to initialise ChromaDB"** |
| 92 | +- Ensure `backend/.venv` is activated |
| 93 | +- Check `chromadb_data/` directory is writable |
| 94 | +- Clear directory and restart if schema changed |
| 95 | + |
| 96 | +**Queries returning empty results** |
| 97 | +- Verify league code is correct (E0 = Premier League, SP1 = La Liga, etc.) |
| 98 | +- Check match_date format in WHERE clause |
| 99 | +- Most data is 2024-2025 season |
| 100 | + |
| 101 | +## Deployment |
| 102 | + |
| 103 | +**Vercel (Frontend)** |
| 104 | +```bash |
| 105 | +npm run build |
| 106 | +# Deploy dist/ folder to Vercel |
| 107 | +``` |
| 108 | + |
| 109 | +Set environment variables in Vercel project settings. |
| 110 | + |
| 111 | +**Backend Options** |
| 112 | +- Railway.app (easiest) |
| 113 | +- Heroku (cost-free tier removed) |
| 114 | +- Self-hosted VPS |
| 115 | + |
| 116 | +Backend needs environment variables set on deployment platform. |
| 117 | + |
| 118 | +## Tech Details |
| 119 | + |
| 120 | +**Why This Stack?** |
| 121 | + |
| 122 | +- Svelte: Smaller bundle than React, excellent for data visualisations |
| 123 | +- FastAPI: Modern Python async framework, auto-generates OpenAPI docs |
| 124 | +- LangChain: Handles complex RAG pipelines without reinventing the wheel |
| 125 | +- Supabase: PostgreSQL advantage for complex queries, real-time capabilities |
| 126 | +- ChromaDB: Lightweight embedding store, no external dependency |
| 127 | + |
| 128 | +**Performance Notes** |
| 129 | + |
| 130 | +- Initial schema embedding takes ~2-3 seconds |
| 131 | +- Subsequent queries cache embeddings in memory |
| 132 | +- Vector search dramatically improves context retrieval |
| 133 | +- Falls back to text search if embeddings unavailable |
| 134 | + |
| 135 | +## Data Source |
| 136 | + |
| 137 | +Match data sourced from [football-data.co.uk](https://www.football-data.co.uk/data.php). |
| 138 | + |
| 139 | +Covers: |
| 140 | +- Premier League, Championship, League One, League Two (England) |
| 141 | +- La Liga, Segunda División (Spain) |
| 142 | +- Bundesliga, 2. Bundesliga (Germany) |
| 143 | +- Serie A, Serie B (Italy) |
| 144 | +- Ligue 1, Ligue 2 (France) |
| 145 | +- Eredivisie (Netherlands) |
| 146 | +- Pro League (Belgium) |
| 147 | +- Primeira Liga (Portugal) |
| 148 | +- Süper Lig (Turkey) |
| 149 | +- Super League (Greece) |
| 150 | +- Premiership, Championship, League One, League Two (Scotland) |
55 | 151 |
|
56 | 152 | ## Contributing |
57 | 153 |
|
58 | | -Contributions are welcome! Please check the issues and submit pull requests for enhancements or bug fixes. For major changes, please open an issue first to discuss the proposed changes. |
| 154 | +Issues and pull requests welcome. For major changes, open an issue first to discuss. |
59 | 155 |
|
60 | 156 | ## License |
61 | 157 |
|
62 | | -This project is licensed under the terms described in the [LICENSE](LICENSE) file. |
| 158 | +See [LICENSE](LICENSE) file for details. |
| 159 | + |
| 160 | +--- |
| 161 | + |
| 162 | +Built as a CodeCademy Mastering Generative AI for Developers bootcamp project (August-September 2025). |
0 commit comments