A multi-provider AI proxy that fans queries out to multiple AI and search providers simultaneously, ranks results using semantic embeddings and a three-axis scoring system (price Β· distance Β· rating), and penalizes sponsored content β putting the user's interests ahead of advertiser dollars.
CUNY capstone project β final demo May 15, 2026.
All changes go through pull requests β no direct commits to main, including from project owners.
- Create a branch from
main:git checkout main && git pull origin main git checkout -b your-name/short-description - Make your changes, commit, and push:
git push -u origin your-name/short-description
- Open a pull request on GitHub targeting
main. Add a brief description of what changed and why. - Get at least one teammate review before merging.
| Component | State |
|---|---|
| FastAPI server (fallback mode) | Working |
NLIP server (NLIPApplication / NLIPSession) |
Pending β NLIP libraries not yet installable |
Provider: OpenAI (gpt-4o-mini) |
Working β needs OPENAI_API_KEY |
Provider: Gemini (gemini-2.5-flash) |
Working β needs GEMINI_API_KEY |
Provider: Ollama (llama3.2) |
Working β runs locally, no key needed |
Provider: WatsonX (granite-13b-instruct-v2) |
Working β needs WATSONX_API_KEY + WATSONX_PROJECT_ID |
| Provider: Brave Search | Ready β needs BRAVE_API_KEY |
| Provider: Mock (canned lunch data for tests) | Working (tests only, not in server build) |
| Orchestrator (parallel fan-out, failure isolation) | Working |
Constraint extraction ($15, within 1 mile, 4 stars) |
Working |
| Intent detection (price / distance / rating / general) | Working |
| Ranker β semantic similarity (Ollama embeddings) | Working |
| Ranker β three-axis gap scoring (P1/P2/P3) | Working |
| Ranker β multi-intent axis weighting | Working |
| Ranker β hard constraint filtering | Working |
| Ranker β fuzzy consensus clustering | Working |
| Ranker β sponsored content penalty | Working |
| Query result cache (3-hour TTL, 10 query history) | Working |
GET /health |
Working |
GET /metrics (Prometheus) |
Working |
GET /history (recent queries) |
Working |
POST /cache/clear |
Working |
| Demo UI β ranked results with score bars | Working |
| Demo UI β 3D scoring space (Plotly) | Working |
| Demo UI β radar chart (top 3 comparison) | Working |
| Demo UI β provider breakdown panel | Working |
| Demo UI β query history dropdown | Working |
| Tests | 23 passing |
user (browser)
β
β POST /query
βΌ
βββββββββββββββββββββββββββββββ
β FastAPI server β angel_filter/server.py
β + Query cache (3hr TTL) β angel_filter/cache.py
βββββββββββββββ¬ββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββ
β Orchestrator β angel_filter/orchestrator.py
β 1. extract_constraints() β angel_filter/constraints.py
β 2. detect_intent() β
β 3. fan-out in parallel β
ββββ¬βββββββ¬βββββββ¬βββββββ¬βββββ
β β β β
OpenAI Gemini Ollama WatsonX angel_filter/providers/*.py
β β β β (Brave also available)
ββββββββ΄βββ¬ββββ΄βββββββ
β normalized ProviderResult list
βΌ
βββββββββββββββββββββββββββββββ
β Ranker β angel_filter/ranker.py
β 1. hard constraint filter β
β 2. Ollama embeddings β
β β semantic similarity β
β 3. P1/P2/P3 axis scoring β
β 4. fuzzy consensus cluster β
β 5. sponsored penalty β
βββββββββββββββ¬ββββββββββββββββ
β RankedResult list
βΌ
βββββββββββββββββββββββββββββββ
β Demo UI β static/index.html
β - ranked result cards β
β - score bars β
β - 3D scoring space β
β - radar chart β
β - provider breakdown β
β - query history β
βββββββββββββββββββββββββββββββ
Semantic similarity scoring requires an embedding model. The ranker tries three backends in order, using whichever is available:
| Priority | Backend | When it's used |
|---|---|---|
| 1 | Ollama (nomic-embed-text) |
Local development β Ollama running on localhost:11434 |
| 2 | OpenAI (text-embedding-3-small) |
Cloud deployment β Ollama not available, OPENAI_API_KEY is set |
| 3 | Keyword overlap | Last resort β no embedding backend available, scores are weaker |
For Render deployment: Ollama cannot run on Render's free tier. Set
OPENAI_API_KEYin Render's environment variables and the ranker will automatically use OpenAI embeddings instead. All scoring, consensus clustering, and axis weighting remain fully active β only the embedding source changes.
Every result is scored across four layers:
The user's query and each result's title + snippet are embedded using Ollama
(nomic-embed-text). Cosine similarity between the query vector and each
result vector produces a 0β1 score. Falls back to keyword overlap when Ollama
is offline.
Explicit constraints are extracted from the query and injected into provider prompts and the ranker:
| Axis | Constraint example | Gap math |
|---|---|---|
| P1 Price | under $15 |
candidate.price - budget (negative = under budget) |
| P2 Distance | within 1 mile |
candidate.distance - max_distance (negative = closer) |
| P3 Rating | rated 4 stars |
min_rating - candidate.rating (negative = meets threshold) |
Each gap maps to a 0β1 score. Intent detection (price / distance / rating / general) shifts the axis weights β a price query gives P1 60% of the axis score, with P2 and P3 splitting the remaining 40%. All three axes always contribute β no winner-take-all.
Hard constraint filtering removes results that are more than 25% over budget or more than 0.5β below the minimum rating before scoring begins.
Results mentioned by multiple providers are boosted. Matching uses embedding cosine similarity β₯ 0.75 so "Joe's Pizza" and "Joe Pizza" cluster together. Capped at a maximum of 2 extra providers to prevent a mediocre result from winning just because every provider mentioned it.
Any result flagged as sponsored receives a flat score deduction regardless of how well it matches the query. This is the thesis of the project.
Final score formula:
score = 0.50 Γ similarity
+ 0.35 Γ axis_score
+ 0.15 Γ consensus_bonus
- 0.20 (if sponsored)
Choose your operating system below. You need at least one API key to run the server β Gemini has a free tier and is the easiest to get started with.
- macOS 11 or later
- Homebrew (package manager)
- Python 3.12
- Ollama (local AI β free, no key needed)
- Git
- At least one API key (see API Keys below)
1. Install Homebrew (skip if already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"2. Install Python 3.12 and Git
brew install python@3.12 gitVerify:
python3.12 --version # should print Python 3.12.x
git --version3. Install Ollama
Download from https://ollama.com/download and run the installer.
Then pull the two models Angel Filter needs:
ollama pull nomic-embed-text # embedding model β used for ranking
ollama pull llama3.2 # generation model β used as a providerVerify Ollama is running:
curl http://localhost:11434/api/tagsYou should see a JSON list of installed models.
4. Clone the repo
git clone https://github.com/adonisja/NLIP-Project
cd NLIP-Project5. Install Python dependencies
pip3.12 install fastapi "uvicorn[standard]" httpx prometheus-client ollama python-dotenv6. Download Plotly (required for the 3D visualization)
curl -o static/plotly.min.js https://cdn.plot.ly/plotly-2.32.0.min.js7. Set up your API keys
Create a .env file in the project root:
cp .env.example .envThen open .env in any text editor and fill in your keys (see API Keys below).
8. Start the server
./start.shOpen http://localhost:8005 in your browser.
- Windows 10 or 11
- Python 3.12 (check "Add to PATH" during install)
- Ollama for Windows
- Git for Windows
- At least one API key (see API Keys below)
1. Install Python 3.12
Download from https://www.python.org/downloads/release/python-3120/
During installation, check "Add python.exe to PATH" β this is important.
Verify in a new terminal (Command Prompt or PowerShell):
python --version # should print Python 3.12.x
pip --version
2. Install Git
Download from https://git-scm.com/download/win and run the installer with default settings.
3. Install Ollama
Download from https://ollama.com/download and run the installer.
Open a new terminal and pull the two models:
ollama pull nomic-embed-text
ollama pull llama3.2
Verify Ollama is running:
curl http://localhost:11434/api/tags
4. Clone the repo
git clone https://github.com/adonisja/NLIP-Project
cd NLIP-Project
5. Install Python dependencies
pip install fastapi "uvicorn[standard]" httpx prometheus-client ollama python-dotenv
6. Download Plotly
In PowerShell:
Invoke-WebRequest -Uri "https://cdn.plot.ly/plotly-2.32.0.min.js" -OutFile "static\plotly.min.js"7. Set up your API keys
Copy the example env file:
copy .env.example .env
Open .env in Notepad or VS Code and fill in your keys.
8. Start the server
On Windows, start.sh won't work directly. Run this instead:
python -m uvicorn angel_filter.server:app --reload --port 8005
Or if you have Git Bash installed:
./start.shOpen http://localhost:8005 in your browser.
You need at least one of the following. The server auto-detects which keys are present and enables those providers.
| Provider | Key name | Where to get it | Cost |
|---|---|---|---|
| Gemini | GEMINI_API_KEY |
aistudio.google.com | Free tier available |
| OpenAI | OPENAI_API_KEY |
platform.openai.com | Free trial credits |
| WatsonX | WATSONX_API_KEY + WATSONX_PROJECT_ID |
cloud.ibm.com | Free tier available |
| Brave Search | BRAVE_API_KEY |
api.search.brave.com | 2,000 free queries/month |
| Ollama | (no key needed) | Runs locally after install | Free |
Create a .env file in the project root with your keys:
# Required β at least one AI provider
GEMINI_API_KEY=your-key-here
OPENAI_API_KEY=your-key-here
# WatsonX (needs both values)
WATSONX_API_KEY=your-key-here
WATSONX_PROJECT_ID=your-project-id-here
WATSONX_REGION=us-east
WATSONX_MODEL=ibm/granite-13b-instruct-v2
# Ollama (no key β just set the model name)
OLLAMA_MODEL=llama3.2:latest
# Optional
BRAVE_API_KEY=your-key-here
Never commit your
.envfile. It is already listed in.gitignore. Each contributor creates their own.envlocally.
After starting the server, check that providers loaded correctly:
curl http://localhost:8005/healthYou should see something like:
{
"ok": true,
"mode": "fallback",
"providers": ["openai", "gemini", "ollama"],
"uptime_seconds": 5.1
}If providers is empty, check your .env file and make sure the keys are set correctly.
Run the test suite (no network or API keys needed):
# Mac
python3.12 -m pytest tests/ -v
# Windows
python -m pytest tests/ -vAll 23 tests should pass.
python3.12 -m pytest tests/ -v23 tests covering:
- End-to-end pipeline with all providers
- Sponsored penalty applied and visible in scores
- Provider failure isolation
- Budget constraint filtering (
$15pushes$28bistro out) - Distance intent favors nearest result
- Rating intent favors highest-rated result
- Axis scores present and in 0β1 range on all results
- Consensus bonus applied when two providers agree
- Intent detection for all four intent types (8 parametrized cases)
- Constraint extraction from natural language (7 parametrized cases)
No tests require network or Ollama β they use the mock provider and keyword-fallback ranker, making them fast and deterministic.
| Query | What it demonstrates |
|---|---|
lunch under $15 |
Budget constraint + price intent |
best rated lunch spots near me |
Rating + distance intent together |
Find me the top 3 lunch spots under $15, within 1 mile, rated at least 4 stars |
All three axes, hard filter, constraint injection |
| Run any query twice | Cache hit β instant response, "from cache" badge |
angel_filter/
server.py # FastAPI server + provider wiring
orchestrator.py # parallel fan-out + ranker call
ranker.py # scoring: similarity + axis + consensus + penalty
constraints.py # natural language constraint extraction
prompt.py # shared prompt builder for AI providers
cache.py # in-memory query cache (3-hour TTL)
providers/
base.py # BaseProvider, ProviderResult, ProviderError
openai_provider.py # OpenAI gpt-4o-mini
gemini.py # Google Gemini
ollama_provider.py # Local Ollama (llama3.2)
watsonx.py # IBM WatsonX
brave.py # Brave Search API
mock.py # canned lunch data (tests only)
static/
index.html # demo UI (results + 3D plot + radar chart)
plotly.min.js # Plotly served locally (gitignored, download once)
tests/
test_orchestrator.py # 23 tests
start.sh # starts server on port 8005, loads .env
pyproject.toml
README.md
Apache-2.0 (matches the upstream NLIP projects).