Skip to content

Commit 79590c1

Browse files
committed
Enhance vector search capabilities in KillrVideo
- Updated the `content_features` column in `schema-astra.cql` to a 4096-dimensional vector type for improved semantic search using NVIDIA's NV-Embed model. - Introduced a new `vector_search.md` document outlining the integration plan for semantic search, including architecture, schema changes, and data ingestion pipeline. - Added `prompts.md` to provide a blueprint for implementing vector search incrementally with LLM build prompts, ensuring a test-driven approach to development.
1 parent 5666247 commit 79590c1

3 files changed

Lines changed: 315 additions & 1 deletion

File tree

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
# Vector Search Implementation – LLM Build Prompts
2+
3+
> This document supplies a **blueprint** and a **chain of reusable prompts** for a code-generation LLM (e.g. Cursor‐GPT) to implement semantic vector search in KillrVideo **incrementally and test-driven**.
4+
> Every prompt is independent, yet builds on artifacts produced by the previous one. After completing **each** prompt the LLM must:
5+
> 1. **Run** `ruff --fix`, `black .`, `pytest -q`  (failing tests → iterate).
6+
> 2. Commit only when the *entire* suite is green.
7+
8+
---
9+
10+
## 0. Glossary & Conventions
11+
* **NV-Embed** = NVIDIA embedding model (4096-dim) with **512-token** input cap.
12+
* **Data API** = Astra DB `$vectorize` endpoint used for vector search.
13+
* **FastAPI app** lives in `app/`.
14+
* **Test helpers** in `tests/` use `pytest` + `pytest-asyncio`.
15+
* Use **feature flag** `settings.VECTOR_SEARCH_ENABLED` (default **False**).
16+
* Use `HTTP 400` for token-limit violations in search/query paths.
17+
18+
---
19+
20+
## 1. High-Level Blueprint (single narrative)
21+
1. **Schema Migration** – enlarge `content_features` column, attach NVIDIA provider, recreate SAI index, & seed backfill.
22+
2. **Ingestion Changes** – on video submit/update, assemble *title + description + tags*, clip to 512 tokens, store as *string* (server vectorises).
23+
3. **Semantic Search** – new helper in `video_service`, integrate into existing `/search/videos` endpoint, add `mode` param.
24+
4. **Pagination & Validation** – respect `page`/`pageSize`, enforce query length ≤ 512 tokens, keyword fallback when semantic disabled.
25+
5. **OpenAPI & Docs** – update schemas + docs.
26+
6. **Front-end Search UI** – add search box & results list, fallback messaging.
27+
7. **Feature Roll-out** – env flag, smoke tests, monitoring hooks.
28+
29+
---
30+
31+
## 2. Iterative Roadmap → Chunks
32+
| Milestone | Chunk | Output |
33+
|-----------|-------|--------|
34+
| M1 Schema | C1.1 DDL script (JSON & CQL) | `migrations/2025_08_vector.sql` & CI-run JSON |
35+
| | C1.2 Py backfill job | `scripts/backfill_vectors.py` |
36+
| M2 Ingest | C2.1 `clip_to_512_tokens` util + tests | `app/utils/text.py` |
37+
| | C2.2 Submit flow rewrite | `video_service.py` patched |
38+
| M3 Search | C3.1 Service helper | `search_semantic()` + unit tests |
39+
| | C3.2 Router wiring | `/api/v1/search/videos` param, tests |
40+
| M4 Docs | C4.1 OpenAPI regen | updated yaml |
41+
| M5 Front | C5.1 Home search UI | React component & e2e tests |
42+
| M6 Roll-out | C6.1 Feature flag infra | settings + toggles |
43+
44+
Chunks are intentionally modest (≈1–3 files each, <150 LoC).
45+
46+
---
47+
48+
## 3. Right-Sized Steps (final cut)
49+
1. **Step 1 – Create DB migration & index recreation**
50+
2. **Step 2 – Backfill existing videos with `$vectorize` bulk update**
51+
3. **Step 3 – Add `clip_to_512_tokens()` util + tests**
52+
4. **Step 4 – Modify `submit_new_video()` to store string & guard token count**
53+
5. **Step 5 – Implement `search_videos_by_semantic()` helper + unit tests**
54+
6. **Step 6 – Extend search router with `mode` param, integrate helper**
55+
7. **Step 7 – Update OpenAPI YAML & regenerate client**
56+
8. **Step 8 – Add feature flag env var + toggling logic**
57+
9. **Step 9 – Front-end search bar + API wiring (mocked until backend green)**
58+
10. **Step 10 – Smoke & load tests, rollout script**
59+
60+
Each step below is accompanied by an LLM prompt.
61+
62+
---
63+
64+
## 4. Prompts (feed sequentially)
65+
66+
### Prompt 1 – DB Migration
67+
```text
68+
You are working inside the KillrVideo FastAPI repo.
69+
Goal: **Enlarge the `videos.content_features` column to `vector<float,4096>` and attach the NVIDIA service**. Also drop & recreate the SAI cosine index.
70+
Tasks:
71+
1. Add *migrations/2025_08_vector.cql* containing the necessary `ALTER TABLE`, `DROP INDEX`, `CREATE INDEX` CQL.
72+
2. Add *migrations/2025_08_vector.json* Data API payload (see docs/vector_search.md §3).
73+
3. Register the SQL script in *scripts/migrate.py* so CI picks it up.
74+
4. Unit test: mock Cassandra session; assert index metadata after migration.
75+
After coding run **ruff, black, pytest**. Ensure all tests pass.
76+
```
77+
78+
---
79+
80+
### Prompt 2 – Vector Backfill Job
81+
```text
82+
Goal: **Populate the new 4096-dim vectors for existing rows**.
83+
1. Create *scripts/backfill_vectors.py*.
84+
• Scan `videos` where `content_features IS NULL` (page size 100).
85+
• Build text = title + description + tags.
86+
• POST Data API `updateMany` with `$vectorize`.
87+
2. Provide CLI entry-point `python -m scripts.backfill_vectors --dry-run`.
88+
3. Add unit tests with `responses` to stub Data API.
89+
Run lints/tests until green.
90+
```
91+
92+
---
93+
94+
### Prompt 3 – Token-Clipping Utility
95+
```text
96+
Goal: Guard 512-token NVIDIA limit.
97+
1. Create *app/utils/text.py* with `clip_to_512_tokens(text: str) -> str` using whitespace splitter.
98+
2. Edge-case: consecutive whitespace, Unicode punctuation.
99+
3. Tests: >512 tokens → clipped length ==512, ≤512 unchanged.
100+
Run ruff/black/pytest.
101+
```
102+
103+
---
104+
105+
### Prompt 4 – Ingestion Pipeline Update
106+
```text
107+
Goal: Use auto-vectorize on insert.
108+
1. In *app/services/video_service.py* → function `submit_new_video`:
109+
• Build `embedding_text` from name/description/tags.
110+
• Call `clip_to_512_tokens`.
111+
• Assign string to `content_features` field.
112+
2. Remove legacy 16-float stub path.
113+
3. Add unit tests with `monkeypatch` to verify Data API payload contains **string**, not list.
114+
Run lints/tests.
115+
```
116+
117+
---
118+
119+
### Prompt 5 – Semantic Search Helper
120+
```text
121+
Goal: Backend ANN search wrapper.
122+
1. Add `search_videos_by_semantic(query: str, page:int, page_size:int)` to *video_service.py*.
123+
• Validate len(query_tokens) ≤512 else raise `InvalidQueryError` (400).
124+
• Call Data API `find` with `sort:{"$vectorize": query}`.
125+
2. Return list[VideoSummary] preserving existing pagination schema.
126+
3. Tests: stub API, assert ordering & error path.
127+
Run lints/tests.
128+
```
129+
130+
---
131+
132+
### Prompt 6 – API Router Wiring
133+
```text
134+
Goal: Expose semantic mode.
135+
1. In *routers/search.py* add optional `mode: Literal['semantic','keyword']='semantic'`.
136+
2. If `mode=='semantic' and settings.VECTOR_SEARCH_ENABLED` → call helper; else fallback.
137+
3. Update OpenAPI annotations.
138+
4. Tests: both branches, 400 on long query.
139+
Run lints/tests.
140+
```
141+
142+
---
143+
144+
### Prompt 7 – OpenAPI & Client Regen
145+
```text
146+
Goal: Align docs with new behaviour.
147+
1. Update *docs/killrvideo_openapi.yaml* paths `/search/videos` (`mode` param, 400 response).
148+
2. Run generator (`scripts/gen_client.py`) to refresh `client/` stubs.
149+
3. Ensure CI passes.
150+
```
151+
152+
---
153+
154+
### Prompt 8 – Feature Flag Infrastructure
155+
```text
156+
Goal: Toggle vector search safely.
157+
1. Add `VECTOR_SEARCH_ENABLED: bool = False` to *app/core/config.py* (env-driven).
158+
2. Docs update in README & `.env.example`.
159+
3. Unit test: flag off ⇒ helper not called.
160+
Run lints/tests.
161+
```
162+
163+
---
164+
165+
### Prompt 9 – Front-end Search UI
166+
```text
167+
Goal: New search bar (React / Next.js).
168+
1. Create `components/SemanticSearchBar.tsx`.
169+
2. Call backend `/api/v1/search/videos?q=...`.
170+
3. Display results using existing `VideoCard`.
171+
4. Cypress e2e: search term returns expected mock.
172+
Run `npm run lint && npm run test` until green.
173+
```
174+
175+
---
176+
177+
### Prompt 10 – Smoke & Load Tests + Roll-out Script
178+
```text
179+
Goal: Confidence for production switch.
180+
1. Add *tests/e2e/test_semantic_search.py* hitting a staging DB.
181+
2. Add Locust file *load/semantic_search.py* (RPS 20).
182+
3. Create *scripts/enable_vector_flag.py* that flips env + triggers migration.
183+
4. Update GitHub Actions workflow to run load test nightly.
184+
Run lints/tests.
185+
```
186+
187+
---
188+
189+
**End of prompts.**
190+
191+
Once Prompt 10 passes all checks, the vector search feature is fully integrated, tested and ready for production rollout.

docs/schema-astra.cql

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ CREATE TABLE IF NOT EXISTS killrvideo.videos (
8787
name text,
8888
preview_image_location text,
8989
tags set<text>, -- Collection for efficient tag storage
90-
content_features vector<float, 16>, -- Vector type from Cassandra 5.0 for ML features
90+
content_features vector<float, 4096>, -- Vector type (4096-dim) for NV-Embed semantic search
9191
userid uuid,
9292
content_rating text, -- 'G', 'PG', 'PG-13', 'R', etc.
9393
category text,

docs/vector_search.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# Vector Search Integration Plan
2+
3+
## 1. Objective
4+
Enable semantic (“natural-language”) search across the KillrVideo catalogue by leveraging Astra DB’s vector search on the `videos.content_features` column and NVIDIA’s **NV-Embed** model.
5+
6+
* User story* > As a viewer I can type *“Find me videos about cats that can talk”* in the search box and receive the most relevant videos, ranked by similarity.
7+
8+
## 2. High-level architecture
9+
1. **Client (web / mobile)** – new search box on the landing page.
10+
2. **FastAPI backend**
11+
• Accepts query at `GET /api/v1/search/videos` (existing path).
12+
• Performs ANN search via Data API `$vectorize` on `videos`.
13+
3. **Astra DB**
14+
`videos.content_features` is a vector column with NVIDIA integration (COSINE metric).
15+
• A dedicated Storage-Attached Index (SAI) drives ANN retrieval.
16+
17+
```
18+
Client ──▶ /api/v1/search/videos?q=... ──▶ FastAPI ──▶ Data API find(sort={"$vectorize": ...}) ──▶ Astra DB
19+
```
20+
21+
## 3. Schema work
22+
| Table | Column | Change | Notes |
23+
|-------|--------|--------|-------|
24+
| `killrvideo.videos` | `content_features` | Alter type to `vector<float, 4096>` (was 16) and attach NVIDIA service | NV-Embed-v2 outputs 4096-dim vectors (HuggingFace card).|
25+
26+
Example Data API alteration (one-off, run from CI or manual):
27+
```jsonc
28+
{
29+
"alterTable": {
30+
"name": "videos",
31+
"addColumns": {
32+
"content_features": {
33+
"type": "vector",
34+
"dimension": 4096,
35+
"service": {
36+
"provider": "nvidia",
37+
"modelName": "NV-Embed-QA"
38+
}
39+
}
40+
}
41+
}
42+
}
43+
```
44+
The existing SAI needs to be dropped & recreated to match the new dimension:
45+
```cql
46+
DROP INDEX IF EXISTS videos_content_features_idx;
47+
CREATE CUSTOM INDEX videos_content_features_idx
48+
ON killrvideo.videos(content_features)
49+
USING 'StorageAttachedIndex'
50+
WITH OPTIONS = {'similarity_function':'COSINE'};
51+
```
52+
53+
## 4. Data ingestion pipeline
54+
### 4.1 Where
55+
`app/services/video_service.py::submit_new_video()`
56+
57+
### 4.2 How
58+
1. Concatenate title, description and tag list into a single string `embedding_text`.
59+
2. Insert **that string** into `content_features` – Astra will auto-vectorize via NVIDIA.
60+
3. Keep current list-of-floats fallback for unit-test stub collections.
61+
62+
Pseudo-snippet:
63+
```python
64+
embedding_text = "\n".join([
65+
new_video.name,
66+
new_video.description or "",
67+
" ".join(new_video.tags or []),
68+
])
69+
video_doc["content_features"] = embedding_text # triggers $vectorize
70+
```
71+
72+
### 4.3 🔒 Token limit guard (512)
73+
According to the NVIDIA integration docs, `$vectorize` payloads **MUST NOT** exceed **512 tokens**.
74+
75+
Implementation guidelines:
76+
* **Helper `clip_to_512_tokens(text: str) -> str`** – rough tokenizer based on whitespace or sentencepiece once provider offers an official tokenizer.
77+
* Apply guard **before** assigning to `video_doc["content_features"]`.
78+
* Apply guard on **search queries** – if the user submits >512-token text, return `400` with validation error or truncate and warn.
79+
* Unit tests: long description (>3000 chars) should be gracefully clipped, insert succeeds.
80+
81+
(The ↩︎ 512-token budget covers title + description + tags, so we may need to drop trailing tokens when the combined string is too long.)
82+
83+
## 5. Search endpoint
84+
### 5.1 Backend Service
85+
New helper `search_videos_by_semantic(query: str, ...)` in `video_service.py`:
86+
```python
87+
db_table.find(
88+
filter={},
89+
sort={"$vectorize": query}, # Data API will embed with NV-Embed
90+
limit=page_size,
91+
)
92+
```
93+
• Pagination: skip/limit still applies.
94+
• Optional keyword fallback when `$vectorize` fails (e.g. provider quota).
95+
96+
### 5.2 API Layer
97+
`routers/search.py` – route already exists.
98+
Add query-param `mode` (`semantic|keyword`, default = semantic).
99+
100+
## 6. Front-end (brief)
101+
* Add prominent search input on the home page.
102+
* On submit → call `/api/v1/search/videos?q=text` → render list using existing `VideoSummary` cards.
103+
104+
## 7. Roll-out strategy
105+
1. **Dev DB**: run alter-table & backfill vectors via bulk update with `$vectorize`.
106+
2. **Backend code**: merge feature branch behind `VECTOR_SEARCH_ENABLED` flag.
107+
3. **Smoke tests**: ensure new inserts generate embeddings & search returns expected order.
108+
4. **Prod**: enable flag, monitor latency & application logs.
109+
110+
## 8. Work items
111+
- [ ] DB migration script (`scripts/migrate_2025_08_vector.sql` or Data API JSON).
112+
- [ ] Code: update `submit_new_video`, create semantic search helper.
113+
- [ ] Unit tests for ingestion & search.
114+
- [ ] API docs / OpenAPI schema tweaks (new param).
115+
- [ ] Front-end search UI.
116+
117+
## 9. Open questions / decisions
118+
1. Which NV-Embed model? (`NV-Embed-QA`, `NV-Embed-v2`?) – default QA variant assumed.
119+
2. Do we persist the raw embedding string for transparency? (Proposed yes – stored in the same column.)
120+
3. Hard‐limit max query length (NVIDIA 512-token)? Need validation in endpoint.
121+
122+
---
123+
*Last updated: 2025-06-16*

0 commit comments

Comments
 (0)