Worker RSS plateaus at ~80% during concurrent match_song on heavy songs

## Summary

The research worker (`ApproachNote Worker` on Render, 2GB instance) plateaus at ~80% memory (~1.6GB RSS) while processing popular jazz standards, with one prior OOM-kill observed. The DuckDB cap shipped in `eef620b` brought the peak from ~95% to ~80% — but the plateau is uncomfortably close to the limit on a single heavy song, and one concurrent allocation away from OOM.

This is not a leak. It's the steady-state working set of three handlers chewing on the same song at once.

## Evidence

### Observation 1 — Ain't Misbehavin' (yesterday, pre-DuckDB cap)
- ~10-hour climb to 100% memory, then OOM at ~03:32 UTC.
- 1438 releases linked to the song.
- All three workers (`spotify.match_song`, `apple.match_song`, `youtube.match_recording`) running concurrently for the song.

### Observation 2 — Summertime (today, post-DuckDB cap)
Worker logs around 2026-05-14T19:24:09:

```
INFO research_worker.loop.spotify.match_song.job4391: claimed target=song/872d7739-…  (Summertime)
INFO research_worker.loop.spotify: Found 3642 releases to process
INFO research_worker.loop.apple.match_song.job4392:   claimed target=song/872d7739-…  (Summertime)
INFO research_worker.loop.apple:   Found 3642 releases to process
INFO research_worker.loop.youtube.match_recording.job4393+: claimed target=recording/…
```

Memory chart for the same instance: baseline ~5% → climb starting ~19:23 → plateau ~80% by ~19:28, sustained while both `match_song` jobs iterate.

## Where the memory goes

The worker (`research_worker/run.py`) spawns one thread per registered `(source, job_type)`, so `spotify.match_song`, `apple.match_song`, and `youtube.match_recording` run **in parallel inside the same process** ([run.py:83-91](https://github.com/dprodger/ApproachNote/blob/main/backend/research_worker/run.py#L83-L91)). On a heavy song, three working sets stack:

| Tenant | Source of working set | Approx. size |
| --- | --- | --- |
| Apple `match_song` | DuckDB buffer pool over Apple Music catalog parquets | capped at 512MB (post-eef620b) |
| Spotify `match_song` | `cur.fetchall()` of N-row release×recording join with JSON-aggregated performers, plus parsed per-release Spotify API responses held live for the duration of the loop | scales with releases — Summertime had 3642 rows |
| YouTube `match_recording` | per-recording client + matcher state across many concurrent jobs | tens of MB |
| Python heap | Arenas don't return to the OS after GC | sticky overhead |

The Spotify `get_releases_for_song` query is the most stretchy of these — it materialises every row of a `releases × recording_releases × recordings × release_streaming_links` join with a JSON-aggregated `performers` subquery per row, then keeps the entire list reachable during the per-release loop in `SpotifyMatcher.match_releases`.

## What's already shipped

- **`eef620b` — Cap DuckDB resource usage on the Apple Music catalog connection.** Adds `PRAGMA memory_limit='512MB'` and `PRAGMA threads=2` after every `duckdb.connect()`, with `APPLE_DUCKDB_MEMORY_LIMIT` / `APPLE_DUCKDB_THREADS` env overrides for ops tuning without a redeploy. Reduced observed peak from ~95% to ~80%.

## Proposed fixes (cheapest first)

1. **Tighten the DuckDB cap further via env.** Set `APPLE_DUCKDB_MEMORY_LIMIT=256MB` on the Render worker. Frees ~250MB headroom; Apple queries may spill to temp disk on heavy songs but the cap is already implemented. Zero code change, instantly reversible.

2. **Stream Spotify's release iteration.** Replace `cur.fetchall()` in `integrations/spotify/db.py::get_releases_for_song` with a server-side cursor that yields batches of ~200 rows, and process per-batch inside `SpotifyMatcher.match_releases`. Caps Spotify's working set regardless of song popularity. ~30-50 LOC.

3. **Upgrade worker instance to 4GB on Render.** No code change. Buys runway but doesn't address the underlying scaling problem — the next high-coverage standard with double the releases will reproduce.

A complementary improvement worth tracking separately: a **wall-clock watchdog** on `match_song` handlers (abort + reschedule after, say, 30 min) so a stuck job can't pin DuckDB indefinitely the way the Ain't Misbehavin' run did.

## Repro

1. Queue a deep refresh on a song with > ~2000 releases (Summertime, Ain't Misbehavin', Body and Soul, Stardust, etc).
2. Watch `/admin/research/` (filter source=spotify or apple, job_type=match_song) — both jobs claim within seconds of each other.
3. Watch Render's memory chart for the worker; expect rapid climb to 70–85% during the in-flight window.

## Acceptance criteria

- Worker RSS stays below 70% (~1.4GB) during concurrent `match_song` runs on the heaviest songs in the catalog.
- Two heavy songs queued back-to-back do not OOM the worker.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker RSS plateaus at ~80% during concurrent match_song on heavy songs #193

Summary

Evidence

Observation 1 — Ain't Misbehavin' (yesterday, pre-DuckDB cap)

Observation 2 — Summertime (today, post-DuckDB cap)

Where the memory goes

What's already shipped

Proposed fixes (cheapest first)

Repro

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Tenant	Source of working set	Approx. size
Apple `match_song`	DuckDB buffer pool over Apple Music catalog parquets	capped at 512MB (post-eef620b)
Spotify `match_song`	`cur.fetchall()` of N-row release×recording join with JSON-aggregated performers, plus parsed per-release Spotify API responses held live for the duration of the loop	scales with releases — Summertime had 3642 rows
YouTube `match_recording`	per-recording client + matcher state across many concurrent jobs	tens of MB
Python heap	Arenas don't return to the OS after GC	sticky overhead

Worker RSS plateaus at ~80% during concurrent match_song on heavy songs #193

Description

Summary

Evidence

Observation 1 — Ain't Misbehavin' (yesterday, pre-DuckDB cap)

Observation 2 — Summertime (today, post-DuckDB cap)

Where the memory goes

What's already shipped

Proposed fixes (cheapest first)

Repro

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions