Skip to content

YouTube rematch sweep CLI (parity with rematch_spotify_duration_mismatches.py) #173

@dprodger

Description

@dprodger

Spotify has scripts/rematch_spotify_duration_mismatches.py (commit 0317b98) which walks all songs whose Spotify links have duration mismatches above a threshold and enqueues ('spotify', 'rematch_duration_mismatches') jobs onto the durable research worker queue. YouTube has no equivalent — to bulk-rematch, an admin has to invoke match_youtube_videos.py per song or per recording.

Ask

A scripts/rematch_youtube_videos.py (or similar name) that:

  1. Walks recordings where the existing YouTube link looks suspicious — the natural signals from the matcher's two-mode design:
    • match_confidence < 0.7
    • match_method = 'youtube_conservative'
    • --threshold-confidence <float> to override (default 0.7)
  2. Enqueues one ('youtube', 'match_recording') job per recording onto the research queue with payload={'rematch': True}. Existing handler picks them up.
  3. Honors match_method='manual' (skip — admin already verified).
  4. Same flag shape as the Spotify version: --dry-run (count + sample, no enqueue), --limit N, --debug.

Why it's worth a separate file from the admin review

The CLI is operational tooling — runnable from a Render shell, no Spotify-style "admin page workflow" coupling. Building it first would let an admin trigger broad rematches even before the review UI lands. The two issues can ship in either order.

Implementation notes

  • core/song_research.py already has the per-recording YouTube enqueue helper inline at line 67 — extract to a shared module function (e.g. core/youtube_rematch.py) so the new CLI calls the same thing.
  • Mirror the structure of core/spotify_rematch_mismatches.py. The "find candidates" SQL is what differs; the enqueue + sweep boilerplate is identical.
  • Tests: 9 cases in tests/test_spotify_rematch_mismatches.py is a good template — covering candidate discovery, dedup, threshold pass-through, error handling.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions