Skip to content

feat(captions): whisper-small + spoken-language selector#22

Merged
joaothaira merged 1 commit into
mainfrom
feat/caption-model-small-language
Jun 11, 2026
Merged

feat(captions): whisper-small + spoken-language selector#22
joaothaira merged 1 commit into
mainfrom
feat/caption-model-small-language

Conversation

@ThairaHub

Copy link
Copy Markdown
Collaborator

Comparing the two caption stacks (decision tracked since the upstream merge): the fork's Electron-side subtitle script transcribes accurately while the in-app auto-captions were unreliable. Root cause isn't the pipeline — both run transformers.js Whisper — it's the model and a missing language hint:

In-app (before) Fork script (reference)
Model whisper-tiny (39M, quantized) whisper-small (244M)
Language autodetect forced (language + task: transcribe)

tiny's language detection misfires on non-English audio and the whole transcript degrades from there.

Changes

  • fetch-caption-model.mjs bundles Xenova/whisper-small (installer grows ~200 MB; accepted trade-off — quality was the blocker for shipping captions at all).
  • language threads from the auto-captions dialog → worker → transcriber options, skipping Whisper's detection pass. Defaults to the app UI locale.
  • Spoken language selector in the captions dialog (native-name labels, no translation burden).

This closes the quality gap with the fork's Electron stack using the packaged-app-ready pipeline — next step in retiring the duplicate stack while keeping the proven config.

Verification

tsc 0 errors · biome clean · vitest 225/225 · i18n-check 12 locales · vite build OK
Functional check pending: dev-mode caption run downloads whisper-small from the HF CDN on first use.

🤖 Generated with Claude Code

tiny with language autodetect was unreliable on non-English audio — a
misdetected language degrades the whole transcript. The fork's Electron
subtitle script (whisper-small, forced language) transcribes the same
clips accurately, so bring the in-app stack up to that configuration:

- bundle/load Xenova/whisper-small instead of whisper-tiny
- thread a Whisper language name from the auto-captions dialog through
  the worker to the transcriber ({ language, task: 'transcribe' }),
  skipping the detection pass; defaults to the app UI locale
- language selector (native-name labels) in the captions dialog

Installer grows by roughly the model-size delta (~200 MB). Dev fetches
the model from the HF CDN on first use as before.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@joaothaira joaothaira merged commit 2e849dd into main Jun 11, 2026
4 checks passed
@joaothaira joaothaira deleted the feat/caption-model-small-language branch June 11, 2026 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants