feat(captions): whisper-small + spoken-language selector#22
Merged
Conversation
tiny with language autodetect was unreliable on non-English audio — a
misdetected language degrades the whole transcript. The fork's Electron
subtitle script (whisper-small, forced language) transcribes the same
clips accurately, so bring the in-app stack up to that configuration:
- bundle/load Xenova/whisper-small instead of whisper-tiny
- thread a Whisper language name from the auto-captions dialog through
the worker to the transcriber ({ language, task: 'transcribe' }),
skipping the detection pass; defaults to the app UI locale
- language selector (native-name labels) in the captions dialog
Installer grows by roughly the model-size delta (~200 MB). Dev fetches
the model from the HF CDN on first use as before.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Comparing the two caption stacks (decision tracked since the upstream merge): the fork's Electron-side subtitle script transcribes accurately while the in-app auto-captions were unreliable. Root cause isn't the pipeline — both run transformers.js Whisper — it's the model and a missing language hint:
language+task: transcribe)tiny's language detection misfires on non-English audio and the whole transcript degrades from there.
Changes
fetch-caption-model.mjsbundles Xenova/whisper-small (installer grows ~200 MB; accepted trade-off — quality was the blocker for shipping captions at all).languagethreads from the auto-captions dialog → worker → transcriber options, skipping Whisper's detection pass. Defaults to the app UI locale.This closes the quality gap with the fork's Electron stack using the packaged-app-ready pipeline — next step in retiring the duplicate stack while keeping the proven config.
Verification
tsc 0 errors · biome clean · vitest 225/225 · i18n-check 12 locales · vite build OK
Functional check pending: dev-mode caption run downloads whisper-small from the HF CDN on first use.
🤖 Generated with Claude Code