feat(cli): support any OpenAI-compatible vision endpoint for image captioning#1809
Open
yuemeng200 wants to merge 1 commit into
Open
feat(cli): support any OpenAI-compatible vision endpoint for image captioning#1809yuemeng200 wants to merge 1 commit into
yuemeng200 wants to merge 1 commit into
Conversation
…ible vision endpoint Allow any OpenAI-compatible vision API (Volcengine ARK, Azure OpenAI, Ollama, vLLM, etc.) to be used for image captioning during `hyperframes capture`. New env vars (highest priority, checked before OPENROUTER_API_KEY): HYPERFRAMES_VISION_API_KEY - bearer token for the custom endpoint HYPERFRAMES_VISION_BASE_URL - base URL (e.g. https://ark.cn-beijing.volces.com/api/v3) HYPERFRAMES_VISION_MODEL - model ID (required; warns and skips if missing) Priority order: HYPERFRAMES_VISION_* > OPENROUTER_API_KEY > GEMINI/GOOGLE_API_KEY The OpenRouter path is refactored to share the same openAiCompatCaptionOne helper used by the custom endpoint — no behaviour change for existing users. Adds 4 new tests covering: custom endpoint happy path, priority over OpenRouter, missing-model warning, and the no-key skip path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
hyperframes captureimage captioning feature currently supports only two hardcoded providers: Google Gemini and OpenRouter. This blocks users outside of Google's supported regions (e.g. mainland China) from getting AI-driven asset descriptions, since neither Gemini nor OpenRouter is easily accessible there.This PR adds three new environment variables that let any OpenAI-compatible vision endpoint be used:
HYPERFRAMES_VISION_API_KEYHYPERFRAMES_VISION_BASE_URLhttps://ark.cn-beijing.volces.com/api/v3)HYPERFRAMES_VISION_MODELProvider priority (first match wins):
HYPERFRAMES_VISION_*— custom OpenAI-compatible endpoint ← newOPENROUTER_API_KEY— OpenRouter (unchanged)GEMINI_API_KEY/GOOGLE_API_KEY— Google Gemini (unchanged)Example (Volcengine ARK / Doubao vision):
Works equally well with Azure OpenAI, local Ollama, self-hosted vLLM, or any other endpoint that speaks the OpenAI chat completions wire format with
image_urlcontent parts.Implementation notes
openAiCompatCaptionOnehelper — no behaviour change for existing OpenRouter users.HYPERFRAMES_VISION_API_KEYandHYPERFRAMES_VISION_BASE_URLare set butHYPERFRAMES_VISION_MODELis missing, a warning is pushed and captioning is skipped gracefully (same degradation pattern as a missing key).Tests
4 new unit tests added to
contentExtractor.test.ts:OPENROUTER_API_KEYwhen both are setHYPERFRAMES_VISION_MODEL→ warning emitted, fetch not calledHYPERFRAMES_VISION_API_KEYfor completenessAll 6 tests pass (
bunx vitest run). Build clean (bun run build). Lint + format clean (oxlint+oxfmt).