Skip to content

feat(cli): support any OpenAI-compatible vision endpoint for image captioning#1809

Open
yuemeng200 wants to merge 1 commit into
heygen-com:mainfrom
yuemeng200:feat/vision-openai-compatible-endpoint
Open

feat(cli): support any OpenAI-compatible vision endpoint for image captioning#1809
yuemeng200 wants to merge 1 commit into
heygen-com:mainfrom
yuemeng200:feat/vision-openai-compatible-endpoint

Conversation

@yuemeng200

Copy link
Copy Markdown

Summary

The hyperframes capture image captioning feature currently supports only two hardcoded providers: Google Gemini and OpenRouter. This blocks users outside of Google's supported regions (e.g. mainland China) from getting AI-driven asset descriptions, since neither Gemini nor OpenRouter is easily accessible there.

This PR adds three new environment variables that let any OpenAI-compatible vision endpoint be used:

Variable Description
HYPERFRAMES_VISION_API_KEY Bearer token for the custom endpoint
HYPERFRAMES_VISION_BASE_URL Base URL (e.g. https://ark.cn-beijing.volces.com/api/v3)
HYPERFRAMES_VISION_MODEL Model ID (required; warns and skips if unset)

Provider priority (first match wins):

  1. HYPERFRAMES_VISION_* — custom OpenAI-compatible endpoint ← new
  2. OPENROUTER_API_KEY — OpenRouter (unchanged)
  3. GEMINI_API_KEY / GOOGLE_API_KEY — Google Gemini (unchanged)

Example (Volcengine ARK / Doubao vision):

HYPERFRAMES_VISION_API_KEY=ark-xxx
HYPERFRAMES_VISION_BASE_URL=https://ark.cn-beijing.volces.com/api/v3
HYPERFRAMES_VISION_MODEL=doubao-seed-2-0-mini-260428

Works equally well with Azure OpenAI, local Ollama, self-hosted vLLM, or any other endpoint that speaks the OpenAI chat completions wire format with image_url content parts.

Implementation notes

  • The OpenRouter fetch path is refactored to share the new openAiCompatCaptionOne helper — no behaviour change for existing OpenRouter users.
  • If HYPERFRAMES_VISION_API_KEY and HYPERFRAMES_VISION_BASE_URL are set but HYPERFRAMES_VISION_MODEL is missing, a warning is pushed and captioning is skipped gracefully (same degradation pattern as a missing key).

Tests

4 new unit tests added to contentExtractor.test.ts:

  • Custom endpoint happy path (verifies URL, auth header, model, base64 image payload)
  • Custom endpoint takes priority over OPENROUTER_API_KEY when both are set
  • Missing HYPERFRAMES_VISION_MODEL → warning emitted, fetch not called
  • No-key path now also stubs HYPERFRAMES_VISION_API_KEY for completeness

All 6 tests pass (bunx vitest run). Build clean (bun run build). Lint + format clean (oxlint + oxfmt).

…ible vision endpoint

Allow any OpenAI-compatible vision API (Volcengine ARK, Azure OpenAI, Ollama,
vLLM, etc.) to be used for image captioning during `hyperframes capture`.

New env vars (highest priority, checked before OPENROUTER_API_KEY):
  HYPERFRAMES_VISION_API_KEY   - bearer token for the custom endpoint
  HYPERFRAMES_VISION_BASE_URL  - base URL (e.g. https://ark.cn-beijing.volces.com/api/v3)
  HYPERFRAMES_VISION_MODEL     - model ID (required; warns and skips if missing)

Priority order: HYPERFRAMES_VISION_* > OPENROUTER_API_KEY > GEMINI/GOOGLE_API_KEY

The OpenRouter path is refactored to share the same openAiCompatCaptionOne
helper used by the custom endpoint — no behaviour change for existing users.

Adds 4 new tests covering: custom endpoint happy path, priority over OpenRouter,
missing-model warning, and the no-key skip path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant