Skip to content

LCORE-2282: Normalize Vertex AI model IDs to workaround llama-stack 0.6.x bug#1818

Open
anik120 wants to merge 1 commit into
lightspeed-core:mainfrom
anik120:lcore-2282
Open

LCORE-2282: Normalize Vertex AI model IDs to workaround llama-stack 0.6.x bug#1818
anik120 wants to merge 1 commit into
lightspeed-core:mainfrom
anik120:lcore-2282

Conversation

@anik120
Copy link
Copy Markdown
Contributor

@anik120 anik120 commented May 29, 2026

Description

Fixes 500 error when using Vertex AI models with llama-stack 0.6.x.

Root cause: llama-stack 0.6.x inline::meta-reference responses provider normalizes model IDs before checking allowed_models, but doesn't normalize the allowed_models list itself. This causes validation to fail:

  • Model registered as: publishers/google/models/gemini-2.5-flash
  • llama-stack strips to: google/gemini-2.5-flash internally
  • Checks against allowed list with full prefix
  • Mismatch → 500 error

Solution: Strip publishers/google/models/ prefix before passing model ID to llama-stack, matching what it expects internally.

This workaround can be removed when upgrading to llama-stack 0.7.0+ which fixes the underlying bug via ogx-ai/ogx#5169

Signed-off-by: Anik Bhattacharjee anbhatta@redhat.com

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Bug Fixes

    • Improved Vertex AI model compatibility with the Responses API.
  • Tests

    • Added comprehensive tests for model ID normalization.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Warning

Review limit reached

@anik120, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 25 minutes and 3 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9a342a34-88fb-4232-89db-ff2ecf13cdf2

📥 Commits

Reviewing files that changed from the base of the PR and between 6bf6a6b and f528800.

📒 Files selected for processing (5)
  • src/app/endpoints/rlsapi_v1.py
  • src/utils/compaction.py
  • src/utils/query.py
  • src/utils/responses.py
  • tests/unit/utils/test_responses.py

Walkthrough

This PR adds Vertex AI model ID normalization to the Responses API integration. A new normalize_vertex_ai_model_id helper rewrites publishers/google/models/ prefixes to google/ to resolve a llama-stack 0.6.x validation mismatch. The helper is integrated into prepare_responses_params and validated by unit tests covering multiple ID formats.

Changes

Vertex AI Model ID Normalization

Layer / File(s) Summary
normalize_vertex_ai_model_id helper and test coverage
src/utils/responses.py, tests/unit/utils/test_responses.py
New normalize_vertex_ai_model_id function rewrites publishers/google/models/ prefixes to google/ while preserving non-Vertex and Gemini API model IDs. TestNormalizeVertexAIModelId verifies normalization across Vertex AI, versioned, non-Vertex, and models/... formats.
Integration into prepare_responses_params
src/utils/responses.py
prepare_responses_params now computes and uses a normalized model ID via the helper before constructing ResponsesApiParams instead of passing the raw model ID.

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: normalizing Vertex AI model IDs to fix a llama-stack 0.6.x compatibility issue, which aligns perfectly with the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

….6.x bug

Fixes 500 error when using Vertex AI models with llama-stack 0.6.x.

Root cause: llama-stack 0.6.x inline::meta-reference responses provider
normalizes model IDs before checking allowed_models, but doesn't normalize
the allowed_models list itself. This causes validation to fail:
- Model registered as: publishers/google/models/gemini-2.5-flash
- llama-stack strips to: google/gemini-2.5-flash internally
- Checks against allowed list with full prefix
- Mismatch → 500 error

Solution: Strip publishers/google/models/ prefix before passing model ID
to llama-stack, matching what it expects internally.

This workaround can be removed when upgrading to llama-stack 0.7.0+
which fixes the underlying bug via ogx-ai/ogx#5169

Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>
@tisnik tisnik requested a review from asimurka June 1, 2026 14:38
Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread src/utils/query.py
Normalized model ID with Vertex AI prefix stripped if present
"""
if model_id.startswith("publishers/google/models/"):
return model_id.replace("publishers/google/models/", "google/", 1)
Copy link
Copy Markdown
Contributor

@asimurka asimurka Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anik120 is "publishers/google/models" some consistent prefix? I thought that the fix will be more variable (namely replacing any model prefix with google/ not hard-coded)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants