LCORE-2282: Upgrade llama-stack to 0.7.0 to fix Vertex AI model name mismatch#1816
LCORE-2282: Upgrade llama-stack to 0.7.0 to fix Vertex AI model name mismatch#1816anik120 wants to merge 8 commits into
Conversation
|
Warning Review limit reached
More reviews will be available in 10 minutes and 34 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (11)
WalkthroughThis PR bumps llama-stack to 0.7.0, regenerates build and wheel lockfiles with updated pins, converts many example/test/run configs from agents/RAG → responses/file-search (provider wiring meta-reference → builtin), adds OpenAPI reasoning schemas, updates a constants value, extends response types, normalizes function-call outputs, and aligns unit tests and docker-compose. ChangesDependencies and Examples Update
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pyproject.toml (1)
151-157:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate stale provider-type comments to
builtin.These comments still reference
inline::meta-reference, which is outdated for llama-stack 0.7.0 and can mislead future dependency/provider checks.Also applies to: 176-176
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pyproject.toml` around lines 151 - 157, Update the stale provider-type comments that currently read "inline::meta-reference" to the new provider tag "builtin" in pyproject.toml (the commented dependency lines and the surrounding provider comment blocks); locate the comment instances around the API agents and API eval sections (the lines showing "inline::meta-reference") and replace those tokens with "builtin" including the additional occurrence noted near line 176 so the provider/type metadata matches llama-stack 0.7.0 expectations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.konflux/requirements-build.txt:
- Around line 153-155: Resolve the conflicting exact pins by removing or
consolidating duplicate requirements for the same package (specifically
reconcile hatchling==1.26.3 vs hatchling==1.29.0 and setuptools==78.1.1 vs
setuptools==82.0.1): pick the intended single version (or replace with a
compatible range like >= and < if truly needed) and update the autogenerated
requirements file so only one entry per distribution remains; after editing,
regenerate the requirements file with the correct build Python version (the
header currently says Python 3.13) to ensure compatibility and deterministic
builds.
---
Outside diff comments:
In `@pyproject.toml`:
- Around line 151-157: Update the stale provider-type comments that currently
read "inline::meta-reference" to the new provider tag "builtin" in
pyproject.toml (the commented dependency lines and the surrounding provider
comment blocks); locate the comment instances around the API agents and API eval
sections (the lines showing "inline::meta-reference") and replace those tokens
with "builtin" including the additional occurrence noted near line 176 so the
provider/type metadata matches llama-stack 0.7.0 expectations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: db468d63-6687-4a4c-850f-d3bf3b58f40f
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
.konflux/requirements-build.txt.konflux/requirements.hashes.source.txt.konflux/requirements.hashes.wheel.txtpyproject.tomlsrc/models/api/responses/successful/catalog.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-pr
- GitHub Check: Pylinter
- GitHub Check: unit_tests (3.12)
- GitHub Check: unit_tests (3.13)
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/models/api/responses/successful/catalog.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models must use
@model_validatorand@field_validatorfor validation and complete type annotations for all attributes, avoidingAnytype
Files:
src/models/api/responses/successful/catalog.py
🧠 Learnings (2)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/api/responses/successful/catalog.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/api/responses/successful/catalog.py
🔇 Additional comments (3)
.konflux/requirements.hashes.wheel.txt (1)
2-2: LGTM!Also applies to: 89-90, 109-110, 128-129, 199-200
.konflux/requirements-build.txt (1)
2-5: ⚡ Quick winAlign Python version used by
pybuild-deps compilewith the Konflux 3.12 target.
.konflux/requirements-build.txtis labeled “autogenerated by pip-compile with Python 3.13”, butscripts/konflux_requirements.shpins--python-version 3.12for theuv pip compilesteps and then runsuv run pybuild-deps compilewithout any Python-version/interpreter pin. This can lead to resolver/marker skew across build artifacts—either explicitly runpybuild-deps compileunder Python 3.12 (viauv run/config or a supported flag) or regenerate the file so its header and resolved markers match 3.12.src/models/api/responses/successful/catalog.py (1)
217-218: LGTM!
| setuptools==78.1.1 | ||
| # via oci | ||
| setuptools==82.0.1 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify duplicate package pins with conflicting versions in build lockfile
python - <<'PY'
import re
from collections import defaultdict
path = ".konflux/requirements-build.txt"
pins = defaultdict(set)
with open(path, "r", encoding="utf-8") as f:
for i, line in enumerate(f, 1):
m = re.match(r'^([A-Za-z0-9_.-]+)==([^\s]+)\s*$', line.strip())
if m:
pins[m.group(1).lower()].add((m.group(2), i))
conflicts = {k: v for k, v in pins.items() if len({ver for ver, _ in v}) > 1}
if not conflicts:
print("No conflicting duplicate pins found.")
else:
print("Conflicting duplicate pins:")
for pkg, versions in sorted(conflicts.items()):
details = ", ".join(f"{ver} (Line {ln})" for ver, ln in sorted(versions, key=lambda x: x[1]))
print(f" {pkg}: {details}")
PYRepository: lightspeed-core/lightspeed-stack
Length of output: 203
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file=".konflux/requirements-build.txt"
echo "=== File head (first 80 lines) ==="
nl -ba "$file" | head -n 80
echo
echo "=== Lines 120-190 ==="
nl -ba "$file" | sed -n '120,190p'Repository: lightspeed-core/lightspeed-stack
Length of output: 172
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file=".konflux/requirements-build.txt"
echo "=== Lines 1-80 ==="
cat -n "$file" | sed -n '1,80p'
echo
echo "=== Lines 140-175 ==="
cat -n "$file" | sed -n '140,175p'
echo
echo "=== Lines mentioning setuptools/hatchling (all occurrences) ==="
rg -n "^(setuptools|hatchling)==|setuptools==|hatchling==" "$file" || trueRepository: lightspeed-core/lightspeed-stack
Length of output: 3577
Fix conflicting exact pins in .konflux/requirements-build.txt (autogenerated)
The file contains incompatible == pins for the same packages:
hatchling==1.26.3(line 43) andhatchling==1.29.0(line 48)setuptools==78.1.1(line 153) andsetuptools==82.0.1(line 155, “unsafe” section)
With two different == requirements for the same distribution, the requirements input is inherently conflicting (pip can’t satisfy both exact versions), breaking deterministic builds.
Also ensure regeneration/compatibility with the actual build Python version, since the header states it was generated with Python 3.13 (lines 2-6).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.konflux/requirements-build.txt around lines 153 - 155, Resolve the
conflicting exact pins by removing or consolidating duplicate requirements for
the same package (specifically reconcile hatchling==1.26.3 vs hatchling==1.29.0
and setuptools==78.1.1 vs setuptools==82.0.1): pick the intended single version
(or replace with a compatible range like >= and < if truly needed) and update
the autogenerated requirements file so only one entry per distribution remains;
after editing, regenerate the requirements file with the correct build Python
version (the header currently says Python 3.13) to ensure compatibility and
deterministic builds.
3131604 to
afb3a44
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/azure-run.yaml (1)
145-148:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse a real Llama Guard model for
provider_shield_id.
provider_shield_id: openai/gpt-4o-minipoints at a chat model, not the guard model. That makes this shield look configured without actually validating requests through Llama Guard. Please switch this to the actual guard model identifier and mirror that fix in the sibling example configs touched by this PR.Based on learnings: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set
provider_shield_idto the guard model identifier and do not use a chat/generative model id.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/azure-run.yaml` around lines 145 - 148, The shield entry uses a chat model id instead of the Llama Guard model id: update the shield with shield_id "llama-guard" by replacing provider_shield_id "openai/gpt-4o-mini" with the actual Llama Guard model identifier (use the guard model id required by your provider), and apply the identical change to the sibling example configs touched in this PR so every Llama Guard shield uses the guard model id rather than a chat/generative model id.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/models/api/responses/successful/catalog.py`:
- Around line 217-218: The example API bucket key currently uses "agents" and
needs to be renamed to "responses" so the published OpenAPI example matches the
new config surface; update the example object key (the one that contains entries
like "provider_id": "builtin" and "provider_type": "inline::builtin") from
"agents" to "responses" in the catalog example so /providers displays the
correct name after migration.
In `@tests/configuration/run.yaml`:
- Around line 20-22: The config defines providers.responses
(provider_id/provider_type) but does not enable the responses API; update the
YAML to expose the responses API by enabling the responses feature flag
alongside the existing agents flag (i.e., add a top-level responses: true entry
next to agents: true) so providers.responses is actually active for the
migration; ensure the provider block (provider_id: builtin / provider_type:
inline::builtin) remains unchanged.
---
Outside diff comments:
In `@examples/azure-run.yaml`:
- Around line 145-148: The shield entry uses a chat model id instead of the
Llama Guard model id: update the shield with shield_id "llama-guard" by
replacing provider_shield_id "openai/gpt-4o-mini" with the actual Llama Guard
model identifier (use the guard model id required by your provider), and apply
the identical change to the sibling example configs touched in this PR so every
Llama Guard shield uses the guard model id rather than a chat/generative model
id.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4bc096b8-b60c-41a2-bdeb-3cd336012807
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (25)
.konflux/requirements-build.txt.konflux/requirements.hashes.source.txt.konflux/requirements.hashes.wheel.txtexamples/azure-run.yamlexamples/bedrock-run.yamlexamples/run.yamlexamples/vertexai-run.yamlexamples/vllm-rhaiis.yamlexamples/vllm-rhelai.yamlexamples/vllm-rhoai.yamlexamples/watsonx-run.yamlpyproject.tomlrun.yamlsrc/constants.pysrc/models/api/responses/successful/catalog.pytests/configuration/run.yamltests/e2e-prow/rhoai/configs/run.yamltests/e2e/configs/run-azure.yamltests/e2e/configs/run-bedrock.yamltests/e2e/configs/run-ci.yamltests/e2e/configs/run-rhaiis.yamltests/e2e/configs/run-rhelai.yamltests/e2e/configs/run-vertexai.yamltests/e2e/configs/run-watsonx.yamltests/unit/app/endpoints/test_tools.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: build-pr
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/constants.pysrc/models/api/responses/successful/catalog.py
src/constants.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
constants.pyfor shared constants with descriptive comments and type hints usingFinal[type]
Files:
src/constants.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models must use
@model_validatorand@field_validatorfor validation and complete type annotations for all attributes, avoidingAnytype
Files:
src/models/api/responses/successful/catalog.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/app/endpoints/test_tools.py
🧠 Learnings (4)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/api/responses/successful/catalog.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/api/responses/successful/catalog.py
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.
Applied to files:
tests/e2e/configs/run-azure.yamltests/configuration/run.yamlexamples/vertexai-run.yamltests/e2e/configs/run-bedrock.yamltests/e2e/configs/run-ci.yamlexamples/azure-run.yamlrun.yamlexamples/run.yamlexamples/bedrock-run.yamltests/e2e/configs/run-vertexai.yamltests/e2e/configs/run-rhaiis.yamltests/e2e/configs/run-rhelai.yamlexamples/vllm-rhelai.yamlexamples/watsonx-run.yamlexamples/vllm-rhoai.yamltests/e2e/configs/run-watsonx.yamlexamples/vllm-rhaiis.yamltests/e2e-prow/rhoai/configs/run.yaml
📚 Learning: 2025-12-18T10:21:03.056Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:03.056Z
Learning: In run.yaml for Llama Stack 0.3.x, do not add a telemetry provider block under providers. To enable telemetry, set telemetry.enabled: true directly (no provider block required). This pattern applies specifically to run.yaml configuration in this repository.
Applied to files:
run.yaml
🔇 Additional comments (19)
.konflux/requirements-build.txt (1)
43-48: Duplicate exact pins remain unresolved.
hatchlingandsetuptoolsare still pinned to two conflicting exact versions in this generated file (Line 43/Line 48 and Line 153/Line 155), which matches the previously raised finding.Also applies to: 153-155
pyproject.toml (1)
31-33: LGTM!src/constants.py (1)
10-10: LGTM!.konflux/requirements.hashes.wheel.txt (1)
2-2: LGTM!Also applies to: 89-90, 109-110, 128-129, 199-200
examples/azure-run.yaml (1)
5-5: LGTM!Also applies to: 53-55, 66-66, 75-76, 102-103, 158-159
examples/bedrock-run.yaml (1)
4-4: LGTM!Also applies to: 59-61, 72-72, 81-82, 108-109, 157-158
examples/run.yaml (1)
10-10: LGTM!Also applies to: 61-63, 74-74, 83-84, 110-111, 163-164
examples/vertexai-run.yaml (1)
5-5: LGTM!Also applies to: 53-55, 66-66, 75-76, 102-103, 154-155
examples/vllm-rhaiis.yaml (1)
5-5: LGTM!Also applies to: 55-56, 64-64, 73-74, 100-101, 142-143
examples/vllm-rhelai.yaml (1)
5-5: LGTM!Also applies to: 55-56, 64-64, 73-74, 100-101, 142-143
examples/vllm-rhoai.yaml (1)
5-5: LGTM!Also applies to: 55-56, 64-64, 73-74, 100-101, 142-143
examples/watsonx-run.yaml (1)
4-4: LGTM!Also applies to: 61-63, 74-74, 83-84, 110-111, 166-167
run.yaml (1)
4-4: LGTM!Also applies to: 54-56, 64-64, 73-74, 100-101, 139-142
tests/e2e-prow/rhoai/configs/run.yaml (1)
5-5: LGTM!Also applies to: 56-58, 69-69, 78-79, 105-106, 161-162
tests/e2e/configs/run-azure.yaml (1)
5-5: LGTM!Also applies to: 62-64, 69-69, 78-79, 105-106, 154-155
tests/e2e/configs/run-bedrock.yaml (1)
5-5: LGTM!Also applies to: 60-62, 67-67, 76-77, 103-104, 152-153
tests/e2e/configs/run-ci.yaml (1)
5-5: LGTM!Also applies to: 55-57, 62-62, 71-72, 98-99, 143-144
tests/e2e/configs/run-rhaiis.yaml (1)
5-5: LGTM!Also applies to: 62-64, 69-69, 78-79, 105-106, 154-155
tests/e2e/configs/run-rhelai.yaml (1)
5-5: LGTM!Also applies to: 62-64, 69-69, 78-79, 105-106, 154-155
| "provider_id": "builtin", | ||
| "provider_type": "inline::builtin", |
There was a problem hiding this comment.
Rename the example API bucket from agents to responses.
This example still nests the builtin provider under "agents", so the /providers OpenAPI example will advertise the old API name after the 0.7.0 migration. Update the example key at Line 215 to "responses" so the published schema matches the new config surface.
Suggested fix
- "agents": [
+ "responses": [
{
"provider_id": "builtin",
"provider_type": "inline::builtin",
},
],📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "provider_id": "builtin", | |
| "provider_type": "inline::builtin", | |
| "responses": [ | |
| { | |
| "provider_id": "builtin", | |
| "provider_type": "inline::builtin", | |
| }, | |
| ], |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/models/api/responses/successful/catalog.py` around lines 217 - 218, The
example API bucket key currently uses "agents" and needs to be renamed to
"responses" so the published OpenAPI example matches the new config surface;
update the example object key (the one that contains entries like "provider_id":
"builtin" and "provider_type": "inline::builtin") from "agents" to "responses"
in the catalog example so /providers displays the correct name after migration.
fdaef5d to
f034340
Compare
…mismatch Upgrades llama-stack from 0.6.0 to 0.7.0 to fix critical Vertex AI model name validation bug where queries fail with HTTP 500 when the optional `model` field is omitted. See: https://redhat.atlassian.net/browse/LCORE-2282 for more info. The fix ogx-ai/ogx#5169 is included in the [0.7.0](https://github.com/ogx-ai/ogx/releases/tag/v0.7.0) release. Breaking Changes Addressed in llama-stack 0.7.0: - **Agents API → Responses API** (#5195): Updated all configuration files to use the new `responses` API - **Provider rename: meta-reference → builtin** (#5131): Updated all provider configurations from `inline::meta-reference` to `inline::builtin` - **RAG refactor: rag-runtime → file-search** (#5186): Updated tool_runtime providers from `inline::rag-runtime` to `inline::file-search` and toolgroups from `builtin::rag` to `builtin::file_search`
The lightspeed-stack service needs access to run.yaml which contains the llama-stack configuration (APIs, providers, etc.). Without this mount, the container uses a stale copy baked into the image, causing failures when testing breaking changes like agents → responses API.
In llama-stack 0.7.0, tool_groups was split out from tool_runtime into its own API. It must be explicitly added to the apis list for toolgroup registration/listing to work in library mode.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
examples/vllm-rhelai.yaml (1)
4-14:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
tool_groupsAPI missing fromapiswhile a toolgroup is registered.Same issue as flagged in
examples/vertexai-run.yaml: this config registersbuiltin::file_search(Lines 141-143) but omitstool_groupsfrom theapislist, unlike the other migrated configs in this PR. Addtool_groupsto keep wiring consistent and ensure the toolgroup is loaded.🔧 Proposed fix
- tool_runtime +- tool_groups - vector_io🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/vllm-rhelai.yaml` around lines 4 - 14, The apis list in examples/vllm-rhelai.yaml omits tool_groups while a toolgroup is registered (builtin::file_search); update the apis array to include "tool_groups" so the toolgroup wiring loads properly—locate the apis block and add "tool_groups" alongside the other entries (ensuring consistency with other configs like examples/vertexai-run.yaml) so the registered builtin::file_search is picked up at startup.examples/watsonx-run.yaml (1)
3-13:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
tool_groupsAPI missing fromapiswhile a toolgroup is registered.Same issue as flagged in
examples/vertexai-run.yaml: this config registersbuiltin::file_search(Lines 165-167) but omitstool_groupsfrom theapislist, unlike the other migrated configs in this PR. Addtool_groupsfor consistency and to ensure the toolgroup loads.🔧 Proposed fix
- tool_runtime +- tool_groups - vector_io🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/watsonx-run.yaml` around lines 3 - 13, The apis list is missing the tool_groups entry while a toolgroup (builtin::file_search) is registered; update the apis array in examples/watsonx-run.yaml to include "tool_groups" alongside the other entries so the registered toolgroup loads correctly and matches the other migrated configs.tests/e2e/configs/run-rhelai.yaml (1)
146-149:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
provider_shield_iduses a chat model id rather than a guard model.As in the other config,
provider_shield_id: openai/gpt-4o-miniis a generative chat model; this does not confirm the Llama Guard shield is gating queries. Use a real guard model identifier and ensure the e2e coverage exercises an actual Llama Guard model so shield effectiveness is verified. Line is outside the diff—flagging for confirmation.Based on learnings: in Llama-stack config YAMLs, set
provider_shield_idto the guard model identifier and not a chat/generative model id, and ensure E2E tests exercise a real Llama Guard model to verify the shield is effective.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/e2e/configs/run-rhelai.yaml` around lines 146 - 149, The YAML uses a chat/generative model id for provider_shield_id (provider_shield_id: openai/gpt-4o-mini) which won't exercise the Llama Guard shield; update the shield entry (shield_id: llama-guard, provider_id: llama-guard) to reference a real Llama Guard model identifier (replace provider_shield_id with the guard model's ID used by your deployment) and update any related test setup so the E2E flow sends inputs that should be gated (e.g., a disallowed prompt) to confirm the shield actually blocks/flags requests; verify shield behavior in the test assertions and adjust any mock/provider mappings that reference provider_shield_id accordingly.examples/vllm-rhoai.yaml (1)
134-137:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
provider_shield_idpoints at a chat model, not a guard model.
provider_shield_id: openai/gpt-4o-miniis a generative chat model id; using it here means only an override landed and does not guarantee the Llama Guard shield is actually gating queries. Prefer a real guard model identifier (e.g.meta-llama/Llama-Guard-3-8B). This line is outside the diff, so flagging for confirmation rather than as a regression of this PR.Based on learnings: in Llama-stack config YAMLs, set
provider_shield_idto the guard model identifier and not a chat/generative model id, since a chat-model id indicates only an override landed and does not mean the safety shield is effective.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/vllm-rhoai.yaml` around lines 134 - 137, The shields entry uses a chat/generative model id in provider_shield_id which only acts as an override and may not enable the Llama Guard shield; change the provider_shield_id value under the shields block (with shield_id: llama-guard and provider_id: llama-guard) to a real guard model identifier (for example meta-llama/Llama-Guard-3-8B) so the guard model is actually used for gating rather than a chat model override.
♻️ Duplicate comments (1)
src/models/api/responses/successful/catalog.py (1)
215-220:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRename the example API bucket
agents→responses.The provider fields were updated to
builtin/inline::builtin, but the surrounding key at Line 215 is still"agents". After the 0.7.0 Agents→Responses migration the published/providersOpenAPI example will advertise the stale API name.Suggested fix
- "agents": [ + "responses": [ { "provider_id": "builtin", "provider_type": "inline::builtin", }, ],🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/models/api/responses/successful/catalog.py` around lines 215 - 220, Update the example API payload in src/models/api/responses/successful/catalog.py by renaming the top-level key "agents" to "responses" so the OpenAPI example matches the post-0.7.0 Agents→Responses migration; locate the JSON/example dict literal that currently contains the "agents" array (with provider entries using "provider_id": "builtin" and "provider_type": "inline::builtin") and change the key to "responses" so the published /providers example advertises the correct API name.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/openapi.json`:
- Around line 1691-1695: The docs/openapi.json file is out of sync with the
generated OpenAPI schema; regenerate the spec by running the provided generation
script so CI passes: run the command shown in the CI output (uv run python
scripts/generate_openapi_schema.py docs/openapi.json) to recreate
docs/openapi.json from the source schema (scripts/generate_openapi_schema.py)
rather than editing the file manually, then commit the regenerated file; repeat
for other affected ranges if necessary.
In `@src/utils/conversations.py`:
- Around line 106-124: _extract_function_output_content currently drops
dict-shaped list elements and can return non-string values; update it to mirror
_extract_text_from_content: when iterating a list, handle dict parts by checking
keys like "type" and fetching "text" or "content" fields (fall back to ""),
treat objects with attributes the same as today, and collect those text pieces;
for the non-list branch ensure you always return a string (e.g., return "" for
None or otherwise str(output)). Also ensure the function's behavior matches how
ToolResultSummary.content expects a str.
---
Outside diff comments:
In `@examples/vllm-rhelai.yaml`:
- Around line 4-14: The apis list in examples/vllm-rhelai.yaml omits tool_groups
while a toolgroup is registered (builtin::file_search); update the apis array to
include "tool_groups" so the toolgroup wiring loads properly—locate the apis
block and add "tool_groups" alongside the other entries (ensuring consistency
with other configs like examples/vertexai-run.yaml) so the registered
builtin::file_search is picked up at startup.
In `@examples/vllm-rhoai.yaml`:
- Around line 134-137: The shields entry uses a chat/generative model id in
provider_shield_id which only acts as an override and may not enable the Llama
Guard shield; change the provider_shield_id value under the shields block (with
shield_id: llama-guard and provider_id: llama-guard) to a real guard model
identifier (for example meta-llama/Llama-Guard-3-8B) so the guard model is
actually used for gating rather than a chat model override.
In `@examples/watsonx-run.yaml`:
- Around line 3-13: The apis list is missing the tool_groups entry while a
toolgroup (builtin::file_search) is registered; update the apis array in
examples/watsonx-run.yaml to include "tool_groups" alongside the other entries
so the registered toolgroup loads correctly and matches the other migrated
configs.
In `@tests/e2e/configs/run-rhelai.yaml`:
- Around line 146-149: The YAML uses a chat/generative model id for
provider_shield_id (provider_shield_id: openai/gpt-4o-mini) which won't exercise
the Llama Guard shield; update the shield entry (shield_id: llama-guard,
provider_id: llama-guard) to reference a real Llama Guard model identifier
(replace provider_shield_id with the guard model's ID used by your deployment)
and update any related test setup so the E2E flow sends inputs that should be
gated (e.g., a disallowed prompt) to confirm the shield actually blocks/flags
requests; verify shield behavior in the test assertions and adjust any
mock/provider mappings that reference provider_shield_id accordingly.
---
Duplicate comments:
In `@src/models/api/responses/successful/catalog.py`:
- Around line 215-220: Update the example API payload in
src/models/api/responses/successful/catalog.py by renaming the top-level key
"agents" to "responses" so the OpenAPI example matches the post-0.7.0
Agents→Responses migration; locate the JSON/example dict literal that currently
contains the "agents" array (with provider entries using "provider_id":
"builtin" and "provider_type": "inline::builtin") and change the key to
"responses" so the published /providers example advertises the correct API name.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1e6b0ce9-c05d-40fa-987f-c4521ad45957
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (29)
.konflux/requirements-build.txt.konflux/requirements.hashes.source.txt.konflux/requirements.hashes.wheel.txtdocker-compose.yamldocs/openapi.jsonexamples/azure-run.yamlexamples/bedrock-run.yamlexamples/run.yamlexamples/vertexai-run.yamlexamples/vllm-rhaiis.yamlexamples/vllm-rhelai.yamlexamples/vllm-rhoai.yamlexamples/watsonx-run.yamlpyproject.tomlrun.yamlsrc/constants.pysrc/models/api/responses/successful/catalog.pysrc/models/common/responses/types.pysrc/utils/conversations.pytests/configuration/run.yamltests/e2e-prow/rhoai/configs/run.yamltests/e2e/configs/run-azure.yamltests/e2e/configs/run-bedrock.yamltests/e2e/configs/run-ci.yamltests/e2e/configs/run-rhaiis.yamltests/e2e/configs/run-rhelai.yamltests/e2e/configs/run-vertexai.yamltests/e2e/configs/run-watsonx.yamltests/unit/app/endpoints/test_tools.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: check
- GitHub Check: Pylinter
- GitHub Check: unit_tests (3.13)
- GitHub Check: build-pr
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/constants.pysrc/models/api/responses/successful/catalog.pysrc/models/common/responses/types.pysrc/utils/conversations.py
src/constants.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
constants.pyfor shared constants with descriptive comments and type hints usingFinal[type]
Files:
src/constants.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models must use
@model_validatorand@field_validatorfor validation and complete type annotations for all attributes, avoidingAnytype
Files:
src/models/api/responses/successful/catalog.pysrc/models/common/responses/types.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/app/endpoints/test_tools.py
🧠 Learnings (4)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/api/responses/successful/catalog.pysrc/models/common/responses/types.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/api/responses/successful/catalog.pysrc/models/common/responses/types.py
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.
Applied to files:
tests/configuration/run.yamldocker-compose.yamlexamples/azure-run.yamltests/e2e/configs/run-vertexai.yamlexamples/bedrock-run.yamltests/e2e/configs/run-watsonx.yamltests/e2e/configs/run-azure.yamltests/e2e/configs/run-rhaiis.yamlrun.yamlexamples/run.yamltests/e2e/configs/run-ci.yamltests/e2e/configs/run-bedrock.yamlexamples/vertexai-run.yamltests/e2e-prow/rhoai/configs/run.yamlexamples/vllm-rhelai.yamlexamples/watsonx-run.yamlexamples/vllm-rhaiis.yamlexamples/vllm-rhoai.yamltests/e2e/configs/run-rhelai.yaml
📚 Learning: 2025-12-18T10:21:03.056Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:03.056Z
Learning: In run.yaml for Llama Stack 0.3.x, do not add a telemetry provider block under providers. To enable telemetry, set telemetry.enabled: true directly (no provider block required). This pattern applies specifically to run.yaml configuration in this repository.
Applied to files:
run.yaml
🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt
docs/openapi.json
[error] 1-1: CI check failed: docs/openapi.json is out of date (diff between docs/openapi.json and /tmp/openapi-generated.json). Regenerate with: uv run python scripts/generate_openapi_schema.py docs/openapi.json
🔇 Additional comments (38)
src/constants.py (1)
10-10: LGTM!tests/configuration/run.yaml (2)
46-47: LGTM!Also applies to: 89-93, 143-144
21-31: ⚡ Quick winConfirm
inline::builtinResponses provider schema forpersistence.agent_state
Intests/configuration/run.yaml(lines 21-31), theinline::builtinresponses provider config sets bothpersistence.agent_state(namespace/backend) andpersistence.responses(table_name/backend). Llama Stack v0.7.0 docs forinline::builtinreferencepersistence.responses.*, but don’t listpersistence.agent_state; if that key isn’t part of the provider’s schema, config validation at startup would fail. Confirm whetherpersistence.agent_stateis supported by theinline::builtinresponses provider (or belongs under a different provider/store config).docker-compose.yaml (1)
87-87: LGTM!src/utils/conversations.py (1)
283-283: LGTM!examples/azure-run.yaml (1)
5-5: LGTM!Also applies to: 53-55, 66-76, 102-103, 158-159
pyproject.toml (1)
31-33: ⚡ Quick winConfirm
0.7.0exists for all pinned packages (and flag compatibility as dependency-driven).PyPI lists
0.7.0for all three:llama-stack,llama-stack-client, andllama-stack-api, so the pins are not referencing a non-existent release. Dependency compatibility still depends on whatllama-stack-client==0.7.0/llama-stack-api==0.7.0declare in theirrequires_distforllama-stack.src/models/common/responses/types.py (1)
39-41: ⚡ Quick winCheck
OpenAIResponseOutputMessageReasoningItemis exported byllama_stack_api.openai_responsesin the pinnedllama_stack_api==0.7.0.
src/models/common/responses/types.pyimportsOpenAIResponseOutputMessageReasoningItem as ReasoningItemand adds it toResponseItem; if that symbol isn’t actually exported under that exact name in0.7.0, the module will raise an ImportError at load time..konflux/requirements-build.txt (1)
43-48: 🏗️ Heavy liftConflicting exact pins remain for the same distributions.
hatchling==1.26.3(Line 43) andhatchling==1.29.0(Line 48) — plussetuptools==78.1.1(Line 153) andsetuptools==82.0.1(Line 155) — are mutually unsatisfiable==constraints that break deterministic resolution. Regenerate this autogenerated file so each distribution has a single pin.tests/e2e/configs/run-watsonx.yaml (2)
62-69: ⚡ Quick win
file-searchenabled with emptyvector_io— same concern as run-vertexai.yaml.Verify file search resolves a vector backend given
vector_io: []whiledefault_provider_id: faissis set.
5-5: LGTM!Also applies to: 14-14, 70-80, 106-107, 150-152
tests/e2e/configs/run-azure.yaml (2)
62-69: ⚡ Quick win
file-searchenabled with emptyvector_io— same concern as run-vertexai.yaml.
5-5: LGTM!Also applies to: 14-14, 70-80, 106-107, 155-156
tests/e2e/configs/run-rhaiis.yaml (2)
62-69: ⚡ Quick win
file-searchenabled with emptyvector_io— same concern as run-vertexai.yaml.
5-5: LGTM!Also applies to: 14-14, 70-80, 106-107, 155-156
tests/e2e/configs/run-vertexai.yaml (2)
69-79: LGTM!Also applies to: 105-106
62-68: ⚡ Quick win
file-searchenabled whilevector_iois empty — verify runtime vector backend wiring
run-vertexai.yamlenables an inlinefile-searchtool but declaresvector_io: [](lines 62-68). If this e2e config is used standalone (orvector_io: []overrides/doesn’t inherit the repo’s baserun.yaml), then thefile-searchimplementation may have no registered vector backend even though the repo’srun.yamlconfiguresvector_iowith FAISS (provider_id: faiss,provider_type: inline::faiss).
Confirm the e2e config composition/override behavior and thatvector_stores.default_provider_id: faiss(line 153) resolves to an actualvector_ioprovider during the test run; otherwisefile-searchcalls can fail at runtime.examples/bedrock-run.yaml (2)
59-61: LGTM!Also applies to: 72-82, 108-109, 157-158
3-13: ⚡ Quick winAdd
tool_groupsto theapislist inexamples/bedrock-run.yaml
Theapislist excerpt (lines 3-13) omitstool_groups, but this config registers abuiltin::file_searchtoolgroup later (around lines 157-158). Iftool_groupsisn’t advertised, thefile_searchtoolgroup may not be served.🔧 Proposed fix
- tool_runtime +- tool_groups - vector_io.konflux/requirements.hashes.wheel.txt (1)
1-2: LGTM!run.yaml (2)
3-13: LGTM!Also applies to: 55-57, 101-102, 140-144
65-75: ⚡ Quick winValidate
persistence.agent_statesupport ininline::builtinResponses (0.7.0)Docs/code for
inline::builtinResponses persistence describepersistence.responses.*(e.g.,table_name,backend, etc.) and persist via aresponses_store; they don’t mention anagent_state/agents_statekey for the built-in Responses provider.run.yamllines 65-75 includes both:persistence: agent_state: namespace: agents_state backend: kv_default responses: table_name: agents_responses backend: sql_defaultConfirm
persistence.agent_stateis actually accepted by theinline::builtinResponses provider; if not, it likely belongs under the inline Agents provider config instead of here.examples/run.yaml (2)
9-20: LGTM!
62-64: LGTM!Also applies to: 75-85, 111-112, 164-165
tests/unit/app/endpoints/test_tools.py (2)
505-505: LGTM!Also applies to: 538-538, 613-613, 664-664
911-914: LGTM!Also applies to: 968-971, 991-991, 1010-1013, 1083-1084, 1108-1108, 1128-1131
tests/e2e/configs/run-ci.yaml (1)
4-15: LGTM!Also applies to: 56-58, 63-73, 99-100, 143-145
tests/e2e/configs/run-bedrock.yaml (1)
4-15: LGTM!Also applies to: 61-63, 68-78, 104-105, 153-154
examples/vertexai-run.yaml (2)
53-55: LGTM!Also applies to: 66-76, 102-103, 154-155
4-14: ⚡ Quick winRefactor the concern:
tool_groupsinapisis likely not required for llama-stack 0.7.0
In llama-stack 0.7.0+, the ToolGroups API (/v1/toolgroups) is removed and tool groups are auto-registered based on configuredtool_runtimeproviders (via providertoolgroup_id). So this config’s omission oftool_groupsfromapisis likely expected, and adding it may be unnecessary/incorrect. The actionable check is whether the configuredtool_runtimeprovider forbuiltin::file_searchmaps/registers the tool group (e.g., correcttoolgroup_id).tests/e2e-prow/rhoai/configs/run.yaml (1)
4-15: LGTM!Also applies to: 57-59, 70-80, 106-107, 162-163
examples/vllm-rhelai.yaml (1)
55-56: LGTM!Also applies to: 64-74, 100-101, 142-143
examples/watsonx-run.yaml (1)
61-63: LGTM!Also applies to: 74-84, 110-111, 166-167
examples/vllm-rhaiis.yaml (1)
5-5: LGTM!Also applies to: 55-56, 64-74, 100-101, 142-143
examples/vllm-rhoai.yaml (2)
55-56: LGTM!Also applies to: 64-74, 100-101, 142-143
4-14: ⚡ Quick winAdd
tool_groupstoapiswhen registeringregistered_resources.tool_groups(file_search)
examples/vllm-rhoai.yaml’sapislist shown here omitstool_groups, even though this config registersregistered_resources.tool_groupsfor the file-search capability in the original report. Please confirm that llama-stack 0.7.0 requirestool_groupsto be declared inapisfor tool group registration to take effect, then alignexamples/vllm-rhoai.yamlwith the corresponding e2e run config(s) that include- tool_groups.tests/e2e/configs/run-rhelai.yaml (2)
63-65: LGTM!Also applies to: 70-80, 106-107, 155-156
4-15: ⚡ Quick winConfirm
tool_groupsis a valid top-levelapisentry in llama-stack 0.7.0 and makerun-rhelai.yamlconsistent with the sibling configs.
tests/e2e/configs/run-rhelai.yamladds- tool_groupsunderapis, while the siblingexamples/vllm-rhoai.yamlomits it. Ifapisonly supports composite APIs (and ToolGroups are auto-registered from configuredtool_runtimeproviders), thentool_groupshere is either redundant or may fail config validation at startup—so reconcile the two configs based on the repo’s config schema/validator expectations.
| "providers": { | ||
| "agents": [ | ||
| { | ||
| "provider_id": "meta-reference", | ||
| "provider_type": "inline::meta-reference" | ||
| "provider_id": "builtin", | ||
| "provider_type": "inline::builtin" |
There was a problem hiding this comment.
Regenerate the OpenAPI spec to fix CI failure.
The pipeline failure indicates this file is out of sync with the generated output. This is an auto-generated file that should not be manually edited.
Run the regeneration command from the CI output:
uv run python scripts/generate_openapi_schema.py docs/openapi.jsonThe schema changes themselves (provider rename to builtin, new reasoning schemas, discriminator wiring) appear correct and align with the llama-stack 0.7.0 upgrade.
Also applies to: 14836-14840, 14891-14895, 15264-15268, 15706-15710, 15933-16042, 16145-16162, 17031-17035, 18074-18079, 18468-18473, 18482-18485, 19298-19301
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/openapi.json` around lines 1691 - 1695, The docs/openapi.json file is
out of sync with the generated OpenAPI schema; regenerate the spec by running
the provided generation script so CI passes: run the command shown in the CI
output (uv run python scripts/generate_openapi_schema.py docs/openapi.json) to
recreate docs/openapi.json from the source schema
(scripts/generate_openapi_schema.py) rather than editing the file manually, then
commit the regenerated file; repeat for other affected ranges if necessary.
| def _extract_function_output_content(output: Any) -> str: | ||
| """Extract string content from function call output. | ||
|
|
||
| In llama-stack 0.7.0+, function output can be either a string or a list of content parts. | ||
| This helper extracts text from both formats. | ||
|
|
||
| Args: | ||
| output: Function output (str or list of content parts) | ||
|
|
||
| Returns: | ||
| Extracted text content as string | ||
| """ | ||
| if isinstance(output, list): | ||
| text_parts = [] | ||
| for part in output: | ||
| if hasattr(part, "type") and getattr(part, "type", None) == "text": | ||
| text_parts.append(getattr(part, "text", "")) | ||
| return " ".join(text_parts) if text_parts else "" | ||
| return output |
There was a problem hiding this comment.
Handle dict content parts and guarantee a str return.
Two gaps versus the existing _extract_text_from_content helper:
- Dict parts dropped: list elements are only inspected via
getattr, so dict-shaped content parts (common aftermodel_dump()/ raw API payloads) won't match and their text is silently lost._extract_text_from_contentalready handles the dict case. - Return type not guaranteed: the non-list branch does
return output, so aNone/non-string output is returned despite the-> strannotation, propagating a non-strintoToolResultSummary.content.
Proposed fix
if isinstance(output, list):
- text_parts = []
+ text_parts: list[str] = []
for part in output:
- if hasattr(part, "type") and getattr(part, "type", None) == "text":
- text_parts.append(getattr(part, "text", ""))
- return " ".join(text_parts) if text_parts else ""
- return output
+ if isinstance(part, dict):
+ if part.get("type") == "text" and part.get("text"):
+ text_parts.append(str(part["text"]))
+ continue
+ if getattr(part, "type", None) == "text":
+ text_value = getattr(part, "text", "")
+ if text_value:
+ text_parts.append(text_value)
+ return " ".join(text_parts)
+ return output if isinstance(output, str) else ""🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/utils/conversations.py` around lines 106 - 124,
_extract_function_output_content currently drops dict-shaped list elements and
can return non-string values; update it to mirror _extract_text_from_content:
when iterating a list, handle dict parts by checking keys like "type" and
fetching "text" or "content" fields (fall back to ""), treat objects with
attributes the same as today, and collect those text pieces; for the non-list
branch ensure you always return a string (e.g., return "" for None or otherwise
str(output)). Also ensure the function's behavior matches how
ToolResultSummary.content expects a str.
In llama-stack 0.7.0, conversations was split out into its own API. It must be explicitly added to the apis list for conversation storage (the openai_conversations table) to be created.
|
@tisnik bumping llama-stack to 0.7.0 is turning out to be a behemoth of a task with ALL the breaking changes in just one minor release. I addressed 5 of them, and there are still more e2e errors. I could keep going down this route, but doing the cost benefit analysis doesn't justify going down this route for this bug. Created #1818 as a workaround |
|
Oh also the Lightspeed-evaluation repo needs to be updated with 0.7.0 too otherwise the E2E Tests for Lightspeed Evaluation won't pass, so that'd be another additional behemoth of a task in that repo |
Description
Upgrades llama-stack from 0.6.0 to 0.7.0 to fix critical Vertex AI model name validation bug where queries fail with HTTP 500 when the optional
modelfield is omitted.See: https://redhat.atlassian.net/browse/LCORE-2282 for more info.
The fix ogx-ai/ogx#5169 is included in the 0.7.0 release.
Also note, breaking Changes Addressed for llama-stack 0.7.0:
responsesAPIinline::meta-referencetoinline::builtininline::rag-runtimetoinline::file-searchand toolgroups frombuiltin::ragtobuiltin::file_searchtool_groupsis now a separate API that must be explicitly enabledconversationsis now a separate API that must be explicitly enabledType of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit