Skip to content

LCORE-2282: Upgrade llama-stack to 0.7.0 to fix Vertex AI model name mismatch#1816

Draft
anik120 wants to merge 8 commits into
lightspeed-core:mainfrom
anik120:bump-llama-stack
Draft

LCORE-2282: Upgrade llama-stack to 0.7.0 to fix Vertex AI model name mismatch#1816
anik120 wants to merge 8 commits into
lightspeed-core:mainfrom
anik120:bump-llama-stack

Conversation

@anik120
Copy link
Copy Markdown
Contributor

@anik120 anik120 commented May 29, 2026

Description

Upgrades llama-stack from 0.6.0 to 0.7.0 to fix critical Vertex AI model name validation bug where queries fail with HTTP 500 when the optional model field is omitted.

See: https://redhat.atlassian.net/browse/LCORE-2282 for more info.

The fix ogx-ai/ogx#5169 is included in the 0.7.0 release.

Also note, breaking Changes Addressed for llama-stack 0.7.0:

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Chores
    • Regenerated dependency lockfiles and bumped several pins; bumped llama-stack packages to 0.7.0.
  • New Features / Configuration
    • Switched default API surface from "agents" to "responses" and replaced RAG with a file-search tool across examples and runtime configs; updated registered toolgroups.
  • Documentation
    • Expanded API schema with new "reasoning" response types and updated schema descriptions.
  • Maintenance
    • Increased max supported llama-stack version constant.
  • Tests
    • Updated tests to expect the builtin file-search toolgroup.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Warning

Review limit reached

@anik120, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 10 minutes and 34 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a6215acb-03d3-4bc1-ba7a-4c457cbfdf86

📥 Commits

Reviewing files that changed from the base of the PR and between 1d69e4a and 0f9edf5.

📒 Files selected for processing (11)
  • examples/run.yaml
  • run.yaml
  • tests/configuration/run.yaml
  • tests/e2e-prow/rhoai/configs/run.yaml
  • tests/e2e/configs/run-azure.yaml
  • tests/e2e/configs/run-bedrock.yaml
  • tests/e2e/configs/run-ci.yaml
  • tests/e2e/configs/run-rhaiis.yaml
  • tests/e2e/configs/run-rhelai.yaml
  • tests/e2e/configs/run-vertexai.yaml
  • tests/e2e/configs/run-watsonx.yaml

Walkthrough

This PR bumps llama-stack to 0.7.0, regenerates build and wheel lockfiles with updated pins, converts many example/test/run configs from agents/RAG → responses/file-search (provider wiring meta-reference → builtin), adds OpenAPI reasoning schemas, updates a constants value, extends response types, normalizes function-call outputs, and aligns unit tests and docker-compose.

Changes

Dependencies and Examples Update

Layer / File(s) Summary
Runtime dependency version updates
pyproject.toml
llama-stack, llama-stack-client, and llama-stack-api bumped from 0.6.00.7.0.
Constants bump
src/constants.py
MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION updated to 0.7.0.
Build dependencies regeneration
.konflux/requirements-build.txt
Build lockfile regenerated with many pin updates and added transitive packages; unsafe packages section expanded.
Wheel hashes and wheel lockfile updates
.konflux/requirements.hashes.wheel.txt
Wheel hash file updated (compile target, packaging bumped to 26.2); added/removed several pins and updated hashes.
APIs & tooling config changes
examples/*, run.yaml, tests/*/configs/*, tests/configuration/run.yaml
Multiple example and test run YAMLs switch apis: agentsresponses, replace RAG tool (rag-runtime) with file-search, and change provider wiring from meta-reference/inline::meta-reference to builtin/inline::builtin.
OpenAPI reasoning schemas
docs/openapi.json
New reasoning schemas and discriminator wiring added; several schema titles/descriptions updated and provider example entries switched to builtin.
API response example update & types
src/models/api/responses/successful/catalog.py, src/models/common/responses/types.py
ProvidersListResponse example changed to use provider_id: "builtin" / provider_type: "inline::builtin"; ResponseItem union extended to include ReasoningItem.
Conversation helper
src/utils/conversations.py
Added _extract_function_output_content and used it to normalize function_call_output tool result summaries.
Unit tests aligned
tests/unit/app/endpoints/test_tools.py
Unit tests updated to expect builtin::file_search / file-search toolgroup/provider identifiers and adjusted assertions/docstrings.
Docker compose
docker-compose.yaml
Added ./run.yaml bind mount into lightspeed-stack container at /app-root/run.yaml.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • tisnik
  • radofuchs
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: upgrading llama-stack to 0.7.0 to fix a Vertex AI model name mismatch issue (LCORE-2282).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pyproject.toml (1)

151-157: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale provider-type comments to builtin.

These comments still reference inline::meta-reference, which is outdated for llama-stack 0.7.0 and can mislead future dependency/provider checks.

Also applies to: 176-176

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pyproject.toml` around lines 151 - 157, Update the stale provider-type
comments that currently read "inline::meta-reference" to the new provider tag
"builtin" in pyproject.toml (the commented dependency lines and the surrounding
provider comment blocks); locate the comment instances around the API agents and
API eval sections (the lines showing "inline::meta-reference") and replace those
tokens with "builtin" including the additional occurrence noted near line 176 so
the provider/type metadata matches llama-stack 0.7.0 expectations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.konflux/requirements-build.txt:
- Around line 153-155: Resolve the conflicting exact pins by removing or
consolidating duplicate requirements for the same package (specifically
reconcile hatchling==1.26.3 vs hatchling==1.29.0 and setuptools==78.1.1 vs
setuptools==82.0.1): pick the intended single version (or replace with a
compatible range like >= and < if truly needed) and update the autogenerated
requirements file so only one entry per distribution remains; after editing,
regenerate the requirements file with the correct build Python version (the
header currently says Python 3.13) to ensure compatibility and deterministic
builds.

---

Outside diff comments:
In `@pyproject.toml`:
- Around line 151-157: Update the stale provider-type comments that currently
read "inline::meta-reference" to the new provider tag "builtin" in
pyproject.toml (the commented dependency lines and the surrounding provider
comment blocks); locate the comment instances around the API agents and API eval
sections (the lines showing "inline::meta-reference") and replace those tokens
with "builtin" including the additional occurrence noted near line 176 so the
provider/type metadata matches llama-stack 0.7.0 expectations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: db468d63-6687-4a4c-850f-d3bf3b58f40f

📥 Commits

Reviewing files that changed from the base of the PR and between d531855 and 733ab05.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • .konflux/requirements-build.txt
  • .konflux/requirements.hashes.source.txt
  • .konflux/requirements.hashes.wheel.txt
  • pyproject.toml
  • src/models/api/responses/successful/catalog.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-pr
  • GitHub Check: Pylinter
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/models/api/responses/successful/catalog.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/api/responses/successful/catalog.py
🧠 Learnings (2)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/api/responses/successful/catalog.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/api/responses/successful/catalog.py
🔇 Additional comments (3)
.konflux/requirements.hashes.wheel.txt (1)

2-2: LGTM!

Also applies to: 89-90, 109-110, 128-129, 199-200

.konflux/requirements-build.txt (1)

2-5: ⚡ Quick win

Align Python version used by pybuild-deps compile with the Konflux 3.12 target.

.konflux/requirements-build.txt is labeled “autogenerated by pip-compile with Python 3.13”, but scripts/konflux_requirements.sh pins --python-version 3.12 for the uv pip compile steps and then runs uv run pybuild-deps compile without any Python-version/interpreter pin. This can lead to resolver/marker skew across build artifacts—either explicitly run pybuild-deps compile under Python 3.12 (via uv run/config or a supported flag) or regenerate the file so its header and resolved markers match 3.12.

src/models/api/responses/successful/catalog.py (1)

217-218: LGTM!

Comment on lines +153 to 155
setuptools==78.1.1
# via oci
setuptools==82.0.1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify duplicate package pins with conflicting versions in build lockfile
python - <<'PY'
import re
from collections import defaultdict

path = ".konflux/requirements-build.txt"
pins = defaultdict(set)

with open(path, "r", encoding="utf-8") as f:
    for i, line in enumerate(f, 1):
        m = re.match(r'^([A-Za-z0-9_.-]+)==([^\s]+)\s*$', line.strip())
        if m:
            pins[m.group(1).lower()].add((m.group(2), i))

conflicts = {k: v for k, v in pins.items() if len({ver for ver, _ in v}) > 1}
if not conflicts:
    print("No conflicting duplicate pins found.")
else:
    print("Conflicting duplicate pins:")
    for pkg, versions in sorted(conflicts.items()):
        details = ", ".join(f"{ver} (Line {ln})" for ver, ln in sorted(versions, key=lambda x: x[1]))
        print(f"  {pkg}: {details}")
PY

Repository: lightspeed-core/lightspeed-stack

Length of output: 203


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file=".konflux/requirements-build.txt"

echo "=== File head (first 80 lines) ==="
nl -ba "$file" | head -n 80

echo
echo "=== Lines 120-190 ==="
nl -ba "$file" | sed -n '120,190p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 172


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file=".konflux/requirements-build.txt"

echo "=== Lines 1-80 ==="
cat -n "$file" | sed -n '1,80p'

echo
echo "=== Lines 140-175 ==="
cat -n "$file" | sed -n '140,175p'

echo
echo "=== Lines mentioning setuptools/hatchling (all occurrences) ==="
rg -n "^(setuptools|hatchling)==|setuptools==|hatchling==" "$file" || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 3577


Fix conflicting exact pins in .konflux/requirements-build.txt (autogenerated)

The file contains incompatible == pins for the same packages:

  • hatchling==1.26.3 (line 43) and hatchling==1.29.0 (line 48)
  • setuptools==78.1.1 (line 153) and setuptools==82.0.1 (line 155, “unsafe” section)

With two different == requirements for the same distribution, the requirements input is inherently conflicting (pip can’t satisfy both exact versions), breaking deterministic builds.

Also ensure regeneration/compatibility with the actual build Python version, since the header states it was generated with Python 3.13 (lines 2-6).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.konflux/requirements-build.txt around lines 153 - 155, Resolve the
conflicting exact pins by removing or consolidating duplicate requirements for
the same package (specifically reconcile hatchling==1.26.3 vs hatchling==1.29.0
and setuptools==78.1.1 vs setuptools==82.0.1): pick the intended single version
(or replace with a compatible range like >= and < if truly needed) and update
the autogenerated requirements file so only one entry per distribution remains;
after editing, regenerate the requirements file with the correct build Python
version (the header currently says Python 3.13) to ensure compatibility and
deterministic builds.

@anik120 anik120 changed the title LCORE-2282: Upgrade llama-stack to 0.7.0 to fix Vertex AI model name mismatch WIP: LCORE-2282: Upgrade llama-stack to 0.7.0 to fix Vertex AI model name mismatch May 29, 2026
@anik120 anik120 changed the title WIP: LCORE-2282: Upgrade llama-stack to 0.7.0 to fix Vertex AI model name mismatch LCORE-2282: Upgrade llama-stack to 0.7.0 to fix Vertex AI model name mismatch May 29, 2026
@anik120 anik120 force-pushed the bump-llama-stack branch 3 times, most recently from 3131604 to afb3a44 Compare May 29, 2026 18:15
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/azure-run.yaml (1)

145-148: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use a real Llama Guard model for provider_shield_id.

provider_shield_id: openai/gpt-4o-mini points at a chat model, not the guard model. That makes this shield look configured without actually validating requests through Llama Guard. Please switch this to the actual guard model identifier and mirror that fix in the sibling example configs touched by this PR.

Based on learnings: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set provider_shield_id to the guard model identifier and do not use a chat/generative model id.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/azure-run.yaml` around lines 145 - 148, The shield entry uses a chat
model id instead of the Llama Guard model id: update the shield with shield_id
"llama-guard" by replacing provider_shield_id "openai/gpt-4o-mini" with the
actual Llama Guard model identifier (use the guard model id required by your
provider), and apply the identical change to the sibling example configs touched
in this PR so every Llama Guard shield uses the guard model id rather than a
chat/generative model id.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/models/api/responses/successful/catalog.py`:
- Around line 217-218: The example API bucket key currently uses "agents" and
needs to be renamed to "responses" so the published OpenAPI example matches the
new config surface; update the example object key (the one that contains entries
like "provider_id": "builtin" and "provider_type": "inline::builtin") from
"agents" to "responses" in the catalog example so /providers displays the
correct name after migration.

In `@tests/configuration/run.yaml`:
- Around line 20-22: The config defines providers.responses
(provider_id/provider_type) but does not enable the responses API; update the
YAML to expose the responses API by enabling the responses feature flag
alongside the existing agents flag (i.e., add a top-level responses: true entry
next to agents: true) so providers.responses is actually active for the
migration; ensure the provider block (provider_id: builtin / provider_type:
inline::builtin) remains unchanged.

---

Outside diff comments:
In `@examples/azure-run.yaml`:
- Around line 145-148: The shield entry uses a chat model id instead of the
Llama Guard model id: update the shield with shield_id "llama-guard" by
replacing provider_shield_id "openai/gpt-4o-mini" with the actual Llama Guard
model identifier (use the guard model id required by your provider), and apply
the identical change to the sibling example configs touched in this PR so every
Llama Guard shield uses the guard model id rather than a chat/generative model
id.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4bc096b8-b60c-41a2-bdeb-3cd336012807

📥 Commits

Reviewing files that changed from the base of the PR and between 733ab05 and afb3a44.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (25)
  • .konflux/requirements-build.txt
  • .konflux/requirements.hashes.source.txt
  • .konflux/requirements.hashes.wheel.txt
  • examples/azure-run.yaml
  • examples/bedrock-run.yaml
  • examples/run.yaml
  • examples/vertexai-run.yaml
  • examples/vllm-rhaiis.yaml
  • examples/vllm-rhelai.yaml
  • examples/vllm-rhoai.yaml
  • examples/watsonx-run.yaml
  • pyproject.toml
  • run.yaml
  • src/constants.py
  • src/models/api/responses/successful/catalog.py
  • tests/configuration/run.yaml
  • tests/e2e-prow/rhoai/configs/run.yaml
  • tests/e2e/configs/run-azure.yaml
  • tests/e2e/configs/run-bedrock.yaml
  • tests/e2e/configs/run-ci.yaml
  • tests/e2e/configs/run-rhaiis.yaml
  • tests/e2e/configs/run-rhelai.yaml
  • tests/e2e/configs/run-vertexai.yaml
  • tests/e2e/configs/run-watsonx.yaml
  • tests/unit/app/endpoints/test_tools.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-pr
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/constants.py
  • src/models/api/responses/successful/catalog.py
src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

  • src/constants.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/api/responses/successful/catalog.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/app/endpoints/test_tools.py
🧠 Learnings (4)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/api/responses/successful/catalog.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/api/responses/successful/catalog.py
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

  • tests/e2e/configs/run-azure.yaml
  • tests/configuration/run.yaml
  • examples/vertexai-run.yaml
  • tests/e2e/configs/run-bedrock.yaml
  • tests/e2e/configs/run-ci.yaml
  • examples/azure-run.yaml
  • run.yaml
  • examples/run.yaml
  • examples/bedrock-run.yaml
  • tests/e2e/configs/run-vertexai.yaml
  • tests/e2e/configs/run-rhaiis.yaml
  • tests/e2e/configs/run-rhelai.yaml
  • examples/vllm-rhelai.yaml
  • examples/watsonx-run.yaml
  • examples/vllm-rhoai.yaml
  • tests/e2e/configs/run-watsonx.yaml
  • examples/vllm-rhaiis.yaml
  • tests/e2e-prow/rhoai/configs/run.yaml
📚 Learning: 2025-12-18T10:21:03.056Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:03.056Z
Learning: In run.yaml for Llama Stack 0.3.x, do not add a telemetry provider block under providers. To enable telemetry, set telemetry.enabled: true directly (no provider block required). This pattern applies specifically to run.yaml configuration in this repository.

Applied to files:

  • run.yaml
🔇 Additional comments (19)
.konflux/requirements-build.txt (1)

43-48: Duplicate exact pins remain unresolved.

hatchling and setuptools are still pinned to two conflicting exact versions in this generated file (Line 43/Line 48 and Line 153/Line 155), which matches the previously raised finding.

Also applies to: 153-155

pyproject.toml (1)

31-33: LGTM!

src/constants.py (1)

10-10: LGTM!

.konflux/requirements.hashes.wheel.txt (1)

2-2: LGTM!

Also applies to: 89-90, 109-110, 128-129, 199-200

examples/azure-run.yaml (1)

5-5: LGTM!

Also applies to: 53-55, 66-66, 75-76, 102-103, 158-159

examples/bedrock-run.yaml (1)

4-4: LGTM!

Also applies to: 59-61, 72-72, 81-82, 108-109, 157-158

examples/run.yaml (1)

10-10: LGTM!

Also applies to: 61-63, 74-74, 83-84, 110-111, 163-164

examples/vertexai-run.yaml (1)

5-5: LGTM!

Also applies to: 53-55, 66-66, 75-76, 102-103, 154-155

examples/vllm-rhaiis.yaml (1)

5-5: LGTM!

Also applies to: 55-56, 64-64, 73-74, 100-101, 142-143

examples/vllm-rhelai.yaml (1)

5-5: LGTM!

Also applies to: 55-56, 64-64, 73-74, 100-101, 142-143

examples/vllm-rhoai.yaml (1)

5-5: LGTM!

Also applies to: 55-56, 64-64, 73-74, 100-101, 142-143

examples/watsonx-run.yaml (1)

4-4: LGTM!

Also applies to: 61-63, 74-74, 83-84, 110-111, 166-167

run.yaml (1)

4-4: LGTM!

Also applies to: 54-56, 64-64, 73-74, 100-101, 139-142

tests/e2e-prow/rhoai/configs/run.yaml (1)

5-5: LGTM!

Also applies to: 56-58, 69-69, 78-79, 105-106, 161-162

tests/e2e/configs/run-azure.yaml (1)

5-5: LGTM!

Also applies to: 62-64, 69-69, 78-79, 105-106, 154-155

tests/e2e/configs/run-bedrock.yaml (1)

5-5: LGTM!

Also applies to: 60-62, 67-67, 76-77, 103-104, 152-153

tests/e2e/configs/run-ci.yaml (1)

5-5: LGTM!

Also applies to: 55-57, 62-62, 71-72, 98-99, 143-144

tests/e2e/configs/run-rhaiis.yaml (1)

5-5: LGTM!

Also applies to: 62-64, 69-69, 78-79, 105-106, 154-155

tests/e2e/configs/run-rhelai.yaml (1)

5-5: LGTM!

Also applies to: 62-64, 69-69, 78-79, 105-106, 154-155

Comment on lines +217 to +218
"provider_id": "builtin",
"provider_type": "inline::builtin",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Rename the example API bucket from agents to responses.

This example still nests the builtin provider under "agents", so the /providers OpenAPI example will advertise the old API name after the 0.7.0 migration. Update the example key at Line 215 to "responses" so the published schema matches the new config surface.

Suggested fix
-                        "agents": [
+                        "responses": [
                             {
                                 "provider_id": "builtin",
                                 "provider_type": "inline::builtin",
                             },
                         ],
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"provider_id": "builtin",
"provider_type": "inline::builtin",
"responses": [
{
"provider_id": "builtin",
"provider_type": "inline::builtin",
},
],
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/api/responses/successful/catalog.py` around lines 217 - 218, The
example API bucket key currently uses "agents" and needs to be renamed to
"responses" so the published OpenAPI example matches the new config surface;
update the example object key (the one that contains entries like "provider_id":
"builtin" and "provider_type": "inline::builtin") from "agents" to "responses"
in the catalog example so /providers displays the correct name after migration.

Comment thread tests/configuration/run.yaml
@anik120 anik120 force-pushed the bump-llama-stack branch 3 times, most recently from fdaef5d to f034340 Compare May 29, 2026 18:51
…mismatch

Upgrades llama-stack from 0.6.0 to 0.7.0 to fix critical Vertex AI model name validation bug
where queries fail with HTTP 500 when the optional `model` field is omitted.

See: https://redhat.atlassian.net/browse/LCORE-2282 for more info.

The fix ogx-ai/ogx#5169 is included in
the [0.7.0](https://github.com/ogx-ai/ogx/releases/tag/v0.7.0) release.

Breaking Changes Addressed in llama-stack 0.7.0:
- **Agents API → Responses API** (#5195): Updated all configuration files to use the new `responses` API
- **Provider rename: meta-reference → builtin** (#5131): Updated all provider configurations from `inline::meta-reference` to `inline::builtin`
- **RAG refactor: rag-runtime → file-search** (#5186): Updated tool_runtime providers from `inline::rag-runtime` to `inline::file-search` and toolgroups from `builtin::rag` to `builtin::file_search`
@anik120 anik120 force-pushed the bump-llama-stack branch from f034340 to b676a16 Compare May 29, 2026 18:54
anik120 added 5 commits May 29, 2026 15:07
The lightspeed-stack service needs access to run.yaml which contains
the llama-stack configuration (APIs, providers, etc.). Without this
mount, the container uses a stale copy baked into the image, causing
failures when testing breaking changes like agents → responses API.
@anik120 anik120 force-pushed the bump-llama-stack branch from b676a16 to 4209b21 Compare May 29, 2026 19:08
In llama-stack 0.7.0, tool_groups was split out from tool_runtime
into its own API. It must be explicitly added to the apis list
for toolgroup registration/listing to work in library mode.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
examples/vllm-rhelai.yaml (1)

4-14: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

tool_groups API missing from apis while a toolgroup is registered.

Same issue as flagged in examples/vertexai-run.yaml: this config registers builtin::file_search (Lines 141-143) but omits tool_groups from the apis list, unlike the other migrated configs in this PR. Add tool_groups to keep wiring consistent and ensure the toolgroup is loaded.

🔧 Proposed fix
 - tool_runtime
+- tool_groups
 - vector_io
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/vllm-rhelai.yaml` around lines 4 - 14, The apis list in
examples/vllm-rhelai.yaml omits tool_groups while a toolgroup is registered
(builtin::file_search); update the apis array to include "tool_groups" so the
toolgroup wiring loads properly—locate the apis block and add "tool_groups"
alongside the other entries (ensuring consistency with other configs like
examples/vertexai-run.yaml) so the registered builtin::file_search is picked up
at startup.
examples/watsonx-run.yaml (1)

3-13: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

tool_groups API missing from apis while a toolgroup is registered.

Same issue as flagged in examples/vertexai-run.yaml: this config registers builtin::file_search (Lines 165-167) but omits tool_groups from the apis list, unlike the other migrated configs in this PR. Add tool_groups for consistency and to ensure the toolgroup loads.

🔧 Proposed fix
 - tool_runtime
+- tool_groups
 - vector_io
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/watsonx-run.yaml` around lines 3 - 13, The apis list is missing the
tool_groups entry while a toolgroup (builtin::file_search) is registered; update
the apis array in examples/watsonx-run.yaml to include "tool_groups" alongside
the other entries so the registered toolgroup loads correctly and matches the
other migrated configs.
tests/e2e/configs/run-rhelai.yaml (1)

146-149: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

provider_shield_id uses a chat model id rather than a guard model.

As in the other config, provider_shield_id: openai/gpt-4o-mini is a generative chat model; this does not confirm the Llama Guard shield is gating queries. Use a real guard model identifier and ensure the e2e coverage exercises an actual Llama Guard model so shield effectiveness is verified. Line is outside the diff—flagging for confirmation.

Based on learnings: in Llama-stack config YAMLs, set provider_shield_id to the guard model identifier and not a chat/generative model id, and ensure E2E tests exercise a real Llama Guard model to verify the shield is effective.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/configs/run-rhelai.yaml` around lines 146 - 149, The YAML uses a
chat/generative model id for provider_shield_id (provider_shield_id:
openai/gpt-4o-mini) which won't exercise the Llama Guard shield; update the
shield entry (shield_id: llama-guard, provider_id: llama-guard) to reference a
real Llama Guard model identifier (replace provider_shield_id with the guard
model's ID used by your deployment) and update any related test setup so the E2E
flow sends inputs that should be gated (e.g., a disallowed prompt) to confirm
the shield actually blocks/flags requests; verify shield behavior in the test
assertions and adjust any mock/provider mappings that reference
provider_shield_id accordingly.
examples/vllm-rhoai.yaml (1)

134-137: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

provider_shield_id points at a chat model, not a guard model.

provider_shield_id: openai/gpt-4o-mini is a generative chat model id; using it here means only an override landed and does not guarantee the Llama Guard shield is actually gating queries. Prefer a real guard model identifier (e.g. meta-llama/Llama-Guard-3-8B). This line is outside the diff, so flagging for confirmation rather than as a regression of this PR.

Based on learnings: in Llama-stack config YAMLs, set provider_shield_id to the guard model identifier and not a chat/generative model id, since a chat-model id indicates only an override landed and does not mean the safety shield is effective.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/vllm-rhoai.yaml` around lines 134 - 137, The shields entry uses a
chat/generative model id in provider_shield_id which only acts as an override
and may not enable the Llama Guard shield; change the provider_shield_id value
under the shields block (with shield_id: llama-guard and provider_id:
llama-guard) to a real guard model identifier (for example
meta-llama/Llama-Guard-3-8B) so the guard model is actually used for gating
rather than a chat model override.
♻️ Duplicate comments (1)
src/models/api/responses/successful/catalog.py (1)

215-220: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Rename the example API bucket agentsresponses.

The provider fields were updated to builtin/inline::builtin, but the surrounding key at Line 215 is still "agents". After the 0.7.0 Agents→Responses migration the published /providers OpenAPI example will advertise the stale API name.

Suggested fix
-                        "agents": [
+                        "responses": [
                             {
                                 "provider_id": "builtin",
                                 "provider_type": "inline::builtin",
                             },
                         ],
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/api/responses/successful/catalog.py` around lines 215 - 220,
Update the example API payload in src/models/api/responses/successful/catalog.py
by renaming the top-level key "agents" to "responses" so the OpenAPI example
matches the post-0.7.0 Agents→Responses migration; locate the JSON/example dict
literal that currently contains the "agents" array (with provider entries using
"provider_id": "builtin" and "provider_type": "inline::builtin") and change the
key to "responses" so the published /providers example advertises the correct
API name.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/openapi.json`:
- Around line 1691-1695: The docs/openapi.json file is out of sync with the
generated OpenAPI schema; regenerate the spec by running the provided generation
script so CI passes: run the command shown in the CI output (uv run python
scripts/generate_openapi_schema.py docs/openapi.json) to recreate
docs/openapi.json from the source schema (scripts/generate_openapi_schema.py)
rather than editing the file manually, then commit the regenerated file; repeat
for other affected ranges if necessary.

In `@src/utils/conversations.py`:
- Around line 106-124: _extract_function_output_content currently drops
dict-shaped list elements and can return non-string values; update it to mirror
_extract_text_from_content: when iterating a list, handle dict parts by checking
keys like "type" and fetching "text" or "content" fields (fall back to ""),
treat objects with attributes the same as today, and collect those text pieces;
for the non-list branch ensure you always return a string (e.g., return "" for
None or otherwise str(output)). Also ensure the function's behavior matches how
ToolResultSummary.content expects a str.

---

Outside diff comments:
In `@examples/vllm-rhelai.yaml`:
- Around line 4-14: The apis list in examples/vllm-rhelai.yaml omits tool_groups
while a toolgroup is registered (builtin::file_search); update the apis array to
include "tool_groups" so the toolgroup wiring loads properly—locate the apis
block and add "tool_groups" alongside the other entries (ensuring consistency
with other configs like examples/vertexai-run.yaml) so the registered
builtin::file_search is picked up at startup.

In `@examples/vllm-rhoai.yaml`:
- Around line 134-137: The shields entry uses a chat/generative model id in
provider_shield_id which only acts as an override and may not enable the Llama
Guard shield; change the provider_shield_id value under the shields block (with
shield_id: llama-guard and provider_id: llama-guard) to a real guard model
identifier (for example meta-llama/Llama-Guard-3-8B) so the guard model is
actually used for gating rather than a chat model override.

In `@examples/watsonx-run.yaml`:
- Around line 3-13: The apis list is missing the tool_groups entry while a
toolgroup (builtin::file_search) is registered; update the apis array in
examples/watsonx-run.yaml to include "tool_groups" alongside the other entries
so the registered toolgroup loads correctly and matches the other migrated
configs.

In `@tests/e2e/configs/run-rhelai.yaml`:
- Around line 146-149: The YAML uses a chat/generative model id for
provider_shield_id (provider_shield_id: openai/gpt-4o-mini) which won't exercise
the Llama Guard shield; update the shield entry (shield_id: llama-guard,
provider_id: llama-guard) to reference a real Llama Guard model identifier
(replace provider_shield_id with the guard model's ID used by your deployment)
and update any related test setup so the E2E flow sends inputs that should be
gated (e.g., a disallowed prompt) to confirm the shield actually blocks/flags
requests; verify shield behavior in the test assertions and adjust any
mock/provider mappings that reference provider_shield_id accordingly.

---

Duplicate comments:
In `@src/models/api/responses/successful/catalog.py`:
- Around line 215-220: Update the example API payload in
src/models/api/responses/successful/catalog.py by renaming the top-level key
"agents" to "responses" so the OpenAPI example matches the post-0.7.0
Agents→Responses migration; locate the JSON/example dict literal that currently
contains the "agents" array (with provider entries using "provider_id":
"builtin" and "provider_type": "inline::builtin") and change the key to
"responses" so the published /providers example advertises the correct API name.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1e6b0ce9-c05d-40fa-987f-c4521ad45957

📥 Commits

Reviewing files that changed from the base of the PR and between afb3a44 and 1d69e4a.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (29)
  • .konflux/requirements-build.txt
  • .konflux/requirements.hashes.source.txt
  • .konflux/requirements.hashes.wheel.txt
  • docker-compose.yaml
  • docs/openapi.json
  • examples/azure-run.yaml
  • examples/bedrock-run.yaml
  • examples/run.yaml
  • examples/vertexai-run.yaml
  • examples/vllm-rhaiis.yaml
  • examples/vllm-rhelai.yaml
  • examples/vllm-rhoai.yaml
  • examples/watsonx-run.yaml
  • pyproject.toml
  • run.yaml
  • src/constants.py
  • src/models/api/responses/successful/catalog.py
  • src/models/common/responses/types.py
  • src/utils/conversations.py
  • tests/configuration/run.yaml
  • tests/e2e-prow/rhoai/configs/run.yaml
  • tests/e2e/configs/run-azure.yaml
  • tests/e2e/configs/run-bedrock.yaml
  • tests/e2e/configs/run-ci.yaml
  • tests/e2e/configs/run-rhaiis.yaml
  • tests/e2e/configs/run-rhelai.yaml
  • tests/e2e/configs/run-vertexai.yaml
  • tests/e2e/configs/run-watsonx.yaml
  • tests/unit/app/endpoints/test_tools.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: check
  • GitHub Check: Pylinter
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: build-pr
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/constants.py
  • src/models/api/responses/successful/catalog.py
  • src/models/common/responses/types.py
  • src/utils/conversations.py
src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

  • src/constants.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/api/responses/successful/catalog.py
  • src/models/common/responses/types.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/app/endpoints/test_tools.py
🧠 Learnings (4)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/api/responses/successful/catalog.py
  • src/models/common/responses/types.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/api/responses/successful/catalog.py
  • src/models/common/responses/types.py
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

  • tests/configuration/run.yaml
  • docker-compose.yaml
  • examples/azure-run.yaml
  • tests/e2e/configs/run-vertexai.yaml
  • examples/bedrock-run.yaml
  • tests/e2e/configs/run-watsonx.yaml
  • tests/e2e/configs/run-azure.yaml
  • tests/e2e/configs/run-rhaiis.yaml
  • run.yaml
  • examples/run.yaml
  • tests/e2e/configs/run-ci.yaml
  • tests/e2e/configs/run-bedrock.yaml
  • examples/vertexai-run.yaml
  • tests/e2e-prow/rhoai/configs/run.yaml
  • examples/vllm-rhelai.yaml
  • examples/watsonx-run.yaml
  • examples/vllm-rhaiis.yaml
  • examples/vllm-rhoai.yaml
  • tests/e2e/configs/run-rhelai.yaml
📚 Learning: 2025-12-18T10:21:03.056Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:03.056Z
Learning: In run.yaml for Llama Stack 0.3.x, do not add a telemetry provider block under providers. To enable telemetry, set telemetry.enabled: true directly (no provider block required). This pattern applies specifically to run.yaml configuration in this repository.

Applied to files:

  • run.yaml
🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt
docs/openapi.json

[error] 1-1: CI check failed: docs/openapi.json is out of date (diff between docs/openapi.json and /tmp/openapi-generated.json). Regenerate with: uv run python scripts/generate_openapi_schema.py docs/openapi.json

🔇 Additional comments (38)
src/constants.py (1)

10-10: LGTM!

tests/configuration/run.yaml (2)

46-47: LGTM!

Also applies to: 89-93, 143-144


21-31: ⚡ Quick win

Confirm inline::builtin Responses provider schema for persistence.agent_state
In tests/configuration/run.yaml (lines 21-31), the inline::builtin responses provider config sets both persistence.agent_state (namespace/backend) and persistence.responses (table_name/backend). Llama Stack v0.7.0 docs for inline::builtin reference persistence.responses.*, but don’t list persistence.agent_state; if that key isn’t part of the provider’s schema, config validation at startup would fail. Confirm whether persistence.agent_state is supported by the inline::builtin responses provider (or belongs under a different provider/store config).

docker-compose.yaml (1)

87-87: LGTM!

src/utils/conversations.py (1)

283-283: LGTM!

examples/azure-run.yaml (1)

5-5: LGTM!

Also applies to: 53-55, 66-76, 102-103, 158-159

pyproject.toml (1)

31-33: ⚡ Quick win

Confirm 0.7.0 exists for all pinned packages (and flag compatibility as dependency-driven).

PyPI lists 0.7.0 for all three: llama-stack, llama-stack-client, and llama-stack-api, so the pins are not referencing a non-existent release. Dependency compatibility still depends on what llama-stack-client==0.7.0 / llama-stack-api==0.7.0 declare in their requires_dist for llama-stack.

src/models/common/responses/types.py (1)

39-41: ⚡ Quick win

Check OpenAIResponseOutputMessageReasoningItem is exported by llama_stack_api.openai_responses in the pinned llama_stack_api==0.7.0.

src/models/common/responses/types.py imports OpenAIResponseOutputMessageReasoningItem as ReasoningItem and adds it to ResponseItem; if that symbol isn’t actually exported under that exact name in 0.7.0, the module will raise an ImportError at load time.

.konflux/requirements-build.txt (1)

43-48: 🏗️ Heavy lift

Conflicting exact pins remain for the same distributions.

hatchling==1.26.3 (Line 43) and hatchling==1.29.0 (Line 48) — plus setuptools==78.1.1 (Line 153) and setuptools==82.0.1 (Line 155) — are mutually unsatisfiable == constraints that break deterministic resolution. Regenerate this autogenerated file so each distribution has a single pin.

tests/e2e/configs/run-watsonx.yaml (2)

62-69: ⚡ Quick win

file-search enabled with empty vector_io — same concern as run-vertexai.yaml.

Verify file search resolves a vector backend given vector_io: [] while default_provider_id: faiss is set.


5-5: LGTM!

Also applies to: 14-14, 70-80, 106-107, 150-152

tests/e2e/configs/run-azure.yaml (2)

62-69: ⚡ Quick win

file-search enabled with empty vector_io — same concern as run-vertexai.yaml.


5-5: LGTM!

Also applies to: 14-14, 70-80, 106-107, 155-156

tests/e2e/configs/run-rhaiis.yaml (2)

62-69: ⚡ Quick win

file-search enabled with empty vector_io — same concern as run-vertexai.yaml.


5-5: LGTM!

Also applies to: 14-14, 70-80, 106-107, 155-156

tests/e2e/configs/run-vertexai.yaml (2)

69-79: LGTM!

Also applies to: 105-106


62-68: ⚡ Quick win

file-search enabled while vector_io is empty — verify runtime vector backend wiring

run-vertexai.yaml enables an inline file-search tool but declares vector_io: [] (lines 62-68). If this e2e config is used standalone (or vector_io: [] overrides/doesn’t inherit the repo’s base run.yaml), then the file-search implementation may have no registered vector backend even though the repo’s run.yaml configures vector_io with FAISS (provider_id: faiss, provider_type: inline::faiss).
Confirm the e2e config composition/override behavior and that vector_stores.default_provider_id: faiss (line 153) resolves to an actual vector_io provider during the test run; otherwise file-search calls can fail at runtime.

examples/bedrock-run.yaml (2)

59-61: LGTM!

Also applies to: 72-82, 108-109, 157-158


3-13: ⚡ Quick win

Add tool_groups to the apis list in examples/bedrock-run.yaml
The apis list excerpt (lines 3-13) omits tool_groups, but this config registers a builtin::file_search toolgroup later (around lines 157-158). If tool_groups isn’t advertised, the file_search toolgroup may not be served.

🔧 Proposed fix
 - tool_runtime
+- tool_groups
 - vector_io
.konflux/requirements.hashes.wheel.txt (1)

1-2: LGTM!

run.yaml (2)

3-13: LGTM!

Also applies to: 55-57, 101-102, 140-144


65-75: ⚡ Quick win

Validate persistence.agent_state support in inline::builtin Responses (0.7.0)

Docs/code for inline::builtin Responses persistence describe persistence.responses.* (e.g., table_name, backend, etc.) and persist via a responses_store; they don’t mention an agent_state/agents_state key for the built-in Responses provider. run.yaml lines 65-75 includes both:

persistence:
  agent_state:
    namespace: agents_state
    backend: kv_default
  responses:
    table_name: agents_responses
    backend: sql_default

Confirm persistence.agent_state is actually accepted by the inline::builtin Responses provider; if not, it likely belongs under the inline Agents provider config instead of here.

examples/run.yaml (2)

9-20: LGTM!


62-64: LGTM!

Also applies to: 75-85, 111-112, 164-165

tests/unit/app/endpoints/test_tools.py (2)

505-505: LGTM!

Also applies to: 538-538, 613-613, 664-664


911-914: LGTM!

Also applies to: 968-971, 991-991, 1010-1013, 1083-1084, 1108-1108, 1128-1131

tests/e2e/configs/run-ci.yaml (1)

4-15: LGTM!

Also applies to: 56-58, 63-73, 99-100, 143-145

tests/e2e/configs/run-bedrock.yaml (1)

4-15: LGTM!

Also applies to: 61-63, 68-78, 104-105, 153-154

examples/vertexai-run.yaml (2)

53-55: LGTM!

Also applies to: 66-76, 102-103, 154-155


4-14: ⚡ Quick win

Refactor the concern: tool_groups in apis is likely not required for llama-stack 0.7.0
In llama-stack 0.7.0+, the ToolGroups API (/v1/toolgroups) is removed and tool groups are auto-registered based on configured tool_runtime providers (via provider toolgroup_id). So this config’s omission of tool_groups from apis is likely expected, and adding it may be unnecessary/incorrect. The actionable check is whether the configured tool_runtime provider for builtin::file_search maps/registers the tool group (e.g., correct toolgroup_id).

tests/e2e-prow/rhoai/configs/run.yaml (1)

4-15: LGTM!

Also applies to: 57-59, 70-80, 106-107, 162-163

examples/vllm-rhelai.yaml (1)

55-56: LGTM!

Also applies to: 64-74, 100-101, 142-143

examples/watsonx-run.yaml (1)

61-63: LGTM!

Also applies to: 74-84, 110-111, 166-167

examples/vllm-rhaiis.yaml (1)

5-5: LGTM!

Also applies to: 55-56, 64-74, 100-101, 142-143

examples/vllm-rhoai.yaml (2)

55-56: LGTM!

Also applies to: 64-74, 100-101, 142-143


4-14: ⚡ Quick win

Add tool_groups to apis when registering registered_resources.tool_groups (file_search)

examples/vllm-rhoai.yaml’s apis list shown here omits tool_groups, even though this config registers registered_resources.tool_groups for the file-search capability in the original report. Please confirm that llama-stack 0.7.0 requires tool_groups to be declared in apis for tool group registration to take effect, then align examples/vllm-rhoai.yaml with the corresponding e2e run config(s) that include - tool_groups.

tests/e2e/configs/run-rhelai.yaml (2)

63-65: LGTM!

Also applies to: 70-80, 106-107, 155-156


4-15: ⚡ Quick win

Confirm tool_groups is a valid top-level apis entry in llama-stack 0.7.0 and make run-rhelai.yaml consistent with the sibling configs.

tests/e2e/configs/run-rhelai.yaml adds - tool_groups under apis, while the sibling examples/vllm-rhoai.yaml omits it. If apis only supports composite APIs (and ToolGroups are auto-registered from configured tool_runtime providers), then tool_groups here is either redundant or may fail config validation at startup—so reconcile the two configs based on the repo’s config schema/validator expectations.

Comment thread docs/openapi.json
Comment on lines 1691 to +1695
"providers": {
"agents": [
{
"provider_id": "meta-reference",
"provider_type": "inline::meta-reference"
"provider_id": "builtin",
"provider_type": "inline::builtin"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Regenerate the OpenAPI spec to fix CI failure.

The pipeline failure indicates this file is out of sync with the generated output. This is an auto-generated file that should not be manually edited.

Run the regeneration command from the CI output:

uv run python scripts/generate_openapi_schema.py docs/openapi.json

The schema changes themselves (provider rename to builtin, new reasoning schemas, discriminator wiring) appear correct and align with the llama-stack 0.7.0 upgrade.

Also applies to: 14836-14840, 14891-14895, 15264-15268, 15706-15710, 15933-16042, 16145-16162, 17031-17035, 18074-18079, 18468-18473, 18482-18485, 19298-19301

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/openapi.json` around lines 1691 - 1695, The docs/openapi.json file is
out of sync with the generated OpenAPI schema; regenerate the spec by running
the provided generation script so CI passes: run the command shown in the CI
output (uv run python scripts/generate_openapi_schema.py docs/openapi.json) to
recreate docs/openapi.json from the source schema
(scripts/generate_openapi_schema.py) rather than editing the file manually, then
commit the regenerated file; repeat for other affected ranges if necessary.

Comment on lines +106 to +124
def _extract_function_output_content(output: Any) -> str:
"""Extract string content from function call output.

In llama-stack 0.7.0+, function output can be either a string or a list of content parts.
This helper extracts text from both formats.

Args:
output: Function output (str or list of content parts)

Returns:
Extracted text content as string
"""
if isinstance(output, list):
text_parts = []
for part in output:
if hasattr(part, "type") and getattr(part, "type", None) == "text":
text_parts.append(getattr(part, "text", ""))
return " ".join(text_parts) if text_parts else ""
return output
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle dict content parts and guarantee a str return.

Two gaps versus the existing _extract_text_from_content helper:

  • Dict parts dropped: list elements are only inspected via getattr, so dict-shaped content parts (common after model_dump() / raw API payloads) won't match and their text is silently lost. _extract_text_from_content already handles the dict case.
  • Return type not guaranteed: the non-list branch does return output, so a None/non-string output is returned despite the -> str annotation, propagating a non-str into ToolResultSummary.content.
Proposed fix
     if isinstance(output, list):
-        text_parts = []
+        text_parts: list[str] = []
         for part in output:
-            if hasattr(part, "type") and getattr(part, "type", None) == "text":
-                text_parts.append(getattr(part, "text", ""))
-        return " ".join(text_parts) if text_parts else ""
-    return output
+            if isinstance(part, dict):
+                if part.get("type") == "text" and part.get("text"):
+                    text_parts.append(str(part["text"]))
+                continue
+            if getattr(part, "type", None) == "text":
+                text_value = getattr(part, "text", "")
+                if text_value:
+                    text_parts.append(text_value)
+        return " ".join(text_parts)
+    return output if isinstance(output, str) else ""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/conversations.py` around lines 106 - 124,
_extract_function_output_content currently drops dict-shaped list elements and
can return non-string values; update it to mirror _extract_text_from_content:
when iterating a list, handle dict parts by checking keys like "type" and
fetching "text" or "content" fields (fall back to ""), treat objects with
attributes the same as today, and collect those text pieces; for the non-list
branch ensure you always return a string (e.g., return "" for None or otherwise
str(output)). Also ensure the function's behavior matches how
ToolResultSummary.content expects a str.

In llama-stack 0.7.0, conversations was split out into its own API.
It must be explicitly added to the apis list for conversation storage
(the openai_conversations table) to be created.
@anik120 anik120 force-pushed the bump-llama-stack branch from b42a632 to 0f9edf5 Compare May 29, 2026 20:21
@anik120
Copy link
Copy Markdown
Contributor Author

anik120 commented May 29, 2026

@tisnik bumping llama-stack to 0.7.0 is turning out to be a behemoth of a task with ALL the breaking changes in just one minor release. I addressed 5 of them, and there are still more e2e errors. I could keep going down this route, but doing the cost benefit analysis doesn't justify going down this route for this bug. Created #1818 as a workaround

@anik120
Copy link
Copy Markdown
Contributor Author

anik120 commented May 29, 2026

Oh also the Lightspeed-evaluation repo needs to be updated with 0.7.0 too otherwise the E2E Tests for Lightspeed Evaluation won't pass, so that'd be another additional behemoth of a task in that repo

@anik120 anik120 marked this pull request as draft May 29, 2026 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant