Skip to content

feat: add ci:run-evals label support, missing scorers and parameterize evaluation reporting project for flexible CI execution#168

Merged
omkargaikwad23 merged 4 commits intomainfrom
evalbench-ci
May 5, 2026
Merged

feat: add ci:run-evals label support, missing scorers and parameterize evaluation reporting project for flexible CI execution#168
omkargaikwad23 merged 4 commits intomainfrom
evalbench-ci

Conversation

@omkargaikwad23
Copy link
Copy Markdown
Contributor

  • Pipeline Trigger Alignment: Updated cloudbuild.yaml to support the manual evaluation trigger label (ci:run-evals) for non-release branches and set the correct RELEASE_VERSION context.
  • Scorers Sync: Added missing skills_best_practices and skills_trajectory evaluation scorers to evals/run_config.yaml.
  • Tool Updates: Bumped gemini_cli_version to @google/gemini-cli@latest and enabled GEMINI_CLI_TRUST_WORKSPACE: "true" environment variable to ensure secure execution workspace trust in the automated sandbox environment.
  • Repository Labels: Appended the ci:run-evals label definition to .github/labels.yaml.

@omkargaikwad23 omkargaikwad23 requested review from a team as code owners May 5, 2026 07:13
@omkargaikwad23 omkargaikwad23 added the ci:run-evals Manually trigger the evaluation CI pipeline on a PR. label May 5, 2026
@omkargaikwad23 omkargaikwad23 changed the title feat: add ci:run-evals label support and parameterize evaluation reporting project for flexible CI execution feat: add ci:run-evals label support, missing scorers and parameterize evaluation reporting project for flexible CI execution May 5, 2026
@omkargaikwad23 omkargaikwad23 merged commit 69c0c82 into main May 5, 2026
11 checks passed
@omkargaikwad23 omkargaikwad23 deleted the evalbench-ci branch May 5, 2026 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:run-evals Manually trigger the evaluation CI pipeline on a PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants