Skip to content

Commit dacee17

Browse files
ci: update evalbench pipeline trigger, sync scorers, and bump gemini-cli
- **Pipeline Trigger Alignment:** Updated `cloudbuild.yaml` to support the manual evaluation trigger label (`ci:run-evals`) for non-release branches and set the correct `RELEASE_VERSION` context. - **Scorers Sync:** Added missing `skills_best_practices` and `skills_trajectory` evaluation scorers to `evals/run_config.yaml`. - **Tool Updates:** Bumped `gemini_cli_version` to `@google/gemini-cli@latest` and enabled `GEMINI_CLI_TRUST_WORKSPACE: "true"` environment variable to ensure secure execution workspace trust in the automated sandbox environment. - **Repository Labels:** Appended the `ci:run-evals` label definition to `.github/labels.yaml`.
1 parent c6e37b1 commit dacee17

4 files changed

Lines changed: 23 additions & 12 deletions

File tree

.github/labels.yaml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,4 +83,8 @@
8383

8484
- name: 'release-please:force-run'
8585
color: bdca82
86-
description: Manually trigger the release please workflow on a PR.
86+
description: Manually trigger the release please workflow on a PR.
87+
88+
- name: 'ci:run-evals'
89+
color: 4285f4
90+
description: Manually trigger the evaluation CI pipeline on a PR.

cloudbuild.yaml

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -27,12 +27,7 @@ steps:
2727
- |
2828
set -e
2929
30-
# Only run on release branches
31-
if [[ "$_HEAD_BRANCH" != release-please-* ]]; then
32-
echo "Not a release-please branch. Exiting."
33-
exit 0
34-
fi
35-
echo "Release branch detected. Fetching PR data from GitHub API..."
30+
echo "Fetching PR data from GitHub API..."
3631
3732
# Fetch PR data and status code
3833
HTTP_STATUS=$(curl -s -o pr_data.json -w "%{http_code}" -H "Authorization: token $$GITHUB_TOKEN" \
@@ -50,15 +45,22 @@ steps:
5045
PR_LABELS=$(echo "$$PR_DATA" | jq -r '[.labels[].name] | join(",")')
5146
PR_TITLE=$(echo "$$PR_DATA" | jq -r '.title')
5247
53-
# Determine Release Version (Use double quotes and $$ for bash variables)
54-
if [[ "$$PR_LABELS" == *"autorelease: pending"* ]]; then
48+
# Check if execution labels are present
49+
if [[ "$$PR_LABELS" != *"autorelease: pending"* && "$$PR_LABELS" != *"ci:run-evals"* ]]; then
50+
echo "PR does not have 'autorelease: pending' or 'ci:run-evals' label. Skipping execution."
51+
exit 0
52+
fi
53+
echo "Execution label detected. Processing release version context..."
54+
55+
# Determine Release Version based on branch name
56+
if [[ "$_HEAD_BRANCH" == release-please-* ]]; then
5557
if [[ "$$PR_TITLE" =~ release\ ([0-9]+\.[0-9]+\.[0-9]+) ]]; then
5658
export RELEASE_VERSION="$${BASH_REMATCH[1]}"
5759
else
58-
export RELEASE_VERSION="unknown"
60+
export RELEASE_VERSION="pr-$_PR_NUMBER-release-unknown"
5961
fi
6062
else
61-
export RELEASE_VERSION="unknown"
63+
export RELEASE_VERSION="pr-$_PR_NUMBER-ci-run-evals"
6264
fi
6365
6466
# Workaround for evalbench bug: settings are only applied if path basename matches extension ID

evals/model_config.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,13 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
gemini_cli_version: "@google/gemini-cli@0.38.1"
15+
gemini_cli_version: "@google/gemini-cli@latest"
1616
generator: gemini_cli
1717
env:
1818
GOOGLE_CLOUD_PROJECT: "${GOOGLE_CLOUD_PROJECT}"
1919
GOOGLE_CLOUD_LOCATION: "global"
2020
GOOGLE_GENAI_USE_VERTEXAI: "true"
21+
GEMINI_CLI_TRUST_WORKSPACE: "true"
2122
setup:
2223
extensions:
2324
# Points to the symlink created in cloudbuild.yaml to match the extension ID

evals/run_config.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,16 @@ scorers:
2525
model_config: /workspace/evals/gemini_2.5_pro_model.yaml
2626
behavioral_metrics:
2727
model_config: /workspace/evals/gemini_2.5_pro_model.yaml
28+
skills_best_practices:
29+
model_config: /workspace/evals/gemini_2.5_pro_model.yaml
30+
skills_dir: /workspace/bigquery-data-analytics/skills
2831

2932
# Performance
3033
turn_count: {}
3134
end_to_end_latency: {}
3235
tool_call_latency: {}
3336
token_consumption: {}
37+
skills_trajectory: {}
3438

3539
reporting:
3640
bigquery:

0 commit comments

Comments
 (0)