Skip to content

Add Claude AI test failure analysis to Slack notifications#3381

Open
robbycochran wants to merge 35 commits into
masterfrom
add-test-analysis-job
Open

Add Claude AI test failure analysis to Slack notifications#3381
robbycochran wants to merge 35 commits into
masterfrom
add-test-analysis-job

Conversation

@robbycochran
Copy link
Copy Markdown
Collaborator

Summary

Adds AI-powered test failure analysis that runs within the same workflow and includes intelligent insights in Slack notifications.

New Architecture

Instead of a separate workflow_run, this adds an analyze-failures job to the existing test workflows:

Test Workflow
 ├── amd64-integration-tests → (may fail)
 ├── arm64-integration-tests → (may fail)
 ├── s390x-integration-tests → (may fail)
 ├── ppc64le-integration-tests → (may fail)
 │
 ├── analyze-failures (runs if any fail)
 │    ├── Downloads all artifacts
 │    ├── Parses JUnit XML
 │    ├── Calls Claude AI
 │    └── Uploads analysis-report.md
 │
 └── notify (waits for analysis)
      ├── Downloads analysis-report.md
      └── Posts to Slack with insights

What Changed

Modified Workflows

  • .github/workflows/integration-tests.yml

    • Added analyze-failures job before notify
    • Modified notify to download and include analysis report
    • Graceful fallback if analysis unavailable
  • .github/workflows/unit-tests.yml

    • Same pattern for unit test failures

New Files

  • .github/scripts/analyze_test_failures.py

    • Parses JUnit XML test reports
    • Extracts failure patterns
    • Calls Claude via Vertex AI
    • Generates markdown report
  • .github/scripts/requirements.txt

    • anthropic - Claude API client
    • google-auth - GCP authentication
    • google-cloud-aiplatform - Vertex AI
  • .github/scripts/README.md

    • Setup instructions
    • Usage examples
    • Troubleshooting guide

Example Output

Before (current):

@acs-collector-oncall
Integration tests failed.

After (with AI):

@acs-collector-oncall

🤖 AI Analysis

**Root Cause**: eBPF LSM hook attachment failures on RHCOS nodes. 
Tests failed with "permission denied" when attaching to lsm/file_open.

**Pattern**: All 3 failures occurred on RHCOS amd64 test VMs.

**Recommendations**:
• Check kernel version on RHCOS VMs - LSM BPF requires kernel 5.7+
• Verify CONFIG_BPF_LSM is enabled in kernel config
• Review recent changes to eBPF program loading logic
• Check for SELinux policy changes blocking BPF operations

---
**Statistics**
• Total Failures: 3
• Total Errors: 0
• Failed Jobs: amd64-integration-tests

Key Benefits

Same workflow - No workflow_run complexity, better job tracking
Job dependencies - notify waits for analyze-failures to complete
Markdown report - Clean format, easy to include in Slack
Graceful fallback - Still notifies even if analysis fails
Artifact-based - Analysis report uploaded as workflow artifact
Context available - All test artifacts already in the workflow

Required Secrets

Add these to repository secrets (see .github/scripts/README.md for setup):

# Required
GCP_CLAUDE_SERVICE_ACCOUNT_KEY  # GCP service account JSON with Vertex AI access

# Optional (have defaults)
GCP_CLAUDE_PROJECT_ID           # Defaults to "rhacs-eng"
GCP_CLAUDE_REGION               # Defaults to "us-central1"

Existing secret used:

  • SLACK_COLLECTOR_ONCALL_WEBHOOK ✅ Already configured

Setup Instructions

1. Create GCP Service Account

gcloud iam service-accounts create claude-test-analyzer \
  --display-name="Claude Test Failure Analyzer"

2. Grant Vertex AI Access

gcloud projects add-iam-policy-binding rhacs-eng \
  --member="serviceAccount:claude-test-analyzer@rhacs-eng.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

3. Create Key

gcloud iam service-accounts keys create key.json \
  --iam-account=claude-test-analyzer@rhacs-eng.iam.gserviceaccount.com

4. Add to GitHub Secrets

Repository Settings → Secrets and variables → Actions → New secret:

  • Name: GCP_CLAUDE_SERVICE_ACCOUNT_KEY
  • Value: Paste entire contents of key.json

Testing

The workflow will activate on the next test failure. To test manually:

# Install dependencies
pip install -r .github/scripts/requirements.txt

# Run analysis locally
python .github/scripts/analyze_test_failures.py \
  --artifacts-dir test-artifacts \
  --output-file analysis-report.md \
  --workflow-name "Integration Tests" \
  --failed-jobs "amd64-integration-tests"

# View report
cat analysis-report.md

Advantages Over workflow_run Approach

Feature This PR workflow_run
Workflow complexity ✅ Single workflow ❌ Two workflows
Job dependencies ✅ Native needs: ❌ Separate trigger
Artifact access ✅ Direct download ❌ Cross-workflow
Debugging ✅ Same workflow run ❌ Different run
Failure handling ✅ Built-in fallback ❌ Must handle separately

Future Enhancements

  • Correlate failures with recent commits
  • Link to similar past failures
  • Track failure patterns over time
  • Auto-create issues for recurring failures

Testing Plan

  1. ✅ Workflow syntax validated
  2. ✅ Python script tested locally
  3. ⏳ Will trigger on next test failure
  4. ⏳ Monitor for false positives
  5. ⏳ Iterate based on team feedback

cc @stackrox/acs-collector

Add analyze-failures job that runs within the same workflow when tests fail,
generating a markdown report that is included in Slack notifications.

Changes:
- Add analyze-failures job to integration-tests.yml
- Add analyze-failures job to unit-tests.yml
- Modify notify jobs to wait for and include analysis report
- Add analyze_test_failures.py script using Claude via Vertex AI
- Generate markdown reports with root cause, patterns, recommendations

Architecture:
1. Test jobs run (may fail)
2. analyze-failures job downloads artifacts, parses JUnit XML, calls Claude
3. Uploads analysis-report.md as artifact
4. notify job waits for analysis, downloads report, includes in Slack

Benefits:
- Same workflow (no separate workflow_run complexity)
- Job dependencies ensure analysis completes before notify
- Markdown report format easy to include in Slack
- Graceful fallback if analysis fails
- All test context available in same workflow

Required secrets:
- GCP_CLAUDE_SERVICE_ACCOUNT_KEY: Service account with Vertex AI access
- GCP_CLAUDE_PROJECT_ID: GCP project (optional, defaults to rhacs-eng)
- GCP_CLAUDE_REGION: Vertex AI region (optional, defaults to us-central1)

See .github/scripts/README.md for setup instructions.
@robbycochran robbycochran requested a review from a team as a code owner May 20, 2026 19:54
StackRox Automation and others added 2 commits May 20, 2026 20:01
Replace Python-based test analysis with Claude Code skill for deeper
investigation capabilities:

- Claude can now read source code, git history, and test implementation
- More specific root cause identification with file/line references
- Better platform-specific pattern detection
- Simplified workflow dependencies (no Python setup needed)

The skill has full tool access (Read, Grep, Bash) to investigate
failures rather than just summarizing error messages.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Use npm install for more reliable package management and
better caching support in GitHub Actions.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 27.34%. Comparing base (01135a9) to head (f3b267c).
⚠️ Report is 1 commits behind head on master.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3381   +/-   ##
=======================================
  Coverage   27.34%   27.34%           
=======================================
  Files          95       95           
  Lines        5420     5420           
  Branches     2545     2545           
=======================================
  Hits         1482     1482           
  Misses       3211     3211           
  Partials      727      727           
Flag Coverage Δ
collector-unit-tests 27.34% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

StackRox Automation and others added 3 commits May 20, 2026 20:28
Replace npm-based CLI installation with official claude-code-base-action
for better integration and reliability:

- Direct Vertex AI authentication via use_vertex flag
- Inline prompt with detailed analysis instructions
- Proper GitHub Action outputs and error handling
- No manual installation or dependency management needed
- Better caching and optimization

The action runs headless analysis on test artifacts and generates
markdown reports with root cause, evidence, and recommendations.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Unit tests are stable and rarely fail, so AI analysis isn't needed there.
Keep analysis only for integration tests where failures are more complex
and platform-specific.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Provides a safe way to test the Claude analysis + Slack notification
flow before relying on it for production failures.

Usage:
- Add label 'test-oncall-workflow' to any PR, or
- Manually trigger via workflow_dispatch

The test:
1. Creates realistic fake test failure artifacts (JUnit XML)
2. Runs Claude analysis on the failures
3. Posts to Slack with [TEST] prefix

This allows verification that:
- GCP authentication works
- Claude can parse test reports and analyze code
- Slack webhook is configured correctly
- The full end-to-end flow works

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
StackRox Automation and others added 21 commits May 20, 2026 20:35
Clarify what the test workflow actually validates:
- Workflow execution and job dependencies
- Artifact creation and download
- GCP Vertex AI authentication
- Claude action integration
- Slack webhook delivery
- Report generation and formatting

The test uses synthetic JUnit XML files to verify the infrastructure
works end-to-end, not to validate Claude's analysis quality on real
collector failures.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
GCP_CLAUDE_SERVICE_ACCOUNT_KEY is already configured,
no need to include setup instructions.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
New workflows in PRs can't be triggered by PR events until merged
to the base branch. Change to workflow_dispatch so it can be tested
after merge.

Also adds weekly scheduled run to verify the workflow continues
to work over time.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Can trigger on PR branch via workflow_dispatch.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Ensures the test workflow uses the same GCP Vertex AI configuration
as the production workflow.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- ANTHROPIC_VERTEX_PROJECT_ID from GCP_CLAUDE_PROJECT_ID secret (no default)
- CLOUD_ML_REGION set to global for Vertex AI

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
workflow_dispatch doesn't work for workflows that only exist in PR branches.
Add push trigger so it runs when the workflow file is updated.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Created analyze-and-notify.yml as a reusable workflow that contains
the actual Claude analysis and Slack notification logic.

Both the production integration-tests workflow and the test workflow
now call this same reusable workflow, ensuring:
- No code duplication
- Test runs the exact same logic as production
- Single source of truth for analysis behavior
- Easy to maintain and update

Also fixed CLOUD_ML_REGION from 'global' to 'us-east5' - Anthropic
models on Vertex AI are not available in the global endpoint.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Per user feedback, should use global region.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Artifacts don't work across jobs in reusable workflows. Combined
analyze-failures and notify into a single job so the analysis report
can be read directly without artifact upload/download.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Separated Claude analysis from Slack notification as requested.

Added explicit check for analysis-report.md creation to debug
why Claude action isn't generating the file. This will help
identify if it's:
- Model availability issue (404 error)
- Permissions issue
- File path issue

The artifact upload now only happens if the report exists,
and notify job handles missing artifact gracefully.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The 404 error suggests the default model isn't available. Trying:
- Explicit model: claude-3-5-sonnet-v2@20241022 (Vertex AI naming)
- Region: us-east5 (common region where Claude is available)

If this doesn't work, we may need to:
- Check what models are available in the GCP project
- Use Anthropic API directly instead of Vertex AI
- Verify service account has proper permissions

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Keeping region as global per user requirement.
Still trying explicit model: claude-3-5-sonnet-v2@20241022

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Using claude-opus-4-6 model for Vertex AI.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Trying claude-sonnet-4-6 for Vertex AI availability.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed to us-east5 region and removed explicit model parameter
to use the action's default.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Added continue-on-error to Claude step
- Show Claude action outcome/conclusion
- Display first 20 lines of report if created
- Always run check steps to see what happened

This will help diagnose why the analysis report isn't being created.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
If Claude action doesn't succeed, generate a fallback analysis
report that:
- Explains Claude failed
- Shows the action outcome
- Provides debug steps
- Lists what to check in GCP/Vertex AI

This allows the workflow to complete successfully and send a
Slack notification with useful debugging info instead of just
'artifact not found'.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added:
- allowed_tools: Read, Grep, Glob, Bash so Claude can access code
- Explicit instruction to create analysis-report.md using cat/heredoc

The action runs Claude but doesn't automatically create files from
the prompt. Claude needs explicit tools and instructions to write
the analysis-report.md file.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Following the pattern from PR #3170, moved all analysis logic
into a proper Claude Code skill.

Added:
- .claude/commands/analyze-test-failures.md
  - Skill definition with usage, implementation details
  - Follows repo's slash command pattern
  - Contains all analysis instructions and report format

Changed:
- .github/workflows/analyze-and-notify.yml
  - Simplified prompt to just invoke the skill
  - Removed ~60 lines of inline instructions
  - Cleaner, more maintainable workflow

Benefits:
- Single source of truth for analysis logic
- Can be tested locally: `claude /analyze-test-failures`
- Can be improved independently of workflow
- Follows established repo conventions

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
StackRox Automation and others added 8 commits May 20, 2026 22:41
The claude-code-base-action doesn't auto-load skills from
.claude/commands/, so it was asking for approval to run an
unknown command.

Solution: Inline the instructions in the workflow prompt.

The skill file (.claude/commands/analyze-test-failures.md) remains
as documentation and can be used when running Claude CLI locally,
but the workflow uses the instructions directly.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
After testing, claude-code-base-action doesn't auto-execute skills
from .claude/commands/ the same way the CLI does.

Solution:
- Keep .claude/commands/analyze-test-failures.md as documentation
- Inline the instructions in the workflow for reliability
- Condensed to ~30 lines (readable, maintainable)

The skill file is still valuable for:
- Local testing: read it to understand the approach
- Documentation of the analysis methodology
- Future migration if action adds skill support

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Using append_system_prompt to tell Claude that commands from
.claude/commands/ are trusted and should execute without approval.

This approach:
- Keeps the workflow prompt minimal (4 lines)
- Trusts the skill definition in .claude/commands/analyze-test-failures.md
- Uses /analyze-test-failures command directly

If this works, we get:
✅ DRY - single source of truth in skill file
✅ Minimal workflow code
✅ Same skill works locally and in CI

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Instead of invoking /analyze-test-failures directly (which requires approval),
tell Claude to:
1. Read the skill file: .claude/commands/analyze-test-failures.md
2. Follow its instructions
3. Auto-approve Read, Grep, Glob, Bash tools via settings

This approach:
- Uses action's settings feature for permissions
- Skill file is the source of truth
- Claude reads and executes the documented steps
- No approval prompts needed

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Created proper plugin structure:
- .claude/plugins/test-failure-analyzer/manifest.json
- .claude/plugins/test-failure-analyzer/analyze-test-failures.md

Using action's built-in plugin support:
- plugin_marketplaces: file://.claude/plugins
- plugins: test-failure-analyzer

Workflow now:
1. Action loads plugin from local .claude/plugins/
2. Plugin provides /analyze-test-failures command
3. Workflow invokes it with simple prompt

This is the proper way to use the action's plugin feature.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added:
- allowed_tools: Bash so Claude can create files
- Explicit reminder in prompt to create the file
- Fallback step to capture Claude's text output and write to file

If Claude doesn't create analysis-report.md (as we saw - it just
returned text), this fallback reads the execution output JSON and
writes the result field to analysis-report.md.

This ensures we always get a report, either:
1. Claude creates it directly (preferred)
2. We capture Claude's output and write it (fallback)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changes to skill:
- Changed from 'IMPORTANT' to 'CRITICAL: File Creation Step'
- Added actual template in the bash command example
- Emphasized 'DO NOT just summarize' - must create file
- Explained workflow depends on file existing

Changes to workflow prompt:
- Changed to 'CRITICAL REQUIREMENT'
- Added explicit bash command example in prompt
- Emphasized file MUST exist for workflow to continue

Changes to check step:
- Search entire workspace with find command
- Show current directory
- List all files to debug where Claude is running

This should make it unmistakably clear that creating the file
is not optional.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The heredoc with GitHub expressions was causing YAML parsing errors.
Changed to use echo statements instead of cat heredoc.

This avoids the YAML parser trying to interpret the markdown content
with ** as YAML aliases.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants