Add Claude AI test failure analysis to Slack notifications#3381
Open
robbycochran wants to merge 35 commits into
Open
Add Claude AI test failure analysis to Slack notifications#3381robbycochran wants to merge 35 commits into
robbycochran wants to merge 35 commits into
Conversation
Add analyze-failures job that runs within the same workflow when tests fail, generating a markdown report that is included in Slack notifications. Changes: - Add analyze-failures job to integration-tests.yml - Add analyze-failures job to unit-tests.yml - Modify notify jobs to wait for and include analysis report - Add analyze_test_failures.py script using Claude via Vertex AI - Generate markdown reports with root cause, patterns, recommendations Architecture: 1. Test jobs run (may fail) 2. analyze-failures job downloads artifacts, parses JUnit XML, calls Claude 3. Uploads analysis-report.md as artifact 4. notify job waits for analysis, downloads report, includes in Slack Benefits: - Same workflow (no separate workflow_run complexity) - Job dependencies ensure analysis completes before notify - Markdown report format easy to include in Slack - Graceful fallback if analysis fails - All test context available in same workflow Required secrets: - GCP_CLAUDE_SERVICE_ACCOUNT_KEY: Service account with Vertex AI access - GCP_CLAUDE_PROJECT_ID: GCP project (optional, defaults to rhacs-eng) - GCP_CLAUDE_REGION: Vertex AI region (optional, defaults to us-central1) See .github/scripts/README.md for setup instructions.
Replace Python-based test analysis with Claude Code skill for deeper investigation capabilities: - Claude can now read source code, git history, and test implementation - More specific root cause identification with file/line references - Better platform-specific pattern detection - Simplified workflow dependencies (no Python setup needed) The skill has full tool access (Read, Grep, Bash) to investigate failures rather than just summarizing error messages. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Use npm install for more reliable package management and better caching support in GitHub Actions. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3381 +/- ##
=======================================
Coverage 27.34% 27.34%
=======================================
Files 95 95
Lines 5420 5420
Branches 2545 2545
=======================================
Hits 1482 1482
Misses 3211 3211
Partials 727 727
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Replace npm-based CLI installation with official claude-code-base-action for better integration and reliability: - Direct Vertex AI authentication via use_vertex flag - Inline prompt with detailed analysis instructions - Proper GitHub Action outputs and error handling - No manual installation or dependency management needed - Better caching and optimization The action runs headless analysis on test artifacts and generates markdown reports with root cause, evidence, and recommendations. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Unit tests are stable and rarely fail, so AI analysis isn't needed there. Keep analysis only for integration tests where failures are more complex and platform-specific. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Provides a safe way to test the Claude analysis + Slack notification flow before relying on it for production failures. Usage: - Add label 'test-oncall-workflow' to any PR, or - Manually trigger via workflow_dispatch The test: 1. Creates realistic fake test failure artifacts (JUnit XML) 2. Runs Claude analysis on the failures 3. Posts to Slack with [TEST] prefix This allows verification that: - GCP authentication works - Claude can parse test reports and analyze code - Slack webhook is configured correctly - The full end-to-end flow works Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Clarify what the test workflow actually validates: - Workflow execution and job dependencies - Artifact creation and download - GCP Vertex AI authentication - Claude action integration - Slack webhook delivery - Report generation and formatting The test uses synthetic JUnit XML files to verify the infrastructure works end-to-end, not to validate Claude's analysis quality on real collector failures. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
GCP_CLAUDE_SERVICE_ACCOUNT_KEY is already configured, no need to include setup instructions. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
New workflows in PRs can't be triggered by PR events until merged to the base branch. Change to workflow_dispatch so it can be tested after merge. Also adds weekly scheduled run to verify the workflow continues to work over time. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Can trigger on PR branch via workflow_dispatch. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Ensures the test workflow uses the same GCP Vertex AI configuration as the production workflow. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- ANTHROPIC_VERTEX_PROJECT_ID from GCP_CLAUDE_PROJECT_ID secret (no default) - CLOUD_ML_REGION set to global for Vertex AI Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
workflow_dispatch doesn't work for workflows that only exist in PR branches. Add push trigger so it runs when the workflow file is updated. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Created analyze-and-notify.yml as a reusable workflow that contains the actual Claude analysis and Slack notification logic. Both the production integration-tests workflow and the test workflow now call this same reusable workflow, ensuring: - No code duplication - Test runs the exact same logic as production - Single source of truth for analysis behavior - Easy to maintain and update Also fixed CLOUD_ML_REGION from 'global' to 'us-east5' - Anthropic models on Vertex AI are not available in the global endpoint. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Per user feedback, should use global region. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Artifacts don't work across jobs in reusable workflows. Combined analyze-failures and notify into a single job so the analysis report can be read directly without artifact upload/download. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Separated Claude analysis from Slack notification as requested. Added explicit check for analysis-report.md creation to debug why Claude action isn't generating the file. This will help identify if it's: - Model availability issue (404 error) - Permissions issue - File path issue The artifact upload now only happens if the report exists, and notify job handles missing artifact gracefully. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The 404 error suggests the default model isn't available. Trying: - Explicit model: claude-3-5-sonnet-v2@20241022 (Vertex AI naming) - Region: us-east5 (common region where Claude is available) If this doesn't work, we may need to: - Check what models are available in the GCP project - Use Anthropic API directly instead of Vertex AI - Verify service account has proper permissions Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Keeping region as global per user requirement. Still trying explicit model: claude-3-5-sonnet-v2@20241022 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Using claude-opus-4-6 model for Vertex AI. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Trying claude-sonnet-4-6 for Vertex AI availability. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed to us-east5 region and removed explicit model parameter to use the action's default. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Added continue-on-error to Claude step - Show Claude action outcome/conclusion - Display first 20 lines of report if created - Always run check steps to see what happened This will help diagnose why the analysis report isn't being created. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
If Claude action doesn't succeed, generate a fallback analysis report that: - Explains Claude failed - Shows the action outcome - Provides debug steps - Lists what to check in GCP/Vertex AI This allows the workflow to complete successfully and send a Slack notification with useful debugging info instead of just 'artifact not found'. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added: - allowed_tools: Read, Grep, Glob, Bash so Claude can access code - Explicit instruction to create analysis-report.md using cat/heredoc The action runs Claude but doesn't automatically create files from the prompt. Claude needs explicit tools and instructions to write the analysis-report.md file. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Following the pattern from PR #3170, moved all analysis logic into a proper Claude Code skill. Added: - .claude/commands/analyze-test-failures.md - Skill definition with usage, implementation details - Follows repo's slash command pattern - Contains all analysis instructions and report format Changed: - .github/workflows/analyze-and-notify.yml - Simplified prompt to just invoke the skill - Removed ~60 lines of inline instructions - Cleaner, more maintainable workflow Benefits: - Single source of truth for analysis logic - Can be tested locally: `claude /analyze-test-failures` - Can be improved independently of workflow - Follows established repo conventions Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The claude-code-base-action doesn't auto-load skills from .claude/commands/, so it was asking for approval to run an unknown command. Solution: Inline the instructions in the workflow prompt. The skill file (.claude/commands/analyze-test-failures.md) remains as documentation and can be used when running Claude CLI locally, but the workflow uses the instructions directly. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
After testing, claude-code-base-action doesn't auto-execute skills from .claude/commands/ the same way the CLI does. Solution: - Keep .claude/commands/analyze-test-failures.md as documentation - Inline the instructions in the workflow for reliability - Condensed to ~30 lines (readable, maintainable) The skill file is still valuable for: - Local testing: read it to understand the approach - Documentation of the analysis methodology - Future migration if action adds skill support Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Using append_system_prompt to tell Claude that commands from .claude/commands/ are trusted and should execute without approval. This approach: - Keeps the workflow prompt minimal (4 lines) - Trusts the skill definition in .claude/commands/analyze-test-failures.md - Uses /analyze-test-failures command directly If this works, we get: ✅ DRY - single source of truth in skill file ✅ Minimal workflow code ✅ Same skill works locally and in CI Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Instead of invoking /analyze-test-failures directly (which requires approval), tell Claude to: 1. Read the skill file: .claude/commands/analyze-test-failures.md 2. Follow its instructions 3. Auto-approve Read, Grep, Glob, Bash tools via settings This approach: - Uses action's settings feature for permissions - Skill file is the source of truth - Claude reads and executes the documented steps - No approval prompts needed Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Created proper plugin structure: - .claude/plugins/test-failure-analyzer/manifest.json - .claude/plugins/test-failure-analyzer/analyze-test-failures.md Using action's built-in plugin support: - plugin_marketplaces: file://.claude/plugins - plugins: test-failure-analyzer Workflow now: 1. Action loads plugin from local .claude/plugins/ 2. Plugin provides /analyze-test-failures command 3. Workflow invokes it with simple prompt This is the proper way to use the action's plugin feature. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added: - allowed_tools: Bash so Claude can create files - Explicit reminder in prompt to create the file - Fallback step to capture Claude's text output and write to file If Claude doesn't create analysis-report.md (as we saw - it just returned text), this fallback reads the execution output JSON and writes the result field to analysis-report.md. This ensures we always get a report, either: 1. Claude creates it directly (preferred) 2. We capture Claude's output and write it (fallback) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changes to skill: - Changed from 'IMPORTANT' to 'CRITICAL: File Creation Step' - Added actual template in the bash command example - Emphasized 'DO NOT just summarize' - must create file - Explained workflow depends on file existing Changes to workflow prompt: - Changed to 'CRITICAL REQUIREMENT' - Added explicit bash command example in prompt - Emphasized file MUST exist for workflow to continue Changes to check step: - Search entire workspace with find command - Show current directory - List all files to debug where Claude is running This should make it unmistakably clear that creating the file is not optional. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The heredoc with GitHub expressions was causing YAML parsing errors. Changed to use echo statements instead of cat heredoc. This avoids the YAML parser trying to interpret the markdown content with ** as YAML aliases. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds AI-powered test failure analysis that runs within the same workflow and includes intelligent insights in Slack notifications.
New Architecture
Instead of a separate workflow_run, this adds an
analyze-failuresjob to the existing test workflows:What Changed
Modified Workflows
.github/workflows/integration-tests.ymlanalyze-failuresjob beforenotifynotifyto download and include analysis report.github/workflows/unit-tests.ymlNew Files
.github/scripts/analyze_test_failures.py.github/scripts/requirements.txtanthropic- Claude API clientgoogle-auth- GCP authenticationgoogle-cloud-aiplatform- Vertex AI.github/scripts/README.mdExample Output
Before (current):
After (with AI):
Key Benefits
✅ Same workflow - No workflow_run complexity, better job tracking
✅ Job dependencies -
notifywaits foranalyze-failuresto complete✅ Markdown report - Clean format, easy to include in Slack
✅ Graceful fallback - Still notifies even if analysis fails
✅ Artifact-based - Analysis report uploaded as workflow artifact
✅ Context available - All test artifacts already in the workflow
Required Secrets
Add these to repository secrets (see
.github/scripts/README.mdfor setup):Existing secret used:
SLACK_COLLECTOR_ONCALL_WEBHOOK✅ Already configuredSetup Instructions
1. Create GCP Service Account
gcloud iam service-accounts create claude-test-analyzer \ --display-name="Claude Test Failure Analyzer"2. Grant Vertex AI Access
3. Create Key
4. Add to GitHub Secrets
Repository Settings → Secrets and variables → Actions → New secret:
GCP_CLAUDE_SERVICE_ACCOUNT_KEYkey.jsonTesting
The workflow will activate on the next test failure. To test manually:
Advantages Over workflow_run Approach
needs:Future Enhancements
Testing Plan
cc @stackrox/acs-collector