Skip to content

Gate automated review on risk/complexity scores and test coverage#6249

Open
msujaws wants to merge 2 commits into
mozilla:masterfrom
msujaws:feature-1-risk-gating
Open

Gate automated review on risk/complexity scores and test coverage#6249
msujaws wants to merge 2 commits into
mozilla:masterfrom
msujaws:feature-1-risk-gating

Conversation

@msujaws

@msujaws msujaws commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Depends on #6248.

Adds a cheap pre-screen so the expensive review agent only runs on revisions that score low on risk and complexity, letting teams ease in by raising thresholds as confidence grows. Also adds reviewer-group resolution (with aliases) so gating can be configured per group.
Library (bugbug/tools/code_review/):

  • diff_analysis.py — classify in-diff test signal (absent / sparse / adequate / tests_only / no_code_change) and collect changed non-test paths.
  • test_coverage.py — probe mozilla-central via the async searchfox client for existing tests referencing changed files; degrades gracefully if unavailable.
  • scoring.py — RiskComplexityScorer: one structured Haiku call returning 0-10 risk/complexity plus factors, fed the precomputed <test_signals> block.
  • agent.py — inject the test-signals block into the prompt; surface token usage and tool-call counts in the response.
  • llms.py — DEFAULT_SCORING_MODEL and a shared usage_from_messages() helper.

Service (reviewhelper-api):

  • reviewer_groups.py + reviewer_groups.yaml — per-group thresholds (with defaults) from a checked-in YAML; matching_groups() resolves a revision's groups via the new PhabricatorPatch plumbing. An aliases field lets a rotation project (e.g. home-newtab-reviewers-rotation) count as the same logical group, matched against historical reviewer groups.
  • review_processor.py — _score_and_gate() scores before reviewing and raises ReviewSkipped("above_threshold") when either score meets its threshold.
    • ReviewStatus.SKIPPED (terminal, no retry) + skipped_reason and risk/complexity/test-signal columns on review_requests, recorded even when skipped; alembic migration + status index.
    • internal.py records skips as SKIPPED (HTTP 204); pricing.py adds per-model token pricing for later cost reporting.

msujaws added 2 commits June 30, 2026 11:45
Adds a cheap pre-screen so the (expensive) review agent only runs on revisions
that score low on risk and complexity, letting teams "ease in" by raising
thresholds as confidence grows.

Library (bugbug/tools/code_review/):
- diff_analysis.py: classify the patch's in-diff test signal (absent / sparse /
  adequate / tests_only / no_code_change) and collect changed non-test paths.
- test_coverage.py: probe mozilla-central via the async searchfox client for
  existing tests referencing the changed files (covered / partial / uncovered);
  degrades gracefully if searchfox is unavailable.
- scoring.py: RiskComplexityScorer, a single structured Haiku call returning
  0-10 risk/complexity plus factors, fed the precomputed <test_signals> block.
- agent.py: inject the test-signals block into the review prompt and surface
  token usage + tool-call counts in the response details.
- llms.py: DEFAULT_SCORING_MODEL and a shared usage_from_messages() helper.

Service (reviewhelper-api):
- reviewer_groups.py + reviewer_groups.yaml: per-group thresholds (with
  defaults) loaded from a checked-in YAML file; matching_groups() resolves a
  revision's reviewer groups via the new PhabricatorPatch plumbing.
- review_processor.py: _score_and_gate() scores before reviewing and raises
  ReviewSkipped("above_threshold") when either score meets its threshold.
- ReviewStatus.SKIPPED (terminal, no retry) + skipped_reason and
  risk/complexity/test-signal columns on review_requests, recorded even when
  skipped; alembic migration + status index.
- internal.py records skips as SKIPPED (HTTP 204); request.py messaging.
- pricing.py: per-model token pricing for later cost reporting.
Consume PhabricatorPatch.historical_reviewer_project_phids so a group that was assigned via a rotation (added, then swapped for an individual and removed) still matches. Add an `aliases` field to ReviewerGroup so a rotation project (e.g. home-newtab-reviewers-rotation) counts as the same logical group as home-newtab-reviewers, and resolve every slug (own + aliases) when matching. Falls back to the current reviewer snapshot for patches without history.
@msujaws msujaws force-pushed the feature-1-risk-gating branch from 48cbf57 to b403990 Compare June 30, 2026 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant