Skip to content

XY-1073: Add quality scoreboard grammar and adversarial benchmark gates#239

Merged
yvette-carlisle merged 1 commit into
mainfrom
y/elf-xy-1073
Jun 22, 2026
Merged

XY-1073: Add quality scoreboard grammar and adversarial benchmark gates#239
yvette-carlisle merged 1 commit into
mainfrom
y/elf-xy-1073

Conversation

@yvette-carlisle

Copy link
Copy Markdown
Member

Summary

  • Add the P4 quality scoreboard grammar for typed non-pass states, evidence classes, external-adapter manifest boundaries, and unqualified-win claim blocking.
  • Add adversarial quality fixtures covering source authority conflicts, stale current answers, unsupported-claim refusal, private excluded spans, and correction persistence.
  • Add benchmark/runbook docs for the new quality scoreboard and adversarial gate.

Validation

  • cargo make real-world-memory-adversarial-quality
  • cargo nextest run -p elf-eval --test real_world_job_benchmark adversarial_quality_fixtures_score_scoreboard_gates adversarial_quality_fixture_catches_unsupported_and_stale_regressions adversarial_quality_repeated_fixture_run_is_deterministic
  • cargo nextest run -p elf-eval --test real_world_job_benchmark real_world_memory_fixtures_report_aggregate_metrics
  • cargo make checks

Handoff Note

  • Decodex produced the original XY-1073 commit but failed before creating a PR; the retained lane entered failed/orphaned handoff state after interrupt.
  • Manual recovery amended the generated commit to clarify job-level vs external-adapter typed non-pass scoreboard fields, then reran validation before this PR.

…r and adversarial benchmark gates","authority":"XY-1073"}
@yvette-carlisle yvette-carlisle merged commit 846880f into main Jun 22, 2026
12 checks passed
@yvette-carlisle yvette-carlisle deleted the y/elf-xy-1073 branch June 22, 2026 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant