Skip to content

audit: classify api round-trip failures by measured divergence and rank a worklist#225

Merged
webern merged 2 commits into
mainfrom
claude/tender-mccarthy-ye7mii
Jun 20, 2026
Merged

audit: classify api round-trip failures by measured divergence and rank a worklist#225
webern merged 2 commits into
mainfrom
claude/tender-mccarthy-ye7mii

Conversation

@webern

@webern webern commented Jun 20, 2026

Copy link
Copy Markdown
Owner

Summary

Reworks the api round-trip failure classifier (audit/classify.py) to rank fixes purely by measured behavior, and produces the ranked worklist #212 asked for.

The classifier keyed its categories on the hand-authored support attribute in data/api.features.xml — a prediction of round-trip behavior, not a measurement, and demonstrably wrong (part-group was marked full yet dropped data, #224). Ranking built on a fallible prediction is untrustworthy.

What changed:

Ran natively over the corpus: 828 failing files, 0 crashes, 550 reorder-free candidates. The top signals are add: — mx::api injecting elements the source lacked (encoding, identification, type) — which the old support-based classifier could never surface. The first two fixes (stop emitting an empty <encoding/>, preserve part-name/@print-object) land 26 files; 15 fixes land 71.

The individual fixes are filed as separate issues under #208 / #213.

Testing

  • make test-audit (classifier unit tests): 19 cases, green
  • Rewrote the classifier tests for the measured model (signatures, distance, candidate/reorder split, sole-blocker worklist, greedy batch plan)
  • End-to-end: built mxtest-api-roundtrip, ran make dump-api-roundtrip (828 files) and make classify-api-roundtrip — produces the worklist + batch plan

References

webern added 2 commits June 20, 2026 18:19
…upport opinion

The classifier keyed categories (B/D/E/G) on the hand-authored `support`
attribute in data/api.features.xml -- a prediction of round-trip behavior, not a
measurement, and demonstrably wrong (part-group was marked full yet dropped
data, #224). Ranking built on a fallible prediction is untrustworthy.

Rework classification to be grounded only in what each expected/actual pair
actually shows:

- Drop the api.features.xml cross-reference entirely. Whether a drop is intended
  is a present-day human call (#214), not something the classifier asserts.
- Reduce every difference to a signature (drop/add/value/attr/reorder); a file's
  distance to passing is its count of unique signatures (a tag dropped many
  times is one signature).
- value/attr now come from the alignment walk (recurse SequenceMatcher equal
  blocks) so they survive sibling drops; drops/adds stay on the O(n) multiset.
- Status is PASS/FAIL/CRASH. A FAIL with no reorder is a candidate; reorders are
  expected mx::api behavior, deferred to test normalization (#214).
- Worklist ranks signatures by candidate files they are the sole blocker of,
  then by total files blocked; report adds a distance histogram and distance-1..3
  near-miss buckets so small fix-sets are visible.

Rewrite the classifier tests for the measured model and update the design doc,
audit/README, the explain-api-roundtrip skill, and the CLI help to match.
The signature worklist ranks fixes independently, but most candidate files need
several fixes before they pass strict comparison. Add build_batch_plan: a greedy
set-cover that, at each step, picks the signature clearing the most candidate
files outright, answering #212's "minimal changes -> most files" directly. The
report gains a batch_plan section and the stdout summary prints it.

On the current corpus the first two fixes -- stop emitting an empty <encoding/>
and preserve part-name/@print-object -- land 26 files; 15 fixes land 71.
@webern webern added testing non-breaking fixes or implementation that do not require breaking changes ai Issues opened by, or through, a coding agent. labels Jun 20, 2026 — with Claude
@github-actions

Copy link
Copy Markdown

gen-quality gen/

gen-quality: 84.5 / 100   (floor 84.5, +0.0)

  structure     86.5  x0.50   [fn 90.5 / file 82.6]
  cyclomatic    88.4  x0.25
  cognitive     76.6  x0.25

  409 functions across 31 files, 7702 lines (largest file 1044)
  max cc 56  max cognitive 44  max fn loc 152

Worst offenders (top 5 per axis; full lists in score.json):
  cyclomatic gen/xsd/analyze.py:311     report                             56
  cyclomatic gen/plates/build.py:956    _validate_config_against_ir        35
  cyclomatic gen/press/context.py:145   plate_context                      34
  cyclomatic gen/__main__.py:46         _ir                                23
  cyclomatic gen/tests/test_ir.py:102   _check_references                  20
  cognitive  gen/xsd/analyze.py:311     report                             44
  cognitive  gen/ir/resolve.py:119      flat_elements                      40
  cognitive  gen/tests/test_ir.py:102   _check_references                  38
  cognitive  gen/press/context.py:145   plate_context                      37
  cognitive  gen/xsd/analyze.py:207     _sccs                              37
  size       gen/xsd/analyze.py:311     report                             152
  size       gen/press/context.py:145   plate_context                      96
  size       gen/plates/build.py:533    _value_plate                       89
  size       gen/plates/build.py:956    _validate_config_against_ir        89
  size       gen/ir/resolve.py:119      flat_elements                      78

Commit 6d4db346b0eb9ec5b90b2551208453b9c343e0dd.

@github-actions

Copy link
Copy Markdown

Coverage report

Core-dev coverage src/private/mx/core/

Metric Coverage Covered / Total
Lines 77.9% 28539 / 36624
Functions 74.4% 6360 / 8550
Branches 50.7% 22672 / 44725

API coverage src/private/mx/{api,impl,utility}/

Metric Coverage Covered / Total
Lines 72.7% 5428 / 7468
Functions 60.3% 1831 / 3034
Branches 43.7% 4532 / 10375

Core HTML report | API HTML report

Commit 6d4db346b0eb9ec5b90b2551208453b9c343e0dd.

@webern webern merged commit 9c19b20 into main Jun 20, 2026
7 checks passed
@webern webern deleted the claude/tender-mccarthy-ye7mii branch June 20, 2026 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai Issues opened by, or through, a coding agent. non-breaking fixes or implementation that do not require breaking changes testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

api: rank round-trip failure fixes by files unblocked vs effort

1 participant