Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 167 additions & 0 deletions .claude/skills/explain-api-roundtrip/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
---
name: explain-api-roundtrip
description: >
Use this skill to explain, in plain language, what is wrong with the `mx::api` round-trip and what
it needs next. It drives the failure classifier (dump -> classify) over the corpus, then reads
build/api/classified.json and turns it into a prioritized, human-readable explanation grouped by
failure mode (hard crash, dropped supported elements, reorder, by-design drop, audit blind spot).
Invoke for requests like "what's broken about mx::api", "explain the round-trip failures",
"what does the api need next", or "triage the api round-trip".
argument-hint: "<optional: a category or element to focus on>"
disable-model-invocation: false
user-invocable: true
---
# Explain the `mx::api` round-trip

`mx::api` is a deliberate subset of MusicXML, so some round-trip data loss is by design. This skill
separates the by-design losses from the real defects and produces a plain-English answer to two
questions: what is broken, and what should we fix next.

It is the read-out layer on top of the failure classifier (issue #211): the classifier produces a
machine-readable `build/api/classified.json`; this skill interprets it for a human.

## How it works

The pipeline has two steps, kept separate on purpose (see `audit/README.md`):

1. `make dump-api-roundtrip` — runs the api pipeline over every corpus file and writes the
normalized expected/actual XML pairs to `build/api/roundtrip-dump/`. Slow: it builds the C++
harness and runs ~800 files. Re-run only when api/impl code changed.
2. `make classify-api-roundtrip` — pure Python, fast. Diffs each pair as an element multiset
(`Counter(expected) - Counter(actual)`), cross-references `data/api.features.xml`, and writes
`build/api/classified.json` plus a stdout summary.

The categories in the JSON (`primary_category`):

| id | meaning |
|----|---------|
| B | drop-only: every missing element is `support="none"` — a by-design subset drop |
| C | reorder-only: same elements, different order |
| D | enum bug: a value maps to a known-missing enum member |
| E | missing attribute: a `partial` feature dropped one attribute |
| F | pipeline error: LOADFAIL/GETDATAFAIL/CREATEFAIL — no output produced (a crash) |
| unknown | a FAIL that matched none of the above — usually a `support="full"` element that was dropped (a real bug) or an element not tracked in `api.features.xml` |

## Procedure

### Step 1 — produce the data

If `build/api/roundtrip-dump/` is empty or stale (api/impl changed since it was written), run:

```
make dump-api-roundtrip
```

Then always run:

```
make classify-api-roundtrip
```

Read the stdout summary it prints — that is the top-level shape (counts per category + the worklist
of features blocking the most files).

### Step 2 — mine `build/api/classified.json`

Run these read-only analyses (they join the classifier output against the support index). Adjust the
path if `--out` was overridden.

Top dropped elements, with their audited support level (the key signal — `support="full"` drops are
bugs, `support="none"` drops are by design):

```
python3 - <<'PY'
import json, re
from collections import Counter
d = json.load(open("build/api/classified.json"))
support = {m.group(1): m.group(2) for m in
re.finditer(r'name="([^"]+)" support="([a-z]+)"', open("data/api.features.xml").read())}
miss = Counter()
for r in d["files"]:
for tag in r["missing_element_counts"]:
miss[tag] += 1 # files affected
for tag, n in miss.most_common(25):
print(f"{n:>4} files {tag:<18} support={support.get(tag, 'NOT-IN-INDEX')}")
PY
```

Pipeline-error (crash) cluster — group by file/feature to find the common root:

```
python3 - <<'PY'
import json
d = json.load(open("build/api/classified.json"))
for r in d["files"]:
if r["primary_category"] == "F":
print(r["pipeline_error_kind"], r["file"])
PY
```

Reorder cluster — where in the tree the order diverges:

```
python3 - <<'PY'
import json
from collections import Counter
d = json.load(open("build/api/classified.json"))
paths = Counter(r["first_divergence_path"] for r in d["files"] if r["primary_category"] == "C")
for path, n in paths.most_common(10):
print(f"{n:>4} {path}")
PY
```

What is driving the `unknown` bucket (the warnings on stderr from Step 1 also list this; this is the
programmatic view):

```
python3 - <<'PY'
import json, re
from collections import Counter
d = json.load(open("build/api/classified.json"))
support = {m.group(1): m.group(2) for m in
re.finditer(r'name="([^"]+)" support="([a-z]+)"', open("data/api.features.xml").read())}
full_drop, untracked = Counter(), Counter()
for r in d["files"]:
if r["primary_category"] != "unknown":
continue
for tag in r["missing_elements"]:
s = support.get(tag)
if s in ("full", "partial"):
full_drop[tag] += 1 # claimed supported but dropped -> bug
elif s is None:
untracked[tag] += 1 # not in api.features.xml -> audit gap
print("supported-but-dropped:", full_drop.most_common(10))
print("untracked:", untracked.most_common(10))
PY
```

To drill into one file, look at its record (`missing_elements`, `mismatch_type`,
`first_divergence_path`) and diff the pair directly:
`diff build/api/roundtrip-dump/<flat>.expected.xml build/api/roundtrip-dump/<flat>.actual.xml`
where `<flat>` is the corpus path with `/` replaced by `__`.

### Step 3 — write the explanation

Synthesize the findings into plain language grouped by failure mode, ordered by severity. Use this
structure (fill the numbers and element names from Step 2; do not invent them):

1. Frame it: `mx::api` is a subset, so some loss is by design — separate that from the real defects.
2. Hard crashes (category F). Highest severity: no output at all. Name the cluster (the crash
analysis usually points at one feature). These are bugs.
3. Dropped supported elements (the `support="full"`/`partial` rows from the top-dropped table and the
`supported-but-dropped` view). Either an impl round-trip bug or `api.features.xml` overstates
support — say which needs checking, per element.
4. Reorder (category C). Lower severity: content intact, order wrong. Name the divergence path.
5. By-design drops (category B): mention briefly — these are expected subset behavior, not bugs.
6. Audit blind spots: the `untracked` view — elements dropped but not in `api.features.xml`, so they
can't be categorized. Recommend running `api-feature-audit` to close the gap.

Then give a prioritized "what it needs" list. Be honest about caveats: the comparison is strict
full-DOM, and if the pinned baseline (`roundtrip-baseline.txt`) is ungrown, almost the whole corpus
shows as failing — these are the raw landscape, not a regression.

## Hand-off Fixes (if requested)

- To fix a dropped/under-supported element or a crash: use the `add-mx-api-feature` skill.
- To correct or extend support levels in `data/api.features.xml`: use the `api-feature-audit` skill.
- The findings belong under the tracking issue #208; file specifics with the `open-mx-issue` skill.
3 changes: 3 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,9 @@ jobs:
- name: Generator tests
run: make test-gen

- name: Audit tests
run: make test-audit

- name: plates --check (all targets)
run: make gen-check

Expand Down
67 changes: 0 additions & 67 deletions .github/workflows/replace-claude.yaml

This file was deleted.

39 changes: 37 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -65,12 +65,13 @@ FIND_CPP := find src \

.DEFAULT_GOAL := help
.PHONY: help sdk fmt check core-dev check-core-dev test-core-dev test-cpp-unit \
validate-cpp probe-cpp coverage-core-dev test-gen gen-check \
validate-cpp probe-cpp coverage-core-dev test-gen test-audit gen-check \
gen-quality gen-lint \
gen gen-cpp gen-go gen-c gen-schema \
audit audit-force \
build-go build-c test-go test-c \
lib dev test run-examples test-api-roundtrip discover-api-roundtrip coverage-api \
lib dev test run-examples test-api-roundtrip discover-api-roundtrip \
dump-api-roundtrip classify-api-roundtrip coverage-api \
clean clean-docker check-docker docker-volume

help:
Expand All @@ -84,6 +85,8 @@ help:
@echo ' make run-examples Build and run all three api example programs.'
@echo ' make test-api-roundtrip Run the corpus api roundtrip in regression mode (CI gate).'
@echo ' make discover-api-roundtrip Run discovery mode over the full corpus (manual only).'
@echo ' make dump-api-roundtrip Dump normalized expected/actual XML for failures.'
@echo ' make classify-api-roundtrip Classify dumped failures by root cause (Python).'
@echo ' make coverage-api Instrumented api/impl/utility build + gcovr report.'
@echo ''
@echo ' C++ core:'
Expand All @@ -97,6 +100,7 @@ help:
@echo ''
@echo ' Generator:'
@echo ' make test-gen Run the generator (parser + IR + plates + press) Python tests.'
@echo ' make test-audit Run the audit tool Python tests (incl. failure classifier).'
@echo ' make gen-check plates --check for every target (renames, collisions).'
@echo ' make gen Run the generator for every target (cpp/go/c/schema).'
@echo ' make gen-cpp Run the generator for the C++ target (src/private/mx/core/generated).'
Expand Down Expand Up @@ -177,6 +181,24 @@ test-api-roundtrip: dev
discover-api-roundtrip: dev
$(BUILD_ROOT)/api/mxtest-api-roundtrip discovery $(CURDIR)/data

# Dump normalized expected/actual XML for every failing api round-trip.
# Output goes to build/api/roundtrip-dump/ (build dir, already gitignored).
# Feeds the classifier: make dump-api-roundtrip && make classify-api-roundtrip
dump-api-roundtrip: dev
mkdir -p $(BUILD_ROOT)/api/roundtrip-dump
$(BUILD_ROOT)/api/mxtest-api-roundtrip discovery $(CURDIR)/data \
--dump $(CURDIR)/$(BUILD_ROOT)/api/roundtrip-dump

# Classify api round-trip failures by root cause.
# Reads the dump produced by dump-api-roundtrip; writes build/api/classified.json.
# Fast (pure Python); kept separate from the slow dump step so classification
# logic can be re-run without re-dumping. Pass DUMP_DIR=path to override.
DUMP_DIR ?= $(BUILD_ROOT)/api/roundtrip-dump
classify-api-roundtrip:
python3 -m audit classify $(DUMP_DIR) \
--data $(CURDIR)/data \
--out $(BUILD_ROOT)/api/classified.json

# Instrumented api coverage: build mx+mxtest with --coverage, run all
# suites, produce gcovr report for src/private/mx/{api,impl,utility}/.
coverage-api:
Expand Down Expand Up @@ -299,6 +321,10 @@ coverage-core-dev:
test-gen:
python3 -m unittest discover -s gen/tests -t . $(ARGS)

# Audit tool Python tests (feature-audit + the round-trip failure classifier).
test-audit:
python3 -m unittest discover -s audit/tests -t . $(ARGS)

# plates --check for every target: validates renames and detects identifier
# collisions (a CI gate, like test-gen).
gen-check:
Expand Down Expand Up @@ -406,6 +432,12 @@ test-api-roundtrip: $(DOCKER_STAMP) docker-volume
discover-api-roundtrip: $(DOCKER_STAMP) docker-volume
$(DOCKER_RUN) make discover-api-roundtrip BUILD_TYPE=$(BUILD_TYPE)

dump-api-roundtrip: $(DOCKER_STAMP) docker-volume
$(DOCKER_RUN) make dump-api-roundtrip BUILD_TYPE=$(BUILD_TYPE)

classify-api-roundtrip: $(DOCKER_STAMP) docker-volume
$(DOCKER_RUN) make classify-api-roundtrip BUILD_TYPE=$(BUILD_TYPE)

coverage-api: $(DOCKER_STAMP) docker-volume
@rm -rf $(COV_DIR)/api
$(DOCKER_RUN) make coverage-api BUILD_TYPE=$(BUILD_TYPE) ARGS='$(ARGS)'
Expand Down Expand Up @@ -437,6 +469,9 @@ coverage-core-dev: $(DOCKER_STAMP) docker-volume
test-gen: $(DOCKER_STAMP)
$(DOCKER_RUN) make test-gen ARGS='$(ARGS)'

test-audit: $(DOCKER_STAMP)
$(DOCKER_RUN) make test-audit ARGS='$(ARGS)'

gen-check: $(DOCKER_STAMP)
$(DOCKER_RUN) make gen-check

Expand Down
26 changes: 26 additions & 0 deletions audit/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,32 @@ python3 -m audit all [--force] # both
common case (a new corpus file was added) only writes the new sidecar. Use
`--force` when the output format itself changes.

## Classifying api round-trip failures

```
make dump-api-roundtrip # C++: write normalized expected/actual XML pairs
make classify-api-roundtrip # Python: classify those failures by root cause

python3 -m audit classify <dump_dir> [--data DIR] [--out FILE]
```

`classify` reads the dump directory produced by `make dump-api-roundtrip`
(`build/api/roundtrip-dump/`), diffs each expected/actual pair as an order-free
element **multiset** (`Counter(expected) - Counter(actual)`), cross-references
`data/api.features.xml`, and assigns each non-passing file a root-cause category
(drop-only, reorder-only, enum bug, missing attribute, pipeline error). It writes
`build/api/classified.json` and prints a worklist of the features blocking the
most files. The two steps are kept separate: dumping is slow (runs the C++
pipeline over the whole corpus), classifying is fast (pure Python), so the
classification logic can be iterated without re-dumping. See
`docs/ai/design/api-roundtrip-classifier.md`.

## Tests

```
make test-audit # python3 -m unittest discover -s audit/tests -t .
```

## Audited set

The audited files are exactly those the `corert` round-trip suite processes (see
Expand Down
Loading
Loading