Skip to content

Spec 30: real-world beta-normalizer fixtures — capture session data per provider #102

@0bserver07

Description

@0bserver07

Goal

Replace synthetic-but-spec-accurate beta-normalizer fixtures with real session data captured from each beta provider. The current fixtures catch structural failure modes; they don't catch the next "Cursor v3 conversationId-in-the-key" until real local data hits the normalizer.

Why now

HANDOFF #6 has been open since v0.7.0. The defensive empty/malformed coverage from v0.6.1 + the spec-shape coverage from Wave 5 are not enough for normalizer correctness on real data — the codeburn catalog spec doesn't capture every edge case in the wild.

Schema

None. Test-fixture additions only.

User-visible surface

None. CI-only correctness improvement.

Implementation plan

  1. Set up minimal accounts / installs for each beta provider where reasonable (the maintainer or a contributor needs to do this on their machine):
    • Qwen, Gemini, Copilot, Codeium, Continue, Droid, Kiro, OpenClaw, Pi/OMP, OpenCode, Cursor Agent, KiloCode, Roo Code (13 in total).
  2. Run a representative session in each (5-10 turns, mix of prompts + tool calls).
  3. Copy the resulting source files (each adapter's enumerate() knows where) into tests/fixtures/beta_normalizers/<provider>/real_session_<id>.<ext>.
  4. Strip PII / secrets — write a redaction helper that zeroes API keys, file paths in /Users/<name>/, etc.
  5. Add real-shape parity tests: each provider's normalizer is run on each real fixture, asserts cost_source != "unknown" (or documents which models legitimately are), model is non-empty, tokens_in / tokens_out are non-zero.
  6. Update docs/beta-normalizer-drift.md with new findings.

Tests

  • Per-provider parity: run normalizer on each real fixture, assert key invariants.
  • Redaction helper: smoke-test that no secrets / personal paths leak into the committed fixture.

Hard parts

  • Most beta providers don't have real local sessions on the maintainer's machine. This is a logistics problem, not a code problem. The agent can write the parity test infrastructure; the user (or a contributor) has to capture the real fixtures.
  • PII redaction is high-stakes. Get it wrong, you commit secrets. Use multiple redaction passes; require a manual review step before any commit.
  • Some providers may have changed their format since the synthetic fixtures were written — those are exactly the regressions this finds.

Out of scope

  • Real-session capture from providers the maintainer doesn't have access to (contributors can fill in).
  • Stress / load testing — different effort.

Dependencies

  • None.

Estimated effort

Size M — agent does ~1-2 hr of test-infra + redaction-helper work; the real fixture capture is asynchronous and depends on the maintainer's machine state.

Hard rules

  • DO NOT touch versions / CHANGELOG headings.
  • ANY committed fixture must pass through the redaction helper FIRST.
  • Branch: feat/real-beta-fixtures off main.
  • Document explicitly which providers have real fixtures vs synthetic-only.

Metadata

Metadata

Assignees

No one assigned

    Labels

    size-m~1 hr agent runspecSpec/feature for an agent to implementwave-6Wave 6: sensitive / long-tail

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions