qa(v0.13.0): regression guards + 3 production-bug fixes#13
Merged
Conversation
AI 에이전트(Claude Code 등)가 perp-cli 레포에서 QA 작업을 수행할 때 따라야 하는 워크플로우/가드레일 정의. 13 섹션 + 한국어 보고 포맷. - Section 3: 명시적 사용자 승인 없이 절대 금지 (main 머지, npm publish, git tag, mainnet 실거래, deps major 업데이트 등) - Section 7: 보안 가드 (.env/PK/mnemonic stage 검증, signer abstraction 우회 금지, testnet PK 도 레포 커밋 금지) - Section 9: perp-cli 특화 (다중 어댑터 호환성, SKILL.md ↔ CLI 정합성) - Section 11: 한국어 구조화 보고 포맷 - Section 13: 우선순위 충돌 시 명시적 금지 > 보안 가드 > 사용자 지시 CLAUDE.md 에 요약 + 정본 포인터 추가 (CLAUDE.md 는 .gitignore). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
QA finding: README "Command Groups" table drifted from `perp --help` ground truth — 4 stale groups + 2 missing. ### Removed (already migrated, no longer top-level) - `manage` → `wallet manage ...` (margin mode, subaccount, API keys) - `dashboard` / `status` → folded into `portfolio` per --help description - `agent` → `wallet agent ...` (approve/list/revoke/rotate/verify) ### Added - `outcome` — Hyperliquid HIP-4 binary/range contracts (v0.13.0 new asset class, USDH-quoted, no leverage) - `health` — adapter health check across 4 DEX ### Description tightened - `wallet` — explicitly notes nested `wallet agent` / `wallet manage` so users do not look for the old top-level groups - `portfolio` — notes it replaces former `account balance` / `status` / `dashboard` - `alerts` — Telegram + Discord (was Telegram-only label) ### Core Commands section Adds "Outcome markets" examples (list / view / book / positions / orders / buy / sell / cancel) — already present in skills/perp-cli/references/ commands.md but missing from README. Verified via container at `qa/2026-05-05-v0.13.0-validation`: - `perp --help` enumerates exactly 17 groups; new README table = 17/17 match - `perp --json outcome view 2` returns symmetric Yes/No book + underlying gap (BTC binary live market) - `pnpm test` 1323/1323 passed (70 files) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
QA finding (v0.13.0): `outcome view` (last commit `edffc77`) ships the gap / inTheMoney classification on the live mainnet path with zero unit coverage — only `parseDescription` was tested in 0.13.0. Live verified once at release; no regression guard. ### Refactor (minimal) Extract `_computeUnderlying(parsed, allMids)` static helper from `getView()`. Pure function, no SDK dependency. `getView()` shrinks from ~30 lines of inline branching to a single helper call. ### New tests (9, total 16 → 25 in this file) - returns null when description has no underlying - priceBinary in-the-money (mark > target → "yes") - priceBinary out-of-the-money (mark < target → "no") - priceBinary edge: gap exactly 0 → "yes" (Yes = mark >= target convention) - non-priceBinary class: gap computed, classification suppressed (Rule #2) - missing class: classification suppressed - missing markPrice for symbol: gap/gapPct undefined, inTheMoney null - missing targetPrice: gap/gapPct undefined, inTheMoney null - underlying symbol uppercased before allMids lookup All 25 tests pass; tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…istry
QA finding: exchange enumeration is duplicated between two places —
`exchanges/registry.ts` (the SSOT for `-e` flag resolution) and
`landing.ts:LANDING_EXCHANGES` (powers the no-arg `perp` landing page).
Drift is silent: a new exchange registered in registry.ts will not
appear on the landing page until a human notices.
Plus `landing.ts:exchangeLabel()` is an inline ternary chain — if a 5th
exchange is added to LANDING_EXCHANGES without updating that switch,
the new entry silently falls through to the last arm's label ("Aster")
on the landing page. This is the same class of bug as the v0.12.16-18
landing-page misclassifications.
### New tests (2, total 5 → 7 in this file)
- LANDING_EXCHANGES.sort() === listExchanges().sort()
(registry SSOT comparison; fails immediately if a registered exchange
is missing from the landing enumeration)
- distinct, non-empty label per LANDING_EXCHANGES member
(catches inline-switch fallthrough by asserting Set(labels).size ===
LANDING_EXCHANGES.length)
Both cover Section 9 of QA_WORKFLOW (multi-adapter consistency).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes a P0 gap from previous QA cycle: getView() ships midSum (binary symmetry invariant Yes+No≈1.0) and msToExpiry (Date.now-dependent) on the live mainnet path with zero unit coverage. ### Refactor (minimal, same `_helper` convention as _computeUnderlying) - Extract `_computeMidSum(sides)` static. Adds a small Rule #2 fix: previously `Number(mid)` could produce NaN from a malformed venue payload and propagate as `midSum: NaN` downstream. Helper now returns `undefined` ("symmetry unknown") for any non-finite impliedProb instead of emitting NaN. - Extract `_computeTimeStatus(expiryMs, nowMs)`. `nowMs` is injected so callers can pin the clock under test instead of patching Date.now globally. Does NOT clamp negatives — expired-UX classification is the caller's job, not the helper's (Rule #2). ### New tests (9, total 25 → 34 in this file) midSum (5): - healthy binary sums to ~1.0 - one side missing impliedProb → undefined - any side NaN/Infinity → undefined (no NaN propagation) - empty side list → undefined - preserves arithmetic faithfully (sum < 1 / > 1 are both surfaced) timeStatus (4): - now < expiry → positive ms - now === expiry → 0 - now > expiry → negative ms (caller classifies expired UX) - expiryMs undefined → msToExpiry undefined Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sion) QA finding from previous cycle: v0.13.0 README "Command Groups" table drifted from `perp --help` ground truth — 4 stale rows (`agent`, `manage`, `dashboard`, `status`) + 2 missing (`outcome`, `health`). Fixed in d6c9fe0 but no regression guard. ### Approach Hand-maintained `KNOWN_TOP_LEVEL_GROUPS` SSOT constant in this test file + README markdown-table parser. Adding a new top-level group fails this test until BOTH places are updated; deleting a row fails until the constant is also updated. This is the 80/20 guard. The deeper SSOT collapse — deriving the list dynamically from a Commander program-builder factory so a single source covers index.ts + README + this test — is filed as a follow-up P2 item (needs a small refactor of src/index.ts). ### New tests (3, new file `src/__tests__/readme-cli-sync.test.ts`) - README group rows == KNOWN_TOP_LEVEL_GROUPS (set equality) - README table has no duplicate rows - README table preserves SSOT order (strict — guards silent reshuffling that would break agent docs / skill bundles relying on order) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P0 from QA cycle review: Section 7 ("거래 관련 커맨드는 반드시
--testnet 플래그 또는 mock signer 로만 실행. mainnet 실행 금지") had
no automated guard. The --dry-run gate is correct in source but a
single missed `dryRunGuard()` call in a future trade subcommand would
silently leak a venue order. This test pins that contract.
### Approach
Integration-style: stub `execution-log` + `client-id-tracker` modules
(file IO), inject a mock ExchangeAdapter where every venue method is a
spy, parse `trade ...` argv through Commander, then assert the venue
methods received zero calls.
A positive control test runs the same path without --dry-run and
confirms `adapter.marketOrder("BTC", "buy", "0.01")` IS called once —
without it, the negative tests could be green because of a wiring bug
rather than because gating works.
### New tests (5, new file `src/__tests__/commands/trade-dry-run-gating.test.ts`)
- `trade market BTC buy 0.01 --dry-run` → marketOrder NOT called
- `trade market BTC sell 0.01 --dry-run` → marketOrder NOT called
- `trade buy BTC 0.01 --dry-run` (shortcut) → marketOrder NOT called
- `trade sell BTC 0.01 --dry-run` (shortcut) → marketOrder NOT called
- positive control: same `trade market` path WITHOUT --dry-run reaches
marketOrder exactly once with the expected args
### Out of scope (deferred)
- limit / split / close / cancel / outcome buy / outcome sell — same
pattern, follow-up commit.
- 4-DEX matrix (PAC / LT / Aster) — adapter mock is HL-shaped here;
follow-up cycle covers cross-exchange.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P0 from review: getView() depth handling and outcome/side gating shipped without unit coverage. Tightens both with explicit Rule #2 gates instead of silent slice() coercion. ### Refactor - Extract `_assertOutcomeRange(outcome, side)` static — pure arithmetic gate. The previous private method had a small ordering bug: NaN outcomes slipped past the `encoding > MAX_ENCODING` post-check because `Number.isInteger(NaN)=false` was only checked for `side`, not for `outcome` paired with `encoding=NaN`. Now rejected explicitly. - Extract `_trimBook(book, depth)` static — depth is now strictly validated (Number.isInteger + depth >= 0). Previously `slice(0, NaN)` silently returned [] and `slice(0, -1)` silently dropped the last entry. Both are caller bugs, not "depth=10 default". - Throws EXCHANGE_ERROR when the venue payload is missing bids/asks array — previously crashed with TypeError or fabricated empty book. `getView()` and `_validateOutcomeSide` updated to compose these helpers. ### New tests (11, total 34 → 45 in this file) _assertOutcomeRange (4): - valid range + boundary (9_999_999, 9 → MAX_ENCODING) - outcome NaN / -1 / 0.5 / Infinity rejected (the silent-pass bug) - side outside 0..9 rejected (with friendly error message) - encoding overflow on outcome=10_000_000 → 100_000_000 rejected _trimBook (7): - normal trim with bestBid/bestAsk - depth=0 → empty + undefined best prices - depth larger than book → full book, no padding/error - negative depth rejected (previously dropped last entry silently) - NaN / Infinity / fractional depth rejected - malformed payload (missing bids or asks) → EXCHANGE_ERROR - empty book passes through cleanly Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1 from review: the public --json envelope is what every external agent (MCP, scripts, Claude Code) parses against. Drift in `jsonOk` / `jsonError` shape would silently break consumers pinned to the old shape. ### Approach Zod schema (EnvelopeOk / EnvelopeErr / Envelope union) mirroring ApiResponse in src/utils.ts. 11 cases covering: - ok envelope with object / array / null / undefined data - timestamp ISO-8601 regex - meta merge (exchange, duration_ms) - error envelope minimal (code + message only) - error envelope with full agent-actionable payload (status/retryable/ retryAfterMs/remediation/details) - mutual exclusivity of ok=true and ok=false on the same parse - malformed: ok envelope missing meta.timestamp → reject - malformed: error without `code` → reject - live `outcome view 2` response shape (Appendix B regression — captures the exact shape that ships at v0.13.0 so future envelope changes break this test instead of agents in the field) ### Out of scope (deferred) - Per-command schemas (outcome view, market list, portfolio, …) — this commit guards the envelope wrapper, not each `data` shape. Per-command schemas are a worthwhile follow-up but explode test count; deferred to a separate cycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P2 from review: scripts/sync-skill-version.mjs ran only at prepublishOnly time, so a PR that bumps package.json version without re-running the sync would silently merge with SKILL.md still pointing at the old version. The npm package would be fine (prepublishOnly covers it) but the GitHub-checkout install path (used by Docker QA and agent skill installers reading from main) would ship a stale version field. ### Changes - scripts/sync-skill-version.mjs: add --check mode that exits non-zero on drift instead of writing. Default (no flag) behavior unchanged. - .github/workflows/ci.yml: add a new "SKILL.md version sync check" step between typecheck and unit tests that runs the script in --check mode. Verified locally: `node scripts/sync-skill-version.mjs --check` exits 0 when synced (current state at 0.13.0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the cycle report to reflect the full 10-commit / 50-test expansion (initial 4 commits + 6 commits from "전체 진행" phase 1-3). ### Highlights - 1323 → 1373 tests, 70 → 73 files - 6 findings (vs. 3 in the initial report) — adds dry-run gating gap, envelope contract gap, SKILL.md sync CI gap - 사람 검토 필요 항목 expanded to 5 (helper underscore convention, regex fragility, outcome status alias, hand-maintained KNOWN list, getView integration coverage) - 다음 권장 액션 split into 4 buckets (이번 PR / 다음 사이클 분리 / micro-PR / refactor) with explicit branch names and dependencies ### Appendix B — live ↔ unit cross-validation The live `outcome view 2` mainnet response is now mapped field-by-field to the matching unit test case (gap / gapPct / inTheMoney / midSum / encoding / envelope schema). This is the cross-validation pattern the user explicitly asked to keep across QA cycles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes 사람 검토 항목 #5 (격상 권장 from cycle review): 35 helper unit tests cover the math but cannot catch a regression where Hyperliquid's `/info` payload shape changes. This file mocks `_infoPost` at the adapter level and asserts the assembled `OutcomeView` against the expected shape end-to-end. ### Test pattern - Inject a stub HyperliquidAdapter (only `isTestnet` needed for the paths that getView exercises). - `vi.spyOn(adapter as any, "_infoPost")` routes by `body.type`: outcomeMeta / allMids / l2Book(coin) → controllable mocks. - `vi.useFakeTimers` + `setSystemTime` pin the clock so msToExpiry is deterministic (60_000 ms before the BTC binary expiry). ### New tests (7, new file `hyperliquid-outcome-getView.test.ts`) - assembles OutcomeView from outcomeMeta + allMids + l2Book responses (live BTC binary shape; verifies all 5 helper outputs combined) - propagates depth into bids/asks length - throws SYMBOL_NOT_FOUND for unknown outcome id - throws EXCHANGE_ERROR when outcomeMeta payload shape changes (no `outcomes` field) - returns gap/inTheMoney undefined/null when allMids drops the underlying perp symbol (live HL cache-miss simulation) - returns midSum undefined when allMids drops a side's encoding - RECORDS a known Rule #2 violation: getOrderbook silently substitutes [[],[]] when l2Book omits `levels` — pinned for follow-up fix ### Follow-up The last test pins a Rule #2 violation in `getOrderbook` (line 419, `book?.levels ?? [[], []]`) — currently silent-empty-book on malformed payload. Will be fixed in a follow-up commit that flips the assertion from "silent empty" to "throws EXCHANGE_ERROR". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Production bug #3 surfaced by the getView integration test (#22b9286 follow-up): `getOrderbook` did const levels = book?.levels ?? [[], []]; which silently substituted an empty book whenever the venue payload was missing the `levels` field — a Rule #2 violation that masks upstream contract breaks as "no resting orders". This is the same class of bug as the NaN propagation in `_computeMidSum` and the outcome NaN silent-pass in `_validateOutcomeSide` (both fixed earlier in this cycle). Three Rule #2 issues found via the helper-extract + unit-test pattern in a single audit pass — supports the user's hypothesis that other modules without this treatment likely harbor more. ### Change Reject when `book.levels` is missing OR not a 2-tuple of arrays. Throws EXCHANGE_ERROR with a coin-tagged message so the caller (CLI / agent) can attribute the failure to a specific outcome side. ### Test Flips the previously "RECORDS silent empty" assertion to actively verify the throw, plus a new case for `levels` being a length-1 tuple (also rejected — partial venue payload is not "valid empty book"). Test count: 35 + 7 → 35 + 8 in the outcome suite. All other tests unaffected (`book.levels` is the actual production field; previous silent fallback was unreachable for healthy responses). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…son changes Process gate from cycle review: v0.13.0's `outcome view` shipped with zero unit-test coverage on getView's gap/inTheMoney/midSum/msToExpiry — caught only because a manual QA cycle ran. Make this catch automatic at PR-review time. ### Change - Adds an "If applicable" checkbox: Live ↔ Unit cross-validation table required for any change that adds/modifies a `--json` command or affects envelope shape. - Adds a "Live ↔ Unit Test Cross-Validation" section with a Korean comment block explaining when it's required and what to do if not applicable (one-line N/A reason). - Pattern lifted directly from docs/qa-reports/2026-05-05 Appendix B. PR authors with envelope-touching changes now have to either: (a) capture a live response + map fields to unit-test cases, or (b) explicitly state why N/A. Either way, drift between live behavior and tested behavior is visible at review time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Full revision after the "전체 진행" cycle (initial 4 commits → 14 commits, 1323 → 1381 tests). New shape mirrors the user's review feedback: ### New / expanded sections - "테스트 작성 중 발견된 production 결함" (new) — promotes the 3 Rule #2 violations (NaN silent-pass, NaN propagation, silent empty book) out of the "Changed interfaces" footnote and into a first- class section. Three bugs found in one audit pass via the same helper-extract pattern is the user's "tip of the iceberg" signal. - Findings 4-7 added (dry-run gating gap, envelope contract gap, SKILL.md sync gap, live-↔-unit cross-validation process gate). - "다음 권장 액션" reorganized per user's cycle-review priority reordering: numeric-validation-audit (new P0), aster-signer- regression (P3 → P0), program-builder factory (P2 → P1). - Coverage section explicitly notes `@vitest/coverage-v8` is missing (Section 3 user approval needed — filed as micro-PR). - 사람 검토 항목 #5 (getView integration coverage) flagged as closed by 22b9286. - Live-↔-unit cross-validation table extended to include the getView integration test row. ### Verified - Container ground-truth: 74 files / 1381 tests passed at HEAD `49392da`, 23.09s. Reflects all 14 cycle commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
qa/2026-05-05-v0.13.0-validation사이클의 15 commits._computeUnderlying/_computeMidSum/_computeTimeStatus/_assertOutcomeRange/_trimBook) + 단위 35 case + 통합 8 case.getOrderbook.--dry-runvenue 게이팅 /--jsonZod schema / CI sync-skill-version--check/ PR template Live↔Unit cross-validation.Findings (7개 모두 처리)
agent/manage/dashboard/status) + 2 missing (outcome/health) → fix + 회귀 가드outcome view핵심 로직 unit test 0% → 43 case (35 helper + 8 SDK-mock 통합)LANDING_EXCHANGESdrift +exchangeLabelfallthrough footgun 가드 부재 → 회귀 가드--dry-runvenue 콜 차단 자동 가드 부재 → 4 path + positive control--jsonenvelope contract 자동 가드 부재 → Zod schema 11 case자세한 보고
docs/qa-reports/2026-05-05-v0.13.0-validation.md전문 참조 (310+ 줄, 11 섹션 + 부록 A·B). 라이브outcome view응답 ↔ 단위 테스트 case field-by-field 매핑 포함.Out of scope — 별도 사이클로 분리
KNOWN_TOP_LEVEL_GROUPShand-maintained 부분 제거.@vitest/coverage-v8(coverage 정량 측정),fast-check(property test),@typescript-eslint/naming-convention(lint).Test plan
pnpm test(unit suite): 1381 / 1381 passed (host + container 동일)pnpm build(tsc strict): clean--dry-run(Section 3 / Section 7 준수)perp-qaDocker, GitHub clone): 74 files / 1381 tests / 23.09soutcome view/health/arb scan/portfolio/ 4 DEXmarket list모두 OK--check+ unit + integration withcontinue-on-error)QA Workflow Section 3 / 13 — 절대 금지 항목 준수
main 머지/push (이 PR 자체는 사용자 명시 승인) · npm publish ✗ · git tag ✗ · GitHub Release ✗ · mainnet 실거래 ✗ · deps 메이저 변경 ✗ · 새 npm 패키지 추가 ✗
🤖 Generated with Claude Code