qa(v0.13.0): regression guards + 3 production-bug fixes by Hiksang · Pull Request #13 · hypurrquant/perp-cli

Hiksang · 2026-05-05T08:40:49Z

Summary

v0.13.0 (HL Outcome / HIP-4) 회귀 가드 + 3 production-bug fixes — qa/2026-05-05-v0.13.0-validation 사이클의 15 commits.
1323 → 1381 tests (+58), 3 신규 unit + 1 통합 test file, 호스트와 컨테이너 cross-validate (74 files / 23.09s).
5 helper 추출 (_computeUnderlying / _computeMidSum / _computeTimeStatus / _assertOutcomeRange / _trimBook) + 단위 35 case + 통합 8 case.
3 production Rule v0.2.1: scaled TP, pnl/liquidation alerts, bot fix #2 violations 수정 (단위 테스트 작성 중 발견) — NaN silent-pass on outcome / NaN propagation in midSum / silent empty book in getOrderbook.
회귀 가드 7 영역 추가 — README↔commander parity / LANDING_EXCHANGES drift / --dry-run venue 게이팅 / --json Zod schema / CI sync-skill-version --check / PR template Live↔Unit cross-validation.

Findings (7개 모두 처리)

README "Command Groups" 표 4 stale (agent / manage / dashboard / status) + 2 missing (outcome / health) → fix + 회귀 가드
outcome view 핵심 로직 unit test 0% → 43 case (35 helper + 8 SDK-mock 통합)
LANDING_EXCHANGES drift + exchangeLabel fallthrough footgun 가드 부재 → 회귀 가드
--dry-run venue 콜 차단 자동 가드 부재 → 4 path + positive control
--json envelope contract 자동 가드 부재 → Zod schema 11 case
SKILL.md ↔ package.json version drift → CI gate
Live↔Unit cross-validation process gate 부재 → PR template 추가

자세한 보고

docs/qa-reports/2026-05-05-v0.13.0-validation.md 전문 참조 (310+ 줄, 11 섹션 + 부록 A·B). 라이브 outcome view 응답 ↔ 단위 테스트 case field-by-field 매핑 포함.

Out of scope — 별도 사이클로 분리

numeric-validation-audit (P0 신설) — 3 production 결함이 helper-extract 패턴으로 발견된 점 follow-up. 다른 모듈에 동일 패턴 audit.
aster-signer-regression (P0 격상) — v0.12.16-18 active fragility 신호.
commander program-builder factory (P2 → P1) — KNOWN_TOP_LEVEL_GROUPS hand-maintained 부분 제거.
cross-adapter-matrix (P1) / failure-modes (P2) — 4-DEX mock 인프라 / 5xx·429·malformed.
사용자 승인 필요 dep: @vitest/coverage-v8 (coverage 정량 측정), fast-check (property test), @typescript-eslint/naming-convention (lint).

Test plan

pnpm test (unit suite): 1381 / 1381 passed (host + container 동일)
pnpm build (tsc strict): clean
mainnet 실거래 0건 — 모두 readonly + --dry-run (Section 3 / Section 7 준수)
컨테이너 정본 검증 (perp-qa Docker, GitHub clone): 74 files / 1381 tests / 23.09s
라이브 readonly: outcome view / health / arb scan / portfolio / 4 DEX market list 모두 OK
CI workflow 통과 확인 (typecheck + skill-sync --check + unit + integration with continue-on-error)
사람 review

QA Workflow Section 3 / 13 — 절대 금지 항목 준수

main 머지/push (이 PR 자체는 사용자 명시 승인) · npm publish ✗ · git tag ✗ · GitHub Release ✗ · mainnet 실거래 ✗ · deps 메이저 변경 ✗ · 새 npm 패키지 추가 ✗

🤖 Generated with Claude Code

AI 에이전트(Claude Code 등)가 perp-cli 레포에서 QA 작업을 수행할 때 따라야 하는 워크플로우/가드레일 정의. 13 섹션 + 한국어 보고 포맷. - Section 3: 명시적 사용자 승인 없이 절대 금지 (main 머지, npm publish, git tag, mainnet 실거래, deps major 업데이트 등) - Section 7: 보안 가드 (.env/PK/mnemonic stage 검증, signer abstraction 우회 금지, testnet PK 도 레포 커밋 금지) - Section 9: perp-cli 특화 (다중 어댑터 호환성, SKILL.md ↔ CLI 정합성) - Section 11: 한국어 구조화 보고 포맷 - Section 13: 우선순위 충돌 시 명시적 금지 > 보안 가드 > 사용자 지시 CLAUDE.md 에 요약 + 정본 포인터 추가 (CLAUDE.md 는 .gitignore). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

QA finding: README "Command Groups" table drifted from `perp --help` ground truth — 4 stale groups + 2 missing. ### Removed (already migrated, no longer top-level) - `manage` → `wallet manage ...` (margin mode, subaccount, API keys) - `dashboard` / `status` → folded into `portfolio` per --help description - `agent` → `wallet agent ...` (approve/list/revoke/rotate/verify) ### Added - `outcome` — Hyperliquid HIP-4 binary/range contracts (v0.13.0 new asset class, USDH-quoted, no leverage) - `health` — adapter health check across 4 DEX ### Description tightened - `wallet` — explicitly notes nested `wallet agent` / `wallet manage` so users do not look for the old top-level groups - `portfolio` — notes it replaces former `account balance` / `status` / `dashboard` - `alerts` — Telegram + Discord (was Telegram-only label) ### Core Commands section Adds "Outcome markets" examples (list / view / book / positions / orders / buy / sell / cancel) — already present in skills/perp-cli/references/ commands.md but missing from README. Verified via container at `qa/2026-05-05-v0.13.0-validation`: - `perp --help` enumerates exactly 17 groups; new README table = 17/17 match - `perp --json outcome view 2` returns symmetric Yes/No book + underlying gap (BTC binary live market) - `pnpm test` 1323/1323 passed (70 files) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

QA finding (v0.13.0): `outcome view` (last commit `edffc77`) ships the gap / inTheMoney classification on the live mainnet path with zero unit coverage — only `parseDescription` was tested in 0.13.0. Live verified once at release; no regression guard. ### Refactor (minimal) Extract `_computeUnderlying(parsed, allMids)` static helper from `getView()`. Pure function, no SDK dependency. `getView()` shrinks from ~30 lines of inline branching to a single helper call. ### New tests (9, total 16 → 25 in this file) - returns null when description has no underlying - priceBinary in-the-money (mark > target → "yes") - priceBinary out-of-the-money (mark < target → "no") - priceBinary edge: gap exactly 0 → "yes" (Yes = mark >= target convention) - non-priceBinary class: gap computed, classification suppressed (Rule #2) - missing class: classification suppressed - missing markPrice for symbol: gap/gapPct undefined, inTheMoney null - missing targetPrice: gap/gapPct undefined, inTheMoney null - underlying symbol uppercased before allMids lookup All 25 tests pass; tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…istry QA finding: exchange enumeration is duplicated between two places — `exchanges/registry.ts` (the SSOT for `-e` flag resolution) and `landing.ts:LANDING_EXCHANGES` (powers the no-arg `perp` landing page). Drift is silent: a new exchange registered in registry.ts will not appear on the landing page until a human notices. Plus `landing.ts:exchangeLabel()` is an inline ternary chain — if a 5th exchange is added to LANDING_EXCHANGES without updating that switch, the new entry silently falls through to the last arm's label ("Aster") on the landing page. This is the same class of bug as the v0.12.16-18 landing-page misclassifications. ### New tests (2, total 5 → 7 in this file) - LANDING_EXCHANGES.sort() === listExchanges().sort() (registry SSOT comparison; fails immediately if a registered exchange is missing from the landing enumeration) - distinct, non-empty label per LANDING_EXCHANGES member (catches inline-switch fallthrough by asserting Set(labels).size === LANDING_EXCHANGES.length) Both cover Section 9 of QA_WORKFLOW (multi-adapter consistency). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes a P0 gap from previous QA cycle: getView() ships midSum (binary symmetry invariant Yes+No≈1.0) and msToExpiry (Date.now-dependent) on the live mainnet path with zero unit coverage. ### Refactor (minimal, same `_helper` convention as _computeUnderlying) - Extract `_computeMidSum(sides)` static. Adds a small Rule #2 fix: previously `Number(mid)` could produce NaN from a malformed venue payload and propagate as `midSum: NaN` downstream. Helper now returns `undefined` ("symmetry unknown") for any non-finite impliedProb instead of emitting NaN. - Extract `_computeTimeStatus(expiryMs, nowMs)`. `nowMs` is injected so callers can pin the clock under test instead of patching Date.now globally. Does NOT clamp negatives — expired-UX classification is the caller's job, not the helper's (Rule #2). ### New tests (9, total 25 → 34 in this file) midSum (5): - healthy binary sums to ~1.0 - one side missing impliedProb → undefined - any side NaN/Infinity → undefined (no NaN propagation) - empty side list → undefined - preserves arithmetic faithfully (sum < 1 / > 1 are both surfaced) timeStatus (4): - now < expiry → positive ms - now === expiry → 0 - now > expiry → negative ms (caller classifies expired UX) - expiryMs undefined → msToExpiry undefined Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sion) QA finding from previous cycle: v0.13.0 README "Command Groups" table drifted from `perp --help` ground truth — 4 stale rows (`agent`, `manage`, `dashboard`, `status`) + 2 missing (`outcome`, `health`). Fixed in d6c9fe0 but no regression guard. ### Approach Hand-maintained `KNOWN_TOP_LEVEL_GROUPS` SSOT constant in this test file + README markdown-table parser. Adding a new top-level group fails this test until BOTH places are updated; deleting a row fails until the constant is also updated. This is the 80/20 guard. The deeper SSOT collapse — deriving the list dynamically from a Commander program-builder factory so a single source covers index.ts + README + this test — is filed as a follow-up P2 item (needs a small refactor of src/index.ts). ### New tests (3, new file `src/__tests__/readme-cli-sync.test.ts`) - README group rows == KNOWN_TOP_LEVEL_GROUPS (set equality) - README table has no duplicate rows - README table preserves SSOT order (strict — guards silent reshuffling that would break agent docs / skill bundles relying on order) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P0 from QA cycle review: Section 7 ("거래 관련 커맨드는 반드시 --testnet 플래그 또는 mock signer 로만 실행. mainnet 실행 금지") had no automated guard. The --dry-run gate is correct in source but a single missed `dryRunGuard()` call in a future trade subcommand would silently leak a venue order. This test pins that contract. ### Approach Integration-style: stub `execution-log` + `client-id-tracker` modules (file IO), inject a mock ExchangeAdapter where every venue method is a spy, parse `trade ...` argv through Commander, then assert the venue methods received zero calls. A positive control test runs the same path without --dry-run and confirms `adapter.marketOrder("BTC", "buy", "0.01")` IS called once — without it, the negative tests could be green because of a wiring bug rather than because gating works. ### New tests (5, new file `src/__tests__/commands/trade-dry-run-gating.test.ts`) - `trade market BTC buy 0.01 --dry-run` → marketOrder NOT called - `trade market BTC sell 0.01 --dry-run` → marketOrder NOT called - `trade buy BTC 0.01 --dry-run` (shortcut) → marketOrder NOT called - `trade sell BTC 0.01 --dry-run` (shortcut) → marketOrder NOT called - positive control: same `trade market` path WITHOUT --dry-run reaches marketOrder exactly once with the expected args ### Out of scope (deferred) - limit / split / close / cancel / outcome buy / outcome sell — same pattern, follow-up commit. - 4-DEX matrix (PAC / LT / Aster) — adapter mock is HL-shaped here; follow-up cycle covers cross-exchange. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P0 from review: getView() depth handling and outcome/side gating shipped without unit coverage. Tightens both with explicit Rule #2 gates instead of silent slice() coercion. ### Refactor - Extract `_assertOutcomeRange(outcome, side)` static — pure arithmetic gate. The previous private method had a small ordering bug: NaN outcomes slipped past the `encoding > MAX_ENCODING` post-check because `Number.isInteger(NaN)=false` was only checked for `side`, not for `outcome` paired with `encoding=NaN`. Now rejected explicitly. - Extract `_trimBook(book, depth)` static — depth is now strictly validated (Number.isInteger + depth >= 0). Previously `slice(0, NaN)` silently returned [] and `slice(0, -1)` silently dropped the last entry. Both are caller bugs, not "depth=10 default". - Throws EXCHANGE_ERROR when the venue payload is missing bids/asks array — previously crashed with TypeError or fabricated empty book. `getView()` and `_validateOutcomeSide` updated to compose these helpers. ### New tests (11, total 34 → 45 in this file) _assertOutcomeRange (4): - valid range + boundary (9_999_999, 9 → MAX_ENCODING) - outcome NaN / -1 / 0.5 / Infinity rejected (the silent-pass bug) - side outside 0..9 rejected (with friendly error message) - encoding overflow on outcome=10_000_000 → 100_000_000 rejected _trimBook (7): - normal trim with bestBid/bestAsk - depth=0 → empty + undefined best prices - depth larger than book → full book, no padding/error - negative depth rejected (previously dropped last entry silently) - NaN / Infinity / fractional depth rejected - malformed payload (missing bids or asks) → EXCHANGE_ERROR - empty book passes through cleanly Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P1 from review: the public --json envelope is what every external agent (MCP, scripts, Claude Code) parses against. Drift in `jsonOk` / `jsonError` shape would silently break consumers pinned to the old shape. ### Approach Zod schema (EnvelopeOk / EnvelopeErr / Envelope union) mirroring ApiResponse in src/utils.ts. 11 cases covering: - ok envelope with object / array / null / undefined data - timestamp ISO-8601 regex - meta merge (exchange, duration_ms) - error envelope minimal (code + message only) - error envelope with full agent-actionable payload (status/retryable/ retryAfterMs/remediation/details) - mutual exclusivity of ok=true and ok=false on the same parse - malformed: ok envelope missing meta.timestamp → reject - malformed: error without `code` → reject - live `outcome view 2` response shape (Appendix B regression — captures the exact shape that ships at v0.13.0 so future envelope changes break this test instead of agents in the field) ### Out of scope (deferred) - Per-command schemas (outcome view, market list, portfolio, …) — this commit guards the envelope wrapper, not each `data` shape. Per-command schemas are a worthwhile follow-up but explode test count; deferred to a separate cycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P2 from review: scripts/sync-skill-version.mjs ran only at prepublishOnly time, so a PR that bumps package.json version without re-running the sync would silently merge with SKILL.md still pointing at the old version. The npm package would be fine (prepublishOnly covers it) but the GitHub-checkout install path (used by Docker QA and agent skill installers reading from main) would ship a stale version field. ### Changes - scripts/sync-skill-version.mjs: add --check mode that exits non-zero on drift instead of writing. Default (no flag) behavior unchanged. - .github/workflows/ci.yml: add a new "SKILL.md version sync check" step between typecheck and unit tests that runs the script in --check mode. Verified locally: `node scripts/sync-skill-version.mjs --check` exits 0 when synced (current state at 0.13.0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Updates the cycle report to reflect the full 10-commit / 50-test expansion (initial 4 commits + 6 commits from "전체 진행" phase 1-3). ### Highlights - 1323 → 1373 tests, 70 → 73 files - 6 findings (vs. 3 in the initial report) — adds dry-run gating gap, envelope contract gap, SKILL.md sync CI gap - 사람 검토 필요 항목 expanded to 5 (helper underscore convention, regex fragility, outcome status alias, hand-maintained KNOWN list, getView integration coverage) - 다음 권장 액션 split into 4 buckets (이번 PR / 다음 사이클 분리 / micro-PR / refactor) with explicit branch names and dependencies ### Appendix B — live ↔ unit cross-validation The live `outcome view 2` mainnet response is now mapped field-by-field to the matching unit test case (gap / gapPct / inTheMoney / midSum / encoding / envelope schema). This is the cross-validation pattern the user explicitly asked to keep across QA cycles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes 사람 검토 항목 #5 (격상 권장 from cycle review): 35 helper unit tests cover the math but cannot catch a regression where Hyperliquid's `/info` payload shape changes. This file mocks `_infoPost` at the adapter level and asserts the assembled `OutcomeView` against the expected shape end-to-end. ### Test pattern - Inject a stub HyperliquidAdapter (only `isTestnet` needed for the paths that getView exercises). - `vi.spyOn(adapter as any, "_infoPost")` routes by `body.type`: outcomeMeta / allMids / l2Book(coin) → controllable mocks. - `vi.useFakeTimers` + `setSystemTime` pin the clock so msToExpiry is deterministic (60_000 ms before the BTC binary expiry). ### New tests (7, new file `hyperliquid-outcome-getView.test.ts`) - assembles OutcomeView from outcomeMeta + allMids + l2Book responses (live BTC binary shape; verifies all 5 helper outputs combined) - propagates depth into bids/asks length - throws SYMBOL_NOT_FOUND for unknown outcome id - throws EXCHANGE_ERROR when outcomeMeta payload shape changes (no `outcomes` field) - returns gap/inTheMoney undefined/null when allMids drops the underlying perp symbol (live HL cache-miss simulation) - returns midSum undefined when allMids drops a side's encoding - RECORDS a known Rule #2 violation: getOrderbook silently substitutes [[],[]] when l2Book omits `levels` — pinned for follow-up fix ### Follow-up The last test pins a Rule #2 violation in `getOrderbook` (line 419, `book?.levels ?? [[], []]`) — currently silent-empty-book on malformed payload. Will be fixed in a follow-up commit that flips the assertion from "silent empty" to "throws EXCHANGE_ERROR". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Production bug #3 surfaced by the getView integration test (#22b9286 follow-up): `getOrderbook` did const levels = book?.levels ?? [[], []]; which silently substituted an empty book whenever the venue payload was missing the `levels` field — a Rule #2 violation that masks upstream contract breaks as "no resting orders". This is the same class of bug as the NaN propagation in `_computeMidSum` and the outcome NaN silent-pass in `_validateOutcomeSide` (both fixed earlier in this cycle). Three Rule #2 issues found via the helper-extract + unit-test pattern in a single audit pass — supports the user's hypothesis that other modules without this treatment likely harbor more. ### Change Reject when `book.levels` is missing OR not a 2-tuple of arrays. Throws EXCHANGE_ERROR with a coin-tagged message so the caller (CLI / agent) can attribute the failure to a specific outcome side. ### Test Flips the previously "RECORDS silent empty" assertion to actively verify the throw, plus a new case for `levels` being a length-1 tuple (also rejected — partial venue payload is not "valid empty book"). Test count: 35 + 7 → 35 + 8 in the outcome suite. All other tests unaffected (`book.levels` is the actual production field; previous silent fallback was unreachable for healthy responses). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…son changes Process gate from cycle review: v0.13.0's `outcome view` shipped with zero unit-test coverage on getView's gap/inTheMoney/midSum/msToExpiry — caught only because a manual QA cycle ran. Make this catch automatic at PR-review time. ### Change - Adds an "If applicable" checkbox: Live ↔ Unit cross-validation table required for any change that adds/modifies a `--json` command or affects envelope shape. - Adds a "Live ↔ Unit Test Cross-Validation" section with a Korean comment block explaining when it's required and what to do if not applicable (one-line N/A reason). - Pattern lifted directly from docs/qa-reports/2026-05-05 Appendix B. PR authors with envelope-touching changes now have to either: (a) capture a live response + map fields to unit-test cases, or (b) explicitly state why N/A. Either way, drift between live behavior and tested behavior is visible at review time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Full revision after the "전체 진행" cycle (initial 4 commits → 14 commits, 1323 → 1381 tests). New shape mirrors the user's review feedback: ### New / expanded sections - "테스트 작성 중 발견된 production 결함" (new) — promotes the 3 Rule #2 violations (NaN silent-pass, NaN propagation, silent empty book) out of the "Changed interfaces" footnote and into a first- class section. Three bugs found in one audit pass via the same helper-extract pattern is the user's "tip of the iceberg" signal. - Findings 4-7 added (dry-run gating gap, envelope contract gap, SKILL.md sync gap, live-↔-unit cross-validation process gate). - "다음 권장 액션" reorganized per user's cycle-review priority reordering: numeric-validation-audit (new P0), aster-signer- regression (P3 → P0), program-builder factory (P2 → P1). - Coverage section explicitly notes `@vitest/coverage-v8` is missing (Section 3 user approval needed — filed as micro-PR). - 사람 검토 항목 #5 (getView integration coverage) flagged as closed by 22b9286. - Live-↔-unit cross-validation table extended to include the getView integration test row. ### Verified - Container ground-truth: 74 files / 1381 tests passed at HEAD `49392da`, 23.09s. Reflects all 14 cycle commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Hiksang and others added 15 commits May 5, 2026 13:40

Hiksang merged commit 1d0e961 into main May 5, 2026
4 checks passed

Hiksang deleted the qa/2026-05-05-v0.13.0-validation branch May 5, 2026 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qa(v0.13.0): regression guards + 3 production-bug fixes#13

qa(v0.13.0): regression guards + 3 production-bug fixes#13
Hiksang merged 15 commits into
mainfrom
qa/2026-05-05-v0.13.0-validation

Hiksang commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Hiksang commented May 5, 2026

Summary

Findings (7개 모두 처리)

자세한 보고

Out of scope — 별도 사이클로 분리

Test plan

QA Workflow Section 3 / 13 — 절대 금지 항목 준수

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant