Skip to content

qa(v0.13.0): regression guards + 3 production-bug fixes#13

Merged
Hiksang merged 15 commits into
mainfrom
qa/2026-05-05-v0.13.0-validation
May 5, 2026
Merged

qa(v0.13.0): regression guards + 3 production-bug fixes#13
Hiksang merged 15 commits into
mainfrom
qa/2026-05-05-v0.13.0-validation

Conversation

@Hiksang
Copy link
Copy Markdown
Collaborator

@Hiksang Hiksang commented May 5, 2026

Summary

  • v0.13.0 (HL Outcome / HIP-4) 회귀 가드 + 3 production-bug fixesqa/2026-05-05-v0.13.0-validation 사이클의 15 commits.
  • 1323 → 1381 tests (+58), 3 신규 unit + 1 통합 test file, 호스트와 컨테이너 cross-validate (74 files / 23.09s).
  • 5 helper 추출 (_computeUnderlying / _computeMidSum / _computeTimeStatus / _assertOutcomeRange / _trimBook) + 단위 35 case + 통합 8 case.
  • 3 production Rule v0.2.1: scaled TP, pnl/liquidation alerts, bot fix #2 violations 수정 (단위 테스트 작성 중 발견) — NaN silent-pass on outcome / NaN propagation in midSum / silent empty book in getOrderbook.
  • 회귀 가드 7 영역 추가 — README↔commander parity / LANDING_EXCHANGES drift / --dry-run venue 게이팅 / --json Zod schema / CI sync-skill-version --check / PR template Live↔Unit cross-validation.

Findings (7개 모두 처리)

  1. README "Command Groups" 표 4 stale (agent / manage / dashboard / status) + 2 missing (outcome / health) → fix + 회귀 가드
  2. outcome view 핵심 로직 unit test 0% → 43 case (35 helper + 8 SDK-mock 통합)
  3. LANDING_EXCHANGES drift + exchangeLabel fallthrough footgun 가드 부재 → 회귀 가드
  4. --dry-run venue 콜 차단 자동 가드 부재 → 4 path + positive control
  5. --json envelope contract 자동 가드 부재 → Zod schema 11 case
  6. SKILL.md ↔ package.json version drift → CI gate
  7. Live↔Unit cross-validation process gate 부재 → PR template 추가

자세한 보고

docs/qa-reports/2026-05-05-v0.13.0-validation.md 전문 참조 (310+ 줄, 11 섹션 + 부록 A·B). 라이브 outcome view 응답 ↔ 단위 테스트 case field-by-field 매핑 포함.

Out of scope — 별도 사이클로 분리

  • numeric-validation-audit (P0 신설) — 3 production 결함이 helper-extract 패턴으로 발견된 점 follow-up. 다른 모듈에 동일 패턴 audit.
  • aster-signer-regression (P0 격상) — v0.12.16-18 active fragility 신호.
  • commander program-builder factory (P2 → P1)KNOWN_TOP_LEVEL_GROUPS hand-maintained 부분 제거.
  • cross-adapter-matrix (P1) / failure-modes (P2) — 4-DEX mock 인프라 / 5xx·429·malformed.
  • 사용자 승인 필요 dep: @vitest/coverage-v8 (coverage 정량 측정), fast-check (property test), @typescript-eslint/naming-convention (lint).

Test plan

  • pnpm test (unit suite): 1381 / 1381 passed (host + container 동일)
  • pnpm build (tsc strict): clean
  • mainnet 실거래 0건 — 모두 readonly + --dry-run (Section 3 / Section 7 준수)
  • 컨테이너 정본 검증 (perp-qa Docker, GitHub clone): 74 files / 1381 tests / 23.09s
  • 라이브 readonly: outcome view / health / arb scan / portfolio / 4 DEX market list 모두 OK
  • CI workflow 통과 확인 (typecheck + skill-sync --check + unit + integration with continue-on-error)
  • 사람 review

QA Workflow Section 3 / 13 — 절대 금지 항목 준수

main 머지/push (이 PR 자체는 사용자 명시 승인) · npm publish ✗ · git tag ✗ · GitHub Release ✗ · mainnet 실거래 ✗ · deps 메이저 변경 ✗ · 새 npm 패키지 추가 ✗

🤖 Generated with Claude Code

Hiksang and others added 15 commits May 5, 2026 13:40
AI 에이전트(Claude Code 등)가 perp-cli 레포에서 QA 작업을 수행할 때
따라야 하는 워크플로우/가드레일 정의. 13 섹션 + 한국어 보고 포맷.

- Section 3: 명시적 사용자 승인 없이 절대 금지 (main 머지, npm publish,
  git tag, mainnet 실거래, deps major 업데이트 등)
- Section 7: 보안 가드 (.env/PK/mnemonic stage 검증, signer abstraction
  우회 금지, testnet PK 도 레포 커밋 금지)
- Section 9: perp-cli 특화 (다중 어댑터 호환성, SKILL.md ↔ CLI 정합성)
- Section 11: 한국어 구조화 보고 포맷
- Section 13: 우선순위 충돌 시 명시적 금지 > 보안 가드 > 사용자 지시

CLAUDE.md 에 요약 + 정본 포인터 추가 (CLAUDE.md 는 .gitignore).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
QA finding: README "Command Groups" table drifted from `perp --help`
ground truth — 4 stale groups + 2 missing.

### Removed (already migrated, no longer top-level)
- `manage`  → `wallet manage ...` (margin mode, subaccount, API keys)
- `dashboard` / `status` → folded into `portfolio` per --help description
- `agent` → `wallet agent ...` (approve/list/revoke/rotate/verify)

### Added
- `outcome` — Hyperliquid HIP-4 binary/range contracts (v0.13.0 new asset
  class, USDH-quoted, no leverage)
- `health` — adapter health check across 4 DEX

### Description tightened
- `wallet` — explicitly notes nested `wallet agent` / `wallet manage` so
  users do not look for the old top-level groups
- `portfolio` — notes it replaces former `account balance` / `status` /
  `dashboard`
- `alerts` — Telegram + Discord (was Telegram-only label)

### Core Commands section
Adds "Outcome markets" examples (list / view / book / positions / orders /
buy / sell / cancel) — already present in skills/perp-cli/references/
commands.md but missing from README.

Verified via container at `qa/2026-05-05-v0.13.0-validation`:
- `perp --help` enumerates exactly 17 groups; new README table = 17/17 match
- `perp --json outcome view 2` returns symmetric Yes/No book + underlying
  gap (BTC binary live market)
- `pnpm test` 1323/1323 passed (70 files)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
QA finding (v0.13.0): `outcome view` (last commit `edffc77`) ships the
gap / inTheMoney classification on the live mainnet path with zero unit
coverage — only `parseDescription` was tested in 0.13.0. Live verified
once at release; no regression guard.

### Refactor (minimal)
Extract `_computeUnderlying(parsed, allMids)` static helper from
`getView()`. Pure function, no SDK dependency. `getView()` shrinks from
~30 lines of inline branching to a single helper call.

### New tests (9, total 16 → 25 in this file)
- returns null when description has no underlying
- priceBinary in-the-money (mark > target → "yes")
- priceBinary out-of-the-money (mark < target → "no")
- priceBinary edge: gap exactly 0 → "yes" (Yes = mark >= target convention)
- non-priceBinary class: gap computed, classification suppressed (Rule #2)
- missing class: classification suppressed
- missing markPrice for symbol: gap/gapPct undefined, inTheMoney null
- missing targetPrice: gap/gapPct undefined, inTheMoney null
- underlying symbol uppercased before allMids lookup

All 25 tests pass; tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…istry

QA finding: exchange enumeration is duplicated between two places —
`exchanges/registry.ts` (the SSOT for `-e` flag resolution) and
`landing.ts:LANDING_EXCHANGES` (powers the no-arg `perp` landing page).
Drift is silent: a new exchange registered in registry.ts will not
appear on the landing page until a human notices.

Plus `landing.ts:exchangeLabel()` is an inline ternary chain — if a 5th
exchange is added to LANDING_EXCHANGES without updating that switch,
the new entry silently falls through to the last arm's label ("Aster")
on the landing page. This is the same class of bug as the v0.12.16-18
landing-page misclassifications.

### New tests (2, total 5 → 7 in this file)
- LANDING_EXCHANGES.sort() === listExchanges().sort()
  (registry SSOT comparison; fails immediately if a registered exchange
  is missing from the landing enumeration)
- distinct, non-empty label per LANDING_EXCHANGES member
  (catches inline-switch fallthrough by asserting Set(labels).size ===
  LANDING_EXCHANGES.length)

Both cover Section 9 of QA_WORKFLOW (multi-adapter consistency).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes a P0 gap from previous QA cycle: getView() ships midSum (binary
symmetry invariant Yes+No≈1.0) and msToExpiry (Date.now-dependent) on
the live mainnet path with zero unit coverage.

### Refactor (minimal, same `_helper` convention as _computeUnderlying)
- Extract `_computeMidSum(sides)` static. Adds a small Rule #2 fix:
  previously `Number(mid)` could produce NaN from a malformed venue
  payload and propagate as `midSum: NaN` downstream. Helper now returns
  `undefined` ("symmetry unknown") for any non-finite impliedProb instead
  of emitting NaN.
- Extract `_computeTimeStatus(expiryMs, nowMs)`. `nowMs` is injected so
  callers can pin the clock under test instead of patching Date.now
  globally. Does NOT clamp negatives — expired-UX classification is the
  caller's job, not the helper's (Rule #2).

### New tests (9, total 25 → 34 in this file)
midSum (5):
- healthy binary sums to ~1.0
- one side missing impliedProb → undefined
- any side NaN/Infinity → undefined (no NaN propagation)
- empty side list → undefined
- preserves arithmetic faithfully (sum < 1 / > 1 are both surfaced)

timeStatus (4):
- now < expiry → positive ms
- now === expiry → 0
- now > expiry → negative ms (caller classifies expired UX)
- expiryMs undefined → msToExpiry undefined

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sion)

QA finding from previous cycle: v0.13.0 README "Command Groups" table
drifted from `perp --help` ground truth — 4 stale rows (`agent`,
`manage`, `dashboard`, `status`) + 2 missing (`outcome`, `health`).
Fixed in d6c9fe0 but no regression guard.

### Approach
Hand-maintained `KNOWN_TOP_LEVEL_GROUPS` SSOT constant in this test
file + README markdown-table parser. Adding a new top-level group fails
this test until BOTH places are updated; deleting a row fails until the
constant is also updated.

This is the 80/20 guard. The deeper SSOT collapse — deriving the list
dynamically from a Commander program-builder factory so a single source
covers index.ts + README + this test — is filed as a follow-up P2 item
(needs a small refactor of src/index.ts).

### New tests (3, new file `src/__tests__/readme-cli-sync.test.ts`)
- README group rows == KNOWN_TOP_LEVEL_GROUPS (set equality)
- README table has no duplicate rows
- README table preserves SSOT order (strict — guards silent reshuffling
  that would break agent docs / skill bundles relying on order)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P0 from QA cycle review: Section 7 ("거래 관련 커맨드는 반드시
--testnet 플래그 또는 mock signer 로만 실행. mainnet 실행 금지") had
no automated guard. The --dry-run gate is correct in source but a
single missed `dryRunGuard()` call in a future trade subcommand would
silently leak a venue order. This test pins that contract.

### Approach
Integration-style: stub `execution-log` + `client-id-tracker` modules
(file IO), inject a mock ExchangeAdapter where every venue method is a
spy, parse `trade ...` argv through Commander, then assert the venue
methods received zero calls.

A positive control test runs the same path without --dry-run and
confirms `adapter.marketOrder("BTC", "buy", "0.01")` IS called once —
without it, the negative tests could be green because of a wiring bug
rather than because gating works.

### New tests (5, new file `src/__tests__/commands/trade-dry-run-gating.test.ts`)
- `trade market BTC buy 0.01 --dry-run` → marketOrder NOT called
- `trade market BTC sell 0.01 --dry-run` → marketOrder NOT called
- `trade buy BTC 0.01 --dry-run` (shortcut) → marketOrder NOT called
- `trade sell BTC 0.01 --dry-run` (shortcut) → marketOrder NOT called
- positive control: same `trade market` path WITHOUT --dry-run reaches
  marketOrder exactly once with the expected args

### Out of scope (deferred)
- limit / split / close / cancel / outcome buy / outcome sell — same
  pattern, follow-up commit.
- 4-DEX matrix (PAC / LT / Aster) — adapter mock is HL-shaped here;
  follow-up cycle covers cross-exchange.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P0 from review: getView() depth handling and outcome/side gating
shipped without unit coverage. Tightens both with explicit Rule #2
gates instead of silent slice() coercion.

### Refactor
- Extract `_assertOutcomeRange(outcome, side)` static — pure arithmetic
  gate. The previous private method had a small ordering bug: NaN
  outcomes slipped past the `encoding > MAX_ENCODING` post-check
  because `Number.isInteger(NaN)=false` was only checked for `side`,
  not for `outcome` paired with `encoding=NaN`. Now rejected explicitly.
- Extract `_trimBook(book, depth)` static — depth is now strictly
  validated (Number.isInteger + depth >= 0). Previously
  `slice(0, NaN)` silently returned [] and `slice(0, -1)` silently
  dropped the last entry. Both are caller bugs, not "depth=10 default".
- Throws EXCHANGE_ERROR when the venue payload is missing bids/asks
  array — previously crashed with TypeError or fabricated empty book.

`getView()` and `_validateOutcomeSide` updated to compose these helpers.

### New tests (11, total 34 → 45 in this file)
_assertOutcomeRange (4):
- valid range + boundary (9_999_999, 9 → MAX_ENCODING)
- outcome NaN / -1 / 0.5 / Infinity rejected (the silent-pass bug)
- side outside 0..9 rejected (with friendly error message)
- encoding overflow on outcome=10_000_000 → 100_000_000 rejected

_trimBook (7):
- normal trim with bestBid/bestAsk
- depth=0 → empty + undefined best prices
- depth larger than book → full book, no padding/error
- negative depth rejected (previously dropped last entry silently)
- NaN / Infinity / fractional depth rejected
- malformed payload (missing bids or asks) → EXCHANGE_ERROR
- empty book passes through cleanly

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1 from review: the public --json envelope is what every external
agent (MCP, scripts, Claude Code) parses against. Drift in `jsonOk` /
`jsonError` shape would silently break consumers pinned to the old
shape.

### Approach
Zod schema (EnvelopeOk / EnvelopeErr / Envelope union) mirroring
ApiResponse in src/utils.ts. 11 cases covering:
- ok envelope with object / array / null / undefined data
- timestamp ISO-8601 regex
- meta merge (exchange, duration_ms)
- error envelope minimal (code + message only)
- error envelope with full agent-actionable payload (status/retryable/
  retryAfterMs/remediation/details)
- mutual exclusivity of ok=true and ok=false on the same parse
- malformed: ok envelope missing meta.timestamp → reject
- malformed: error without `code` → reject
- live `outcome view 2` response shape (Appendix B regression — captures
  the exact shape that ships at v0.13.0 so future envelope changes
  break this test instead of agents in the field)

### Out of scope (deferred)
- Per-command schemas (outcome view, market list, portfolio, …) —
  this commit guards the envelope wrapper, not each `data` shape.
  Per-command schemas are a worthwhile follow-up but explode test
  count; deferred to a separate cycle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P2 from review: scripts/sync-skill-version.mjs ran only at
prepublishOnly time, so a PR that bumps package.json version without
re-running the sync would silently merge with SKILL.md still pointing
at the old version. The npm package would be fine (prepublishOnly
covers it) but the GitHub-checkout install path (used by Docker QA and
agent skill installers reading from main) would ship a stale version
field.

### Changes
- scripts/sync-skill-version.mjs: add --check mode that exits non-zero
  on drift instead of writing. Default (no flag) behavior unchanged.
- .github/workflows/ci.yml: add a new "SKILL.md version sync check"
  step between typecheck and unit tests that runs the script in
  --check mode.

Verified locally: `node scripts/sync-skill-version.mjs --check` exits 0
when synced (current state at 0.13.0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the cycle report to reflect the full 10-commit / 50-test
expansion (initial 4 commits + 6 commits from "전체 진행" phase 1-3).

### Highlights
- 1323 → 1373 tests, 70 → 73 files
- 6 findings (vs. 3 in the initial report) — adds dry-run gating gap,
  envelope contract gap, SKILL.md sync CI gap
- 사람 검토 필요 항목 expanded to 5 (helper underscore convention,
  regex fragility, outcome status alias, hand-maintained KNOWN list,
  getView integration coverage)
- 다음 권장 액션 split into 4 buckets (이번 PR / 다음 사이클 분리 /
  micro-PR / refactor) with explicit branch names and dependencies

### Appendix B — live ↔ unit cross-validation
The live `outcome view 2` mainnet response is now mapped field-by-field
to the matching unit test case (gap / gapPct / inTheMoney / midSum /
encoding / envelope schema). This is the cross-validation pattern the
user explicitly asked to keep across QA cycles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes 사람 검토 항목 #5 (격상 권장 from cycle review): 35 helper
unit tests cover the math but cannot catch a regression where
Hyperliquid's `/info` payload shape changes. This file mocks
`_infoPost` at the adapter level and asserts the assembled
`OutcomeView` against the expected shape end-to-end.

### Test pattern
- Inject a stub HyperliquidAdapter (only `isTestnet` needed for the
  paths that getView exercises).
- `vi.spyOn(adapter as any, "_infoPost")` routes by `body.type`:
  outcomeMeta / allMids / l2Book(coin) → controllable mocks.
- `vi.useFakeTimers` + `setSystemTime` pin the clock so msToExpiry is
  deterministic (60_000 ms before the BTC binary expiry).

### New tests (7, new file `hyperliquid-outcome-getView.test.ts`)
- assembles OutcomeView from outcomeMeta + allMids + l2Book responses
  (live BTC binary shape; verifies all 5 helper outputs combined)
- propagates depth into bids/asks length
- throws SYMBOL_NOT_FOUND for unknown outcome id
- throws EXCHANGE_ERROR when outcomeMeta payload shape changes
  (no `outcomes` field)
- returns gap/inTheMoney undefined/null when allMids drops the
  underlying perp symbol (live HL cache-miss simulation)
- returns midSum undefined when allMids drops a side's encoding
- RECORDS a known Rule #2 violation: getOrderbook silently substitutes
  [[],[]] when l2Book omits `levels` — pinned for follow-up fix

### Follow-up
The last test pins a Rule #2 violation in `getOrderbook` (line 419,
`book?.levels ?? [[], []]`) — currently silent-empty-book on malformed
payload. Will be fixed in a follow-up commit that flips the assertion
from "silent empty" to "throws EXCHANGE_ERROR".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Production bug #3 surfaced by the getView integration test (#22b9286
follow-up): `getOrderbook` did
    const levels = book?.levels ?? [[], []];
which silently substituted an empty book whenever the venue payload
was missing the `levels` field — a Rule #2 violation that masks
upstream contract breaks as "no resting orders".

This is the same class of bug as the NaN propagation in
`_computeMidSum` and the outcome NaN silent-pass in
`_validateOutcomeSide` (both fixed earlier in this cycle). Three
Rule #2 issues found via the helper-extract + unit-test pattern in a
single audit pass — supports the user's hypothesis that other modules
without this treatment likely harbor more.

### Change
Reject when `book.levels` is missing OR not a 2-tuple of arrays. Throws
EXCHANGE_ERROR with a coin-tagged message so the caller (CLI / agent)
can attribute the failure to a specific outcome side.

### Test
Flips the previously "RECORDS silent empty" assertion to actively
verify the throw, plus a new case for `levels` being a length-1 tuple
(also rejected — partial venue payload is not "valid empty book").

Test count: 35 + 7 → 35 + 8 in the outcome suite. All other tests
unaffected (`book.levels` is the actual production field; previous
silent fallback was unreachable for healthy responses).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…son changes

Process gate from cycle review: v0.13.0's `outcome view` shipped with
zero unit-test coverage on getView's gap/inTheMoney/midSum/msToExpiry —
caught only because a manual QA cycle ran. Make this catch automatic
at PR-review time.

### Change
- Adds an "If applicable" checkbox: Live ↔ Unit cross-validation table
  required for any change that adds/modifies a `--json` command or
  affects envelope shape.
- Adds a "Live ↔ Unit Test Cross-Validation" section with a Korean
  comment block explaining when it's required and what to do if not
  applicable (one-line N/A reason).
- Pattern lifted directly from docs/qa-reports/2026-05-05 Appendix B.

PR authors with envelope-touching changes now have to either:
  (a) capture a live response + map fields to unit-test cases, or
  (b) explicitly state why N/A.

Either way, drift between live behavior and tested behavior is
visible at review time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Full revision after the "전체 진행" cycle (initial 4 commits → 14
commits, 1323 → 1381 tests). New shape mirrors the user's review
feedback:

### New / expanded sections
- "테스트 작성 중 발견된 production 결함" (new) — promotes the 3
  Rule #2 violations (NaN silent-pass, NaN propagation, silent empty
  book) out of the "Changed interfaces" footnote and into a first-
  class section. Three bugs found in one audit pass via the same
  helper-extract pattern is the user's "tip of the iceberg" signal.
- Findings 4-7 added (dry-run gating gap, envelope contract gap,
  SKILL.md sync gap, live-↔-unit cross-validation process gate).
- "다음 권장 액션" reorganized per user's cycle-review priority
  reordering: numeric-validation-audit (new P0), aster-signer-
  regression (P3 → P0), program-builder factory (P2 → P1).
- Coverage section explicitly notes `@vitest/coverage-v8` is missing
  (Section 3 user approval needed — filed as micro-PR).
- 사람 검토 항목 #5 (getView integration coverage) flagged as closed
  by 22b9286.
- Live-↔-unit cross-validation table extended to include the getView
  integration test row.

### Verified
- Container ground-truth: 74 files / 1381 tests passed at HEAD
  `49392da`, 23.09s. Reflects all 14 cycle commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Hiksang Hiksang merged commit 1d0e961 into main May 5, 2026
4 checks passed
@Hiksang Hiksang deleted the qa/2026-05-05-v0.13.0-validation branch May 5, 2026 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant