Skip to content

test(controlplane): cover the cred-refresh/hot-idle-reaper seam#815

Open
EDsCODE wants to merge 1 commit into
mainfrom
test/hot-idle-reaper-seam-coverage
Open

test(controlplane): cover the cred-refresh/hot-idle-reaper seam#815
EDsCODE wants to merge 1 commit into
mainfrom
test/hot-idle-reaper-seam-coverage

Conversation

@EDsCODE

@EDsCODE EDsCODE commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Why

A post-mortem on the hot-idle reaper bug (fixed in #810) found that its two regression tests forge state with a raw SQL UPDATE, so they pin the new predicate but never exercise the real production write paths. They prove the new COALESCE(hot_idle_since, updated_at) query behaves — not that the actual credential-refresh writers leave the reap clock alone. That's the exact seam the original bug lived in: two subsystems sharing worker_records.updated_at, each tested in isolation.

This adds coverage across that seam, through the public API.

Tests

  • TestListExpiredHotIdleWorkersSurvivesRealCredentialRefresh — a worker idle for 30m (ttl 5m) is still reaped after the real BumpWorkerEpoch + MarkCredentialsRefreshed bump its updated_at. The test also asserts the bug precondition is present (the bumped updated_at would hide the worker under the old predicate), so it can't silently lose its teeth. Verified to fail against the pre-fix(controlplane): reap hot-idle workers on a refresh-immune idle clock #810 updated_at-keyed predicate (reverting the one-line predicate makes exactly this test fail), and pass with the fix.
  • TestCredentialRefreshWritesLeaveHotIdleSinceUnchanged — parameterized over both real writers (BumpWorkerEpoch, MarkCredentialsRefreshed); asserts each advances updated_at but never moves hot_idle_since. This is the durable guard for the whole class of bug: a timestamp column read as a semantic clock by one subsystem must not be written by another.

Raw SQL is used only to age a row into the past (wall-clock time can't advance in a test); the mutation under test is always the real writer.

Testing

go test ./tests/configstore/ green; golangci-lint clean. Confirmed the seam test fails on the reverted (buggy) predicate.

Follow-ups (not in this PR)

The post-mortem also recommended deeper coverage that this PR intentionally leaves out: an injectable Clock threaded through both the reaper read path and the updated_at writers; making the janitor's fake expiry store compute expiry from (now, defaultTTL) instead of returning canned data; and a short-TTL idle-reclamation e2e lane. Happy to follow up on those separately.

PR #810 fixed the hot-idle reaper but its regression tests forge state with a
raw UPDATE, so they validate the new predicate without proving the real
production writers leave the reap clock alone. Add two tests that exercise the
actual write paths:

- TestListExpiredHotIdleWorkersSurvivesRealCredentialRefresh: a worker idle for
  30m (ttl 5m) is still reaped after the real BumpWorkerEpoch +
  MarkCredentialsRefreshed bump its updated_at. Verified to fail against the old
  updated_at-keyed predicate.
- TestCredentialRefreshWritesLeaveHotIdleSinceUnchanged: parameterized over both
  real writers; asserts each advances updated_at but never moves hot_idle_since —
  the durable guard for the class of bug (a timestamp read as a clock by one
  subsystem being written by another).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Test Impact Plan

Deterministic summary of how this PR changes tests, CI runners, and coverage-risk signals.

Summary

Area Added Changed Deleted
Test files 0 1 0
E2E/journey files 0 0 0
Workflow files 0 0 0

Signals

  • Test cases: +2 / -0
  • Assertions: +16 / -0
  • Skips or known failures added: 0
  • Workflow continue-on-error added: 0
  • Workflow path filters added: 0
  • Test commands removed from justfile: 0
  • E2E/journey retry lines added: 0

Coverage risk: neutral or increased

No coverage-reduction warnings detected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant