test(controlplane): cover the cred-refresh/hot-idle-reaper seam#815
Open
EDsCODE wants to merge 1 commit into
Open
test(controlplane): cover the cred-refresh/hot-idle-reaper seam#815EDsCODE wants to merge 1 commit into
EDsCODE wants to merge 1 commit into
Conversation
PR #810 fixed the hot-idle reaper but its regression tests forge state with a raw UPDATE, so they validate the new predicate without proving the real production writers leave the reap clock alone. Add two tests that exercise the actual write paths: - TestListExpiredHotIdleWorkersSurvivesRealCredentialRefresh: a worker idle for 30m (ttl 5m) is still reaped after the real BumpWorkerEpoch + MarkCredentialsRefreshed bump its updated_at. Verified to fail against the old updated_at-keyed predicate. - TestCredentialRefreshWritesLeaveHotIdleSinceUnchanged: parameterized over both real writers; asserts each advances updated_at but never moves hot_idle_since — the durable guard for the class of bug (a timestamp read as a clock by one subsystem being written by another). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Test Impact PlanDeterministic summary of how this PR changes tests, CI runners, and coverage-risk signals. Summary
Signals
Coverage risk: neutral or increased No coverage-reduction warnings detected. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
A post-mortem on the hot-idle reaper bug (fixed in #810) found that its two regression tests forge state with a raw SQL
UPDATE, so they pin the new predicate but never exercise the real production write paths. They prove the newCOALESCE(hot_idle_since, updated_at)query behaves — not that the actual credential-refresh writers leave the reap clock alone. That's the exact seam the original bug lived in: two subsystems sharingworker_records.updated_at, each tested in isolation.This adds coverage across that seam, through the public API.
Tests
TestListExpiredHotIdleWorkersSurvivesRealCredentialRefresh— a worker idle for 30m (ttl 5m) is still reaped after the realBumpWorkerEpoch+MarkCredentialsRefreshedbump itsupdated_at. The test also asserts the bug precondition is present (the bumpedupdated_atwould hide the worker under the old predicate), so it can't silently lose its teeth. Verified to fail against the pre-fix(controlplane): reap hot-idle workers on a refresh-immune idle clock #810updated_at-keyed predicate (reverting the one-line predicate makes exactly this test fail), and pass with the fix.TestCredentialRefreshWritesLeaveHotIdleSinceUnchanged— parameterized over both real writers (BumpWorkerEpoch,MarkCredentialsRefreshed); asserts each advancesupdated_atbut never moveshot_idle_since. This is the durable guard for the whole class of bug: a timestamp column read as a semantic clock by one subsystem must not be written by another.Raw SQL is used only to age a row into the past (wall-clock time can't advance in a test); the mutation under test is always the real writer.
Testing
go test ./tests/configstore/green;golangci-lintclean. Confirmed the seam test fails on the reverted (buggy) predicate.Follow-ups (not in this PR)
The post-mortem also recommended deeper coverage that this PR intentionally leaves out: an injectable
Clockthreaded through both the reaper read path and theupdated_atwriters; making the janitor's fake expiry store compute expiry from(now, defaultTTL)instead of returning canned data; and a short-TTL idle-reclamation e2e lane. Happy to follow up on those separately.