fix(leader-election): suppress standby lease false positive by rswigginton · Pull Request #72 · Nextdoor/vigil

rswigginton · 2026-06-14T22:06:03Z

In HA (leader election on, 2+ replicas), the non-leader replica logged "no controller is watching nodes" at ERROR every 2x leaseDuration forever, even with a healthy leader. The monitor only watched its own Elected() channel, which never fires on a standby, so it could not tell "another pod is leading" (fine) from "nobody is leading" (the real #55 wedge).

monitorLeaseAcquisition now reads the election Lease before escalating. If another replica holds it and renewed within the lease duration, this pod is a standby and logs at debug instead of warning. ERROR is reserved for when no leader is live, preserving #55 detection.

The lease namespace is pinned to POD_NAMESPACE (downward API) so the probe and the manager read the same Lease.

Tests: healthy standby -> no ERROR; no live leader -> ERROR; a stale lease fed through the real probe -> ERROR (#55 end to end); leaderLeaseLive table.

Fixes #70

In HA (leader election on, 2+ replicas), the non-leader replica logged "no controller is watching nodes" at ERROR every 2x leaseDuration forever, even with a healthy leader. The monitor only watched its own Elected() channel, which never fires on a standby, so it could not tell "another pod is leading" (fine) from "nobody is leading" (the real Nextdoor#55 wedge). monitorLeaseAcquisition now reads the election Lease before escalating. If another replica holds it and renewed within the lease duration, this pod is a standby and logs at debug instead of warning. ERROR is reserved for when no leader is live, preserving Nextdoor#55 detection. The lease namespace is pinned to POD_NAMESPACE (downward API) so the probe and the manager read the same Lease. Tests: healthy standby -> no ERROR; no live leader -> ERROR; a stale lease fed through the real probe -> ERROR (Nextdoor#55 end to end); leaderLeaseLive table. Fixes Nextdoor#70 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rswigginton requested a review from a team as a code owner June 14, 2026 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(leader-election): suppress standby lease false positive#72

fix(leader-election): suppress standby lease false positive#72
rswigginton wants to merge 1 commit into
Nextdoor:mainfrom
rswigginton:fix/70-standby-lease-false-positive

rswigginton commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rswigginton commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant