fix(diplomat): demote per-request fallback log, rate-limit the warn (EIP-2816)#118
Conversation
The "Diplomat server check failed - falling back to direct routing" warn fires on every request when the diplomat server is unreachable, which is the common case for non-Diplomat installs. At ~1.2M log lines / 14d this swamps log volume with what is effectively the expected steady state. Demote the per-request log to debug. Keep a single warn per installId every 5 minutes so an operator still sees one signal per affected install per window, without per-request amplification. EIP-2816 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
warn: Diplomat server check failed - falling back to direct routinglog fires on every request where the diplomat-server check fails — the steady state for non-Diplomat installs. At ~1.2M log lines / 14d (44k+ in the last 14d alone, per Datadog) this is mostly noise.This PR:
warntodebug(it's the expected path, not an exception)warnstill fires per installId per 5-minute window — keeps operator visibility without per-request amplificationtest/util/diplomat.test.tscovering: debug on every fallback, single warn within the window, warn re-fires after the window expires, independent gating per installIdWhy this approach
The ticket asks for two things:
debug— falling back to direct routing is the normal mode for non-Diplomat customers, not a warn-worthy exception."warnOnce-style rate limiter so the log fires once per installId per 5 minutes."Interpreting both: every fallback emits at
debug(full per-event trace if you need it), and a separate gatedwarnfires once per (installId, 5-min window) so dashboards/alerts still see one signal per affected install per window.The cache is a
Map<string, number>keyed by installId — same shape and TTL as the existingCACHE_TTL_MSin the plugin-side consumer, but local to this module. No external dependencies, no shared state across instances (each process has its own cache, which is fine for a "tell me once per window" signal).Test plan
npm test— 30/30 tests pass, including 3 new tests for the gating behaviornpm run lint— no new lint findings on changed filesnpx prettier --check src/util/diplomat.ts test/util/diplomat.test.ts— cleanLinks
🤖 Generated with Claude Code