Skip to content

fix(diplomat): demote per-request fallback log, rate-limit the warn (EIP-2816)#118

Open
cluckienv wants to merge 1 commit into
jfernandes/EIP-1341-Improve-integrations-nodejs-sdk-so-that-all-integrations-that-use-createAxiosClient-receive-Diplomat-support-by-defaultfrom
cluckie/EIP-2816-demote-diplomat-fallback-warn
Open

fix(diplomat): demote per-request fallback log, rate-limit the warn (EIP-2816)#118
cluckienv wants to merge 1 commit into
jfernandes/EIP-1341-Improve-integrations-nodejs-sdk-so-that-all-integrations-that-use-createAxiosClient-receive-Diplomat-support-by-defaultfrom
cluckie/EIP-2816-demote-diplomat-fallback-warn

Conversation

@cluckienv

@cluckienv cluckienv commented Jun 22, 2026

Copy link
Copy Markdown

Summary

The warn: Diplomat server check failed - falling back to direct routing log fires on every request where the diplomat-server check fails — the steady state for non-Diplomat installs. At ~1.2M log lines / 14d (44k+ in the last 14d alone, per Datadog) this is mostly noise.

This PR:

  • Demotes the per-request log from warn to debug (it's the expected path, not an exception)
  • Adds an in-memory rate limiter so a single warn still fires per installId per 5-minute window — keeps operator visibility without per-request amplification
  • Adds test/util/diplomat.test.ts covering: debug on every fallback, single warn within the window, warn re-fires after the window expires, independent gating per installId

Why this approach

The ticket asks for two things:

  1. "Demote the warn to debug — falling back to direct routing is the normal mode for non-Diplomat customers, not a warn-worthy exception."
  2. "Add a warnOnce-style rate limiter so the log fires once per installId per 5 minutes."

Interpreting both: every fallback emits at debug (full per-event trace if you need it), and a separate gated warn fires once per (installId, 5-min window) so dashboards/alerts still see one signal per affected install per window.

The cache is a Map<string, number> keyed by installId — same shape and TTL as the existing CACHE_TTL_MS in the plugin-side consumer, but local to this module. No external dependencies, no shared state across instances (each process has its own cache, which is fine for a "tell me once per window" signal).

Test plan

  • npm test — 30/30 tests pass, including 3 new tests for the gating behavior
  • npm run lint — no new lint findings on changed files
  • npx prettier --check src/util/diplomat.ts test/util/diplomat.test.ts — clean

Links

  • Jira: EIP-2816
  • Parent epic: EIP-2786 — Cost Reduction: envoy-plugin-ccure Log Volume

🤖 Generated with Claude Code

The "Diplomat server check failed - falling back to direct routing" warn
fires on every request when the diplomat server is unreachable, which is
the common case for non-Diplomat installs. At ~1.2M log lines / 14d this
swamps log volume with what is effectively the expected steady state.

Demote the per-request log to debug. Keep a single warn per installId
every 5 minutes so an operator still sees one signal per affected
install per window, without per-request amplification.

EIP-2816

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant