[openai] Add rate limits data stream with headroom dashboard panel and alert by stefans-elastic · Pull Request #19347 · elastic/integrations

stefans-elastic · 2026-06-03T08:39:41Z

Proposed commit message

openai: add rate_limits stream and rate-limit headroom view

This change adds an openai.rate_limits data stream and a rate-limit
headroom view that compares OpenAI's configured per-project, per-model
rate limits against actual Usage API consumption, so operators can see
how close each project and model is to being throttled with HTTP 429
responses. The stream provides the limit side of that comparison; the
dashboard panels and alert join it to the existing usage streams.

The stream uses a CEL work-list: it pages the organization projects
list, then drains each project's rate-limit endpoint, paginating both
with after, last_id and has_more. To collect on every interval rather
than every other one, the program never emits an empty event batch
mid-chain (the filebeat CEL input ends a periodic run as soon as an
evaluation yields zero events, before want_more is checked), so
project-listing and the first rate-limit page are folded into a single
emitting step. Each record is annotated with its project_id, name and
status, is stamped with collected_at, and includes an optional limit
field only when the API returns it, so sparse records never write null
into the long-typed mappings. A failed call emits an error event and
the run continues with the rest of the work list, and the admin token
is redacted.
The OpenAI dashboard gains two ES|QL tables. "Rate limit headroom"
joins limits to usage per project_id and model and reports peak
one-minute used, limit and utilization for requests (RPM), tokens
(TPM) and images (IPM). "Rate limit headroom - by model (org-wide)"
rolls the same metrics up by model across projects; because OpenAI
enforces limits per project, the summed per-project limit is a
synthetic aggregate capacity and is labelled as indicative, not an
exact throttle boundary. The model join normalizes dated model
snapshots (e.g. omni-moderation-2024-09-26, gpt-image-1-2025-04-23) to
Usage API reports per dated snapshot while rate_limits lists the base
name; an exact-string join silently dropped those rows (e.g.
gpt-image-1) from the headroom view. All six usage datasets
(completions, embeddings, moderations, images, audio_speeches,
audio_transcriptions) are added to the dashboard's global
event.dataset filter so RPM is counted for audio and image models too.
A prebuilt "[OpenAI] Rate limit headroom low" .es-query rule fires at
or above 80% peak TPM utilization, grouped by project_id::model. Its
saved-object id and filename use the same "low" wording as the
rule name and dashboard panel.
Request and image usage are normalized into single
openai.base.usage_tokens and openai.base.usage_images fields,
populated by the relevant usage pipelines. The panels and alert
reference these normalized fields rather than per-stream columns, so
the ES|QL resolves even before any usage data exists and keeps working
when only some usage streams have data.
Each usage stream gains a finalization_grace setting (default 15m).
OpenAI does not finalize a per-minute usage bucket when it ends; its
counts keep rising for minutes afterward. The stream holds back any
bucket whose end time is younger than the grace period and re-fetches
it on a later poll, so finalized counts are stored instead of partial
ones; the grace value is persisted in CEL state across polls. A
documented residual limitation remains: OpenAI can revise buckets
upward beyond any fixed grace window under high-volume bursts, and
buckets are ingested once, so a small (few-percent) undercount can
persist for the busiest minutes. This does not affect the alert's
ability to flag over-limit conditions.

Utilization is computed with TO_DOUBLE because the used and limit values
are longs; plain division truncated to zero until usage exceeded the
limit, which also left the alert threshold ineffective. Headroom rows are
ordered so that models with usage appear first, ranked by highest TPM
utilization, with unused models sorted to the bottom.

The peak one-minute calculation depends on the usage streams running with
bucket_width set to 1m, which is the default. The package is bumped to
2.2.0 and format_version to 3.5.7 for alerting_rule_template support.

Checklist

I have reviewed tips for building integrations and this pull request is aligned with them.
I have verified that all data streams collect metrics or logs.
I have added an entry to my package's changelog.yml file.
I have verified that Kibana version constraints are current according to guidelines.
I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

[ ]

How to test this PR locally

Related issues

Screenshots

…d alert

github-actions · 2026-06-03T08:42:31Z

Elastic Docs Style Checker (Vale)

Summary: 1 warning, 3 suggestions found

⚠️ Warnings (1): Fix when the suggestion improves clarity or correctness.

File	Line	Rule	Message
packages/openai/_dev/build/docs/README.md	50	Elastic.QuotesPunctuation	Place punctuation inside closing quotation marks.

💡 Suggestions (3): Optional style improvements. Apply when helpful.

File	Line	Rule	Message
packages/openai/_dev/build/docs/README.md	107	Elastic.HeadingColons	Capitalize ': r'.
packages/openai/_dev/build/docs/README.md	109	Elastic.WordChoice	Consider using 'can, might' instead of 'may', unless the term is in the UI.
packages/openai/_dev/build/docs/README.md	151	Elastic.Ellipses	In general, don't use an ellipsis.

The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale.

github-actions · 2026-06-03T09:19:17Z

TL;DR

The Buildkite failure is real, but the available log payload only contains teardown/output-upload lines and does not include the failing command/test error itself. Immediate next step: rerun the failed Check integrations openai job (or share the full step log/JUnit failure section) so the actual root cause can be identified.

Remediation

Re-run Check integrations openai and capture the first failure block (the section before teardown starts), especially the test name and assertion/error text.
If available, inspect and share the failing JUnit artifact content from build/test-results/openai-*.xml (the uploaded files indicate one of these contains the real failure).

Investigation details

Root Cause

The provided job log (/tmp/gh-aw/buildkite-logs/integrations-check-integrations-openai.txt) is truncated to cleanup + artifact upload output; it does not contain the original failing stack trace/assertion, so a code-level root cause cannot be proven from available evidence.

Evidence

Build: https://buildkite.com/elastic/integrations/builds/44062
Job/step: Check integrations openai
Key log excerpt:
- --- [openai] failed (/tmp/gh-aw/buildkite-logs/integrations-check-integrations-openai.txt:71)
- 🚨 Error: The command exited with status 1 (...:74)
- user command error: exit status 1 (...:76)
- Remaining lines are artifact upload + stack teardown only.

Verification

Attempted local CI script repro with .buildkite/scripts/test_one_package.sh packages/openai origin/main bc03335307d316073d5af257c78f13ef47115df6, but local run exits early due missing CI env/tooling (YQ_VERSION unbound).
Could not retrieve PR metadata/comments via GitHub read tool in this run due integrity gating, so deduplication against prior detective comments could not be confirmed.

Follow-up

Once the full failing section (or failing JUnit XML contents) is available, I can provide a precise root cause classification and patch-level fix recommendation.

Note

🔒 Integrity filter blocked 2 items

The following items were blocked because they don't meet the GitHub integrity level.

#19347 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
[openai] Add rate limits data stream with headroom dashboard panel and alert #19347 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

What is this? | From workflow: PR Buildkite Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

The README documented "images ↔ images per minute" as a compared headroom dimension, but the Rate limit headroom panel only computed RPM and TPM, so max_images_per_1_minute was collected and documented yet never shown. Add image headroom to the panel: ES|QL now computes peak per-minute image usage (SUM of openai.images.images) against max_images_per_1_minute, gated independently so images don't inflate TPM/RPM, plus three table columns (Images used/limit/utilization). Audio stays usage-only by design: the limit is in megabytes but the Usage API reports audio only in seconds/characters, so there is no comparable usage figure. Clarify the README deferral note accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

elastic-vault-github-plugin-prod · 2026-06-03T14:19:35Z

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

OpenAI's Usage API reports dated model snapshots (e.g. gpt-image-1-2025-04-23, omni-moderation-2024-09-26), while the Rate Limits API often lists only the base name (gpt-image-1). The headroom dashboard panels and the alert rule joined usage to limits on the exact model string, so usage for a snapshot without a matching rate-limit row was dropped from headroom entirely (verified live: gpt-image token usage was invisible on the panel). Strip a trailing -YYYY-MM-DD snapshot suffix on both sides of the join so usage collapses to its base family and joins to the limit. Applied to all four dashboard panel queries and the alert rule template. Re-verified against live data: gpt-image-1 token usage now joins to its limit, and moderation utilization is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

OpenAI's Usage API finalizes per-minute buckets with a multi-minute delay: a bucket's request/token counts keep climbing for several minutes after it ends. The usage CEL only guarded the single newest bucket, so any bucket that was no longer newest but still finalizing was emitted partial, and the cursor advanced past it -- a permanent undercount that did not self-correct (most visible under bursts). Widen the partial-skip guard across all 6 usage streams (completions, embeddings, moderations, images, audio_speeches, audio_transcriptions): - New `finalization_grace` config var (default 15m). - events now emit only buckets with end_time <= now - finalization_grace. - The cursor is held at min(start_time) of the still-finalizing buckets so they are re-queried (start_time is inclusive) and emitted once final. Emitted buckets are strictly older than the held cursor, so they are never re-fetched and never double-counted -- no dedup needed. Docs updated with a Finalization grace period section and corrected collection-process steps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The issue-07 fix seeded finalization_grace (and initial_interval) in the CEL state: block and read state.finalization_grace on every poll, but the program's returned state map never re-emitted those keys. Elastic's CEL input replaces persisted state with the returned map (the state: block only seeds the first run), so from the second poll onward state.finalization_grace was absent and every evaluation failed with "no such key: finalization_grace" (event.kind: pipeline_error), ingesting zero usage across all six streams. Re-emit finalization_grace and initial_interval in both returned objects (success and error path) of every usage stream's cel.yml.hbs, mirroring the existing access_token persistence pattern. Verified live: agent resumed with 0 errors and backfilled the burst; per-bucket ES SUM(num_model_requests) equals the direct OpenAI usage API on all six streams (0 mismatches). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…L eval filebeat's CEL input ends a periodic cycle as soon as an evaluation emits zero events, before it inspects want_more. The old two-phase design listed projects in an evaluation that returned no events (events: [], want_more: true), so that empty eval ended the cycle and rate-limit draining only ran on the next interval -- producing every-other-interval collection. Resetting the drain phase's terminal state (prior attempts) could not help, because the blocker was the empty listing eval upstream. Fold listing and emitting into the same evaluation: a LOAD step pages the project list and immediately drains the first newly discovered project's first rate-limit page, and a DRAIN step pages each project's remaining rate limits. Every productive evaluation now emits at least one event, so the want_more chain completes within a single interval. An accumulating worklist with a next index plus projects_done/project_after cursors handle project-list pagination, and a top-level want_more==false reset starts each fresh cycle clean. Verified with the rate_limits system test (batch_size: 1, exercising both project-list and per-project rate-limit pagination): all expected docs are collected in one want_more chain per interval. rate_limits ships unreleased in 2.2.0, so no changelog entry is needed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ic/integrations into openai-rate-limit-headroom

… works The rate_limits pipeline renamed message into openai.rate_limits.results and parsed from there. This broke two paths: - Logstash/reroute (event.original pre-set): message was removed and openai.rate_limits.results never populated, dropping every parsed field. - Agent with preserve_original_event=true: event.original was never created, making the option a silent no-op. Adopt the canonical pattern (audit stream + 38 other pipelines): rename message -> event.original, parse event.original -> openai.rate_limits, and remove event.original unless the preserve_original_event tag is present. The default path output is unchanged; the raw copy is kept only when opted in. Add a pipeline test asserting both event.original and the parsed openai.rate_limits.* fields are present on the preserve path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The org-wide headroom panel summed per-project rate limits within each 1-minute bucket, then took the MAX across minutes. Because rate_limits docs are periodic snapshots, the per-minute project set depends on poll timing, so the denominator could undercount aggregate capacity and swing between polls when snapshots landed in different minutes. Stabilize each project's limit as the MAX over the look-back window first (matching the per-project panel), then sum by model. Usage becomes sum-of-per-project-peaks. Both numerator and denominator are now poll-timing independent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The README and the rate-limit-headroom alert investigation guide claimed limits and usage join on the exact project_id and model strings "without any normalization", but the shipped alert ES|QL and both dashboard headroom panels normalize model via REPLACE(<model>, "-[0-9]{4}-[0-9]{2}-[0-9]{2}$", ""). Rewrite the docs to describe the implemented contract: join on exact project_id with model normalized on both sides (strip trailing -YYYY-MM-DD), so dated usage snapshots match base rate-limit names. Also correct the aggregation note to reflect the MAX-per-bucket dedup of identical family/snapshot limits. Updates the generated README, its build template, and the alert blob. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The prebuilt rate-limit headroom alert capped its ES|QL query at | LIMIT 100. With groupBy: row, that explicit limit is the binding cap on alert instances, and the ES|QL executor path never sets groupAggCount, so groups beyond the top 100 were dropped with no truncation warning. termField/termSize are ignored on the ES|QL path, so termSize: 100 was never the real constraint. Raise the cap to 1000 to align with the alerting max-alerts circuit breaker (xpack.alerting.rules.run.alerts.max); at that value truncation surfaces a warning instead of silently dropping breaching project/model pairs. SORT tpm_utilization DESC stays ahead of the LIMIT so the worst offenders survive any truncation. Align the inert termSize to 1000 and document the cap and large-org tuning in the investigation guide. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

filebeat's CEL input ends a polling cycle as soon as an evaluation emits zero events, before it reads want_more. Three paths in the rate_limits CEL could return [] mid want_more chain -- a project with no configured rate limits (LOAD and DRAIN branches) and an empty project-list page that still has more pages -- each re-introducing the every-other-interval stall the data stream was designed to avoid. Emit a dropped keep-alive sentinel instead of [] whenever the chain still wants to continue, and drop the sentinel in the ingest pipeline so it keeps the chain alive in the agent without ever being indexed. Pipeline test covers the drop (rendered as null in expected output). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The finalization grace period adds ~15m of latency to all usage streams and the headroom panels/alert that read from them, but the README only described the undercount trade-off. Document the concrete latency cost, that it applies to every usage stream, where to change it (Advanced options), and that 0s disables it for fresher-but-undercounted data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The README described the org-wide usage rollup as sum-then-peak (sum across projects per 1-minute bucket, then take the peak minute), but the shipped ES|QL does peak-then-sum (each project's peak minute first, then sum across projects). Reconcile the docs to the query: describe it as an indicative upper bound and align the limit-column wording with the issue-10 stabilize-then-sum implementation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

elastic-vault-github-plugin-prod · 2026-06-05T13:25:37Z

✅ All changelog entries have the correct PR link.

elasticmachine · 2026-06-05T13:40:59Z

💚 Build Succeeded

Buildkite Build
Commit: efc9017

History

💚 Build #44219 succeeded 30a9bf5
💚 Build #44213 succeeded 92bce25
💚 Build #44196 succeeded ca3dfa5
💚 Build #44194 succeeded 67467b5
💚 Build #44147 succeeded e789b3d
💚 Build #44144 succeeded 797a828

cc @stefans-elastic

stefans-elastic added 3 commits June 2, 2026 16:42

[openai] Add rate limits data stream with headroom dashboard panel an…

ce5c88f

…d alert

Merge branch 'main' into openai-rate-limit-headroom

aa308f4

fix version in manifest.yml

f332411

This comment has been minimized.

Sign in to view

add codeowner for openai/data_stream/rate_limits

bc03335

This comment has been minimized.

Sign in to view

shmsr changed the title ~~[openai] Add rate limits data stream with headroom dashboard panel an…~~ [openai] Add rate limits data stream with headroom dashboard panel and alert Jun 3, 2026

shmsr assigned stefans-elastic Jun 3, 2026

shmsr added the enhancement New feature or request label Jun 3, 2026

stefans-elastic added 2 commits June 3, 2026 13:16

fix CI failure

ae41d9a

corrected PR link in changelog.yml

468f8ed

andrewkroh added dashboard Relates to a Kibana dashboard bug, enhancement, or modification. documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. Integration:openai OpenAI labels Jun 3, 2026

stefans-elastic and others added 7 commits June 3, 2026 15:06

fix: count audio & image requests in rate limit headroom RPM

5a4f483

fix: align rate limit alert id with "headroom low" naming

1b78171

fix: collect rate limits every interval, not every other

921049f

fix: sort rate limit headroom by TPM utilization first

b152041

feat: add org-wide rate limit headroom rollup by model

a38e673

fix: harden rate limit headroom panels against missing images stream

9786341

stefans-elastic and others added 5 commits June 4, 2026 15:19

Merge branch 'main' into openai-rate-limit-headroom

aef07d4

stefans-elastic and others added 10 commits June 5, 2026 13:09

Merge branch 'main' into openai-rate-limit-headroom

c271e76

Merge branch 'openai-rate-limit-headroom' of github.com:stefans-elast…

67467b5

…ic/integrations into openai-rate-limit-headroom

docs: note residual usage undercount under high-volume bursts

ca3dfa5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[openai] Add rate limits data stream with headroom dashboard panel and alert#19347

[openai] Add rate limits data stream with headroom dashboard panel and alert#19347
stefans-elastic wants to merge 28 commits into
elastic:mainfrom
stefans-elastic:openai-rate-limit-headroom

stefans-elastic commented Jun 3, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Jun 3, 2026

Root Cause

Evidence

Verification

Follow-up

Uh oh!

elastic-vault-github-plugin-prod Bot commented Jun 3, 2026

Uh oh!

elastic-vault-github-plugin-prod Bot commented Jun 5, 2026

Uh oh!

elasticmachine commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

stefans-elastic commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed commit message

Checklist

Author's Checklist

How to test this PR locally

Related issues

Screenshots

Uh oh!

github-actions Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Elastic Docs Style Checker (Vale)

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Jun 3, 2026

TL;DR

Remediation

Root Cause

Evidence

Verification

Follow-up

Uh oh!

elastic-vault-github-plugin-prod Bot commented Jun 3, 2026

🚀 Benchmarks report

Uh oh!

elastic-vault-github-plugin-prod Bot commented Jun 5, 2026

Uh oh!

elasticmachine commented Jun 5, 2026

💚 Build Succeeded

History

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stefans-elastic commented Jun 3, 2026 •

edited

Loading

github-actions Bot commented Jun 3, 2026 •

edited

Loading