fix(tracing): support minute/second window shorthand in spans API#66382
Draft
jonmcwest wants to merge 3 commits into
Draft
fix(tracing): support minute/second window shorthand in spans API#66382jonmcwest wants to merge 3 commits into
jonmcwest wants to merge 3 commits into
Conversation
The tracing (APM) spans API parses dateRange.date_from via PostHog's global relative-date grammar, where a bare lowercase "m" means months. For a short-window product that meant "-30m" resolved to 30 months ago and silently returned stale spans instead of the last 30 minutes — and any unparseable window fell back to "now" with no error. Add a tracing-scoped normalizer that reads s/m/h/d/w intuitively (m = minutes), passes ISO 8601 timestamps and the broader global relative grammar through untouched, and raises a 400 on genuinely unparseable input instead of failing open. Applied at every dateRange parse site across the spans endpoints, with the serializer help_text updated so the documented format flows into the generated MCP tool schemas. Generated-By: PostHog Code Task-Id: 81588816-332c-4fa0-8c26-84f0f973eaab
Contributor
|
Reviews (1): Last reviewed commit: "fix(tracing): support minute/second wind..." | Re-trigger Greptile |
The import-linter "presentation must use facade" contract forbids the presentation layer from importing new top-level backend modules. The window normalizer is DRF-aware request validation (it raises a DRF ValidationError), so it belongs in the presentation package rather than behind the facade, which must stay free of HTTP concerns. Move it there — presentation→presentation imports are allowlisted — and revert the serializer help_text to its original wording to avoid drift in the committed OpenAPI/MCP generated artifacts. Generated-By: PostHog Code Task-Id: 81588816-332c-4fa0-8c26-84f0f973eaab
The global-grammar passthrough regex used \d* (optional number), so a bare unit like "h" or "-m" matched and was forwarded to the global parser, which resolves it to "now" — the silent fail-open this module exists to prevent. Require either a number (position optional) or a bare unit with an explicit Start/End boundary, so legitimate boundary forms like "wStart"/"mStart" still pass while a bare unit alone now returns a 400. Flagged by Greptile. Generated-By: PostHog Code Task-Id: 81588816-332c-4fa0-8c26-84f0f973eaab
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The tracing (APM) spans API accepts a relative
dateRange.date_fromand parses it with PostHog's global relative-date grammar — where a bare lowercasemmeans months. For a product whose data is measured in hours and days, that's the wrong default:-30m(intended as "30 minutes ago") resolves to ~30 months ago, so a "last 30 minutes" query silently returns very old spans. Worse, an unrecognized window string fell back to "now" with no error, so a typo'd range returned stale data with no signal that anything was wrong.This was hit in practice via the MCP
query-apm-spanstool: a "slow queries in the last 30 minutes" request returned weeks-old rows.Changes
products/tracing/backend/date_window.pywith a tracing-scoped window normalizer:s/m/h/d/wintuitively, so-30m= 30 minutes (the one deliberate divergence from the global grammar, mapped onto itsM-for-minutes token).-1dStart,-3q,-1y, month boundaries …) through unchanged, so nothing that worked before regresses.ValidationError(HTTP 400) on genuinely unparseable input instead of failing open to "now".dateRangeparse site across the spans endpoints (query, sparkline, aggregation, breakdown, histogram, symbol stats, attribute names/values, service names).dateRangeserializerhelp_textso the documented format flows into the generated MCP tool schemas.How did you test this code?
I'm an agent (PostHog Slack app). I added
products/tracing/backend/tests/test_date_window.py— parameterized unit tests over the pure normalizer covering: lowercase-m-means-minutes translation, case-insensitive units, ISO and global-grammar passthrough (locking in no-regression for-1dStart/-3q/-1mStart), empty-value handling, and the 400-on-garbage behavior that replaces the old silent fallback. These guard the exact regression this PR fixes (a window meaning the wrong unit, or bad input silently returning stale data).I could not run the Python suite in this environment (no flox/venv), so I validated the regex/translation logic against all test cases in a standalone harness and
py_compiled the changed modules. Ordering and lint should be confirmed by CI.🤖 Agent context
Autonomy: Human-driven (agent-assisted)
Authored by the PostHog Slack app from a Slack thread, after diagnosing the
-30m-returns-stale-data behavior live via thequery-apm-spans/apm-trace-getMCP tools. Skills invoked:/improving-drf-endpoints(serializer/help_text review) and/writing-tests(kept the tests at the pure-function level rather than standing up the API).Key decision: lowercase
mmeans months everywhere else in PostHog and is heavily tested, so the minutes interpretation is scoped to the tracing layer rather than changed globally. Months/years are meaningless for trace retention, so the divergence is safe here; the global grammar is still accepted for anyone relying on it.Created with PostHog from a Slack thread