Skip to content

fix(tracing): support minute/second window shorthand in spans API#66382

Draft
jonmcwest wants to merge 3 commits into
masterfrom
posthog-code/tracing-minute-window-shorthand
Draft

fix(tracing): support minute/second window shorthand in spans API#66382
jonmcwest wants to merge 3 commits into
masterfrom
posthog-code/tracing-minute-window-shorthand

Conversation

@jonmcwest

Copy link
Copy Markdown
Contributor

Problem

The tracing (APM) spans API accepts a relative dateRange.date_from and parses it with PostHog's global relative-date grammar — where a bare lowercase m means months. For a product whose data is measured in hours and days, that's the wrong default: -30m (intended as "30 minutes ago") resolves to ~30 months ago, so a "last 30 minutes" query silently returns very old spans. Worse, an unrecognized window string fell back to "now" with no error, so a typo'd range returned stale data with no signal that anything was wrong.

This was hit in practice via the MCP query-apm-spans tool: a "slow queries in the last 30 minutes" request returned weeks-old rows.

Changes

  • New products/tracing/backend/date_window.py with a tracing-scoped window normalizer:
    • Reads s/m/h/d/w intuitively, so -30m = 30 minutes (the one deliberate divergence from the global grammar, mapped onto its M-for-minutes token).
    • Passes ISO 8601 timestamps and the broader global relative grammar (-1dStart, -3q, -1y, month boundaries …) through unchanged, so nothing that worked before regresses.
    • Raises a ValidationError (HTTP 400) on genuinely unparseable input instead of failing open to "now".
  • Applied at every dateRange parse site across the spans endpoints (query, sparkline, aggregation, breakdown, histogram, symbol stats, attribute names/values, service names).
  • Updated the dateRange serializer help_text so the documented format flows into the generated MCP tool schemas.

How did you test this code?

I'm an agent (PostHog Slack app). I added products/tracing/backend/tests/test_date_window.py — parameterized unit tests over the pure normalizer covering: lowercase-m-means-minutes translation, case-insensitive units, ISO and global-grammar passthrough (locking in no-regression for -1dStart/-3q/-1mStart), empty-value handling, and the 400-on-garbage behavior that replaces the old silent fallback. These guard the exact regression this PR fixes (a window meaning the wrong unit, or bad input silently returning stale data).

I could not run the Python suite in this environment (no flox/venv), so I validated the regex/translation logic against all test cases in a standalone harness and py_compiled the changed modules. Ordering and lint should be confirmed by CI.

🤖 Agent context

Autonomy: Human-driven (agent-assisted)

Authored by the PostHog Slack app from a Slack thread, after diagnosing the -30m-returns-stale-data behavior live via the query-apm-spans / apm-trace-get MCP tools. Skills invoked: /improving-drf-endpoints (serializer/help_text review) and /writing-tests (kept the tests at the pure-function level rather than standing up the API).

Key decision: lowercase m means months everywhere else in PostHog and is heavily tested, so the minutes interpretation is scoped to the tracing layer rather than changed globally. Months/years are meaningless for trace retention, so the divergence is safe here; the global grammar is still accepted for anyone relying on it.


Created with PostHog from a Slack thread

The tracing (APM) spans API parses dateRange.date_from via PostHog's global
relative-date grammar, where a bare lowercase "m" means months. For a
short-window product that meant "-30m" resolved to 30 months ago and silently
returned stale spans instead of the last 30 minutes — and any unparseable
window fell back to "now" with no error.

Add a tracing-scoped normalizer that reads s/m/h/d/w intuitively (m = minutes),
passes ISO 8601 timestamps and the broader global relative grammar through
untouched, and raises a 400 on genuinely unparseable input instead of failing
open. Applied at every dateRange parse site across the spans endpoints, with the
serializer help_text updated so the documented format flows into the generated
MCP tool schemas.

Generated-By: PostHog Code
Task-Id: 81588816-332c-4fa0-8c26-84f0f973eaab
@greptile-apps

greptile-apps Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Reviews (1): Last reviewed commit: "fix(tracing): support minute/second wind..." | Re-trigger Greptile

Comment thread products/tracing/backend/presentation/date_window.py Outdated
The import-linter "presentation must use facade" contract forbids the
presentation layer from importing new top-level backend modules. The window
normalizer is DRF-aware request validation (it raises a DRF ValidationError),
so it belongs in the presentation package rather than behind the facade, which
must stay free of HTTP concerns. Move it there — presentation→presentation
imports are allowlisted — and revert the serializer help_text to its original
wording to avoid drift in the committed OpenAPI/MCP generated artifacts.

Generated-By: PostHog Code
Task-Id: 81588816-332c-4fa0-8c26-84f0f973eaab
The global-grammar passthrough regex used \d* (optional number), so a bare unit
like "h" or "-m" matched and was forwarded to the global parser, which resolves
it to "now" — the silent fail-open this module exists to prevent. Require either
a number (position optional) or a bare unit with an explicit Start/End boundary,
so legitimate boundary forms like "wStart"/"mStart" still pass while a bare unit
alone now returns a 400. Flagged by Greptile.

Generated-By: PostHog Code
Task-Id: 81588816-332c-4fa0-8c26-84f0f973eaab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant