Skip to content

Rewrite webhooks reference with event lifecycle and v1/v2 schema notes#52

Draft
samgutentag wants to merge 1 commit into
mainfrom
sam-gutentag/webhook-lifecycle
Draft

Rewrite webhooks reference with event lifecycle and v1/v2 schema notes#52
samgutentag wants to merge 1 commit into
mainfrom
sam-gutentag/webhook-lifecycle

Conversation

@samgutentag
Copy link
Copy Markdown
Member

Summary

  • Rewrites merge-queue/webhooks.mdx (previously a near-empty stub deferring to Svix) into a real reference covering subscription families, event lifecycle, batch behavior, payload caveats, v1/v2 test_case.status_changed, quarantining events, and AI investigation events.
  • Adds a cross-link from flaky-tests/webhooks/index.mdx to the Merge Queue webhooks page so integrators landing on either side find the other.

Why

Sourced from customer feedback mining (cluster webhook-event-lifecycle, verdict missing, 11 pairs across 7 customers). The current webhooks.mdx is a stub that defers to Svix; customers integrating against Trunk webhooks hit recurring confusion on batch vs PR lifecycle, v1/v2 schema, bisection semantics, and event-subscription requirements.

Items flagged for review

  • v2.test_case.status_changed naming — used the literal event name based on the Brex/Healthie Slack replies and the existing flaky-tests/webhooks/index.mdx page. Confirm this matches the Svix catalog exactly.
  • pull_request.queued reason field enum — claims BISECTION_REQUIRED (entering) and BISECTION_TEST_RUN_PASSED (returning). Pulled verbatim from the Faire thread; confirm these are the exact enum strings in the live payload.
  • Bisection lifecycle for higher-queue failures — claim that PRs above a pending-failure batch can get kicked first and the lower ones re-queued (so later pull_request.queued events don't necessarily map back to the original pending_failure). Confirm phrasing matches actual queue behavior.
  • Payload timestamp absence — the existing flaky-tests/webhooks/index.mdx payload tables list timestamp as an ISO 8601 field for v2.test_case.status_changed, but the AgencyAnalytics thread says payloads have no timestamp. One of the two is wrong — flagging so the right one stays.
  • Quarantining enum — claims test_case.quarantining_setting_changed fires only for manual ALWAYS / NEVER overrides, not auto-quarantine. Pulled from the Descript thread; confirm there isn't a third trigger.
  • current_status == FLAKY check — taken verbatim from the Descript reply. Confirm the field is literally current_status (not new_status or status) on the test_case.status_changed payload.
  • flaky-tests/get-test-details API path — linked to api.trunk.io/docs#tag/flaky-tests as a best guess. Confirm the actual API reference URL.
  • Replaced Slack-community support link with support@trunk.io per global memory (no community Slack for support). The sibling migrating-from-github-merge-queue.mdx still uses the old Slack link — separate cleanup pass if desired.

Customer signal

@samgutentag samgutentag added the needs review PR sourced from customer-feedback-mining; needs human scrutiny for accuracy before merge label May 20, 2026
@mintlify
Copy link
Copy Markdown
Contributor

mintlify Bot commented May 20, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
trunk 🟢 Ready View Preview May 20, 2026, 11:04 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@samgutentag
Copy link
Copy Markdown
Member Author

Verification status (2026-05-21): unknown

Could not determine rollout state from available signals. Chaining to verify-docs-against-code for content-accuracy check.

  • Eng PR: none referenced in PR body
  • Flag: n/a (no eng work to verify)
  • Signals checked:
    • No trunk-io/<repo>#NNN or PR URL references in PR body
    • No TRUNK-XXX Linear ticket linked
    • PR scope is content (documenting existing webhooks event lifecycle), not a flag-gated new feature

This is a content-accuracy PR, not a feature-rollout PR. The seven "items flagged for review" in the PR body (event name, enum values, bisection lifecycle, timestamp absence, quarantining triggers, current_status field, API path) are the real blockers and need direct verification against source. Auto-chaining to verify-docs-against-code for that pass now.

@samgutentag samgutentag marked this pull request as draft May 21, 2026 07:10
@samgutentag
Copy link
Copy Markdown
Member Author

Code verification (2026-05-21): 3 confirmed / 3 contradicted / 0 ambiguous / 1 partial

Claim Verdict Source
1. v2.test_case.status_changed event name confirmed flake-detection/src/types.ts#L235-L237
2. pull_request.queued reason field with BISECTION_REQUIRED / BISECTION_TEST_RUN_PASSED contradicted events_merge.ts#L157
3. PRs above a pending-failure batch can be kicked, lower ones re-queued with a non-bisection reason confirmed merge_graph.ts#L2784-L2794
4. Webhook payloads have no timestamp field contradicted events_merge.ts#L91-L93, flake-detection/src/types.ts#L239-L242
5. test_case.quarantining_setting_changed fires only on manual override, not auto-quarantine confirmed (with note on enum strings) flaky_tests_service.ts#L782-L794
6. v2 status-changed payload field is current_status contradicted flake-detection/src/types.ts#L254
7. flaky-tests/get-test-details is a real API path confirmed (URL fragment is rendering-specific) docs2/openapi.json#L3104

Three contradictions to resolve before publishing. (1) The pull_request.queued payload field is queued_reason, not reason. (2) Merge Queue webhook payloads include a required ISO 8601 timestamp field; same for the v2 flaky-tests payload. The "no timestamp" workaround section should be deleted or rewritten. (3) The v2.test_case.status_changed payload uses new_status (and previous_status) as top-level fields; current_status only appears in the legacy test_case.status_changed payload, nested inside status_change.current_status. The current_status == FLAKY example needs to become new_status == "FLAKY". Existing flaky-tests/webhooks/index.mdx is correct about both timestamp and new_status; the new PR drifted from it. Also note the quarantining enum string values are ALWAYS_QUARANTINE / NEVER_QUARANTINE (not ALWAYS / NEVER).


Source #1v2.test_case.status_changed is the canonical event name (confirmed)

File: trunk-io/trunk2/ts/packages/flake-detection/src/types.ts#L235-L242

export const V2_TEST_CASE_STATUS_CHANGED_SCHEMA = z.object({
  type: z
    .literal("v2.test_case.status_changed")
    .describe("The type of webhook event."),
  timestamp: z
    .string()
    .datetime()
    .describe("The timestamp of the webhook event."),

Cross-check: registered with Svix at svix-publish-schemas/src/event-types.ts#L34 as "v2.test_case.status_changed": V2_TEST_CASE_STATUS_CHANGED_EVENT_DEFINITION, and emitted from flake-detection-side-effects-handler/src/webhook.ts.

Reasoning: The literal string, casing, v2. prefix, and dot separators all match the PR copy exactly. The publish-schemas registry is what Svix sees, so the Svix catalog reflects this same name.

Source #2pull_request.queued field name is queued_reason, not reason (contradicted)

File: trunk-io/trunk/trunk/services/public-api-shared/webhook_events/events_merge.ts#L155-L165

export const EVENT_MERGE_PULL_REQUEST_QUEUED_SCHEMA = z
  .object({
    queued_reason: z.enum(enumValues(MergeItemPendingReason)).optional().openapi({
      description: "The reason the PR was queued",
    }),
  })
  .merge(pullRequestBaseWebhookPayloadSchema("queued"))
  .openapi({
    description:
      "Triggered when a PR has passed any branch protection requirements and is ready to be tested in the merge queue",
  }) satisfies CommonlyDefinedWebhookEvent;

Confirmed again in the emission integration test reporter.test.ts#L2638-L2653:

expectCreateWebhookMessage(mockedMessageClient, "pull_request.queued", {
  type: "pull_request.queued",
  action: "queued",
  status: "queued",
  ...
  timestamp: transitionTime.toISOString(),
  queued_reason: types.MergeItemPendingReason.MERGE_ITEM_NOW_READY,
});

The enum values BISECTION_REQUIRED and BISECTION_TEST_RUN_PASSED themselves are confirmed in merge-types/index.ts#L174-L185.

Reasoning: The Zod schema, the OpenAPI export, and the integration test all consistently name the field queued_reason. Same pattern applies to sibling events (failure_reason, merged_reason, cancellation_reason, submitted_reason) — every state-transition event uses <state>_reason, not a bare reason. The PR's reason shorthand will not match payload field selectors in customer code. Fix: replace reason with queued_reason in both bullet items and the surrounding prose.

Source #3 — Higher PRs can be kicked first, surviving PRs re-queued with non-bisection reasons (confirmed)

File: trunk-io/trunk/trunk/services/merge/src/controller/merge_graph.ts#L2784-L2794

for (const node of nowBisectingPendingNodes) {
  nowPendingNodes.push(node);
  pendingIdsToTransitionReason.set(node.id, MergeItemPendingReason.BISECTION_REQUIRED);
  nowPendingNodeTestRunIds.add(node.testRunId!);
}
for (const { node, prerequisiteMergeItemIds } of nowNonBisectingPendingNodes) {
  nowPendingNodes.push(node);
  pendingIdsToTransitionReason.set(
    node.id,
    MergeItemPendingReason.PREREQUISITE_MERGE_ITEM_FAILED,
  );

Reasoning: The bisection unwind path splits affected nodes into two buckets and assigns different queued_reason values to each. BISECTION_REQUIRED goes to the nodes actually bisecting; PREREQUISITE_MERGE_ITEM_FAILED goes to the surviving downstream nodes re-queued behind them. So after a single pull_request_batch.pending_failure, subsequent pull_request.queued events on related PRs may carry PREREQUISITE_MERGE_ITEM_FAILED rather than BISECTION_REQUIRED, exactly as the PR describes.

Source #4 — Webhook payloads DO include timestamp (contradicted)

File (Merge Queue events): trunk-io/trunk/trunk/services/public-api-shared/webhook_events/events_merge.ts#L91-L93

    timestamp: z.string().datetime().openapi({
      description: "The ISO 8601 timestamp of when this state transition occurred",
    }),

timestamp is on the base schema (pullRequestBaseWebhookPayloadSchema), so every pull_request.* event inherits it. The OpenAPI export in docs2/openapi.json#L5091-L5106 marks timestamp as required.

File (v2 flaky-tests event): trunk-io/trunk2/ts/packages/flake-detection/src/types.ts#L239-L242

  timestamp: z
    .string()
    .datetime()
    .describe("The timestamp of the webhook event."),

Reasoning: Both webhook families covered by this docs page (Merge Queue pull_request.* and Flaky Tests v2.test_case.status_changed) emit an ISO 8601 timestamp. The PR's "no event timestamp" claim and the "use receive-time as the event time" workaround are wrong on both sides. Drop the entire "Payload notes" section, or rewrite it to reference the timestamp field. Existing flaky-tests/webhooks/index.mdx already documents timestamp correctly — the new PR is the regression, not the existing page.

Source #5test_case.quarantining_setting_changed only fires from the manual override RPC (confirmed)

File: trunk-io/trunk/trunk/services/metrics-flaky-tests/flaky_tests_service.ts#L782-L794

        await sendQuarantineSettingChangedWebhook({
          organizationServiceClient: this.clients.organizationServiceClient,
          userServiceClient: this.clients.userServiceClient,
          metadata: call.metadata,
          svixClient: this.svixClient,
          previousQuarantiningOverride,
          updatedQuarantiningOverride: mappedReq.quarantiningOverride,
          quarantineAuditLog,

This call lives inside the setTestQuarantiningOverride gRPC handler (user-initiated RPC). A repo-wide cross-search for sendQuarantineSettingChangedWebhook returns this one caller (plus the function definition). No auto-quarantine code path emits this event.

Enum-string nit: override values are ALWAYS_QUARANTINE, NEVER_QUARANTINE, UNSPECIFIED per trunk2/ts/packages/prisma/flakyDb/kysely/types.ts#L24-L28:

export const QuarantiningOverrideStatus = {
    UNSPECIFIED: "UNSPECIFIED",
    NEVER_QUARANTINE: "NEVER_QUARANTINE",
    ALWAYS_QUARANTINE: "ALWAYS_QUARANTINE"
} as const;

Reasoning: The "fires only on manual override" claim is correct. There is no third trigger. Tighten the doc copy to use the actual enum strings (ALWAYS_QUARANTINE / NEVER_QUARANTINE) since customers filtering on payload values will need the literal strings.

Source #6 — v2 payload uses new_status, not current_status (contradicted)

File: trunk-io/trunk2/ts/packages/flake-detection/src/types.ts#L235-L268

export const V2_TEST_CASE_STATUS_CHANGED_SCHEMA = z.object({
  type: z
    .literal("v2.test_case.status_changed")
    .describe("The type of webhook event."),
  timestamp: z.string().datetime().describe("The timestamp of the webhook event."),
  repo_id: z.string().uuid().describe("A unique identifier for the repository."),
  test_case_id: z.string().uuid().describe("A unique identifier for the test case."),
  previous_status: TestCaseStatusSchema.describe(
    "The previous status of the test case.",
  ),
  new_status: TestCaseStatusSchema.describe("The new status of the test case."),

For contrast, the legacy v1 event (kept for backward compatibility) does use current_status nested inside status_change per detection-engine-webhook-event-handler/src/types.ts#L42-L52:

export type EnrichedTestCaseStatusChangedEvent = {
  type: "test_case.status_changed";
  status_change: {
    current_status: { reason: string; timestamp: string; value: "healthy" | "flaky" | "broken"; };
    previous_status: "healthy" | "flaky" | "broken";
  };
};

Reasoning: The PR conflates the legacy v1 shape (status_change.current_status) with the v2 shape (top-level new_status). The "Quarantining events" section says to "check for current_status == FLAKY in the payload" — that selector hits undefined against a v2 payload. Fix: replace current_status == FLAKY with new_status == "FLAKY". The existing flaky-tests/webhooks/index.mdx already documents new_status correctly; the new PR is the drift, not the existing page.

Source #7/flaky-tests/get-test-details is a real public-API path (confirmed; URL fragment unverifiable)

File: trunk-io/docs2/openapi.json#L3104

"/flaky-tests/get-test-details": {

The full path /flaky-tests/get-test-details is defined as an operation in the public OpenAPI spec, sibling to /flaky-tests/list-quarantined-tests, /flaky-tests/list-unhealthy-tests, etc.

Reasoning: The API path is real. The #tag/flaky-tests URL hash in the PR is a rendering convention (Redoc / Scalar / Mintlify all group operations under tags derived from the path prefix). The OpenAPI file as exported here doesn't declare explicit tags, so the fragment depends on whatever renders the spec at api.trunk.io. Worth a quick manual click-through before publishing.

@samgutentag samgutentag added the needs eng review verify-docs-against-code: at least one claim contradicts source. label May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs eng review verify-docs-against-code: at least one claim contradicts source. needs review PR sourced from customer-feedback-mining; needs human scrutiny for accuracy before merge

Development

Successfully merging this pull request may close these issues.

1 participant