Skip to content

fix(engine): retry probe on pollHfReady zero-duration timeout#1824

Merged
miguel-heygen merged 1 commit into
mainfrom
fix/probe-retry-zero-duration-timeout
Jul 1, 2026
Merged

fix(engine): retry probe on pollHfReady zero-duration timeout#1824
miguel-heygen merged 1 commit into
mainfrom
fix/probe-retry-zero-duration-timeout

Conversation

@miguel-heygen

Copy link
Copy Markdown
Collaborator

Summary

  • pollHfReady's own timeout ("[FrameCapture] Composition has zero duration. Runtime ready: false, ...") was falling through isTransientBrowserError as non-transient, so the probe stage's existing retry-with-fresh-session logic (already used for frame detachment, disconnects, navigation timeouts, launch failures) never got a chance to run — the render failed outright on the first hiccup.
  • In practice this timeout fires when window.__renderReady simply doesn't flip true within playerReadyTimeout (45s), most often under host contention (several renders/browsers running concurrently) rather than any defect in the composition. Re-running an affected composition standalone succeeded immediately (~3.5-4.4s init vs. the 45s it hit under concurrent load).
  • Added the "Runtime ready: false" message shape to the transient-error patterns so it gets the same one-retry-with-fresh-session treatment. Left the "Runtime ready: true" fast-fail case (no GSAP timeline + no data-duration) unmatched on purpose — that's a genuine authoring bug, not a timing fluke, and should keep failing fast without wasting a retry.

Test plan

  • frameCapture-transientErrors.test.ts — added transient/non-transient cases for both message shapes; all 20 pass.
  • probeStage.test.ts — added an end-to-end test that the probe stage retries once and succeeds on this error, and a companion test that the permanent (Runtime ready: true) variant still fails fast without retrying; all 14 pass.
  • oxlint / oxfmt clean on changed files.
  • Manually reproduced the original failure mode and confirmed a plain retry (fresh browser session) succeeds, matching the fix's behavior.

Renders were failing outright with "[FrameCapture] Composition has zero
duration. Runtime ready: false, ..." whenever window.__renderReady didn't
flip true within playerReadyTimeout (45s) — most often under host
contention (e.g. several renders running concurrently), never from a
defect in the composition itself. Confirmed by re-running an affected
composition standalone: it succeeded immediately (initMs ~3.5-4.4s vs.
the 45s timeout it hit under concurrent load).

The probe stage already retries once with a fresh browser session for
exactly this class of "succeeds on retry" infra flakiness (frame
detachment, disconnects, navigation timeouts, launch failures), but
isTransientBrowserError didn't recognize this message, so it fell
through to an immediate, unretried failure.

Match "Composition has zero duration ... Runtime ready: false" as
transient. Left the "Runtime ready: true" case (pollHfReady's fast-fail:
no GSAP timeline and no data-duration) unmatched — that's a genuine
authoring bug, not a timing fluke, and should keep failing fast.
@miguel-heygen miguel-heygen merged commit b33d54f into main Jul 1, 2026
45 checks passed
@miguel-heygen miguel-heygen deleted the fix/probe-retry-zero-duration-timeout branch July 1, 2026 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant