Skip to content

feat(telemetry): unify CLI and Studio PostHog identity (Layer 1)#1829

Open
jrusso1020 wants to merge 3 commits into
mainfrom
telemetry/unify-cli-studio-identity
Open

feat(telemetry): unify CLI and Studio PostHog identity (Layer 1)#1829
jrusso1020 wants to merge 3 commits into
mainfrom
telemetry/unify-cli-studio-identity

Conversation

@jrusso1020

@jrusso1020 jrusso1020 commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Problem

A HyperFrames developer using the CLI and the Studio it launches shows up as multiple different people in PostHog (project phc_zjjbX0Pn…), even though all surfaces write to the same project. Concretely:

  • The CLI uses config.anonymousId (a randomUUID() in ~/.hyperframes/config.json) as its distinct_id and never calls identify/alias.
  • Studio minted two independent browser ids, neither aliased:
    • localStorage["hf-studio-anon-id"]studio:* events (utils/studioTelemetry.ts)
    • localStorage["hyperframes-studio:anonymousId"]studio_* + render events (telemetry/config.ts)

So one developer looked like up to three separate persons, and the CLI→Studio→render funnel could not be joined per user.

Approach (Layer 1 — device-level stitch, no login, no new PII)

The CLI owns both the Studio launch and the local server, so it seeds the browser with its own anonymous machine id and Studio adopts it. This uses only the existing anonymous config.anonymousId (a random UUID) — no email, no user id, no personal info — so the existing "No personal info" telemetry disclosure stays valid.

  1. CLI seeds its id. The embedded studio server injects window.__HF_CLI_DISTINCT_ID into the served index.html <head>, mirroring the existing window.__HF_STUDIO_ENV__ injection. A hash/query param was deliberately avoided so the id never leaks into $current_url / url_hash telemetry or browser history. A fallback GET /api/telemetry-identity endpoint is also exposed. Seeding is gated on CLI telemetry being enabled — if the CLI is opted out / in dev / CI, nothing is seeded.
  2. Studio adopts it. A new single-source-of-truth resolver (telemetry/distinctId.ts) reads window.__HF_CLI_DISTINCT_ID at boot; when present it adopts it as the Studio distinct_id and persists it, so the browser session joins the CLI machine's PostHog person.
  3. Studio's two ids are unified. Both getAnonymousId() (telemetry/config.ts) and getDistinctId() (utils/studioTelemetry.ts) now delegate to the one resolver, so studio:*, studio_*, and render events share a single id. The resolver also backfills both legacy localStorage keys so existing browsers converge.
  4. Clean fallback. If no CLI id is present (Studio opened standalone), the resolver falls back to the existing per-browser localStorage id — behaviour is identical to today.

Since the render channel (useRenderQueue.tstelemetryDistinctId: getAnonymousId() → CLI studioRenderTelemetry.ts) already routes through getAnonymousId(), render events now automatically carry the unified id too.

How identity flows CLI → Studio

CLI reads config.anonymousId (random UUID, ~/.hyperframes/config.json)
        │  (only if CLI telemetry enabled)
        ▼
studioServer injects <script>window.__HF_CLI_DISTINCT_ID="…"</script> into index.html <head>
   (fallback: GET /api/telemetry-identity → { distinctId })
        │
        ▼
Studio boot: resolveStudioDistinctId() adopts __HF_CLI_DISTINCT_ID, persists to both localStorage keys
        │
        ├── studio:* events (utils/studioTelemetry.ts) ─┐
        ├── studio_* events (telemetry/config.ts) ──────┤ all share ONE distinct_id
        └── render events (useRenderQueue → CLI) ───────┘  == CLI's distinct_id

Result: CLI cli_command*, Studio studio:* / studio_*, and render events all resolve to the same PostHog person.

Files changed

  • packages/cli/src/server/telemetryIdentity.ts (new) — resolveCliTelemetryDistinctId() (gated on shouldTrack(), fail-silent) and buildCliIdentityScript() (HTML-escaped inline script). Kept separate from studioServer.ts so it's unit-testable without the server's heavy render deps.
  • packages/cli/src/server/studioServer.ts — inject the identity script into the SPA <head> alongside the env script; add GET /api/telemetry-identity fallback endpoint.
  • packages/studio/src/telemetry/distinctId.ts (new) — single source of truth: adopt CLI id → else reuse persisted id (canonical or legacy) → else mint new; memoized; persists to both keys.
  • packages/studio/src/telemetry/config.tsgetAnonymousId() now delegates to the resolver.
  • packages/studio/src/utils/studioTelemetry.tsgetDistinctId() now delegates to the resolver.
  • Tests: packages/cli/src/server/telemetryIdentity.test.ts (new, 7 tests) and packages/studio/src/telemetry/distinctId.test.ts (new, 10 tests).

Privacy / disclosure

Uses only the pre-existing anonymous machine UUID — no PII is introduced or shared. The CLI's "No personal info, file paths, or content is collected" notice remains accurate. Seeding is suppressed whenever CLI telemetry is off (opt-out / dev / CI / DO_NOT_TRACK).

Test / verification

  • bun run --filter @hyperframes/studio test1220 passed (incl. new distinctId.test.ts).
  • bun run --filter @hyperframes/cli test1100 passed (incl. new telemetryIdentity.test.ts).
  • typecheck clean for both packages; bun run lint (oxlint + skills) clean; oxfmt --check clean.
  • Production builds succeed: CLI ESM build + Studio Vite build.
  • Pre-commit hooks (lint, format, fallow audit, typecheck, commitlint) all pass.

Decisions & risks

  • Injected global vs endpoint: the injected window.__HF_CLI_DISTINCT_ID is primary because it's synchronous and race-free at boot (Studio fires session_start in main.tsx before an async fetch could resolve). /api/telemetry-identity is a documented fallback for clients that can't use the global.
  • Adopt vs alias: we adopt the CLI id directly (Studio's clients are hand-rolled batch clients, not posthog-js) rather than sending $identify/$create_alias. This avoids irreversible PostHog merges while still unifying events.
  • Over/under-merge: device-level stitching can over-merge if multiple people share one machine/config, or under-merge across machines. Acceptable for Layer 1; Layer 2 addresses cross-machine identity.
  • Back-compat: older Studio bundles simply ignore the injected global (fallback path unchanged); the render distinctId was already optional.

Follow-up — Layer 2 (person-level, cross-machine, needs login)

When the CLI is authenticated (HeyGen OAuth already exists in packages/cli/src/auth), $identify both surfaces to a stable HeyGen user id as the canonical distinct_id so identity unifies across machines/browsers. This requires capturing a stable id from /v3/users/me (today only email/username are extracted; the numeric/sub id is dropped) and updating the telemetry disclosure + consent story, since identifying by user id/email is a change from the current anonymous-only model.

@jrusso1020 jrusso1020 marked this pull request as ready for review July 1, 2026 07:57

@miga-heygen miga-heygen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(telemetry): unify CLI and Studio PostHog identity (Layer 1)

Summary: Stitches CLI and Studio telemetry to a single PostHog person by seeding the CLI's anonymous UUID into the Studio SPA via an injected global, with a clean resolver that unifies Studio's two legacy identity keys. Clean design, good test coverage, proper XSS escaping.

Findings:

# Location Severity Note
1 telemetryIdentity.ts:49-51 nit JSON.stringify + manual < and / escaping is correct for </script> breakout prevention. Comment could mention the / escape also prevents </ inside the script tag, not just <. Test at line 74-78 validates this.
2 distinctId.ts:106-108 nit cachedId ??= "anonymous"??= is redundant since cachedId is guaranteed null here (checked at top of function). Plain = would be clearer.
3 distinctId.ts:55-60 nit getCliDistinctId catches all exceptions silently. A console.debug would help during development. Consistent with fail-silent telemetry contract.
4 studioServer.ts:572-577 nit /api/telemetry-identity returns { distinctId: null } when telemetry is off. Studio doesn't use this endpoint (reads window global). Fine as a fallback for future consumers.
5 studioServer.ts — injection ordering suggestion No test asserting buildCliIdentityScript() runs before buildRuntimeEnvScript() in the <head>. If swapped, Studio telemetry init could race on the window global.
6 distinctId.ts:41-46 suggestion safeLocalStorage helper duplicated from config.ts. Extracting to shared util would remove ~6 lines of duplication.

Verified correct:

  • getAnonymousId no longer returns "anonymous" directly on localStorage failure — resolveStudioDistinctId() handles this at line 105-108. Behavior preserved.
  • Module-level distinctId cache removed from studioTelemetry.ts — now handled by cachedId inside distinctId.ts. Memoization contract maintained.
  • readConfig is NOT called when telemetry is disabled (test at line 37-40 confirms). No unnecessary disk I/O.

Test coverage (17 new tests): CLI id adoption + persistence, CLI override of persisted id, empty CLI id fallback, canonical key reuse, legacy key migration, fresh mint, memoization, telemetry-disabled path, XSS escaping. Solid.

Ponytail assessment: net: -5 lines possible (dedup safeLocalStorage, inline getDistinctId wrapper) — neither worth blocking on.

Verdict: LGTM — Clean design, exemplary PR description, proper XSS defense, privacy-preserving (anonymous UUID only, gated on opt-in). Ship it.

— Miga

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at HEAD 0ca2344e40cb76bd908f28c33ada9c194a778404.

Overall: Clean design, coherent identity story, solid test coverage (17 new tests). Traced end-to-end: CLI seeds window.__HF_CLI_DISTINCT_ID via classic inline script in <head> → executes before deferred main.tsx module → session_start reads it synchronously. Race-free as PR body claims. Layering on top of Miga's review posted ~10 min ago — most of her findings match mine; I'm adding a few she didn't cover rather than reposting the overlap.

Layering on Miga's review:

  • 🟡 Miga's #5 (no test asserting buildCliIdentityScript() runs before buildRuntimeEnvScript()) — agreed, and I'd extend: the more important invariant is "identity script is the FIRST child of <head>, not just before env script." A future contributor adding a third <head> inject that lands ahead of it wouldn't break tests. A single test asserting html.indexOf("__HF_CLI_DISTINCT_ID") < html.indexOf("<script") for the built HTML would pin this.
  • 🟡 Miga's #6 (safeLocalStorage duplication) — worth also noting Studio's telemetry/config.ts still has its own copy at lines 12-19 (unchanged). If dedup happens, both callers should move to the shared util.

My additional findings:

  • 🟡 packages/cli/src/server/studioServer.ts:575/api/telemetry-identity has zero consumers in-repo (verified: grep -rn "telemetry-identity" packages/ returns only the endpoint definition itself). PR body says it's "documented fallback for clients that can't use the global." Fine, but either (a) add a comment identifying the intended consumer (SSR frame? out-of-tree Studio embedding?) or (b) drop until a real caller exists. Right now the tests in telemetryIdentity.test.ts don't cover the HTTP endpoint at all — only the pure helpers.

  • 🟡 packages/studio/src/telemetry/distinctId.test.ts — one memoization gap: the test at line 70-74 sets no CLI global before the first resolveStudioDistinctId() call, so it verifies "persisted-id memo survives localStorage mutation." Worth adding a symmetric test: "CLI-adopted id memo survives later window.__HF_CLI_DISTINCT_ID mutation" — arm the CLI id, call resolve, then reassign window.__HF_CLI_DISTINCT_ID = "different", assert memo still returns the first id. Guards against future code that toggles the global late.

  • ↩️ packages/studio/src/telemetry/distinctId.ts:98-118 — small semantic quirk: when ls === null (SSR / locked-down browser), the fallback is "anonymous" (line 117). When ls exists but getItem throws AND minting fails silently, we fall through to generateId() at line 122. Two different fallback shapes for two adjacent failure modes. Consistency suggestion: in the storage-unavailable branch, also mint a generateId() so all first-open-no-storage cases share one id-per-session shape. Very minor; the current shape is defensible (both are stable within a session), just asymmetric.

  • ⚡ Observability: PR is itself an observability change. Once this ships, how does the team verify the merge worked? A unifiedDistinctIdSource property on studio_session_start (values: "cli-seeded" | "persisted-canonical" | "persisted-legacy" | "fresh-mint") would let you see the CLI-stitch adoption rate in PostHog and confirm legacy-key browsers are converging. Not blocking — but you're building a distributed identity funnel and can't measure funnel completeness without a source tag on the browser side.

  • ↩️ Rollback lever question (non-blocker, worth answering before merge): if this stitching causes an unexpected PostHog merge issue in prod (e.g. one CLI id used across a team via shared config, over-merging real users into one machine), what's the walk-back path? The existing HYPERFRAMES_NO_TELEMETRY=1 opt-out kills both sides, but there's no lever to keep CLI + Studio telemetry ON while un-stitching. If you want a dedicated flag (env var or config toggle for disableStudioIdentitySeed), it's cheap to add now and free insurance. Alternative: given "adopt not alias" (no irreversible PostHog merge), rollback is just "revert the PR" — that's genuine, but only if Studio's persisted key already got the CLI id, that persistence outlives the revert. Some browsers will keep the CLI id in localStorage until manually cleared. Worth noting in follow-up.

Verified correct (my independent trace):

  • main.tsx:8 trackStudioEvent("session_start") at module top-level → classic inline <script> in <head> executes before module scripts → identity always set at session_start firing time.
  • useRenderQueue.ts:122 telemetryDistinctId: getAnonymousId() → posts to /api/projects/:id/render → shared studio-server reads telemetryDistinctId from body (packages/studio-server/src/routes/render.ts:116) → passes as distinctId opt to emitStudioRenderComplete/Error → forwards to trackRenderComplete/Error in CLI's telemetry/events → PostHog with adopted id. Full chain intact.
  • readConfig() fast-path: when CLI telemetry is off, resolveCliTelemetryDistinctId short-circuits before calling readConfig (test at telemetryIdentity.test.ts:37-40 pins this). No unnecessary disk I/O on opt-out.
  • No CSP header set on Studio server, no Content-Security-Policy meta in Studio HTML — inline script injection is CSP-safe.
  • resolveCliTelemetryDistinctId correctly returns null when anonymousId is empty string (test at line 32-36); the id.length > 0 check on the id after the readConfig call handles this defensively even though readConfig normally auto-mints if missing.

Non-issues I checked and cleared:

  • Boot ordering under Vite production build: identity injected as classic inline <script> in <head> → executes synchronously before any <script type="module"> (deferred by default). Race-free.
  • session_start double-fire under React StrictMode: fires at module top-level outside <StrictMode>, executed exactly once per bundle load.
  • Legacy-key backfill: resolveStudioDistinctId writes to BOTH keys on any successful resolution, ensuring utils/studioTelemetry.ts (studio:) and telemetry/config.ts (studio_) always agree.

Verdict: No blockers. Ship-able as-is. The additions above are follow-up polish, not gate criteria. Nice PR — the "adopt not alias" decision is the right call for a first pass.

Review by Rames D Jusso

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — feat(telemetry): unify CLI ↔ Studio PostHog identity (Layer 1)

Verdict: LGTM
Repo: heygen-com/hyperframes Head: 0ca2344
Scope: CLI seeds window.__HF_CLI_DISTINCT_ID into the served Studio index.html; Studio adopts it in a new single-source-of-truth resolver (telemetry/distinctId.ts) that also unifies the two legacy Studio anon-id keys. getAnonymousId() / getDistinctId() both delegate through the resolver, so studio:* + studio_* + render events converge on one distinct_id.

Summary

Clean stitch. Injected inline <script> runs synchronously before the deferred Studio module bundle, so the session_start fire at main.tsx:8 cannot race the CLI id. XSS-hardened script emission (JSON.stringify + < + / escape), gated on shouldTrack() so opt-out / dev / CI seeds nothing. Legacy localStorage keys are backfilled on adoption so any straggling direct reads converge. Consumer audit clean — 3 sites (utils/studioTelemetry.ts getDistinctId, telemetry/config.ts getAnonymousId, components/renders/useRenderQueue.ts telemetryDistinctId) all route through the resolver. Miga posted LGTM first (07-01 08:02 UTC) with six nits; I re-verified independently and confirm.

Findings

🟢 — race-safety-verification

File: packages/cli/src/server/studioServer.ts:713-716, packages/studio/src/main.tsx:8

Miga flagged "no test asserts identity script runs before env script"; I want to make the actual race analysis explicit for future readers. The injected identity is a plain <script>window.__HF_CLI_DISTINCT_ID=…</script> (classic script, synchronous parse). Vite emits the Studio bundle as <script type="module"> (deferred by spec). Classic scripts in <head> execute before deferred module scripts fire. trackStudioEvent("session_start") at main.tsx:8 runs at module top-level, so by the time the memoized resolveStudioDistinctId() first reads window.__HF_CLI_DISTINCT_ID, the global is set. Race-safe by browser-spec ordering, not by test. If someone ever rewrites the server injection to use <script type="module"> or async, the invariant breaks silently — that's the scenario Miga's test suggestion would guard. Non-blocking; noting for the historical record.

🟢 — consumer-audit

Files verified at head SHA:

  • packages/studio/src/utils/studioTelemetry.tsgetDistinctId()resolveStudioDistinctId() (line 27). studio:* events (lines 71, 104) use it. Module-level distinctId cache removed; memo now lives inside the resolver.
  • packages/studio/src/telemetry/config.tsgetAnonymousId()resolveStudioDistinctId() (line 27). studio_* events via client.ts:trackEvent consume this.
  • packages/studio/src/telemetry/client.ts:7 — imports getAnonymousId (delegates to resolver).
  • packages/studio/src/components/renders/useRenderQueue.ts:122telemetryDistinctId: getAnonymousId() in the render POST body. Server side (studioRenderTelemetry.ts:24) accepts the id and attributes the emitted render_complete / render_error to it. Render funnel joinable ✓.

No other reads of hf-studio-anon-id or hyperframes-studio:anonymousId remain outside the new distinctId.ts (which is the sole persistence gateway).

💭 — cli-opt-out-asymmetry (informational, not a bug)

File: packages/cli/src/server/telemetryIdentity.ts:29

When the CLI has telemetry disabled (HYPERFRAMES_NO_TELEMETRY=1, DO_NOT_TRACK=1, dev mode, config.telemetryEnabled=false), the CLI correctly seeds nothing and Studio falls back to its own per-browser id — but Studio's own opt-out is independent (localStorage[hyperframes-studio:telemetryDisabled]), so the browser still sends studio_* events unless also opted out in the UI. This is pre-existing behavior, not a regression introduced by this PR; flagging only because the identity unification makes the asymmetry more visible. Layer 2 (login-based) is the natural place to plumb CLI opt-out → Studio opt-out.

🟢 — nit-convergent-with-miga

Concur with Miga's #2 (redundant ??= at distinctId.ts:106), #4 (/api/telemetry-identity unused today, harmless fallback), #6 (safeLocalStorage duplicated across config.ts + distinctId.ts — ~6 lines dedupe possible). Not worth blocking; band-aid check clear (no duplicate-guard / contradictory-rule / silent-scope-gap patterns).

💭 — readConfig-first-run-side-effect

File: packages/cli/src/telemetry/config.ts:readConfig

readConfig() writes a default config to disk on first-ever call (if ~/.hyperframes/config.json is absent). resolveCliTelemetryDistinctId() calls readConfig() on every served index.html when telemetry is on. In practice the config already exists from prior CLI activity, so this is a no-op — but if someone launches hyperframes preview from a completely fresh environment, serving the SPA is the first thing that materializes the config file. Documented for reviewers; not a defect.

Verification

  • Read packages/cli/src/server/studioServer.ts, packages/cli/src/server/telemetryIdentity.ts, packages/cli/src/telemetry/client.ts, packages/cli/src/telemetry/config.ts, packages/studio/src/telemetry/distinctId.ts, packages/studio/src/telemetry/config.ts, packages/studio/src/utils/studioTelemetry.ts, packages/studio/src/telemetry/client.ts, packages/studio/src/components/renders/useRenderQueue.ts, packages/studio/src/main.tsx, packages/cli/src/server/studioRenderTelemetry.ts, packages/studio/src/utils/generateId.ts — all at 0ca2344.
  • Consumer audit: 3 call sites of the resolver (getDistinctId / getAnonymousId × 2 packages, telemetryDistinctId in render POST). All route through resolveStudioDistinctId(). No orphan direct reads of legacy localStorage keys.
  • Race analysis: classic-script injection in <head> vs Vite deferred module — spec-ordered, safe.
  • XSS: JSON.stringify + < + / escape; input is a random UUID from disk config; test at telemetryIdentity.test.ts:74-78 verifies escaping even for malicious </script><script> input.
  • Test coverage: 17 new tests (7 CLI + 10 Studio) exercise enabled/disabled/empty/exception paths, key migration, memoization, XSS. Solid.
  • CI: Preflight, Lint, Typecheck, Build, Fallow audit, SDK: unit+contract+smoke, Studio: load smoke, regression, Preview parity, player-perf all green at HEAD. Windows shards + CLI-smoke still in progress at review time (not blocking for a stamp; author already reports full test suites passing per PR body).
  • Prior reviews: Miga (miga-heygen) LGTM at 2026-07-01 08:02:02Z, 6 nits, all cosmetic. Rames not yet posted. Convergent on verdict.

Review by Via

@miguel-heygen miguel-heygen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code pass is clean. The CLI-side gate in packages/cli/src/server/telemetryIdentity.ts:27-51 only exposes the anonymous id when CLI telemetry is enabled and escapes the inline script value; packages/cli/src/server/studioServer.ts:570-577 adds the fallback endpoint without changing the shared API routing; and packages/studio/src/telemetry/distinctId.ts:84-125 unifies the CLI-seeded, canonical, and legacy Studio ids fail-silently. The new tests cover the main precedence and fallback paths.

No functional blockers found. I agree with Miga that an ordering test for packages/cli/src/server/studioServer.ts:713-717 and deduping the localStorage helper are useful follow-ups, not blockers.

Blocking cleanup before stamp: the PR body still has a generated-tool footer, and both commits still carry extra co-author trailers. Please scrub those from the PR metadata/commit messages before merge.

CI note: no failures from the completed checks; the two Windows required checks were still pending during this pass.

Verdict: REQUEST CHANGES
Reasoning: The implementation looks sound, but the generated attribution metadata needs cleanup before I can approve.

— Magi

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addendum to prior review — HF #1829

Three reviewers landed within ~4 seconds of my initial LGTM (parallel-reviewer race — 13s later than mine's read-start freshness check). Recording the aggregated state so James has one canonical picture, and adjusting my own verdict.

State after race

  • Miga — LGTM with 6 cosmetic nits (??= redundancy, safeLocalStorage dedup, comment clarification, endpoint currently unused, ordering-test suggestion).
  • Rames — No blockers, adds several polish items I didn't cover: memoization test gap for CLI-id late reassignment, ls===null"anonymous" vs generateId fallback asymmetry, unifiedDistinctIdSource observability tag suggestion (self-check the merge worked in prod), rollback-lever question (localStorage-persisted CLI id outlives a PR revert). All good calls.
  • Miguel — 🚫 CHANGES REQUESTED on PR metadata cleanup: generated-tool footer in the PR body + Co-Authored-By: Claude Opus 4.8 (1M context) trailers on both commits. Verified — both commits carry the trailer, PR body ends with the 🤖 Generated with [Claude Code] block.

My verdict — revised

Deferring to Miguel's blocker. Downgrading my earlier LGTM to LGTM-with-should-fix pending metadata scrub. The code review stands (design + race analysis + consumer audit + XSS — all clear); this is a repo-convention gate on the commit + PR metadata that I missed in my initial pass.

To land:

  1. Scrub the 🤖 Generated with [Claude Code](...) footer from the PR body.
  2. Amend both commits (9d556ca1, 0ca2344e) to drop the Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> trailer, then force-push.

Convergence on the code substance

Three of us (Miga, Rames, me) traced the same code paths and reached "clean design, ship-worthy" independently. Miguel didn't flag any code substance. The only meaningful adjacencies worth landing before merge (still all optional — not gating):

  • Rames's unifiedDistinctIdSource telemetry property on studio_session_start — cheap, and this PR is itself an observability change so measuring the stitch's adoption rate is the natural instrument.
  • Rames's ls===null vs storage-failure fallback asymmetry (line 117 "anonymous" vs later generateId()) — one-line unification.

Not doing any of the polish is fine; the code itself is production-ready modulo metadata cleanup.


Addendum by Via

Seed the CLI's anonymous distinct_id into Studio at launch so a developer's
CLI and their Studio browser session resolve to the same PostHog person.
Also unifies Studio's two previously-independent anonymous ids into one
source of truth. Uses only the existing anonymous machine id (no new PII).

- cli: inject window.__HF_CLI_DISTINCT_ID into the served index.html <head>
  (mirrors the existing __HF_STUDIO_ENV__ injection) + add a fallback
  GET /api/telemetry-identity endpoint. Only seeds when CLI telemetry is
  enabled; empty/no-op otherwise.
- studio: new telemetry/distinctId.ts single source of truth; adopts the
  CLI-seeded id when present, else falls back to the existing per-browser
  localStorage id. Both Studio clients (studio:* and studio_*/render) now
  share this one id.
resolveStudioDistinctId read localStorage.getItem() outside a try/catch
while every other external access in the module is guarded. In a
storage-restricted context where the localStorage reference resolves but
getItem throws, the resolver threw — breaking the module's fail-silent
contract (telemetry must never break Studio). Guard the reads and treat a
throw as "no id". Also drop an unnecessary `as` cast in the test per the
repo CLAUDE.md convention (the optional global is already declared).
- dedup safeLocalStorage/safeSessionStorage into utils/safeStorage.ts,
  used by both telemetry/config.ts and telemetry/distinctId.ts (Miga #6)
- replace redundant `??=` with `=` in the no-storage branch; cachedId is
  guaranteed null there (Miga #2)
- extract buildStudioHeadScripts() so the "identity script before env
  script" head-injection ordering is a pure, tested invariant (Miga #5)
- add tests: head-script ordering + telemetry-off passthrough, and a
  Studio memoization test proving an adopted CLI id survives a later
  window.__HF_CLI_DISTINCT_ID reassignment (Rames)
- clarify the XSS-escaping comment (both < and / escaped so no </script>
  sequence can form) (Miga #1)
@jrusso1020 jrusso1020 force-pushed the telemetry/unify-cli-studio-identity branch from 0ca2344 to 5574eb4 Compare July 1, 2026 08:15
@jrusso1020

Copy link
Copy Markdown
Collaborator Author

Thanks all — addressed the blocker and the cheap, convergent nits. Pushed as refactor(telemetry): address review feedback… (5574eb4).

@miguel-heygen — blocker resolved:

  • Dropped the Co-Authored-By: Claude Opus 4.8 (1M context) trailer from both existing commits (history rewritten; the two commits are now bd333f4 / ec93a26, re-signed and GitHub-verified).
  • Removed the 🤖 Generated with [Claude Code] footer from the PR body.

Code nits addressed:

  • Miga initial code #2 — redundant ??== in the no-storage branch (cachedId is provably null there).
  • Miga feat(core): add compiler entry point and runtime composition fixes #6 / Rames / Via — extracted safeLocalStorage + safeSessionStorage into packages/studio/src/utils/safeStorage.ts; both telemetry/config.ts and telemetry/distinctId.ts now import it (removes the duplication, including config.ts's own copy).
  • Miga fix(cli): update dev command for consolidated studio #5 / Rames / Via — pulled the head-injection into a pure buildStudioHeadScripts(envScript) and added tests asserting the identity script lands before the env script (and that it passes the env script through untouched when telemetry is off). The ordering invariant is now pinned by a test, not just a comment.
  • Rames — added a Studio test proving an adopted CLI id survives a later window.__HF_CLI_DISTINCT_ID reassignment (memoization guard).
  • Miga Initial repo setup #1 — clarified the XSS-escaping comment (both < and / escaped, so no </script> sequence can form).

Tests: studio distinctId 11/11, cli telemetryIdentity 9/9, plus broader telemetry (24) + server (35) suites green; lint/format/typecheck clean.

Deferred as non-blocking follow-ups (happy to take any in this PR if you'd prefer):

  • Rames — unifiedDistinctIdSource tag on studio_session_start (cli-seeded | persisted-canonical | persisted-legacy | fresh-mint): agree it's the right instrument to measure the stitch's adoption rate, but it's a small API/telemetry addition I'd rather land as a focused follow-up than fold into a fix pass. Tracking it for Layer 1.5.
  • Rames — rollback lever / localStorage-persisted-CLI-id outliving a revert: real, and worth a doc note in the follow-up. With "adopt, not alias" a revert stops new stitching, but browsers that already persisted the CLI id keep it until cleared; a dedicated disableStudioIdentitySeed toggle would be the clean walk-back. Not adding speculative config this pass.
  • Rames — ls===null"anonymous" vs storage-failure → generateId() asymmetry: left as-is intentionally — the "anonymous" no-storage fallback matches config.ts's long-standing behavior; changing it shifts identity for locked-down browsers, which I didn't want to do silently in a review pass. Easy to revisit if you'd rather unify on generateId().
  • Rames — HTTP-level test for /api/telemetry-identity: the pure resolver is covered; the endpoint is a documented fallback with no in-repo consumer yet, so I left server-level coverage for when a real caller lands.

@jrusso1020 jrusso1020 requested a review from miguel-heygen July 1, 2026 08:17

@miguel-heygen miguel-heygen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed current head 5574eb4.

Metadata blocker is resolved: the PR body is clean and the branch commits no longer carry generated-tool/co-author trailers.

The review-feedback commit is also clean:

  • packages/cli/src/server/telemetryIdentity.ts:55-65 centralizes the head script ordering invariant, with the CLI identity script before the env script.
  • packages/cli/src/server/telemetryIdentity.test.ts:84-104 pins that ordering and the telemetry-off passthrough path.
  • packages/studio/src/utils/safeStorage.ts:6-20 removes the duplicated safe storage helpers, and the telemetry modules import it instead.
  • packages/studio/src/telemetry/distinctId.test.ts:76-83 covers the CLI-id memoization case Rames flagged.

No new blockers found. Remaining items (source tagging for adoption-rate telemetry, rollback note, endpoint HTTP-level coverage) are reasonable follow-ups, not merge gates.

CI note: no required failures at review time; several required checks were still in progress. The Smoke: global install check is advisory/cancelled, not a code failure.

Verdict: APPROVE
Reasoning: The implementation remains sound, the metadata blocker is fixed, and the added tests cover the load-bearing ordering and memoization concerns. Merge should still wait for required CI to finish green.

— Magi

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R2 — unify CLI ↔ Studio PostHog identity (Layer 1)

Verdict: 🟢 R2 LGTM
Repo: heygen-com/hyperframes
R1 head: 0ca2344 R2 head: 5574eb4
Status: R1 envelope findings addressed; R2 code delta (review-feedback commit) verified clean.

R1 → R2 delta

  • Envelope (Co-Authored-By trailers): Addressed. Walked all 3 commits (bd333f4d, ec93a266, 5574eb40) via gh api /pulls/1829/commits — no Co-Authored-By: Claude, no noreply@anthropic.com, no Claude/anthropic tokens anywhere in commit messages. (The lone CLAUDE.md mention in 5574eb40 is a legit code-convention reference: "drop an unnecessary as cast in the test per the repo CLAUDE.md convention".)
  • Envelope (Claude Code footer in body): Addressed. gh api /pulls/1829 body scanned — no 🤖 Generated, no [Claude Code], no Claude, no anthropic. Body is clean.
  • Code delta: New commit 5574eb40 refactor(telemetry): address review feedback on identity unification — code-substantive, not a trailer strip. Addresses Miga #1, #2, #5, #6 + Rames follow-up. Re-ran the R1 code rubric fast:
    • Miga #6 (dedup safeStorage): New packages/studio/src/utils/safeStorage.ts (20 LOC, fail-silent safeLocalStorage() / safeSessionStorage()), imported by both telemetry/config.ts and telemetry/distinctId.ts. Clean.
    • Miga #5 (extract head-script ordering): buildStudioHeadScripts(envScript) in telemetryIdentity.ts:55-65 — pure function enforcing identity-script BEFORE env-script. studioServer.ts:713-720 uses it: const headScript = buildStudioHeadScripts(buildRuntimeEnvScript());. Ordering invariant is now a tested, single-callsite pure function — future <head> injections can't silently land ahead of the identity script.
    • Miga #2 (??==): distinctId.ts:113cachedId = "anonymous" in the no-storage branch. cachedId is guaranteed null there (early-return at top otherwise), so the ??= was redundant.
    • Miga #1 (XSS comment): telemetryIdentity.ts:47-49 — clarified: "Escaping both means no </script> (or </…) sequence can form in the emitted value, so it can never terminate the inline <script> or open a new tag." Escape logic (.replace(/</g, "\\u003c").replace(/\//g, "\\/")) unchanged.
    • Rames follow-up (memoization test): New test in distinctId.test.ts proves an adopted CLI id survives a later window.__HF_CLI_DISTINCT_ID reassignment (cache stability).
    • Fail-silent contract preserved end-to-end: getCliDistinctId() try/catches window read; resolveStudioDistinctId() try/catches getItem (per ec93a266 fix); persist() swallows setItem errors.
    • Consumer chain intact: getAnonymousId() (telemetry/config.ts:22) and getDistinctId() (utils/studioTelemetry.ts:22) both delegate to resolveStudioDistinctId(). Render → CLI path (useRenderQueue → studioRenderTelemetry) still unified via getAnonymousId().
    • Race safety preserved: Identity script still injected first in <head>, before the deferred Studio bundle boots and reads window.__HF_CLI_DISTINCT_ID.

Verification

  • Direct fetches at R2 head 5574eb4 via gh api .../pulls/1829/commits, .../pulls/1829 (body), .../pulls/1829/reviews, .../contents/<path>?ref=5574eb40865b36efb5149af59a0b31f67c6340da.
  • Compared R1 head 0ca2344 → R2 head 5574eb4: 3 commits ahead, 2 behind main (rebase noise — verified via ?ref=<sha> fetches per repo caveat).
  • Miguel's R2:APPROVED at 2026-07-01T08:19:05Z — "Metadata blocker is resolved: the PR body is clean and the branch commits no longer carry generated-tool/co-author trailers. … No new blockers found." RC formally cleared.
  • Convergence with Magi: Miguel's APPROVE is signed "— Magi" (Magi posts under login miguel-heygen); Magi's R2 landed 14 min before mine and independently verifies the envelope-strip + code-refactor cleanliness. My R2 is convergent add — cross-checked the code-refactor claims against actual file contents at R2 SHA.

Notes / follow-ups (not merge gates)

  • CI is blocked per mergeable_state, but Magi confirms "no required failures at review time; several required checks were still in progress." Standard wait-for-green before merge.
  • Magi's follow-ups (source-tagging for adoption-rate telemetry, rollback note, endpoint HTTP-level coverage) are reasonable Layer-1.5 items, not blockers — I concur.

R2 by Via

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants