Skip to content

fix: handle poor and intermittent connectivity gracefully#2447

Open
lost-particles wants to merge 1 commit into
PostHog:mainfrom
lost-particles:fix/intermittent-connectivity
Open

fix: handle poor and intermittent connectivity gracefully#2447
lost-particles wants to merge 1 commit into
PostHog:mainfrom
lost-particles:fix/intermittent-connectivity

Conversation

@lost-particles
Copy link
Copy Markdown
Contributor

Problem

PostHog Code handled poor/intermittent connectivity poorly: sending while offline silently dropped the message, in-flight sessions could hang indefinitely, and a network-interrupted turn was shown either as a clean "finished" response or as "Interrupted by user" — never as a network failure. This addresses the reports collected in the tracking issue (offline sends lost, stuck sessions, long unexplained waits).

Closes #2163

Changes

Sending / input

  • A send that fails on a flaky network no longer drops the message — it's restored to the composer.
  • Offline sends wait briefly for the connection (bounded grace window) instead of failing instantly; while waiting they show the "Connection lost — waiting to reconnect…" indicator with a live timer, then give up after ~40s.
  • Locally queued follow-up messages are re-enqueued if their drain dispatch fails, instead of being lost.

Stuck / interrupted turns

  • Main-process prompt watchdog: a stream that goes silent fails loudly instead of leaving the turn pending forever.
  • A turn affected by a network drop is no longer presented as a clean finish (no false "finished" notification). Instead it renders an inline footer "Response may be incomplete. Failed due to network issue" with a Retry button (replacing the misleading "Interrupted by user"). Retry re-sends that turn's prompt; the label persists across retries.
  • Offline give-up (~40s) cancels the hung local turn with a connection_lost reason rather than a user-interrupt.

Recovery / transport

  • Cloud sessions auto-recover on the offline→online transition (previously only on window focus).
  • Auth-proxy aborts the connection on a mid-stream upstream failure (res.destroy) instead of cleanly ending it, so a truncated reply surfaces as an error rather than a clean end-of-stream.
  • "Slow connection? Still trying…" hint after ~8s of no stream activity.

Screenshots

1. Network drops mid-turn — "Connection lost — waiting to reconnect…" with a live timer
Screenshot 2026-05-31 at 11 39 13 PM

2. Still offline after ~40s — the turn fails with "Response may be incomplete. Failed due to network issue" + a Retry button
Screenshot 2026-05-31 at 11 40 12 PM

3. Clicking Retry re-runs the prompt (if connection issue persists, it shows the first msg with "Connection lost — waiting to reconnect…" and if connection is back, it runs the flows and gives response)
Screenshot 2026-05-31 at 11 41 12 PM

How did you test this?

Automated (all run locally):

  • Added unit/integration tests and ran them: service.test.ts (offline grace-wait, give-up, re-enqueue, turn-complete flagging), buildConversationItems.test.ts (network-failed footer + prompt-text for retry), interrupt-reason.test.ts, connectivityRecovery.test.ts, GeneratingIndicator.test.tsx, auth-proxy/service.test.ts, plus updates to agent/service.test.ts and useDraftSync.test.tsx.
  • pnpm lint (biome) — clean, 1574 files.
  • pnpm typecheck (turbo, all packages) — 13/13 pass.
  • pnpm test (turbo, all packages) — 1627 pass. 3 timeouts in the unrelated archive git-integration tests under full-suite load; confirmed passing 23/23 when run in isolation.

Manual (live app, toggling the OS network off/on while running pnpm dev):

  • Verified against the running app's main-process logs: the auth-proxy returning 502 when fully offline and aborting on a mid-stream terminated, the offline give-up firing and cancelling with reason: connection_lost, and the network-interrupted turn being flagged instead of notifying "finished".
  • Reproduced the flow in the screenshots above: offline retry shows the reconnecting timer; on sustained outage the turn shows "Failed due to network issue" + Retry; clicking Retry re-runs the prompt once the connection is back.

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

Make the app resilient to offline/flaky networks instead of silently
dropping messages or hanging:

- Preserve typed input when a send fails; restore it to the composer
- Re-enqueue local queued messages when a drain dispatch fails
- Auto-recover cloud sessions on reconnect, not just on window focus
- Bounded connectivity grace-wait before failing a send
- Prompt watchdog so a stalled stream fails loudly instead of hanging
- Auth-proxy aborts the connection on a mid-stream upstream failure so a
  truncated reply isn't treated as a clean end-of-stream
- Don't present a network-interrupted turn as a clean finish: show an
  inline "Response may be incomplete. Failed due to network issue" footer
  with a Retry button (instead of "Interrupted by user")
- Offline send/retry shows a "waiting to reconnect" indicator with timer,
  then gives up after 40s
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 1, 2026

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
apps/code/src/renderer/features/sessions/service/service.ts:101-106
`SEND_CONNECTIVITY_WAIT_MS` is declared and described as the grace window for offline sends, but the only call site passes `OFFLINE_TURN_GIVEUP_MS` (40 s) instead. As a result the 10 s default is never applied, making the constant dead code. If the intent is for offline sends to wait the full 40 s (matching the PR description), the constant and its comment are superfluous; if a shorter grace window was intended, the wrong constant is being used.

```suggestion
const SEND_CONNECTIVITY_POLL_MS = 1_000;
```

### Issue 2 of 2
apps/code/src/renderer/features/sessions/components/GeneratingIndicator.tsx:185-212
**Timer in offline reconnect indicator counts from prompt start, not from when the network dropped.** `elapsed` is computed as `Date.now() - startedAt`, so a user who has been waiting a full minute before the network drops will see the timer jump straight to `60.0s` when the "Connection lost — waiting to reconnect…" banner appears. The visual intent reads as "how long have I been waiting to reconnect", but the value actually measures total turn duration.

Reviews (1): Last reviewed commit: "fix: handle poor and intermittent connec..." | Re-trigger Greptile

Comment on lines +101 to +106
// Grace window: when a send is attempted while offline, wait this long for the
// connection to come back before giving up, so a brief drop doesn't instantly
// fail the message. We only dispatch once connectivity returns — nothing is
// sent speculatively — so this never risks delivering a message twice.
const SEND_CONNECTIVITY_WAIT_MS = 10_000;
const SEND_CONNECTIVITY_POLL_MS = 1_000;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 SEND_CONNECTIVITY_WAIT_MS is declared and described as the grace window for offline sends, but the only call site passes OFFLINE_TURN_GIVEUP_MS (40 s) instead. As a result the 10 s default is never applied, making the constant dead code. If the intent is for offline sends to wait the full 40 s (matching the PR description), the constant and its comment are superfluous; if a shorter grace window was intended, the wrong constant is being used.

Suggested change
// Grace window: when a send is attempted while offline, wait this long for the
// connection to come back before giving up, so a brief drop doesn't instantly
// fail the message. We only dispatch once connectivity returns — nothing is
// sent speculatively — so this never risks delivering a message twice.
const SEND_CONNECTIVITY_WAIT_MS = 10_000;
const SEND_CONNECTIVITY_POLL_MS = 1_000;
const SEND_CONNECTIVITY_POLL_MS = 1_000;
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/renderer/features/sessions/service/service.ts
Line: 101-106

Comment:
`SEND_CONNECTIVITY_WAIT_MS` is declared and described as the grace window for offline sends, but the only call site passes `OFFLINE_TURN_GIVEUP_MS` (40 s) instead. As a result the 10 s default is never applied, making the constant dead code. If the intent is for offline sends to wait the full 40 s (matching the PR description), the constant and its comment are superfluous; if a shorter grace window was intended, the wrong constant is being used.

```suggestion
const SEND_CONNECTIVITY_POLL_MS = 1_000;
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +185 to 212
if (!isOnline) {
return (
<Flex
align="center"
gap="2"
className="select-none"
style={{ WebkitUserSelect: "none" }}
>
<WifiSlash size={12} className="text-amber-11" />
<Text className="text-[13px] text-amber-11">
Connection lost — waiting to reconnect…
</Text>
<Text color="gray" className="text-[13px]">
(Esc to stop
</Text>
<Circle size={4} weight="fill" className="mx-[2px] my-0 text-gray-9" />
<Text
color="gray"
style={{ fontVariantNumeric: "tabular-nums" }}
className="text-[13px]"
>
{formatDuration(elapsed, 1)})
</Text>
</Flex>
);
}

return (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Timer in offline reconnect indicator counts from prompt start, not from when the network dropped. elapsed is computed as Date.now() - startedAt, so a user who has been waiting a full minute before the network drops will see the timer jump straight to 60.0s when the "Connection lost — waiting to reconnect…" banner appears. The visual intent reads as "how long have I been waiting to reconnect", but the value actually measures total turn duration.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/renderer/features/sessions/components/GeneratingIndicator.tsx
Line: 185-212

Comment:
**Timer in offline reconnect indicator counts from prompt start, not from when the network dropped.** `elapsed` is computed as `Date.now() - startedAt`, so a user who has been waiting a full minute before the network drops will see the timer jump straight to `60.0s` when the "Connection lost — waiting to reconnect…" banner appears. The visual intent reads as "how long have I been waiting to reconnect", but the value actually measures total turn duration.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle poor and intermittent internet connectivity gracefully

1 participant