fix: handle poor and intermittent connectivity gracefully#2447
fix: handle poor and intermittent connectivity gracefully#2447lost-particles wants to merge 1 commit into
Conversation
Make the app resilient to offline/flaky networks instead of silently dropping messages or hanging: - Preserve typed input when a send fails; restore it to the composer - Re-enqueue local queued messages when a drain dispatch fails - Auto-recover cloud sessions on reconnect, not just on window focus - Bounded connectivity grace-wait before failing a send - Prompt watchdog so a stalled stream fails loudly instead of hanging - Auth-proxy aborts the connection on a mid-stream upstream failure so a truncated reply isn't treated as a clean end-of-stream - Don't present a network-interrupted turn as a clean finish: show an inline "Response may be incomplete. Failed due to network issue" footer with a Retry button (instead of "Interrupted by user") - Offline send/retry shows a "waiting to reconnect" indicator with timer, then gives up after 40s
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
apps/code/src/renderer/features/sessions/service/service.ts:101-106
`SEND_CONNECTIVITY_WAIT_MS` is declared and described as the grace window for offline sends, but the only call site passes `OFFLINE_TURN_GIVEUP_MS` (40 s) instead. As a result the 10 s default is never applied, making the constant dead code. If the intent is for offline sends to wait the full 40 s (matching the PR description), the constant and its comment are superfluous; if a shorter grace window was intended, the wrong constant is being used.
```suggestion
const SEND_CONNECTIVITY_POLL_MS = 1_000;
```
### Issue 2 of 2
apps/code/src/renderer/features/sessions/components/GeneratingIndicator.tsx:185-212
**Timer in offline reconnect indicator counts from prompt start, not from when the network dropped.** `elapsed` is computed as `Date.now() - startedAt`, so a user who has been waiting a full minute before the network drops will see the timer jump straight to `60.0s` when the "Connection lost — waiting to reconnect…" banner appears. The visual intent reads as "how long have I been waiting to reconnect", but the value actually measures total turn duration.
Reviews (1): Last reviewed commit: "fix: handle poor and intermittent connec..." | Re-trigger Greptile |
| // Grace window: when a send is attempted while offline, wait this long for the | ||
| // connection to come back before giving up, so a brief drop doesn't instantly | ||
| // fail the message. We only dispatch once connectivity returns — nothing is | ||
| // sent speculatively — so this never risks delivering a message twice. | ||
| const SEND_CONNECTIVITY_WAIT_MS = 10_000; | ||
| const SEND_CONNECTIVITY_POLL_MS = 1_000; |
There was a problem hiding this comment.
SEND_CONNECTIVITY_WAIT_MS is declared and described as the grace window for offline sends, but the only call site passes OFFLINE_TURN_GIVEUP_MS (40 s) instead. As a result the 10 s default is never applied, making the constant dead code. If the intent is for offline sends to wait the full 40 s (matching the PR description), the constant and its comment are superfluous; if a shorter grace window was intended, the wrong constant is being used.
| // Grace window: when a send is attempted while offline, wait this long for the | |
| // connection to come back before giving up, so a brief drop doesn't instantly | |
| // fail the message. We only dispatch once connectivity returns — nothing is | |
| // sent speculatively — so this never risks delivering a message twice. | |
| const SEND_CONNECTIVITY_WAIT_MS = 10_000; | |
| const SEND_CONNECTIVITY_POLL_MS = 1_000; | |
| const SEND_CONNECTIVITY_POLL_MS = 1_000; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/renderer/features/sessions/service/service.ts
Line: 101-106
Comment:
`SEND_CONNECTIVITY_WAIT_MS` is declared and described as the grace window for offline sends, but the only call site passes `OFFLINE_TURN_GIVEUP_MS` (40 s) instead. As a result the 10 s default is never applied, making the constant dead code. If the intent is for offline sends to wait the full 40 s (matching the PR description), the constant and its comment are superfluous; if a shorter grace window was intended, the wrong constant is being used.
```suggestion
const SEND_CONNECTIVITY_POLL_MS = 1_000;
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| if (!isOnline) { | ||
| return ( | ||
| <Flex | ||
| align="center" | ||
| gap="2" | ||
| className="select-none" | ||
| style={{ WebkitUserSelect: "none" }} | ||
| > | ||
| <WifiSlash size={12} className="text-amber-11" /> | ||
| <Text className="text-[13px] text-amber-11"> | ||
| Connection lost — waiting to reconnect… | ||
| </Text> | ||
| <Text color="gray" className="text-[13px]"> | ||
| (Esc to stop | ||
| </Text> | ||
| <Circle size={4} weight="fill" className="mx-[2px] my-0 text-gray-9" /> | ||
| <Text | ||
| color="gray" | ||
| style={{ fontVariantNumeric: "tabular-nums" }} | ||
| className="text-[13px]" | ||
| > | ||
| {formatDuration(elapsed, 1)}) | ||
| </Text> | ||
| </Flex> | ||
| ); | ||
| } | ||
|
|
||
| return ( |
There was a problem hiding this comment.
Timer in offline reconnect indicator counts from prompt start, not from when the network dropped.
elapsed is computed as Date.now() - startedAt, so a user who has been waiting a full minute before the network drops will see the timer jump straight to 60.0s when the "Connection lost — waiting to reconnect…" banner appears. The visual intent reads as "how long have I been waiting to reconnect", but the value actually measures total turn duration.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/renderer/features/sessions/components/GeneratingIndicator.tsx
Line: 185-212
Comment:
**Timer in offline reconnect indicator counts from prompt start, not from when the network dropped.** `elapsed` is computed as `Date.now() - startedAt`, so a user who has been waiting a full minute before the network drops will see the timer jump straight to `60.0s` when the "Connection lost — waiting to reconnect…" banner appears. The visual intent reads as "how long have I been waiting to reconnect", but the value actually measures total turn duration.
How can I resolve this? If you propose a fix, please make it concise.
Problem
PostHog Code handled poor/intermittent connectivity poorly: sending while offline silently dropped the message, in-flight sessions could hang indefinitely, and a network-interrupted turn was shown either as a clean "finished" response or as "Interrupted by user" — never as a network failure. This addresses the reports collected in the tracking issue (offline sends lost, stuck sessions, long unexplained waits).
Closes #2163
Changes
Sending / input
Stuck / interrupted turns
connection_lostreason rather than a user-interrupt.Recovery / transport
res.destroy) instead of cleanly ending it, so a truncated reply surfaces as an error rather than a clean end-of-stream.Screenshots
1. Network drops mid-turn — "Connection lost — waiting to reconnect…" with a live timer

2. Still offline after ~40s — the turn fails with "Response may be incomplete. Failed due to network issue" + a Retry button

3. Clicking Retry re-runs the prompt (if connection issue persists, it shows the first msg with "Connection lost — waiting to reconnect…" and if connection is back, it runs the flows and gives response)

How did you test this?
Automated (all run locally):
service.test.ts(offline grace-wait, give-up, re-enqueue, turn-complete flagging),buildConversationItems.test.ts(network-failed footer + prompt-text for retry),interrupt-reason.test.ts,connectivityRecovery.test.ts,GeneratingIndicator.test.tsx,auth-proxy/service.test.ts, plus updates toagent/service.test.tsanduseDraftSync.test.tsx.pnpm lint(biome) — clean, 1574 files.pnpm typecheck(turbo, all packages) — 13/13 pass.pnpm test(turbo, all packages) — 1627 pass. 3 timeouts in the unrelatedarchivegit-integration tests under full-suite load; confirmed passing 23/23 when run in isolation.Manual (live app, toggling the OS network off/on while running
pnpm dev):502when fully offline and aborting on a mid-streamterminated, the offline give-up firing and cancelling withreason: connection_lost, and the network-interrupted turn being flagged instead of notifying "finished".Automatic notifications