Make top-level sandbox execution stateless by default#765
Make top-level sandbox execution stateless by default#765ghostwriternr wants to merge 74 commits into
Conversation
Introduce an isolated lab package for proving the terminal session runtime shape without coupling it to the sandbox container yet. The package runs its default tests in a Linux container so shell and PTY semantics match the target environment rather than the host OS.
Move protocol framing into a focused module and resolve command completion from ordered terminal output. Treat execution timeouts as fatal so later commands cannot inherit corrupted shell state.
Pin Bun to 1.3.14 and route terminal-session commands through Bun-managed extra stdio. This keeps bash attached to a controlling terminal while avoiding filesystem FIFO lifecycle issues.
Replace the unified terminal-session lab with separate command-session and terminal resources so the prototype matches the cleaner API model.
Missing session identifiers should not silently create or reuse a persistent shell. Route them through stateless execution so the container runtime starts matching the explicit-session model.
Process records should not describe missing-session processes as part of a default persistent shell. Store stateless command handles from the start so process lifecycle metadata matches execution routing.
Top-level exec should behave like a one-shot command instead of creating or reusing a default shell. Keep default-session coverage on APIs that still depend on that legacy path while exec moves to the new model.
Streaming command execution overlaps with process lifecycle semantics. Drop execStream from the public SDK surface so callers use exec for completion-only commands and startProcess for streaming work.
Top-level startProcess should create sandbox-level processes instead of attaching them to a hidden default shell. Explicit session process starts continue to pass their session id through the wrapper path.
Top-level process lookups should not inherit an unrelated default session annotation. Only explicit session callers should receive a public session id on returned process objects.
Remove the streaming and callback options from exec so command streaming has a single lifecycle API through startProcess. The bridge exec route now uses process callbacks for its SSE transport.
Use a package name that matches the broader execution runtime boundary, including stateless commands, processes, sessions, and terminals.
Promote stateless command execution into the sandbox-execution package and have the container service delegate sessionless exec to it. The container keeps service result shaping and logging at the control seam.
Sessionless process streaming now delegates to the execution runtime so process lifecycle behavior has one container-tested implementation.
Persistent session exec now goes through the execution runtime while the container keeps the existing session service surface for streaming and PTY integration during the migration.
Remove execStream and process-tracking hooks from the legacy container session class. Runtime-backed sessions now own process streaming and termination, so keeping the older surface would preserve two competing execution paths.
Internal sandbox commands and local R2 sync mounts should not create or reuse hidden command sessions. They now use the same sessionless routing as top-level execution while explicit sessions remain unchanged.
Direct Sandbox DO methods like `sandbox.writeFile()` and `sandbox.mkdir()` no longer lazily create a hidden default command session when `sessionId` is omitted. They now correctly resolve to stateless execution.
Preview URL activation now starts and records runtime identity without creating a hidden command session. With implicit APIs already routed sessionlessly, the stored default-session machinery is no longer needed.
E2E coverage now exercises streaming through process resources so the worker no longer carries a public streaming exec test surface.
Explicit command sessions now run through the sandbox-execution runtime, so the old container-local Session class and its direct tests only kept stale execution semantics alive.
Process lifecycle streaming now uses process-oriented internal names, and the control-plane command domain no longer exposes the stale executeStream surface.
Process command handles now identify sessionless and session-backed execution with semantic targets instead of storing the sessionless sentinel as a fake session ID.
Normalize legacy session-id inputs at service boundaries so execution internals route through explicit session or sessionless targets.
Omitted session IDs now represent sessionless execution across the SDK and control-plane boundary. The legacy sentinel string remains only as a container-side compatibility value for older persisted process handles.
Remove the internal execution-context wrapper now that omitted session IDs are the canonical sessionless boundary. Share the session-create RPC type so per-session command timeouts stay typed across SDK and container code.
🦋 Changeset detectedLatest commit: bf04741 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
| async getOrCreateTerminal( | ||
| options: CreateTerminalOptions | ||
| ): Promise<TerminalHandle> { | ||
| const existing = this.getTerminal(options.id); | ||
| if (existing) { | ||
| return existing; | ||
| } | ||
|
|
||
| const pty = new Pty({ | ||
| cwd: options.cwd ?? defaultTerminalCwd(), | ||
| env: options.env, | ||
| logger: this.logger | ||
| }); | ||
|
|
||
| try { | ||
| await pty.initialize(options.pty); | ||
| const handle = { id: options.id, pty }; | ||
| this.terminals.set(options.id, new ManagedTerminal(handle)); | ||
| return handle; | ||
| } catch (error) { | ||
| await pty.destroy().catch(() => {}); | ||
| throw error; | ||
| } |
There was a problem hiding this comment.
🟡 Race condition in TerminalManager.getOrCreateTerminal leaks PTY resources
Two concurrent getOrCreateTerminal calls with the same terminal ID can both pass the existing check at terminal-manager.ts:38-41, because pty.initialize() at line 50 yields to the event loop via await. When both calls complete initialization, the second this.terminals.set() overwrites the first, orphaning the first PTY process (it's initialized and running but no longer tracked in the map, so it's never destroyed).
Race sequence
- Call A enters
getOrCreateTerminal('t1'), passes theexistingcheck (no terminal exists) - Call A
awaitspty.initialize(), yielding to the event loop - Call B enters
getOrCreateTerminal('t1'), also passes theexistingcheck (terminal not yet stored) - Call B
awaits its ownpty.initialize() - Call A's init completes, stores terminal handle in map
- Call B's init completes, overwrites the map entry — Call A's PTY is leaked
This can happen when proxyTerminal in packages/sandbox/src/pty/proxy.ts:60 calls stub.createTerminal(createOptions) concurrently for the same terminal ID (e.g., two browser tabs reconnecting simultaneously), since the container Bun server handles requests concurrently.
Prompt for agents
The getOrCreateTerminal method has a TOCTOU race condition: two concurrent calls with the same terminal ID can both pass the existence check because pty.initialize() is async and yields to the event loop. The second caller overwrites the first in the map, leaking the first PTY process.
Approach: Add a pending-creation coordination mechanism similar to SessionManager.creatingLocks. Store a Promise in a Map<string, Promise<TerminalHandle>> when creation starts, and have concurrent callers await the same promise. Alternatively, use a per-ID mutex (like the session manager's approach) to serialize creation. The simplest fix would be to re-check the map after initialize() completes and destroy the newly created PTY if the ID was already claimed by another call.
Was this helpful? React with 👍 or 👎 to provide feedback.
| executor | ||
| .exec(command, { | ||
| .startProcess(command, { | ||
| ...opts, | ||
| stream: true, | ||
| onOutput(stream: 'stdout' | 'stderr', data: string) { | ||
| writeSSE(stream, toBase64(data)); | ||
| }, | ||
| onComplete(result: { exitCode: number }) { | ||
| writeSSE('exit', JSON.stringify({ exit_code: result.exitCode })); | ||
| onExit(code: number | null) { | ||
| writeSSE('exit', JSON.stringify({ exit_code: code ?? -1 })); | ||
| closeStream(); | ||
| }, | ||
| onError(err: Error) { |
There was a problem hiding this comment.
🚩 Bridge /exec SSE endpoint now creates persistent process records for every streaming command
The bridge route at packages/sandbox/src/bridge/routes.ts:512-540 migrated from executor.exec(command, { stream: true, ... }) to executor.startProcess(command, { onOutput, onExit, onError }). While the old streaming exec path also created process records internally through ProcessService, the new code path is more explicit about creating long-lived process objects. Each bridge /exec call now returns a Process that appears in listProcesses() and must be cleaned up. If bridge consumers were relying on the old behavior where streaming exec didn't create visible process records, they may see process accumulation. The autoCleanup option is not set by the bridge, so cleanup behavior depends on the sandbox defaults.
(Refers to lines 512-540)
Was this helpful? React with 👍 or 👎 to provide feedback.
268eca5 to
bf04741
Compare
Top-level sandbox calls used to run against a persistent "default" session, so cd/export/aliases carried across exec() calls and file operations. That was a deliberate choice for the original audience: humans driving a sandbox interactively, who expect a shell that remembers what they just did.
The primary consumer has shifted. AI agents increasingly expect a stateless API surface even though the sandbox itself is stateful. An agent already tracks state across its turn history and is good at passing the right cwd/env/arguments per call, so an implicit shared shell works against it: it couples unrelated calls and makes the lifecycle of a command hard to reason about. This reworks the model so the default surface is stateless and shared state is always something the caller opts into.
The model is now four explicit surfaces
sandbox.exec()runs one stateless shell command. No state persists.sandbox.startProcess()owns process lifecycle: streaming, wait, kill, timeout, and process-tree cleanup.sandbox.createSession()is the only way to get a persistent shell;session.exec()runs in that shell so state persists, andsession.startProcess()launches a lifecycle process from a snapshot of session state without writing mutations back.sandbox.terminal()is an independent PTY resource created/destroyed over the control plane, with/ws/terminalas byte transport only.The shell mechanics now live in a new internal package,
@repo/sandbox-execution(CommandSession,StatelessCommandRunner,StatelessProcessRunner, terminal), which the container builds on. The legacy containerSessionclass and the public/internal streaming-exec paths are removed;SessionManagerand the services adapt the runtime through a structuralManagedSessionboundary and explicitExecutionTargetrouting instead of a sentinel__DISABLE_SESSION__session id.Breaking changes (migration)
exec(),startProcess(), and file, watch, git, and backup operations are sessionless unless you pass asessionId. Usesandbox.createSession()when commands or file operations must share shell state.enableDefaultSessionoption is removed.execStream/exec({ stream })); usestartProcess()for streaming and lifecycle control.listProcesses({ sessionId })/getProcess(id, { sessionId }).sandbox.terminal()returns a handle withconnect()/destroy();session.terminal()is gone.Known boundaries
Called out so they aren't mistaken for bugs:
psandmkfifobeing present in the container image.Tests and docs
Unit, container, and E2E tests are updated alongside the behavior, and docs/skills (
SESSION_EXECUTION,ARCHITECTURE) are rewritten to match.This is a large change, but it is one self-contained re-architecture organized as a stack of focused commits on top of
next. Reviewing commit-by-commit is the intended path.