Skip to content

Make top-level sandbox execution stateless by default#765

Draft
ghostwriternr wants to merge 74 commits into
nextfrom
spike/terminal-session-runtime
Draft

Make top-level sandbox execution stateless by default#765
ghostwriternr wants to merge 74 commits into
nextfrom
spike/terminal-session-runtime

Conversation

@ghostwriternr

Copy link
Copy Markdown
Member

Top-level sandbox calls used to run against a persistent "default" session, so cd/export/aliases carried across exec() calls and file operations. That was a deliberate choice for the original audience: humans driving a sandbox interactively, who expect a shell that remembers what they just did.

The primary consumer has shifted. AI agents increasingly expect a stateless API surface even though the sandbox itself is stateful. An agent already tracks state across its turn history and is good at passing the right cwd/env/arguments per call, so an implicit shared shell works against it: it couples unrelated calls and makes the lifecycle of a command hard to reason about. This reworks the model so the default surface is stateless and shared state is always something the caller opts into.

The model is now four explicit surfaces

  • sandbox.exec() runs one stateless shell command. No state persists.
  • sandbox.startProcess() owns process lifecycle: streaming, wait, kill, timeout, and process-tree cleanup.
  • sandbox.createSession() is the only way to get a persistent shell; session.exec() runs in that shell so state persists, and session.startProcess() launches a lifecycle process from a snapshot of session state without writing mutations back.
  • sandbox.terminal() is an independent PTY resource created/destroyed over the control plane, with /ws/terminal as byte transport only.

The shell mechanics now live in a new internal package, @repo/sandbox-execution (CommandSession, StatelessCommandRunner, StatelessProcessRunner, terminal), which the container builds on. The legacy container Session class and the public/internal streaming-exec paths are removed; SessionManager and the services adapt the runtime through a structural ManagedSession boundary and explicit ExecutionTarget routing instead of a sentinel __DISABLE_SESSION__ session id.

Breaking changes (migration)

  • Hidden default sessions are gone. exec(), startProcess(), and file, watch, git, and backup operations are sessionless unless you pass a sessionId. Use sandbox.createSession() when commands or file operations must share shell state.
  • The enableDefaultSession option is removed.
  • There is no public streaming exec (execStream / exec({ stream })); use startProcess() for streaming and lifecycle control.
  • Session IDs moved into options objects across command, process, file, git, and backup APIs; process reads use listProcesses({ sessionId }) / getProcess(id, { sessionId }).
  • Process kill methods no longer accept signal arguments.
  • Terminals are explicit resources: sandbox.terminal() returns a handle with connect()/destroy(); session.terminal() is gone.
  • The container runtime moves to Bun 1.3.14.

Known boundaries

Called out so they aren't mistaken for bugs:

  • Structured stdout/stderr capture is text-oriented, not binary-safe.
  • Process cleanup covers ordinary descendants; detached or reparented processes remain a Unix boundary.
  • Persistent-session shell integration relies on ps and mkfifo being present in the container image.

Tests and docs

Unit, container, and E2E tests are updated alongside the behavior, and docs/skills (SESSION_EXECUTION, ARCHITECTURE) are rewritten to match.

This is a large change, but it is one self-contained re-architecture organized as a stack of focused commits on top of next. Reviewing commit-by-commit is the intended path.

Introduce an isolated lab package for proving the terminal session
runtime shape without coupling it to the sandbox container yet. The
package runs its default tests in a Linux container so shell and PTY
semantics match the target environment rather than the host OS.
Move protocol framing into a focused module and resolve command
completion from ordered terminal output. Treat execution timeouts as
fatal so later commands cannot inherit corrupted shell state.
Pin Bun to 1.3.14 and route terminal-session commands through
Bun-managed extra stdio. This keeps bash attached to a controlling
terminal while avoiding filesystem FIFO lifecycle issues.
Replace the unified terminal-session lab with separate command-session
and terminal resources so the prototype matches the cleaner API model.
Missing session identifiers should not silently create or reuse a
persistent shell. Route them through stateless execution so the
container runtime starts matching the explicit-session model.
Process records should not describe missing-session processes as part
of a default persistent shell. Store stateless command handles from the
start so process lifecycle metadata matches execution routing.
Top-level exec should behave like a one-shot command instead of
creating or reusing a default shell. Keep default-session coverage on
APIs that still depend on that legacy path while exec moves to the new
model.
Streaming command execution overlaps with process lifecycle semantics.
Drop execStream from the public SDK surface so callers use exec for
completion-only commands and startProcess for streaming work.
Top-level startProcess should create sandbox-level processes instead
of attaching them to a hidden default shell. Explicit session process
starts continue to pass their session id through the wrapper path.
Top-level process lookups should not inherit an unrelated default
session annotation. Only explicit session callers should receive a
public session id on returned process objects.
Remove the streaming and callback options from exec so command
streaming has a single lifecycle API through startProcess. The bridge
exec route now uses process callbacks for its SSE transport.
Use a package name that matches the broader execution runtime boundary,
including stateless commands, processes, sessions, and terminals.
Promote stateless command execution into the sandbox-execution package
and have the container service delegate sessionless exec to it. The
container keeps service result shaping and logging at the control seam.
Sessionless process streaming now delegates to the execution runtime so
process lifecycle behavior has one container-tested implementation.
Persistent session exec now goes through the execution runtime while the
container keeps the existing session service surface for streaming and
PTY integration during the migration.
Remove execStream and process-tracking hooks from the legacy container
session class. Runtime-backed sessions now own process streaming and
termination, so keeping the older surface would preserve two competing
execution paths.
Internal sandbox commands and local R2 sync mounts should not create or
reuse hidden command sessions. They now use the same sessionless routing
as top-level execution while explicit sessions remain unchanged.
Direct Sandbox DO methods like `sandbox.writeFile()` and
`sandbox.mkdir()` no longer lazily create a hidden default command
session when `sessionId` is omitted. They now correctly resolve to
stateless execution.
Preview URL activation now starts and records runtime identity without
creating a hidden command session. With implicit APIs already routed
sessionlessly, the stored default-session machinery is no longer needed.
E2E coverage now exercises streaming through process resources so the
worker no longer carries a public streaming exec test surface.
Explicit command sessions now run through the sandbox-execution runtime,
so the old container-local Session class and its direct tests only kept
stale execution semantics alive.
Process lifecycle streaming now uses process-oriented internal names,
and the control-plane command domain no longer exposes the stale
executeStream surface.
Process command handles now identify sessionless and session-backed
execution with semantic targets instead of storing the sessionless
sentinel as a fake session ID.
Normalize legacy session-id inputs at service boundaries so execution
internals route through explicit session or sessionless targets.
Omitted session IDs now represent sessionless execution across the SDK
and control-plane boundary. The legacy sentinel string remains only as a
container-side compatibility value for older persisted process handles.
Remove the internal execution-context wrapper now that omitted session
IDs are the canonical sessionless boundary. Share the session-create RPC
type so per-session command timeouts stay typed across SDK and
container code.
@changeset-bot

changeset-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: bf04741

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@cloudflare/sandbox Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

Open in Devin Review

Comment on lines +35 to +57
async getOrCreateTerminal(
options: CreateTerminalOptions
): Promise<TerminalHandle> {
const existing = this.getTerminal(options.id);
if (existing) {
return existing;
}

const pty = new Pty({
cwd: options.cwd ?? defaultTerminalCwd(),
env: options.env,
logger: this.logger
});

try {
await pty.initialize(options.pty);
const handle = { id: options.id, pty };
this.terminals.set(options.id, new ManagedTerminal(handle));
return handle;
} catch (error) {
await pty.destroy().catch(() => {});
throw error;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Race condition in TerminalManager.getOrCreateTerminal leaks PTY resources

Two concurrent getOrCreateTerminal calls with the same terminal ID can both pass the existing check at terminal-manager.ts:38-41, because pty.initialize() at line 50 yields to the event loop via await. When both calls complete initialization, the second this.terminals.set() overwrites the first, orphaning the first PTY process (it's initialized and running but no longer tracked in the map, so it's never destroyed).

Race sequence
  1. Call A enters getOrCreateTerminal('t1'), passes the existing check (no terminal exists)
  2. Call A awaits pty.initialize(), yielding to the event loop
  3. Call B enters getOrCreateTerminal('t1'), also passes the existing check (terminal not yet stored)
  4. Call B awaits its own pty.initialize()
  5. Call A's init completes, stores terminal handle in map
  6. Call B's init completes, overwrites the map entry — Call A's PTY is leaked

This can happen when proxyTerminal in packages/sandbox/src/pty/proxy.ts:60 calls stub.createTerminal(createOptions) concurrently for the same terminal ID (e.g., two browser tabs reconnecting simultaneously), since the container Bun server handles requests concurrently.

Prompt for agents
The getOrCreateTerminal method has a TOCTOU race condition: two concurrent calls with the same terminal ID can both pass the existence check because pty.initialize() is async and yields to the event loop. The second caller overwrites the first in the map, leaking the first PTY process.

Approach: Add a pending-creation coordination mechanism similar to SessionManager.creatingLocks. Store a Promise in a Map<string, Promise<TerminalHandle>> when creation starts, and have concurrent callers await the same promise. Alternatively, use a per-ID mutex (like the session manager's approach) to serialize creation. The simplest fix would be to re-check the map after initialize() completes and destroy the newly created PTY if the ID was already claimed by another call.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines 512 to 522
executor
.exec(command, {
.startProcess(command, {
...opts,
stream: true,
onOutput(stream: 'stdout' | 'stderr', data: string) {
writeSSE(stream, toBase64(data));
},
onComplete(result: { exitCode: number }) {
writeSSE('exit', JSON.stringify({ exit_code: result.exitCode }));
onExit(code: number | null) {
writeSSE('exit', JSON.stringify({ exit_code: code ?? -1 }));
closeStream();
},
onError(err: Error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Bridge /exec SSE endpoint now creates persistent process records for every streaming command

The bridge route at packages/sandbox/src/bridge/routes.ts:512-540 migrated from executor.exec(command, { stream: true, ... }) to executor.startProcess(command, { onOutput, onExit, onError }). While the old streaming exec path also created process records internally through ProcessService, the new code path is more explicit about creating long-lived process objects. Each bridge /exec call now returns a Process that appears in listProcesses() and must be cleaned up. If bridge consumers were relying on the old behavior where streaming exec didn't create visible process records, they may see process accumulation. The autoCleanup option is not set by the bridge, so cleanup behavior depends on the sandbox defaults.

(Refers to lines 512-540)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@ghostwriternr ghostwriternr force-pushed the spike/terminal-session-runtime branch from 268eca5 to bf04741 Compare June 18, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant