Spec: Sandboxed Exec Contract
Mirrored from
docs/design/policy-sandbox/specs/sandbox-exec-contract.md. Edit the source document in the repository, not this generated page.
orbit-exec::run_process is the single primitive every shell-invoking tool spawns through. This spec names the supervision invariants and failure modes that contract must preserve.
Why This Exists
Section titled “Why This Exists”Process supervision is full of subtle deadlocks (full pipe buffers, orphan grandchildren, signal races). Without a prescriptive contract, callers may build tools that bypass the supervision layer or assume invariants that the layer does not actually provide.
Spawn Invariants
Section titled “Spawn Invariants”- Sandbox validation first.
run_processcallssandbox.validate(req)before spawning. The defaultNoSandboxalways returnsOk, but any future impl that returnsErraborts the spawn before any state changes. - Pipes for capture. Stdout and stderr are always piped to the parent. Tools that want live terminal output use
ExecRequest::debug = true, which tees the captured bytes through a redaction-aware drain rather than skipping capture. - Stdin mode.
StdinMode::Inherit(default),Null, orBytes(Vec<u8>).Bytesallocates a stdin pipe and a writer thread; the other modes do not. - Environment mode.
EnvironmentMode::Inherit(default) orClearAndSet(pairs).ClearAndSetcallscommand.env_clear()then sets the supplied pairs. TheDebugimpl redacts values for keys that matchis_sensitive_env_name. - Process group leadership (Unix). Children spawn with
command.process_group(0), so the child’s PGID equals its PID. Non-Unix builds skip this step. - Working directory.
current_diris applied to the spawn before the child runs. - Spawn failure. If the OS fails to spawn the program,
run_processreturnsOrbitError::Execution("failed to spawn: <error>")and never enters the supervision loop.
Supervision Invariants
Section titled “Supervision Invariants”- Background drains.
wait_with_optional_timeoutspawns reader threads for stdout and stderr immediately after spawn. The child must never block on a full pipe buffer because the parent is not reading. - Stdin writer thread. When
StdinMode::Bytesis set, a writer thread copies the payload to the child’s stdin. A failed write terminates the child viaterminate_process_groupand surfaces asOrbitError::Execution(<message>). - Poll interval. The wait loop polls with
WAIT_POLL_INTERVAL = 100ms(or the remaining deadline, whichever is smaller). The interval is global and not per-request configurable. - Signal handler installation (Unix). A
SignalHandlerGuardinstalls SIGINT and SIGTERM handlers for the duration of the wait loop. Installation acquires a process-globalMutexso concurrent calls cannot race. Drop restores the previous handlers in reverse order. - Timeout escalation. When the deadline expires,
terminate_process_group(child, SIGTERM, poll_interval)is called. If the group does not exit withinTERMINATION_GRACE_PERIOD = 5 seconds,kill_process_group(SIGKILL) is invoked plus a directchild.kill()/child.wait(). - Parent-signal escalation. When the parent receives SIGINT or SIGTERM during the wait, the same termination path runs with the received signal. The result reports
exit_code = Some(128 + signal)andsuccess = false. - Clean-exit reaping. When the child exits cleanly, the wait loop calls
kill_process_group(child.id())to reap any orphan subprocesses still holding pipe write ends, then joins the reader threads. Without this, an orphan grandchild can keep the pipes open and block reader-thread completion indefinitely. - Stderr annotation. Timeouts append
process timed outto stderr; parent-signal interruption appendsprocess interrupted by signal SIG<NAME>. The annotations are added before the result is constructed, not by the caller. - Exit code reporting.
ExecutionResult::exit_codeisSome(code)for clean exits,Some(128 + signal)for parent-signal exits, andNonefor timeouts.
Result Shape
Section titled “Result Shape”ExecutionResult { success, stdout, stderr, exit_code, duration_ms, output }:
successreflects the child’s exit status (clean exit with zero status). Timeouts and parent-signal exits reportsuccess = false.stdoutandstderrareString::from_utf8_lossyconversions of the captured bytes. Non-UTF-8 output is preserved as replacement characters.duration_msis wall-clock time fromInstant::now()at spawn entry to spawn return.outputis reserved for callers that want to attach a parsed-output payload after the fact;run_processitself does not populate it.
Failure Modes
Section titled “Failure Modes”- Spawn failure.
OrbitError::Execution("failed to spawn …")— caller cannot retry without changing the request. - Stdin write failure. Writer-thread error → child terminated →
OrbitError::Execution(<error>)returned. Captured stdout/stderr up to that point are discarded. - Stdin writer panic. Writer-thread panic →
OrbitError::Execution("stdin writer thread panicked")returned. - Signal handler install failure. If
sigactionfails for SIGINT or SIGTERM, the guard rolls back any partial install andrun_processreturnsOrbitError::Execution(<error>)before entering the wait loop. - Wait error.
child.wait_timeouterrors surface asOrbitError::Execution("wait timeout error: …"). The child is left to be reaped by the OS rather than force-killed in this path; this is a known soft spot. - Timeout.
success = false,exit_code = None, stderr suffixed withprocess timed out. - Parent signal.
success = false,exit_code = Some(128 + signal), stderr suffixed with the signal name.
Concurrency Constraints
Section titled “Concurrency Constraints”- Single signal-handler install at a time. The global
MutexinSignalHandlerGuardserializes installs. Two concurrentrun_processcalls in the same process must take turns at install/drop boundaries; the wait loops themselves run concurrently once the handlers are active. - No assumption about thread-local state. Reader threads, writer threads, and the signal handler are spawned with
'staticrequirements; callers must not rely on thread-local data from the spawning thread. - No retry inside
run_process. The runner does not retry spawn failures, wait errors, or signal-install failures. Retry policy belongs to the caller.
Migration Rules
Section titled “Migration Rules”- New
ExecRequestfields must default to a backwards-compatible behavior;EnvironmentMode::default()andStdinMode::default()exist precisely so callers can adopt new fields incrementally. - A future kernel-level
Sandboximpl must implementvalidateto either (a) gate at request-time before spawn, or (b) wrap the spawned process inside its isolation primitive. Mid-spawn isolation that races withprocess::spawnis out of scope for this contract. - Changes to
TERMINATION_GRACE_PERIODorWAIT_POLL_INTERVALrequire an ADR because both constants are observable in the timeout/cancel behavior of every shell-invoking tool.
Agent Signature
Section titled “Agent Signature”Last revised by claude / claude-opus-4-7 for [T20260426-0622].