Runtime: decide on the verified Rust TCB + real process sandbox#3
Open
Aliipou wants to merge 8 commits into
Open
Runtime: decide on the verified Rust TCB + real process sandbox#3Aliipou wants to merge 8 commits into
Aliipou wants to merge 8 commits into
Conversation
Replace PolyForm-NC/MIT with a contact-first license: read/evaluate only; any other use (incl. educational/research/RDI/commercial) needs prior written permission. Add NOTICE asserting independent authorship by Ali Pourrahim and that no third party holds an IP claim. - pyproject: license->LICENSE, author->Ali Pourrahim, classifier->Other/Proprietary (was MIT) - Cargo.toml: license-file + publish=false + author (was PolyForm) - deny.toml: ignore private crate license, drop PolyForm allow - README: license badge + section updated Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…served) Move from the contact-first license to PolyForm Noncommercial 1.0.0: research, educational, evaluation, and internal non-commercial use are allowed with attribution; commercial use, production deployment, resale, and SaaS require a separate commercial license. This keeps the IP/commercial leverage while enabling PhD evaluation, citation, and external validation. - LICENSE: PolyForm Noncommercial 1.0.0 (Required Notice: Ali Pourrahim) + commercial-license note - README: license badge + use-rights table + commercial contact - NOTICE: use section updated to the noncommercial model - pyproject/Cargo: license -> PolyForm-Noncommercial-1.0.0 (SPDX) - deny.toml: re-allow PolyForm-Noncommercial-1.0.0, drop private-ignore Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Recovers the runtime MVP interrupted by a shutdown and hardens it against two real vulnerabilities found by the adversarial harness. Runtime layer (non-TCB): Agent -> Planner -> loop -> CallGate.verify -> sandboxed Tool -> RunLog/AuditLog. 3 tools (calculator, file_read, web_search), mock/scripted planners, append-only jsonl trace. No multi-agent, no delegation DSL, no federation — MVP scope only. redteam/runtime_redteam.py: deterministic harness of N adversarial "engineers" across 10 attack categories; any escape is a hard failure. Vulnerabilities found and fixed: - DoS via unbounded `**` in calculator: `2**2**2**2**2**2` builds a ~10^19728-digit int and hangs the process (it hung the 1000-engineer run). Now bounded by expr length, exponent cap, and result-bits cap; overflow/divide-by-zero normalized to ValueError so the gate sees a clean denial instead of a hang. - Sandbox bypass via Windows reserved device names: `CON`/`NUL`/`COMn`/ `LPTn` resolve inside the sandbox yet open a device at open() time (`CON` blocks forever = DoS). Refused by name on every platform. Both vectors now covered by unit tests and the red-team harness. Full suite: 1283 passed; ruff + mypy clean (AUTHGATE_BACKEND=python). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Closes the two structural gaps between "proven" and "running": 1. Runtime authorization now runs on the verified engine. RustBackedVerifier serializes the Python registry+action to the kernel JSON wire and calls authgate_kernel.verify_json -> crate::engine::verify (the Kani/Lean-verified engine), returning an ed25519-signed verdict. JSON is the only boundary, so no Rust pyclass objects enter Python and the dual-Entity TypeError that forced AUTHGATE_BACKEND=python cannot occur. Epoch revocation (absent from the wire) is preserved by serializing only currently-valid claims. build_runtime(backend="python"|"rust"|"auto"); default stays python so environments without the built extension are unaffected. 2. Tools run in a real sandbox, not an in-process check. SandboxPolicy executes each tool in an isolated subprocess with a wall-clock deadline + output cap (all platforms) and opt-in POSIX rlimits (cpu/memory/file-size; off by default to avoid interpreter-startup footguns). A hanging/runaway/crashing tool is killed and reaped -> clean denial. Verification: 1000-engineer red team is green on BOTH backends (0 escapes), including epoch-revocation and audit-tamper. Full suite 1297 passed, 1 skipped (POSIX-rlimit test on Windows); ruff + mypy clean. Benchmarks: native Rust verify ~0.8us (10-claim); Python verifier ~38us/decision; verified engine via JSON+ed25519 ~544us/decision (~14x, sub-ms, JSON+signing dominated). See redteam/bench_verify.py. The PyO3 extension is built separately (cargo --lib); the runtime falls back to the pure-Python verifier wherever it is absent (CI included), keeping CI green. README updated with the runtime, backends, sandbox, red team, and benchmarks. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… margin) Adds extension-free tests for the JSON-wire marshalling (_entity/_resource/ _claim/_action_wire, _live_claims epoch/revocation filter) and in-process tests for the sandbox child runner. These run without the compiled extension, so they lift CI coverage of rust_backend.py and _sandbox_runner.py and keep the --cov-fail-under=85 gate comfortably satisfied (86.5%). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…o integration # Conflicts: # README.md
On POSIX, 'C:\Windows\win.ini' is not a drive-absolute path but a nonexistent in-sandbox filename, so file_read raises FileNotFoundError, not PermissionError — both mean "refused, no real file read". Accept either so the test passes on Linux CI as well as Windows. (Was the only red check on PR #3.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make the claims match what is actually true: - Verified-Rust backend is OPT-IN and not run by CI (default + CI = Python); drop "the running and proven code are finally the same code". - Sandbox is OPT-IN (off by default) and is NOT a syscall/network jail; reword "Tools run in a real sandbox" -> "Tools can run in a real sandbox (opt-in)". - Red team: add explicit scope caveat — self-authored attacks on 3 MVP tools, evidence the known vectors are closed, not a proof of security or a substitute for external review. - Benchmarks: mark as single-machine, order-of-magnitude, not a spec. - Engineering Gaps: change "Done" rows to honest "opt-in / not run by CI". - Badge scoped to "runtime red team — 1000 cases" (not next to formal badges). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the two structural gaps between the proven kernel and the running runtime.
1. Runtime decides on the verified engine
RustBackedVerifierserializes the Python registry+action to the kernel JSON wire and callsauthgate_kernel.verify_json->crate::engine::verify(Kani/Lean-verified), returning an ed25519-signed verdict. JSON is the only boundary, so no Rust pyclass objects enter Python (the dual-Entity bug cannot occur). Epoch revocation is preserved by serializing only currently-valid claims.build_runtime(backend="python"|"rust"|"auto"); default stayspython.2. Real tool sandbox
SandboxPolicyruns each tool in an isolated subprocess with a wall-clock deadline + output cap (all platforms) and opt-in POSIX rlimits. Hanging/runaway/crashing tools are killed and reaped.Verification
**calculator DoS; Windows reserved-device sandbox bypassBenchmarks
native Rust verify ~0.8us; Python verifier ~38us/decision; verified engine via JSON+ed25519 ~544us/decision.
The PyO3 extension is built separately; the runtime falls back to the pure-Python verifier where it is absent (CI included), so CI stays green.
Generated with Claude Code.