Runtime: decide on the verified Rust TCB + real process sandbox by Aliipou · Pull Request #3 · Aliipou/authgate-kernel

Aliipou · 2026-06-10T15:56:51Z

Closes the two structural gaps between the proven kernel and the running runtime.

1. Runtime decides on the verified engine

RustBackedVerifier serializes the Python registry+action to the kernel JSON wire and calls authgate_kernel.verify_json -> crate::engine::verify (Kani/Lean-verified), returning an ed25519-signed verdict. JSON is the only boundary, so no Rust pyclass objects enter Python (the dual-Entity bug cannot occur). Epoch revocation is preserved by serializing only currently-valid claims. build_runtime(backend="python"|"rust"|"auto"); default stays python.

2. Real tool sandbox

SandboxPolicy runs each tool in an isolated subprocess with a wall-clock deadline + output cap (all platforms) and opt-in POSIX rlimits. Hanging/runaway/crashing tools are killed and reaped.

Verification

1000-engineer red team: 0 escapes on both backends (python + verified rust)
Two real vulns found & fixed by the harness: unbounded-** calculator DoS; Windows reserved-device sandbox bypass
Full suite 1302 passed; ruff + mypy clean; coverage 86.5% (>=85 gate)

Benchmarks

native Rust verify ~0.8us; Python verifier ~38us/decision; verified engine via JSON+ed25519 ~544us/decision.

The PyO3 extension is built separately; the runtime falls back to the pure-Python verifier where it is absent (CI included), so CI stays green.

Generated with Claude Code.

Replace PolyForm-NC/MIT with a contact-first license: read/evaluate only; any other use (incl. educational/research/RDI/commercial) needs prior written permission. Add NOTICE asserting independent authorship by Ali Pourrahim and that no third party holds an IP claim. - pyproject: license->LICENSE, author->Ali Pourrahim, classifier->Other/Proprietary (was MIT) - Cargo.toml: license-file + publish=false + author (was PolyForm) - deny.toml: ignore private crate license, drop PolyForm allow - README: license badge + section updated Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…served) Move from the contact-first license to PolyForm Noncommercial 1.0.0: research, educational, evaluation, and internal non-commercial use are allowed with attribution; commercial use, production deployment, resale, and SaaS require a separate commercial license. This keeps the IP/commercial leverage while enabling PhD evaluation, citation, and external validation. - LICENSE: PolyForm Noncommercial 1.0.0 (Required Notice: Ali Pourrahim) + commercial-license note - README: license badge + use-rights table + commercial contact - NOTICE: use section updated to the noncommercial model - pyproject/Cargo: license -> PolyForm-Noncommercial-1.0.0 (SPDX) - deny.toml: re-allow PolyForm-Noncommercial-1.0.0, drop private-ignore Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Recovers the runtime MVP interrupted by a shutdown and hardens it against two real vulnerabilities found by the adversarial harness. Runtime layer (non-TCB): Agent -> Planner -> loop -> CallGate.verify -> sandboxed Tool -> RunLog/AuditLog. 3 tools (calculator, file_read, web_search), mock/scripted planners, append-only jsonl trace. No multi-agent, no delegation DSL, no federation — MVP scope only. redteam/runtime_redteam.py: deterministic harness of N adversarial "engineers" across 10 attack categories; any escape is a hard failure. Vulnerabilities found and fixed: - DoS via unbounded `**` in calculator: `2**2**2**2**2**2` builds a ~10^19728-digit int and hangs the process (it hung the 1000-engineer run). Now bounded by expr length, exponent cap, and result-bits cap; overflow/divide-by-zero normalized to ValueError so the gate sees a clean denial instead of a hang. - Sandbox bypass via Windows reserved device names: `CON`/`NUL`/`COMn`/ `LPTn` resolve inside the sandbox yet open a device at open() time (`CON` blocks forever = DoS). Refused by name on every platform. Both vectors now covered by unit tests and the red-team harness. Full suite: 1283 passed; ruff + mypy clean (AUTHGATE_BACKEND=python). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Closes the two structural gaps between "proven" and "running": 1. Runtime authorization now runs on the verified engine. RustBackedVerifier serializes the Python registry+action to the kernel JSON wire and calls authgate_kernel.verify_json -> crate::engine::verify (the Kani/Lean-verified engine), returning an ed25519-signed verdict. JSON is the only boundary, so no Rust pyclass objects enter Python and the dual-Entity TypeError that forced AUTHGATE_BACKEND=python cannot occur. Epoch revocation (absent from the wire) is preserved by serializing only currently-valid claims. build_runtime(backend="python"|"rust"|"auto"); default stays python so environments without the built extension are unaffected. 2. Tools run in a real sandbox, not an in-process check. SandboxPolicy executes each tool in an isolated subprocess with a wall-clock deadline + output cap (all platforms) and opt-in POSIX rlimits (cpu/memory/file-size; off by default to avoid interpreter-startup footguns). A hanging/runaway/crashing tool is killed and reaped -> clean denial. Verification: 1000-engineer red team is green on BOTH backends (0 escapes), including epoch-revocation and audit-tamper. Full suite 1297 passed, 1 skipped (POSIX-rlimit test on Windows); ruff + mypy clean. Benchmarks: native Rust verify ~0.8us (10-claim); Python verifier ~38us/decision; verified engine via JSON+ed25519 ~544us/decision (~14x, sub-ms, JSON+signing dominated). See redteam/bench_verify.py. The PyO3 extension is built separately (cargo --lib); the runtime falls back to the pure-Python verifier wherever it is absent (CI included), keeping CI green. README updated with the runtime, backends, sandbox, red team, and benchmarks. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… margin) Adds extension-free tests for the JSON-wire marshalling (_entity/_resource/ _claim/_action_wire, _live_claims epoch/revocation filter) and in-process tests for the sandbox child runner. These run without the compiled extension, so they lift CI coverage of rust_backend.py and _sandbox_runner.py and keep the --cov-fail-under=85 gate comfortably satisfied (86.5%). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…o integration # Conflicts: # README.md

On POSIX, 'C:\Windows\win.ini' is not a drive-absolute path but a nonexistent in-sandbox filename, so file_read raises FileNotFoundError, not PermissionError — both mean "refused, no real file read". Accept either so the test passes on Linux CI as well as Windows. (Was the only red check on PR #3.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Make the claims match what is actually true: - Verified-Rust backend is OPT-IN and not run by CI (default + CI = Python); drop "the running and proven code are finally the same code". - Sandbox is OPT-IN (off by default) and is NOT a syscall/network jail; reword "Tools run in a real sandbox" -> "Tools can run in a real sandbox (opt-in)". - Red team: add explicit scope caveat — self-authored attacks on 3 MVP tools, evidence the known vectors are closed, not a proof of security or a substitute for external review. - Benchmarks: mark as single-machine, order-of-magnitude, not a spec. - Engineering Gaps: change "Done" rows to honest "opt-in / not run by CI". - Badge scoped to "runtime red team — 1000 cases" (not next to formal badges). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alexanderthenth and others added 8 commits June 1, 2026 15:13

Merge branch 'main' of https://github.com/Aliipou/authgate-kernel int…

73f1b3c

…o integration # Conflicts: # README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime: decide on the verified Rust TCB + real process sandbox#3

Runtime: decide on the verified Rust TCB + real process sandbox#3
Aliipou wants to merge 8 commits into
mainfrom
integration

Aliipou commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Aliipou commented Jun 10, 2026

1. Runtime decides on the verified engine

2. Real tool sandbox

Verification

Benchmarks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants