Skip to content

feat: build Linux releases against musl with mimalloc allocator#112

Open
jacderida wants to merge 6 commits into
WithAutonomi:rc-2026.5.4from
jacderida:feat/musl-linux-builds
Open

feat: build Linux releases against musl with mimalloc allocator#112
jacderida wants to merge 6 commits into
WithAutonomi:rc-2026.5.4from
jacderida:feat/musl-linux-builds

Conversation

@jacderida
Copy link
Copy Markdown
Collaborator

Summary

  • Switch both Linux release targets from glibc to musl (x86_64-unknown-linux-musl,
    aarch64-unknown-linux-musl, built via cross). Published binaries are now musl-static, so they
    run on any Linux distribution — including Alpine and other musl-based systems — while continuing
    to run on glibc hosts (a static binary has no dynamic-linker dependency). Asset filenames are
    unchanged (ant-node-cli-linux-{arm64,x64}.tar.gz), so the existing auto-upgrade asset
    matcher and ant node add keep working with no client-side change.
  • Use mimalloc as the global allocator for ant-node and ant-devnet. musl's default
    allocator is significantly slower than glibc's under the concurrent allocation churn of a
    DHT/relay node; mimalloc neutralises that, and tends to beat glibc's allocator too.
  • Carries the upgrade-cache hardening from fix(upgrade): re-verify ML-DSA signature on every cache hit #100 (per-cache-hit ML-DSA re-verification,
    FIFO/pipe rejection at the cache entry path, narrowed copy TOCTOU). These were validated together
    with the musl swap on the same testnets — see evidence below. fix(upgrade): re-verify ML-DSA signature on every cache hit #100 can be closed in favour of
    this PR.

Why

On musl distros (e.g. Alpine), ant node start previously failed with a misleading
No such file or directory (os error 2): the downloaded glibc ant-node requires a glibc dynamic
linker (/lib/ld-linux-*.so) that doesn't exist on musl, so execve returns ENOENT for the
missing loader rather than the binary. Shipping musl-static Linux binaries fixes this for all
Linux distributions at once, with no separate "musl variant" to maintain.

Evidence

DEV-03 — auto-upgrade glibc → musl (PASS)

A staged-rollout test confirming existing glibc nodes upgrade cleanly to the musl build.

DEV-03 auto-upgrade musl-swap test — PASS

Setup verified before triggering: pre-upgrade binary was glibc-dynamic (interpreter
/lib64/ld-linux-x86-64.so.2), ant-node 0.11.3, identical across the sampled VMs — premise held. Channel was
correctly set to Beta (the gotcha from the DEV-02 run), and the v0.11.14-rc.1 pre-release was already
published, so the staged rollout fired automatically. 157 services (10 node VMs × 15 + 7 bootstraps) rolled
over the 2-hour window with zero errors.

┌───────────────────────────┬─────────────────────────────────────────────────────────────────────────────┐
│ Binary downloaded once    │ ✅ Downloading ant-node binary ×1 on every host (422 MB build)              │
│ per host                  │                                                                             │
├───────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ GitHub release info       │ ✅ 1 detection fetch/host; 36 cache-hits vs 2 fetches confirms lock+cache   │
│ fetched once per host     │ dedup (2nd fetch = TTL refresh or post-upgrade routine re-check, never      │
│                           │ per-service)                                                                │
├───────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ No upgrade errors         │ ✅ 0 ERROR/WARN, 0 crashes, 0 failed units                                  │
├───────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ Upgraded binary is        │ ✅ 157/157 static-pie linked / ldd: statically linked; 0 glibc              │
│ musl-static everywhere    │                                                                             │
├───────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ Cache-hit signature       │ ✅ Reused signature-verified cached archive ×14 per node VM                 │
│ re-verification ran       │                                                                             │
├───────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ Peer ID retained          │ ✅ 157/157 identical pre/post                                               │
├───────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ Restart times evenly      │ ✅ scheduled & actual both uniform, 12 bins of 10–18, not 2–4 bursts        │
│ distributed               │                                                                             │
├───────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ Real process restart each │ ✅ pre-exit msg + NRestarts≥1 + new PID + journald stop/start, all 157      │
│  node                     │                                                                             │
├───────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ No (deleted) inode (hard  │ ✅ 0/157                                                                    │
│ failure)                  │                                                                             │
└───────────────────────────┴─────────────────────────────────────────────────────────────────────────────┘

The key result for the (deleted) concern: all 157 services share /usr/local/bin/ant-node. The coordination
logged exactly Replacing binary ×1 per host, with every other service hitting Binary already upgraded …
skipping replacement after reusing the signature-verified cached archive — so the file is replaced once per
host, no process runs from a stale inode, and /proc/<pid>/exe is clean on every service.

DEV-01 vs DEV-02 — 12 h musl + mimalloc vs glibc reference A/B soak (PASS)

Two identical 450-node fleets under identical load; only the binary differs.

DEV-01 (musl + mimalloc) vs DEV-02 (glibc reference) — 12h A/B Report

Setup: two identical 450-node fleets (30 VMs × 15 nodes, DO+Vultr, 9 NAT-simulated), identical client load,
both ant-node 0.11.4. Only the binary differs — DEV-01 musl-static + mimalloc (BuildID 884f505), DEV-02
official glibc release (4e07325). Soak window ~12.2 h (17:05 → 05:17 UTC).

Crash / stability — PASS, both arms

- 906 services (453/arm) active/running, NRestarts=0. Zero crashes, restarts, or failed units on either arm.
- Zero OOM-killer events, zero SIGSEGV/SIGABRT/panic/stack-overflow lines across all 66 VMs.
- musl-specific risks (128 KiB thread stack, stripped resolver) did not materialise — 0 DNS errors fleet-wide.

Memory — the headline

Total RSS tracks within 1–3% between arms the entire window:

┌───────────────────┬──────┬──────┬────────┐
│                   │ t+1h │ t+6h │ t+12h  │
├───────────────────┼──────┼──────┼────────┤
│ DEV-01 musl mean  │ 230  │ 567  │ 979 MB │
├───────────────────┼──────┼──────┼────────┤
│ DEV-02 glibc mean │ 202  │ 565  │ 982 MB │
└───────────────────┴──────┴──────┴────────┘

Both grow ~5×, but that growth is memory-mapped LMDB chunk store, not heap. Per-node breakdown at 12h:

┌─────────────────┬────────────────────────────────────┬──────────────────────────────────┐
│       Arm       │ RssAnon (heap, allocator-governed) │ RssFile (LMDB mmap, reclaimable) │
├─────────────────┼────────────────────────────────────┼──────────────────────────────────┤
│ musl + mimalloc │ 250 MB/node                        │ 726 MB/node                      │
├─────────────────┼────────────────────────────────────┼──────────────────────────────────┤
│ glibc + system  │ 231 MB/node                        │ 750 MB/node                      │
└─────────────────┴────────────────────────────────────┴──────────────────────────────────┘

chunks.mdb is 4.6 GB (mmap'd); free shows ~10.8 GB available per VM — no memory pressure. The RSS climb is
chunk accumulation under continuous upload load (file-backed page cache, reclaimable), allocator-independent,
and identical on both arms.

- Allocator verdict: mimalloc's heap (RssAnon) is ~250 vs glibc's ~231 MB/node — a modest +8% (~19 MB/node),
stable, no leak. Consistent with mimalloc's known slight RSS-for-speed tradeoff.
- Worst-node total-RSS tail ran a bit higher on musl (peak ~3.8 vs ~2.8 GB), but that's dominated by random
chunk-distribution variance in the mmap component, not the allocator.

CPU — neutral

Mean per-node CPU: musl 25.7% vs glibc 25.1%.

P2P health — PASS, both arms

Uploads completing ok (10/25/500 MB) on both; 1300–3200 accepted connections/node; 0 DNS errors; negligible
ERROR rate (0–3 lines/node/12h). Comparable across arms.

Overall verdict: PASS — musl + mimalloc is a safe swap

Over 12 h under identical load, the musl + mimalloc build is stability-, memory-, and CPU-neutral versus the
glibc reference release: zero crashes, ~8% (~19 MB/node) more anonymous heap, identical total-RSS trajectory,
identical CPU. No leak, no musl regression. The large RSS growth seen on both arms is the LMDB chunk store and
is not allocator-related (it affects the current reference release equally).

Test plan

  • Auto-upgrade glibc → musl on a 157-service testnet (DEV-03): single download per host, musl-static
    binary on every node, peer IDs retained, even rollout, real process restarts, no stale-inode execution.
  • 12 h A/B soak musl+mimalloc vs glibc reference (DEV-01/DEV-02): no crashes, memory/CPU neutral, no
    musl-specific failures (DNS, thread-stack), no allocator leak.
  • Confirm release workflow produces the expected musl assets (filenames unchanged) on the next tagged
    build.

Closes #100

grumbach and others added 6 commits May 23, 2026 16:14
The shared upgrade binary cache stored the extracted binary and, on a
cache hit, returned it after only a SHA-256 check against a sibling
.meta.json. SHA-256 is not a security control: anyone able to write to
the shared cache directory (a co-located process, a shared container
volume, a low-privilege foothold on the host) could drop a malicious
binary plus a forged matching metadata hash, and the next ant-node
instance to upgrade would execute it with no signature verification at
all — persistent RCE on every co-located node. The ML-DSA-65 signature
covers the archive and was only checked on the initial download, never
on a cache hit.

Changes:

- Cache the signed *archive + detached signature* instead of the
  extracted binary. `BinaryCache::get_verified_archive` re-runs ML-DSA-65
  verification on every cache hit; the binary is always extracted fresh
  from the just-verified archive. A tampered archive, tampered or
  missing signature, or forged metadata fails verification against the
  pinned release public key, so a poisoned cache entry is rejected and a
  fresh verified download runs.

- Stage cached files into the caller's process-private temp directory
  and verify that copy, then extract from the same private path. Closes
  the verify-vs-extract TOCTOU on the shared cache files: an attacker
  cannot swap the bytes between when the verifier reads them and when
  the extractor reads them.

- Size policy before any copy or read. `fs::symlink_metadata` +
  `file_type().is_file()` rejects symlinks / FIFOs / devices outright;
  archive size is bounded by `MAX_ARCHIVE_SIZE_BYTES` and the signature
  must be exactly `SIGNATURE_SIZE` bytes. Otherwise an attacker could
  plant `cached.archive -> /dev/zero` (stats as 0 bytes) and force
  unbounded disk fill in the staging dir or OOM in `signature::verify`.

- Cache only after successful extraction. A validly-signed-but-malformed
  release no longer becomes a shared cache poison pill that every later
  node downloads, fails to extract, and re-downloads.

- `cache_dir.rs` restricts the shared upgrade cache directory to 0700
  on Unix as defence in depth; the ML-DSA gate is the primary control.

- `store_archive` mirrors the same size / file-type / signature checks
  before persisting, so a poisoned entry cannot be created through the
  supported path either.

Tests in `src/upgrade/binary_cache.rs` cover the tamper path
(SHA-256-forged swap on disk rejected by the signature re-check), the
post-hit shared-file swap (private copy unaffected), the symlink-to-
`/dev/zero` bypass attempt, oversize archive / wrong-sized signature
rejection, and round-trip storage. Production verifies against the
pinned `RELEASE_SIGNING_KEY`; tests use a `#[cfg(test)]`-only
constructor that injects a generated key without weakening the
production trust anchor.

Residual: cache entries are not bound to a specific release version
(the ML-DSA signing context is constant across versions), so a
same-UID attacker who already has any past validly-signed release can
plant it under a newer version's cache key and force a downgrade to
that old signed binary. Not RCE (still legitimately-signed bytes) and
a same-UID attacker has easier paths anyway; closing it cleanly
requires coordinated changes in the release-signing pipeline,
ant-keygen, ant-node, and ant-client, and is tracked in the
binary_cache module docs.
Review feedback on the upgrade binary cache:

- `meta.json` was read with an unbounded `fs::read_to_string`. An
  attacker with write access to the shared cache directory could plant
  the metadata sidecar as a symlink to `/dev/zero` or as a huge file
  and stall the read into a hang/OOM before the archive/sig hardening
  ran. The metadata path now goes through the same
  open-once-and-validate gate as the archive: regular-file check on
  the opened handle, capped at `MAX_META_BYTES` (4 KiB).

- Archive + signature staging previously did `symlink_metadata` (path)
  followed by `fs::copy` (path), leaving a small TOCTOU window where
  an attacker could race-swap the path to a symlink/FIFO/device or an
  oversized file between the check and the copy. Both files are now
  opened once via `open_regular_capped`, validated on the resulting
  `File` handle (size + file-type), and copied into the private
  staging dir from the open handle (wrapped in `Read::take(len)` as
  belt-and-braces against a post-open extension). All subsequent
  operations on those files use the staged private bytes, never the
  shared path.

- Comment fix: the prior comment claimed `sha256_file` loads the
  archive into memory in full. It actually streams in 8 KiB chunks;
  the memory-pressure concern is `signature::verify_from_file*`
  (FIPS-204 requires the message as a slice). Wording updated.

- Stale error message "Failed to serialize binary cache meta" updated
  to "Failed to serialize cached archive metadata" — the cache now
  stores archive metadata, not extracted-binary metadata.

Two new tests:
  test_oversized_meta_is_rejected
  test_meta_symlink_to_special_file_is_rejected  (Unix-only)

488 lib tests pass; cfd clean.
Close a local DoS on auto-upgrade: a cache-dir attacker could plant a
FIFO at ant-node-<ver>.archive (or .sig / .meta.json) and open() for
reading would block indefinitely waiting for a writer, hanging the
upgrade. open_regular_capped previously only checked file type AFTER
the blocking open.

Two-layer defence in open_regular_capped:
- Pre-check via fs::metadata (follows symlinks), reject non-regular
  files before open(). A symlink-to-regular is still accepted as
  before; a symlink-to-FIFO/device/socket is rejected.
- On Unix, also open with O_NONBLOCK so a race between the pre-check
  and open() cannot reopen the FIFO window. Reads on regular files
  ignore O_NONBLOCK, so this is a no-op for the happy path. Platform-
  specific constant (0o4000 Linux, 0x0004 macOS/BSD); fallback to no
  flag on unknown unix-likes.

The existing post-open is_file() check on the file handle remains the
TOCTOU-safe final gate.

New regression test test_fifo_cached_archive_does_not_hang plants a
real FIFO via mkfifo and asserts return in well under 2s. 14/14
binary_cache tests pass; cfd clean.
Round 2 from adversarial review:

- Replace hand-coded O_NONBLOCK constants with libc::O_NONBLOCK. The
  previous 0o4000/0x0004 per-OS values were correct on
  x86_64/aarch64/arm but wrong on Linux/MIPS (0o200) and Linux/SPARC
  (0x4000), where 0o4000 maps to O_NOATIME. Using the libc constant
  always picks the right value for the target arch. Add libc as a
  Unix-only direct dependency (was already transitive).

- Test test_fifo_cached_archive_does_not_hang: replace the mkfifo
  shell-out with libc::mkfifo so a CI image that drops coreutils
  cannot silently skip this test. Bump the budget from 2s to 5s to
  absorb GitHub Actions macOS runner cold-start variance, since the
  failure mode "O_NONBLOCK wrong on this arch" and "CI runner slow"
  look identical from the assertion.

- Document the load-bearing invariant on get_verified_archive's
  private_dir: callers MUST supply a process-private 0o700 dir
  (apply.rs already does via tempfile + permissions). Without that the
  reopens-by-path in sha256_file/verify_archive would reopen a TOCTOU
  window.

- Add a cross-reference comment explaining the intentional asymmetry
  between store_archive (uses symlink_metadata, rejects symlinks) and
  open_regular_capped (uses fs::metadata, accepts symlink-to-regular)
  so a later editor doesn't unify them in the wrong direction.

14/14 binary_cache tests pass, 489/489 lib tests pass, cfd clean.
Switch both Linux release targets from glibc to musl so the published
binaries run on any Linux distribution, including Alpine and other
musl-based systems. Asset filenames are unchanged
(ant-node-cli-linux-{arm64,x64}.tar.gz) so existing auto-upgraders on
deployed nodes continue to find them.

x86_64-unknown-linux-musl now uses `cross` for the musl toolchain
(matching aarch64). musl-static binaries have no dynamic linker
dependency and execute on glibc hosts as well as musl hosts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
musl's default malloc is notably slower than glibc's under concurrent
allocation churn — the steady-state shape of a DHT-bridged P2P node.
Switching the global allocator to mimalloc neutralises that regression
for the musl Linux builds, and tends to outperform glibc's allocator as
well, so all builds benefit.

Applied to both ant-node and ant-devnet binaries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@dirvine dirvine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: PR #112 — musl builds + mimalloc + upgrade cache hardening

This is a high-quality PR. The changes are well-scoped, thoroughly documented in the code, and validated with real-world testnet evidence (12h A/B soak on 2×450-node fleets + 157-node auto-upgrade test). I have no blocking concerns. Below is a structured review.

1. musl build switch (CI) ✅

Clean change. Switching both x86_64-unknown-linux-gnu and aarch64-unknown-linux-gnu to their -musl counterparts, using cross: true for the musl toolchain container. Asset filenames are unchanged so the auto-upgrade asset matcher works without client-side changes. The DEV-03 test confirmed glibc→musl auto-upgrade on 157 services with zero errors, peer IDs retained, and no stale-inode execution.

2. mimalloc global allocator ✅

Correct choice — musl's default malloc under concurrent allocation churn is the known weak point, and mimalloc neutralises it (tending to beat glibc too). The 12h A/B soak confirmed: zero crashes, ~8% more RssAnon (250 vs 231 MB/node, ~19 MB difference), identical total-RSS trajectory. No leak, no CPU regression (25.7% vs 25.1%).

Minor note (non-blocking): The Cargo.toml spec mimalloc = "0.1" is a semver range (^0.1). The Cargo.lock pins 0.1.50, so deterministic builds are fine. If you want to guard against cargo update pulling an untested minor bump, consider adding a precise pin comment in Cargo.toml referencing the soak-tested version.

3. Upgrade cache hardening — security rewrite ✅✅

This is the most important change. The previous design cached the extracted binary and only SHA-256-checked it on cache hits — meaning a cache-dir write attacker could replace the binary with a matching SHA-256 hash and get it executed without any signature verification (persistent RCE). The new design:

  • Caches signed archives (not extracted binaries), with detached ML-DSA-65 signatures
  • Re-verifies the ML-DSA signature on every cache hit — the SHA-256 metadata is a corruption pre-check only
  • Closes the verify-vs-extract TOCTOU by copying the archive into a process-private 0700 staging directory before verification, so extraction reads exactly the bytes that were verified
  • Defends against FIFO/pipe/symlink attacks via open_regular_capped(): pre-check + O_NONBLOCK (via libc::O_NONBLOCK for correct per-arch constant) + post-open is_file() check
  • Sets 0700 on the shared cache directory (defence in depth)
  • Creates upgrade temp dirs with 0700 via tempfile::Builder::permissions

The test coverage for the cache is excellent: store-and-retrieve, tampered archive rejection, private-copy immunity to post-verify swap, missing signature, missing meta, oversize rejection, wrong-size signature, symlink rejection, FIFO non-hang, oversized meta rejection, meta-symlink-to-special rejection. The FIFO test correctly uses libc::mkfifo (not a shell-out), so CI can't silently skip it.

4. Residual concerns (documented, acceptable)

Cache version binding: The PR's own docs acknowledge that SIGNING_CONTEXT = "ant-node-release-v1" is constant across versions, so an attacker with cache-dir write could swap an old signed archive under a new version's key (forced downgrade/wrong-arch crash loop, not RCE). This is out of scope and is acceptable given:

  • The 0700 cache dir permissions shrink the attack surface significantly
  • An attacker with same-UID write can already replace the running binary directly

open_regular_capped TOCTOU: Between the fs::metadata() pre-check and opts.open(), a cache-dir writer could swap the file to something that passes the size/file-type check but contains different bytes. The security model handles this correctly: the open fd is used to stream into the private staging dir, and the private copy is what gets ML-DSA-verified. Even with a swapped-open file, the signature won't match the new bytes → cache hit rejected → fresh verified download.

Summary

Verdict: Approve. This is a well-structured, well-tested, properly documented PR. The musl+mimalloc swap is validated by production-grade soak testing, and the upgrade cache hardening closes a genuine RCE vulnerability (the old SHA-256-only cache-hit gate). The code quality is high, with thorough comments explaining the threat model, design choices, and residual risks throughout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants