Skip to content

[Concurrency] Data races & memory-safety bugs in the generated AtomFS code: file-scope / global state is accessed outside the per-inode locks #1

@LittleS321

Description

@LittleS321

Summary

While studying the generated AtomFS filesystem produced by SpecFS (the eval/<variant>/ C code emitted by DeepSeek R1), we ran a concurrency audit and found 7 distinct concurrency / memory-safety defects across the pre_alloc, extent, rbtree, and delay_alloc variants.

Note on scope: these are defects in the LLM-generated FS implementation, not in the hand-written FUSE skeleton, the MCS lock, or the Python orchestration. The file headers (Generated by LLM (DeepSeek R1) with the SpecFS framework) confirm the affected files are generated.


Findings at a glance

ID Type Severity Variant(s) Primary location
A1 atomicity → OOB write High delay_alloc delalloc.c:95-101 (bwrite)
F7 OOB write (mem-safety) High all variants lowlevel_file.c:100-102 (+ increment sites)
F1 data race High pre_alloc, rbtree (baseline+opt) lowlevel_file.c:100-102 decl, :275-277 write
F5 data race High extent (baseline+opt) lowlevel_file.c:102-105 (+8 increment sites)
F6 data race High rbtree (baseline+opt) mballoc.c:5 decl, :12/42/81
R1/R2 data race (benign) Low delay_alloc read atomfs-fuse.c:45-46 ↔ write delalloc.c:50/74
D1 lock-lifecycle: destroy-held-lock + leak Low–Med pre_alloc, delay_alloc util.c dispose_inode + del/rename callers

(Line numbers are from the baseline/optimization source as indicated; the same pattern recurs in the sibling variants noted.)


Group A — Memory safety: A1, F7

A1 — bwrite flush window: check-then-act OOB on the global alloc buffer [High]

  • Where: eval/delay_alloc/optimization/delalloc.c, bwrite:
 95     if (alloc_buffer.size == ALLOC_BUFFER_BLOCK_NUM) {   // capacity == 128 (common.h:41)
 96         buffer_unlock(&alloc_buffer);                    // <-- lock released
 97         bflush();                                        // resets alloc_buffer.size = 0
 98         buffer_lock(&alloc_buffer);                      // <-- re-acquired, size NOT re-checked
 99     }
100
101     struct record *new_rec = &alloc_buffer.rec[alloc_buffer.size];   // rec[128] if refilled in the gap
...
104     new_rec->buf_offset = alloc_buffer.size * PAGE_SIZE;             // buf + 128*PAGE_SIZE

with rec[ALLOC_BUFFER_BLOCK_NUM] and buf[ALLOC_BUFFER_BLOCK_NUM * PAGE_SIZE] (i.e. [128]) declared at common.h:56-57.

  • Why it's an atomicity violation, not just a missing lock: buffer_lock is a non-recursive per-buffer MCS mutex, and bflush() itself takes buffer_lock — so bwrite is forced to drop the lock around bflush() to avoid self-deadlock. In that gap, another thread can refill the buffer back to 128. After re-locking, bwrite does not re-check size, so rec[alloc_buffer.size] and alloc_buffer.buf + size*PAGE_SIZE index at [128] → out-of-bounds write into fixed-size global arrays. alloc_buffer is a single global (all files share it), so two threads writing different files (each under its own inode lock) contend here — the same "per-inode lock doesn't cover the global" structure as the global-counter races in Group B (F1/F5/F6).
  • Deeper root cause (spec-level): the SySSpec specification for bwrite requires both "lock the mutex before proceeding and unlock after the operation is done" (the whole find-or-append must be one atomic critical section) and "AA-deadlock avoidance: never acquire a lock already held by yourself." These two guarantees are unsatisfiable at the bwrite → bflush call: the model resolved it by dropping the lock mid-operation (satisfying no-self-lock) at the cost of tearing the atomic region (violating atomicity) — and then forgot the post-gap re-check. This is arguably a specification blind spot, not just a coding slip.
  • Impact: out-of-bounds write to fixed global arrays (the only concurrency-induced memory corruption in this set — F7 is the same OOB class but sequential/cumulative).
  • Fix: don't release buffer_lock across the flush (use a lock-free / already-locked bflush variant).

F7 — access_count is an unbounded index into a fixed-size array → OOB write [High]

  • Where: eval/*/lowlevel_file.caccess_data / access_extent are fixed [10000000] (:100-101); access_count is unsigned, initialized = 0, incremented per page op (:249, :285, …) and never reset.
  • Why: on a long-lived mount, after ~10^7 page operations access_data[access_count] = ... writes past the end of the array (the array bound is hit long before the unsigned wraps at ~4.3×10^9). Independent of the race — a deterministic latent OOB.
  • Impact: heap/global corruption → crash or silent corruption.
  • Fix: bound/reset the index, or remove the profiling arrays from the production path (also resolves F1's index corruption).

Group B — Unsynchronized global accounting state (data races): F1, F5, F6, R1/R2

These all have the same shape: a file-scope / global counter (and, for F1, parallel arrays) is incremented in the read/write I/O path with no lock at all, while the only synchronization in the system is the per-inode MCS lock — which does not serialize threads operating on different inodes.

F1 — access_count / access_data[] / access_extent[] raced [High]

  • Where: eval/pre_alloc/baseline/lowlevel_file.c
    • 100 void* access_data[10000000];
    • 101 void* access_extent[10000000];
    • 102 unsigned access_count = 0;
    • Read-modify-written in both file_write (≈:275-277) and file_read (≈:236-249). Zero lock() calls in the whole file.
    • Same globals present in rbtree/{baseline,optimization} and pre_alloc/optimization.
  • Why it races: atomfs_write / atomfs_read on two different files each take their own per-inode MCS lock, then call file_write / file_read, which do access_count++ and write access_data[access_count] on the shared globals. The per-inode locks do not mutually exclude here.
  • Impact: non-atomic RMW on access_count (lost updates) + a racing index into access_data[] / access_extent[] (conflicting / torn writes). The corrupted index also feeds F7.

F5 — extent counters data_*_count / metadata_*_count unlocked [High]

  • Where: eval/extent/baseline/lowlevel_file.c:102-105 declares metadata_read_count, metadata_write_count, data_read_count, data_write_count; incremented at 8 sites in file_read/file_write. Zero locks in the file.
  • Why / impact: identical to F1 — concurrent extent I/O updates four globals with no synchronization. (This variant was the one most easily missed: a counter audit that only greps access_count will skip it.)

F6 — rbtree allocator pool_access_count unlocked [High]

  • Where: eval/rbtree/baseline/mballoc.c:5 declares unsigned pool_access_count = 0;, incremented at :12 / :42 / :81. Zero locks in mballoc.c.
  • Why: atomfs_write → inode_write → file_write → file_allocate → mballoc → pool_access_count++, all under only the leaf inode lock. Concurrent allocation for different files races.
  • Reachability note: slightly narrower than F1 — needs ≥2 threads allocating new pages (create / extend / write into a hole); pure overwrite of already-allocated pages skips mballoc.

R1/R2 — background delayed_function reads data_read_count / data_write_count unlocked [Low, benign]

  • Where: writer side delalloc.c:50 (data_write_count) / :74,:109 (data_read_count) holds buffer_lock; reader side atomfs-fuse.c:45-46 (delayed_function) reads both without taking buffer_lock.
  • Why it races: buffer_lock only serializes writer-vs-writer. The background thread (pthread_create(..., delayed_function, ...) at atomfs-fuse.c:599, runs in the production main, while(true){ sleep(5); print counters; }) never takes that lock, so nothing orders "background read" against "locked writer increment" → a genuine read/write race.
  • Impact: benign in practice — counters are int, aligned int reads are atomic on the target, so the stats file is merely a few counts stale; no tearing, no lost updates. Rated Low.
  • Also (real, separate bug): a type mismatch — atomfs-fuse.c:35 declares extern unsigned data_read_count, data_write_count; but delalloc.c:7-8 defines them as int. That's an ODR / UB inconsistency (usually benign on common ABIs, but worth fixing).

Suggested fix for Group B (F1/F5/F6/R1-R2): make the counters _Atomic (atomic_fetch_add) or, since they are eval/profiling instrumentation, drop them from the production path. For R1/R2 also take buffer_lock on the read side (or make the counters atomic) and reconcile the unsigned vs int declaration.


Group C — Lock lifecycle: D1

D1 — dispose_inode destroys a still-held lock and leaks its MCS queue node [Low–Med]

  • Where:
    • pre_alloc: util.c:156-157 (mcs_mutex_destroy(inum->impl); free(inum); with no preceding unlock); callers atomfs.c rename dispose_inode(dstinode) (:374, dstinode still locked) and atomfs_del (:206).
    • delay_alloc: util.c dispose_inodemain_dir.c:47/52 (atomfs_del: unlock(cur) then dispose_inode(inum)) ← directory_operations.c:118/125 (check_del returns with inum still locked).
  • Why: every unlink / rmdir(empty) / rename-overwrite disposes the target inode while its MCS lock is still held — destroying a held lock and leaking the inum->hd queue node that lock() malloc'd (mcs_mutex_destroy only frees the lock struct, never the held node).
  • Impact: deterministic memory leak — one inum->hd node leaked per successful delete/rename-overwrite (triggers single-threaded). The "destroy a held lock" is currently benign only because the parent-directory lock serializes deletion: a concurrent waiter can't reach the inode after the directory entry is unlinked, so no use-after-free materializes. It remains a latent correctness hazard for any future refactor (e.g. hard-link support).
  • Fix: unlock(inum) before dispose_inode in atomfs_del / atomfs_rename (and free inum->hd in dispose as defense-in-depth).

Suggested fixes (summary)

  1. A1: keep buffer_lock held across the flush.
  2. F7: bound/reset the index or drop the fixed [10000000] arrays.
  3. F1 / F5 / F6 / R1-R2: make the accounting counters _Atomic, or remove the profiling instrumentation from the production path; for R1/R2 also lock the reader / reconcile the unsigned vs int declaration.
  4. D1: unlock(inum) before dispose_inode.

Environment / reproduction

  • Affected source: eval/{pre_alloc,extent,rbtree}/{baseline,optimization}/ + eval/delay_alloc/optimization/ (generated FS implementation).
  • All line numbers re-verified against the current eval/ source at audit time.

Found by @LittleS321, @hemnd and Claude Code via formal concurrency analysis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions