Symptom
Windows CI intermittently fails test/cli/daemon.test.ts > bitsocial daemon can use a kubo node started by another program with:
Daemon failed to start: EPERM: operation not permitted, unlink
'C:\Users\runneradmin\AppData\Local\bitsocial\Data\.daemon_states\9244-daemon.state'
The spawned daemon exits 1 and the test throws spawnAsync process exited with code '1'. Only windows-latest is affected; ubuntu and macos pass.
Root cause
On daemon startup, daemon.ts calls await pruneStaleStates() (unguarded). That walks getAliveDaemonStates() and, for each dead PID, calls deleteDaemonState() → fs.unlink(). The catch there only swallows ENOENT, so any other error propagates and aborts startup.
On Windows, unlinking a file that another process still has open (or that is in "delete-pending" state) returns EPERM/EACCES/EBUSY — unlike POSIX, where unlink of an open file succeeds. Concurrent daemons share the global .daemon_states dir and race to prune the same dead-PID file; the loser of that race used to crash.
Fix
Pruning a stale file is best-effort cleanup and must never be fatal. Make deleteDaemonState tolerate EPERM/EACCES/EBUSY in addition to ENOENT (another daemon reclaims the file on its next prune). Add a regression unit test that mocks fs.unlink to throw EPERM and asserts deleteDaemonState resolves.
Symptom
Windows CI intermittently fails
test/cli/daemon.test.ts > bitsocial daemon can use a kubo node started by another programwith:The spawned daemon exits 1 and the test throws
spawnAsync process exited with code '1'. Onlywindows-latestis affected; ubuntu and macos pass.Root cause
On daemon startup,
daemon.tscallsawait pruneStaleStates()(unguarded). That walksgetAliveDaemonStates()and, for each dead PID, callsdeleteDaemonState()→fs.unlink(). The catch there only swallowsENOENT, so any other error propagates and aborts startup.On Windows, unlinking a file that another process still has open (or that is in "delete-pending" state) returns
EPERM/EACCES/EBUSY— unlike POSIX, where unlink of an open file succeeds. Concurrent daemons share the global.daemon_statesdir and race to prune the same dead-PID file; the loser of that race used to crash.Fix
Pruning a stale file is best-effort cleanup and must never be fatal. Make
deleteDaemonStatetolerateEPERM/EACCES/EBUSYin addition toENOENT(another daemon reclaims the file on its next prune). Add a regression unit test that mocksfs.unlinkto throwEPERMand assertsdeleteDaemonStateresolves.