fix: soft resync — recover a desynced world cache without losing agent state by atiweb · Pull Request #796 · mindcraft-bots/mindcraft

atiweb · 2026-06-17T21:13:49Z

Problem

When the world cache desyncs — lag-induced "ghost blocks", where the bot's cached chunks no longer match the server (long-standing upstream issue PrismarineJS/mineflayer#2600) — the pathfinder plans against blocks that aren't really there and the agent can wedge itself in ways no in-game recovery fixes. Today the only escape is a full process restart (cleanKill), which re-downloads chunks but also throws away everything the agent was doing: conversation history, memory, the self-prompt goal, the current task.

The same heavy hammer is used in a far more common place: after !smeltItem succeeds, the code does a full restart just to refresh mineflayer's inventory cache — cleanKill('Safely restarting to update inventory.').

Fix

Add Agent.softResync(): quit and reconnect the mineflayer bot in place. A fresh connection re-downloads chunks — the same thing a restart does to clear the desync — but the Node process, and therefore all agent state, survives.

!smeltItem now calls softResync('refresh inventory after smelting') instead of restarting. If a resync ever fails it falls back to the existing cleanKill restart, so behavior is never worse than today.

Why the diff is larger than a one-liner

_start originally assumed one connection for the lifetime of the process: the connection handlers, the NPC controller and the update loop were all set up inline, once. Reconnecting in place means generalizing that assumption, and that is most of the diff:

_bindConnectionHandlers() — the kicked / end / error / login handlers are extracted out of _start so they can be re-bound to the new bot instead of duplicated.
_eventsInitialized guard — the NPC controller and the update loop are process-global, not per-connection, so they must start exactly once; otherwise a resync would spin up a second update loop.
_reconnecting flag — a deliberate disconnect must not be treated as a crash by _onDisconnect, and update() must skip its tick while this.bot is being swapped out.
isReconnect param on _setupEventHandlers — on a reconnect we skip the greeting / memory reload / task bootstrap so the bot quietly rejoins with the state it already has.

Each piece is as small as I could make it; the size comes from centralizing the connection lifecycle, not from added features.

Validation — please read this part

I want to be upfront about how far this has and hasn't been tested, because it touches the connection lifecycle.

Tested live on a real server (Paper 1.21.x), running the exact code in this PR. I connected the bot through a local TCP proxy so I could inject faults at the network layer without touching the code under test, and ran it autonomously with another real player online:

softResync, in place (the smelt trigger). A real !smeltItem completed (Successfully smelted raw_iron, got 1 iron_ingot), then softResync fired: the bot quit and reconnected through the proxy in ~3s and logged Soft resync complete; world cache refreshed, agent state preserved. The process never exited (no restart), and the agent immediately continued its self-prompt loop with its goal and history intact — the exact win over the old full-restart path.
Connection refactor under real network drops. I destroyed the bot's TCP connection at the proxy several times — while idle and while mid-action (pathfinding). Each time the refactored handlers detected the drop, the process exited cleanly, the mindserver respawned it, and it reconnected and resumed its goal within ~5s. No hang, no double update loop, no orphaned listeners.
The desync case, via fault injection. I can't summon real lag-induced ghost blocks on demand, so I also exercised the resync with a harness that drops block_change / multi_block_change packets to force the world cache out of sync, then triggers softResync() and verifies the cache is correct again afterwards.

What I have NOT done, and where I'd genuinely value help:

Only one server / one latency profile (Paper). Not validated on vanilla / Spigot / Fabric, across mineflayer versions, or with multiple agents.
I have not observed softResync recover a genuine organic desync in the wild (only fault-injected). The fixed 2.5s settle before respawn is a guess that may need tuning on slower/faster servers.

I'm opening this so others can run it on different servers and conditions, surface the cases I can't reproduce, and suggest improvements — rather than sitting on it until I've personally covered every environment. Happy to gate it behind a setting until it's proven more broadly, or share the fault-injection harness if it helps review.

…ing agent state The only recovery from a desynced world cache (lag-induced ghost blocks, PrismarineJS/mineflayer #2600) was a full process restart, which throws away the agent's conversation history, memory and self-prompt goal. smeltItem also forced a full restart just to refresh mineflayer's inventory cache after smelting. Add Agent.softResync(): quit and reconnect the mineflayer bot in place. A fresh connection re-downloads chunks -- the same fix a restart applies -- but the Node process (and all agent state) survives. To reconnect in place safely: - extract the connection handlers into _bindConnectionHandlers() so they can be rebound to the new bot (previously inline in _start, which assumed a single connection for the whole process lifetime); - guard the NPC controller and the update loop behind _eventsInitialized so a resync re-attaches per-bot listeners without spinning up a second update loop; - mark the deliberate disconnect (_reconnecting) so _onDisconnect does not treat it as a crash, and skip the update tick while the bot is being swapped out; - skip the greeting / memory reload / task bootstrap on a reconnect (isReconnect). Wire smeltItem to softResync instead of cleanKill, giving the feature an organic trigger. Falls back to a full restart if the reconnect fails.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: soft resync — recover a desynced world cache without losing agent state#796

fix: soft resync — recover a desynced world cache without losing agent state#796
atiweb wants to merge 1 commit into
mindcraft-bots:developfrom
atiweb:fix/soft-resync-preserve-state

atiweb commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

atiweb commented Jun 17, 2026

Problem

Fix

Why the diff is larger than a one-liner

Validation — please read this part

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant