Skip to content

perf: replace deepcopy with shallow copy in graph/swarm state management#2276

Closed
AnnasMazhar wants to merge 1 commit intostrands-agents:mainfrom
AnnasMazhar:perf/reduce-deepcopy-overhead
Closed

perf: replace deepcopy with shallow copy in graph/swarm state management#2276
AnnasMazhar wants to merge 1 commit intostrands-agents:mainfrom
AnnasMazhar:perf/reduce-deepcopy-overhead

Conversation

@AnnasMazhar
Copy link
Copy Markdown

Closes #2275

Summary

Replace copy.deepcopy() with targeted shallow copies in GraphNode and SwarmNode state save/restore.

Motivation

deepcopy on a typical conversation (20 messages, 10 tools) costs 0.218ms per call. In a 5-node graph that's 1.1ms of pure copy overhead. See #2275 for full benchmark.

Changes

  • Added _copy_messages() helper: [msg.copy() for msg in messages] (~67x faster)
  • Added _copy_model_state() helper: shallow dict copy + tools list copy (~70x faster)
  • Replaced 8 copy.deepcopy() calls across graph.py and swarm.py
  • Added 2 tests verifying copy isolation (append safety + mutation safety)

Safety

Safe because the executor only:

  1. Appends new messages (agent.py:1123) — never mutates existing message dicts
  2. Replaces model state values — never mutates nested tool specs

Testing

  • hatch fmt --formatter
  • hatch fmt --linter ✓ (ruff + mypy pass)
  • New tests: test_copy_messages_isolation, test_copy_model_state_isolation

Replace copy.deepcopy() with targeted shallow copies in GraphNode and
SwarmNode state save/restore. The executor only appends new messages
and replaces content blocks — it does not mutate existing message dicts
in-place, making shallow copies safe.

Benchmark (20 messages, 10 tools, 5-node graph):
  Before: 1.1ms per graph execution (deepcopy)
  After:  0.002ms per graph execution (shallow copy)
  Speedup: ~600x on state management overhead

For a 20-node complex workflow, this reduces copy overhead from 9ms
to 0.01ms per execution.

Added tests:
- test_copy_messages_isolation: verifies append and key replacement
  on copy do not affect original
- test_copy_model_state_isolation: verifies scalar and tools list
  modifications on copy do not affect original
@AnnasMazhar
Copy link
Copy Markdown
Author

Closing this — after deeper analysis I found that the executor DOES mutate existing message dicts in-place:

  • agent.py:933: self.messages[-1]["content"] = ... (replaces content list)
  • bedrock.py:398: messages[last_user_idx]["content"].append(...) (appends to content)
  • repository_session_manager.py:303: messages[index + 1]["content"].extend(...)

A shallow copy (msg.copy()) shares the content list reference, so these mutations would leak back to the saved initial state. deepcopy is necessary here for correctness.

A safe optimization would need to deep-copy only the content lists (not the entire recursive structure), but that requires more careful analysis of all mutation paths. Leaving the issue open for discussion.

Sorry for the premature PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: deepcopy in GraphNode/SwarmNode state management adds ~1ms per graph node

1 participant