Skip to content

Refactor big mutex to reduce contention#917

Draft
seanmonstar wants to merge 2 commits into
masterfrom
sean/bye-bye-big-mutex
Draft

Refactor big mutex to reduce contention#917
seanmonstar wants to merge 2 commits into
masterfrom
sean/bye-bye-big-mutex

Conversation

@seanmonstar

Copy link
Copy Markdown
Member

This refactor puts the big internal shared mutex on a diet. The goal is to reduce contention, providing performance improvements when h2 is used on multi-threaded runtimes.

Warning

This is a large internal refactor, and it hasn't yet been sufficiently tested in a production deployment. While I tried to keep the behavior exactly the same, it's possible there's subtle changes or bugs.

If you do test this, let me know. I hope to have this tested thoroughly before possibly landing.

Architectural changes

There is now unique ownership of much of the connection state, owned solely by the "connection task".

Most conceptual things were split between an owned handle and shared values such as counts::Counts and counts::Shared. The Store is now uniquely owned by the connection, and the Streams stored in it contain an Arc<stream::Shared> for the parts that need to be shared with a "stream handle".

Stream handle operations now do very little. For receiving, it locks the shared connection recv buffer, pops a frame, and unlocks.

Likewise, when sending data, (or any other status-updating operation), it briefly locks a connection pending_streams queue, inserts itself, and unlocks. All actual changes are written on the stream's pending_ops field. The connection task includes a loop to briefly lock the queue, pop a stream, unlock, and then process the streams pending ops.

So, while this does introduce more locks, they are finer-grained, and should only be held very briefly. Longer "work" is no longer done while holding any lock.

This should result in tasks needing to wait less time for another task that previously was holding the world-lock.

cc #531

@seanmonstar seanmonstar force-pushed the sean/bye-bye-big-mutex branch 3 times, most recently from f6131ee to a1a3c07 Compare June 16, 2026 16:25
@seanmonstar seanmonstar force-pushed the sean/bye-bye-big-mutex branch from a1a3c07 to 170914e Compare June 16, 2026 16:48
@howardjohn

Copy link
Copy Markdown
Contributor

Very exciting! Nice work.

I gave it a quick try and found it would reliably deadlock in our setup. Full trace logs here:

h2-deadlock.zip

The flow is running under https://github.com/istio/ztunnel, opening 4 parallel iperf streams on the same connection. usually lasts for 1-2s before it starts to hang.

Ill work on investigating some more later just need to shift to something else right now and wanted to give quick feedback

@seanmonstar

Copy link
Copy Markdown
Member Author

Extremely helpful, thanks! I'm scouring the traces now, hopefully I'll have a fix. 🤓

@seanmonstar seanmonstar force-pushed the sean/bye-bye-big-mutex branch 2 times, most recently from 167cbdf to 1c670de Compare June 16, 2026 19:33
@seanmonstar

Copy link
Copy Markdown
Member Author

I took a look, and I'm hopeful I've fixed it (I found a lost waker bug). Any further benchmarking is greatly appreciated!

@howardjohn

Copy link
Copy Markdown
Contributor

With 1c670de I get the same symptoms unfortunately.

The repro is a bit annoying but:

clone https://github.com/istio/ztunnel

Run

PROXY_WORKLOAD_INFO=default/local/default LOCAL_XDS_PATH=./examples/localhost.yaml PROXY_MODE=dedicated FAKE_CA="true" XDS_ADDRESS="" LOCAL_XDS_PATH=./examples/localhost.yaml cargo run --features testing --profile quick-release -- proxy

In another terminal:

iperf3 -s

In a third terminal:

sudo useradd -u 1001 iptables1
sudo iptables -t nat -I OUTPUT 1 -p tcp -m owner --uid-owner 1001 -j REDIRECT --to-ports "15001"
sudo -u iptables1 iperf3 -c 127.0.0.1

usually runs for 2-3s before it dies on my machine

@howardjohn

Copy link
Copy Markdown
Contributor

Actually just cargo test on that repo runs into issues as well (not sure if its the same but does appear to deadlock)

@seanmonstar seanmonstar force-pushed the sean/bye-bye-big-mutex branch from 1c670de to e159e34 Compare June 22, 2026 18:21
@seanmonstar

This comment was marked as resolved.

@howardjohn

Copy link
Copy Markdown
Contributor

Thanks! stress_test_multiple_source does still seem to fail (or rather, hangs forever) and the iperf test still does reproduce. So seems possible one fix was made but other issues remain

@seanmonstar seanmonstar force-pushed the sean/bye-bye-big-mutex branch from e159e34 to d58e405 Compare June 22, 2026 20:54
@seanmonstar

This comment was marked as resolved.

@howardjohn

This comment was marked as resolved.

@seanmonstar

This comment was marked as resolved.

@howardjohn

Copy link
Copy Markdown
Contributor

np! If it helps I am on a amd 9950x (16c, 32 threads) on linux 7.0.9 x86

@seanmonstar seanmonstar force-pushed the sean/bye-bye-big-mutex branch 2 times, most recently from aa2cc3c to 51155df Compare June 23, 2026 19:41
@seanmonstar seanmonstar force-pushed the sean/bye-bye-big-mutex branch from 51155df to b9b304a Compare June 23, 2026 19:56
@seanmonstar

Copy link
Copy Markdown
Member Author

Did some more review for any lost wakers or lock ordering. CI is happy, and local ztunnel is happy.

Meanwhile, I'm going to try pushing up a new commit to checkout ztunnel in our CI since GH actions runners have 4 cores. I can ditch the commit once all is happy.

@seanmonstar

Copy link
Copy Markdown
Member Author

Well, the CI job checking ztunnel test suite passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants