Make py_zstd_compress_mt2 thread-safe#321
Conversation
645526c to
693cd72
Compare
|
I intentionally created context only one time, hoped it speed things up. |
That's what I assumed 😅 I think if you want to keep caching contexts, I could change the code to have a kind of pool for contexts and re-use them across calls while guaranteeing a single context is never used for more than one call at a time.
If you want a reproducer, I don't know if it's possible to create one easily, not one that will consistently fail anyway (the crash is rare). Maybe under free-threaded Python, it would show up more. I can try to give it a look. |
|
|
|
I have a reproducer now, which consistently triggers crashes even with the GIL (and triggers TSan, on a free-threaded TSan build). I think this could be added as a regression test after the fix is added. import threading
import zstd
# Large payload so each call takes long enough for threads to overlap.
DATA: bytes = b"hello world this is a test " * 200_000 # 5 MB
THREADS: int = 64
ITERS: int = 500
LEVELS: list[int] = [1, 3, 5, 9, 3, 1] # mix of levels triggers reset_cContext
errors: list[Exception] = []
lock = threading.Lock()
def compress_loop() -> None:
for i in range(ITERS):
level = LEVELS[i % len(LEVELS)]
try:
zstd.compress2(DATA, level)
except Exception as exc:
with lock:
errors.append(exc)
threads = [threading.Thread(target=compress_loop) for _ in range(THREADS)]
for t in threads:
t.start()
for t in threads:
t.join()Output |
|
Ok, maybe there is no simple way. And it's safer to do a new context every time. |
|
Marked 15.7.2 as last safe release. |
Ah, I thought you were looking into it! I'll have a look this week :) |
What is this PR?
This PR is a fix for a crash we witnessed in a service of ours.
py_zstd_compress_mt2used a single global compression context (at the module level),m_cctx, which is thus shared across all calls:python-zstd/src/python-zstd.h
Line 130 in fec54dd
python-zstd/src/python-zstd.c
Lines 279 to 281 in fec54dd
Since
ZSTD_CCtxis not thread-safe, and because compression releases the GIL, two Python threads calling these functions at the same time use the same context concurrently. This can corrupt the state and lead to out-of-bounds access inside zstd.The crash was detected in this form:
Looking further into it, we found another race as well. When called with different
levelvalues,reset_cContextfrees and recreatesm_cctx. If one thread does this while another is still inside the GIL-releasedZSTD_compress2, the latter compresses into a freed context.Proposed fix
The fix proposed in this PR is to make
py_zstd_compress_mt2create and free its ownZSTD_CCtxon each call, instead of sharing the module-global one. This is the same pattern already used bypy_zstd_compress_mt:python-zstd/src/python-zstd.c
Lines 141 to 149 in fec54dd