Release the GIL during Stretch::process() (~7x speedup on 8 threads) by naveensr89 · Pull Request #4 · gregogiudici/python-stretch

naveensr89 · 2026-05-22T23:34:32Z

What this changes

Wraps the C++ stretch work inside Stretch::process() in nb::gil_scoped_release so the GIL is released for the duration. The Python-object boundary (input nb::ndarray read, output nb::ndarray construction) stays under the GIL, so the change is API-compatible and crash-free.

Why

Stretch::process() is pure C++ on raw float* buffers — no Python objects are touched between the input read and the return-value allocation. Holding the GIL across that work prevents ThreadPoolExecutor-based pipelines from parallelizing it.

Measurements

Microbench, 4 s stereo @ 44.1 kHz, 8 threads each running an independent Stretch() instance (+3 semitones, 1.25× tempo):

Build	Serial 8×	Parallel 8×	Speedup
`python-stretch==0.3.1` (current `main`)	424 ms	419 ms	0.98×
this PR	399 ms	55 ms	7.18×

Determinism is unchanged: same Stretch config produces bit-identical output (np.array_equal(out_a, out_b) == True on repeated calls).

The benchmark is reproducible via examples/benchmark_multithread.py (no audio files required — uses np.random input).

What's safe and what isn't

Safe (covered by this patch):

stretch_.seek / stretch_.process / stretch_.flush / stretch_.reset
The Buffer<float> wrappers and the std::copy of channel data into the output float*

Kept under the GIL (outside the release scope):

audio_input.data() / audio_input.shape() — nb::ndarray accessors
new float[…] for the output buffer (just malloc, GIL-free in principle, but kept above the release scope for clarity)
The final return nb::ndarray<…>(outData, …, owner) which constructs a Python object

If users share a single Stretch instance across threads they're still on their own — internal stretcher state is not protected. The intended pattern is one Stretch per thread (or per call), which works correctly with this patch.

Test plan

Microbench: 8-thread parallel speedup goes from 0.98× to 7.18×
Output determinism: bit-identical across repeated calls
No crashes when called from concurrent.futures.ThreadPoolExecutor(max_workers=40) for >1000 stretches
tests/test_multithread.py: single-thread determinism, parallel consistency, cross-run stability (all pass against the PR build)
Full test suite (pytest tests/): 12/12 pass

The C++ stretch computation runs on raw float buffers and does not touch any Python objects, so the GIL can be released for the duration. Without this change, concurrent calls from a ThreadPoolExecutor serialize on the GIL (microbench: 0.98x on 8 threads). With it, 8 threads scale 7.18x. The release is scoped to just the stretch_/Buffer work — the nb::ndarray input read and the nb::ndarray return-value construction stay under the GIL, since both touch Python-managed memory.

gregogiudici · 2026-05-24T17:07:38Z

Hey @naveensr89, thanks for the PR!
It's really nice to see someone else wanting to work on this library. The performance improvement looks great!

Before merging, could you provide a small test set that explicitly lock down the expected behavior?
Something like a test_multithread.py with:

Single-thread determinism: same input -> bit identical outputs across repeated calls.
Parallel consistency: multiple indipendent Stretch instances run in a thread pool -> match with a single-thread reference output.
Cross-run stability: repeating the same parallel batch gives identical results across different runs.

Also, could you share a bit more about how you ran the microbench and analyzed the results?
It would be great if I could verify the same results in tests on my own local setup.

I haven’t worked on the library for a while as I’ve been busy with other projects, but I’d be delighted if we could improve it even further

tests/test_multithread.py covers the three behaviors requested in PR review: 1. single-thread determinism: same input → bit-identical output on repeated calls 2. parallel consistency: N independent Stretch instances in a ThreadPoolExecutor match serial reference outputs (bit-identical) 3. cross-run stability: same parallel batch repeated twice gives identical results examples/benchmark_multithread.py is a self-contained reproducible benchmark (no audio files required) that measures serial vs parallel throughput and prints a speedup table. Confirmed results on this machine (8 vCPU): serial 8×: ~32 ms, parallel 8 threads: ~10 ms → 3.1× speedup (patched build) vs ~1× on unpatched 0.3.1 — GIL release confirmed working. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

naveensr89 · 2026-05-26T17:35:32Z

Hi @gregogiudici, thanks for the review! Both additions are now in the latest commit.

Tests — tests/test_multithread.py covers all three cases you asked for:

Single-thread determinism — same input → np.array_equal on repeated calls
Parallel consistency — 8 independent Stretch instances in a ThreadPoolExecutor → each matches its serial reference output (bit-identical)
Cross-run stability — same parallel batch run twice → identical results

All 12 tests (existing + new) pass against the PR build.

Benchmark — examples/benchmark_multithread.py is self-contained (no audio files needed, uses np.random input) and prints a serial vs parallel table you can run directly:

python examples/benchmark_multithread.py

Results vary by machine. On an 8-vCPU host I get ~3.1× with this patch vs ~1.0× on 0.3.1. The 7.18× from the PR description was measured on a 48-vCPU machine where threads have more
independent cores to spread across — both confirm the GIL is genuinely released.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release the GIL during Stretch::process() (~7x speedup on 8 threads)#4

Release the GIL during Stretch::process() (~7x speedup on 8 threads)#4
naveensr89 wants to merge 2 commits into
gregogiudici:mainfrom
naveensr89:gil-release

naveensr89 commented May 22, 2026 •

edited

Loading

Uh oh!

gregogiudici commented May 24, 2026

Uh oh!

naveensr89 commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

naveensr89 commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this changes

Why

Measurements

What's safe and what isn't

Test plan

Uh oh!

gregogiudici commented May 24, 2026

Uh oh!

naveensr89 commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

naveensr89 commented May 22, 2026 •

edited

Loading