feat(subinterpreter): add opt-in TLS-cached thread state mode#6073
Open
ymwang78 wants to merge 2 commits into
Open
feat(subinterpreter): add opt-in TLS-cached thread state mode#6073ymwang78 wants to merge 2 commits into
ymwang78 wants to merge 2 commits into
Conversation
subinterpreter_scoped_activate previously created and destroyed a fresh PyThreadState on every activation when the calling OS thread was not already running the target interpreter. Workloads that repeatedly re-enter the same sub-interpreter from the same thread therefore churn thread states and lose per-thread interpreter state between activations (see pybind#6040). Add an opt-in subinterpreter_thread_state::cached policy: on first use a PyThreadState is created and stored in OS-thread-local storage keyed by the target interpreter; subsequent activations on that thread only swap it in/out and never destroy it. The default stays transient, so existing behavior is unchanged. Since pybind11 does not control thread lifetime, cleanup is explicit: subinterpreter::release_cached_thread_state() releases the calling thread's cached state for one interpreter, and the static release_all_cached_thread_states() releases all of the calling thread's cached states as an end-of-thread hook. The TLS map's destructor only frees its own nodes and never touches the Python C API, so an unreleased state leaks rather than crashing at thread exit. Includes test coverage and embedding docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds an opt-in “cached” subinterpreter activation mode that reuses a per-OS-thread PyThreadState for the same interpreter, plus APIs/docs/tests for explicitly releasing cached thread states to avoid leaks.
Changes:
- Introduce
subinterpreter_thread_statepolicy (transientvscached) and thread-local cache plumbing. - Add
subinterpreter::release_cached_thread_state()andrelease_all_cached_thread_states()for explicit cleanup. - Add documentation and a Catch2 test validating reuse and cleanup behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/test_with_catch/test_subinterpreter.cpp | Adds coverage for cached thread-state reuse across activations and threads, and for cleanup APIs. |
| include/pybind11/subinterpreter.h | Implements cached activation policy + new release APIs backed by a thread-local unordered_map. |
| docs/advanced/embedding.rst | Documents cached activation behavior, intended use cases, and required cleanup responsibilities. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
360
to
+366
| detail::get_internals().tstate.reset(); | ||
| PyThreadState_Clear(tstate_); | ||
| PyThreadState_DeleteCurrent(); | ||
| if (!cached_) { | ||
| PyThreadState_Clear(tstate_); | ||
| PyThreadState_DeleteCurrent(); | ||
| } | ||
| // When cached_, tstate_ stays alive in the OS-thread-local cache for reuse; the | ||
| // PyThreadState_Swap below merely detaches it from this thread. |
Comment on lines
+374
to
+392
| inline void subinterpreter::release_cached_thread_state() const { | ||
| if (istate_ == nullptr) { | ||
| return; | ||
| } | ||
| auto &cache = detail::subinterpreter_thread_state_cache(); | ||
| auto it = cache.find(istate_); | ||
| if (it == cache.end()) { | ||
| return; | ||
| } | ||
| PyThreadState *cached = it->second; | ||
| cache.erase(it); | ||
|
|
||
| // Make the cached state current (acquiring this interpreter's GIL) so it can be cleared and | ||
| // destroyed on the OS thread that created it, then restore whatever was active before. | ||
| PyThreadState *prev = PyThreadState_Swap(cached); | ||
| PyThreadState_Clear(cached); | ||
| PyThreadState_DeleteCurrent(); | ||
| PyThreadState_Swap(prev); | ||
| } |
Comment on lines
+386
to
+404
| // Make the cached state current (acquiring this interpreter's GIL) so it can be cleared and | ||
| // destroyed on the OS thread that created it, then restore whatever was active before. | ||
| PyThreadState *prev = PyThreadState_Swap(cached); | ||
| PyThreadState_Clear(cached); | ||
| PyThreadState_DeleteCurrent(); | ||
| PyThreadState_Swap(prev); | ||
| } | ||
|
|
||
| inline void subinterpreter::release_all_cached_thread_states() { | ||
| auto &cache = detail::subinterpreter_thread_state_cache(); | ||
| for (auto const &entry : cache) { | ||
| PyThreadState *cached = entry.second; | ||
| // prev is the state active before this swap; it is restored after each deletion, so it is | ||
| // never one of the cached states being destroyed here. | ||
| PyThreadState *prev = PyThreadState_Swap(cached); | ||
| PyThreadState_Clear(cached); | ||
| PyThreadState_DeleteCurrent(); | ||
| PyThreadState_Swap(prev); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
subinterpreter_scoped_activate previously created and destroyed a fresh PyThreadState on every activation when the calling OS thread was not already running the target interpreter. Workloads that repeatedly re-enter the same sub-interpreter from the same thread therefore churn thread states and lose per-thread interpreter state between activations (see #6040).
Add an opt-in subinterpreter_thread_state::cached policy: on first use a PyThreadState is created and stored in OS-thread-local storage keyed by the target interpreter; subsequent activations on that thread only swap it in/out and never destroy it. The default stays transient, so existing behavior is unchanged.
Since pybind11 does not control thread lifetime, cleanup is explicit: subinterpreter::release_cached_thread_state() releases the calling thread's cached state for one interpreter, and the static release_all_cached_thread_states() releases all of the calling thread's cached states as an end-of-thread hook. The TLS map's destructor only frees its own nodes and never touches the Python C API, so an unreleased state leaks rather than crashing at thread exit.
Includes test coverage and embedding docs.
Description
Suggested changelog entry:
📚 Documentation preview 📚: https://pybind11--6073.org.readthedocs.build/