[ty] Break the semantic index out into its own crate by AlexWaygood · Pull Request #24471 · astral-sh/ruff

AlexWaygood · 2026-04-07T14:02:31Z

Summary

The hope is that this might make our incremental compile times a bit faster for ty_python_semantic. It seems like a logical split to me anyway, even if it doesn't: almost all of our machinery in the semantic_index submodule currently has no knowledge of types or type inference, and will never need to have any knowledge of types and type inference.

The only bit that gets a little bit painful here is with reachability constraints. Several constraint-resolution methods on the UseDefMap have to be converted into standalone functions that live in the ty_python_semantic crate and receive a &UseDefMap as an argument, which isn't quite as ergonomic.

One outstanding question here is whether we should rename ty_python_semantic? ty_types might be a good fit if the semantic-index crate is called ty_semantic_index. ty_python_semantic and ty_semantic_index just feel a little similar.

Happily, codspeed reports no performance regressions on this PR :-)

One obvious other downside to this PR is that a bunch of things that were previously pub(crate) now need to be pub, which will make it harder to detect unused methods and fields.

Test Plan

existing tests

astral-sh-bot · 2026-04-07T14:08:55Z

`ecosystem-analyzer` results

No diagnostic changes detected ✅

Large timing changes:

Project	Old Time	New Time	Change
`egglog-python`	0.28s	0.44s	+56%
`pyjwt`	0.05s	0.08s	+54%
`com2ann`	0.06s	0.08s	+51%

Full report with detailed diff (timing results)

astral-sh-bot · 2026-04-07T14:10:39Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

astral-sh-bot · 2026-04-07T14:11:46Z

Typing conformance results

No changes detected ✅

Current numbers

The percentage of diagnostics emitted that were expected errors held steady at 87.92%. The percentage of expected errors that received a diagnostic held steady at 83.11%. The number of fully passing files held steady at 78/133.

astral-sh-bot · 2026-04-07T14:15:28Z

Memory usage report

Memory usage unchanged ✅

AlexWaygood · 2026-04-07T14:46:30Z

This file had to be split into two: one reachability_constraints.rs file in the ty_semantic_index crate that is totally naive about types, and one reachability_constraints.rs file in ty_python_semantic that uses the data structures exposed by the ty_semantic_index module and uses them to actually evaluate reachability

AlexWaygood · 2026-04-07T14:48:34Z

+    pub(crate) fn of_node(
+        db: &'db dyn Db,
+        node: &NodeWithScopeKind,
+        index: &SemanticIndex<'db>,
+    ) -> Option<Self> {


this is NodeWithScopeKind::generic_context() on main, but we can no longer define methods directly on NodeWithScopeKind that return a GenericContext with this refactor, so it has to become a method on GenericContext instead.

Why of_node and not from_node? We could by using an extension trait, but I don't think it's worth it here. This one method seems perfectly fine.

Why of_node and not from_node?

Just because, semantically, we're not converting a node into a generic context, we're trying to construct the generic context of the node

MichaReiser · 2026-04-07T16:09:11Z

My preference here would be to move the type stuff into ty_python_types and use ty_python_semantic.

AlexWaygood · 2026-04-07T16:47:54Z

The refactor I'm proposing here would split semantic analysis of a Python module between two crates: a first pass (the semantic indexing phase) in ty_semantic_index, and the second phase (type inference) in a higher-level crate. So I'm happy to rename the higher-level crate to ty_python_types, but I don't think it makes sense to name the lower-level crate ty_python_semantic, since it's only very low-level semantic analysis that's being done in that crate.

carljm · 2026-04-07T16:52:53Z

I agree with @AlexWaygood that I would find it strange to have a crate named ty_python_semantic which does only the semantic indexing; that's only the most minimal understanding of "Python semantics".

Also agree that probably the best naming here requires not having any crate named ty_python_semantic (even though that means more rename churn), since the whole point of this refactor is that we will no longer have a single crate responsible for implementing Python semantics.

AlexWaygood · 2026-04-07T17:44:23Z

Also agree that probably the best naming here requires not having any crate named ty_python_semantic (even though that means more rename churn), since the whole point of this refactor is that we will no longer have a single crate responsible for implementing Python semantics.

I'll wait until we have consensus on this point before doing that rename, since it'll only increase the risk of painful merge conflicts on this PR. (And, in fact, I'm somewhat inclined to do that rename in a followup.)

MichaReiser · 2026-04-08T06:56:14Z

My concern with ty_python_semantic_index is that it's too narrowly scoped, unless we factor out even more crate. It won't just contain the semantic index; it is also a better fit for suppressions and fixes. It contains the semantic analysis settings struct ProgramSettings, and it should probably contain the logic for how to define lint rules (so that a ty_python_linter crate doesn't have to depend on ty_python_types.

@AlexWaygood did you measure whether this helps with incremental compile times? I want to avoid spending time on this if it doesn't.

AlexWaygood · 2026-04-08T10:42:16Z

I tested the timings of incremental builds on main and on this PR branch by running cargo build -p ty on both branches, then applying this diff, then running cargo build -p ty again (with no other processes running):

diff --git a/crates/ty_python_semantic/src/types.rs b/crates/ty_python_semantic/src/types.rs
index 1a92b994cc..18bccfb807 100644
--- a/crates/ty_python_semantic/src/types.rs
+++ b/crates/ty_python_semantic/src/types.rs
@@ -157,7 +157,7 @@ pub fn check_types(db: &dyn Db, file: File) -> Vec<Diagnostic> {
     for scope_id in index.scope_ids() {
         // Scopes that may require type context are inferred during the inference of
         // their outer scope.
-        if scope_id.accepts_type_context(db) {
+        if !scope_id.accepts_type_context(db) {
             continue;
         }

I did one measurement for both branches with CARGO_PROFILE_DEV_OPT_LEVEL=0, which is cargo's default, and measured a small decrease in incremental compile times on this branch: 4.67s to 4.40s. I did several measurements with CARGO_PROFILE_DEV_OPT_LEVEL=1, which is now what we tell agents to set when compiling ty, and what I've started setting locally, because it makes running mdtests so much faster once ty has been compiled. Here the impact was more significant, though still not huge: I measured incremental compile times of 13.80s, 13.90s, and 14.00s on main, compared to timings of 12.40s, 12.51s and 12.61s on this branch.

AlexWaygood · 2026-04-08T10:43:49Z

it should probably contain the logic for how to define lint rules (so that a ty_python_linter crate doesn't have to depend on ty_python_types.

Hmm, but surely the whole point of being able to write a type-aware linter on top of ty's semantic model is that the lints would be able to know about types? I don't see how a type-aware linter could do that without having ty_python_types in its dependency tree.

AlexWaygood · 2026-04-09T17:22:50Z

Did you test the impact of an incremental change to semantic indexing? Probably not significant, but it's possible that there is an regression in that case.

I didn't. I can do so tomorrow, but I'm less worried about that anyway since I feel like changes to the semantic index are overall much less common than changes to other parts of the crate currently known as ty_python_semantic

ibraheemdev · 2026-04-09T17:30:54Z

Naming aside, I think I'm in favor of this change. It is a very straightforward extraction of an existing module into a separate crate, and has a real impact on compile times right now. We can always restructure things in the future as things become more integrated.

MichaReiser

I obviously didn't review this line by line:

You could consider using an extension trait to avoid regressing the reachability API. I think that's worth it because the new free-standing functions are rather awkward to use and discover
I slightly prefer ty_python_analysis over ty_python_core because it makes it clear that, e.g,. formatting logic should be put elsewhere ;)
Git fails to detect some moves, which is very unfortunate. Can we extract some refactors into separate PRs so that git recognizes the file move?

AlexWaygood · 2026-04-11T11:59:33Z

I slightly prefer ty_python_analysis over ty_python_core because it makes it clear that, e.g,. formatting logic should be put elsewhere ;)

I still dislike ty_python_analysis, because it implies that all analysis will be done from this crate, when really only very basic, low-level analysis is done from this crate -- nearly all "meaningful" analysis will be done from higher-level crates.

Does the _python_ in the name do much for us here? ty_core_analysis or ty_foundational_analysis could work well?

MichaReiser · 2026-04-11T12:24:12Z

ty_python_core is fine for now. We can always rename it.

MichaReiser

Thank you. This looks good.

It's s a shame that we'll lose most of Rust's unused code detection because everything is now pub.

I'd prefer if we moved PythonPlatform to ty_python_core, considering that it isn't used within ty_site_packages and it simplifies ty_site_packages's dependencies.

I'd also prefer if we remove the testing feature from ty_python_core (at least if my understanding of what we use it for is correct, see my inline comment)

I'm not looking forward to my merge conflicts :(

MichaReiser · 2026-04-13T06:59:14Z

+    pub(crate) fn of_node(
+        db: &'db dyn Db,
+        node: &NodeWithScopeKind,
+        index: &SemanticIndex<'db>,
+    ) -> Option<Self> {


Why of_node and not from_node? We could by using an extension trait, but I don't think it's worth it here. This one method seems perfectly fine.

MichaReiser · 2026-04-13T07:02:32Z

+pub(crate) fn is_reachable<'db>(
+    db: &'db dyn Db,
+    use_def: &UseDefMap<'db>,
+    reachability: ScopedReachabilityConstraintId,
+) -> bool {
+    evaluate_reachability(db, use_def, reachability).may_be_true()
+}
+
+pub(crate) fn binding_reachability<'db, 'map>(
+    db: &'db dyn Db,
+    use_def: &'map UseDefMap<'db>,
+    binding: &BindingWithConstraints<'map, 'db>,
+) -> Truthiness {
+    use_def.reachability_constraints().evaluate(
+        db,
+        use_def.predicates(),
+        binding.reachability_constraint,
+    )
+}
+
+pub(crate) fn evaluate_reachability(
+    db: &dyn Db,
+    use_def: &UseDefMap,
+    reachability: ScopedReachabilityConstraintId,
+) -> Truthiness {
+    use_def
+        .reachability_constraints()
+        .evaluate(db, use_def.predicates(), reachability)
+}


I assume those aren't used enough to justify an extension trait?

It was mainly that the ergonomics of these APIs actually seem clearer as standalone functions than as methods. And that I prefer to avoid extension traits unless there's a clear improvement in ergonomics, since the fact that the trait has to be in scope for the methods to be used can make the methods less discoverable

Missed in #24471

AlexWaygood added the ty Multi-file analysis & type inference label Apr 7, 2026

AlexWaygood force-pushed the alex/break-out-index branch from 8c4cbee to e6800d8 Compare April 7, 2026 14:15

AlexWaygood added the internal An internal refactor or improvement label Apr 7, 2026

AlexWaygood force-pushed the alex/break-out-index branch from e6800d8 to e36173a Compare April 7, 2026 14:44

AlexWaygood commented Apr 7, 2026

View reviewed changes

AlexWaygood force-pushed the alex/break-out-index branch 2 times, most recently from 64d5a0c to 4728bdf Compare April 7, 2026 15:04

AlexWaygood marked this pull request as ready for review April 7, 2026 15:08

AlexWaygood requested review from BurntSushi, Gankra, MichaReiser, carljm, dcreager, ibraheemdev and sharkdp as code owners April 7, 2026 15:08

astral-sh-bot Bot assigned MichaReiser Apr 7, 2026

AlexWaygood force-pushed the alex/break-out-index branch from 4728bdf to 4f9e4da Compare April 7, 2026 18:45

AlexWaygood force-pushed the alex/break-out-index branch from 4f9e4da to 99455e1 Compare April 8, 2026 10:14

AlexWaygood force-pushed the alex/break-out-index branch 2 times, most recently from 163229d to 509611c Compare April 10, 2026 22:42

MichaReiser reviewed Apr 11, 2026

View reviewed changes

AlexWaygood mentioned this pull request Apr 11, 2026

[ty] Split reachability analysis into two files #24562

Merged

AlexWaygood force-pushed the alex/break-out-index branch from 509611c to 6ce51d2 Compare April 11, 2026 17:08

AlexWaygood changed the base branch from main to alex/split-reachability April 11, 2026 17:08

AlexWaygood force-pushed the alex/break-out-index branch from 6ce51d2 to bb76eae Compare April 11, 2026 17:22

AlexWaygood requested a review from MichaReiser April 11, 2026 17:29

Base automatically changed from alex/split-reachability to main April 12, 2026 11:51

AlexWaygood force-pushed the alex/break-out-index branch 5 times, most recently from 6a81614 to ba401ec Compare April 12, 2026 16:46

MichaReiser approved these changes Apr 13, 2026

View reviewed changes

AlexWaygood force-pushed the alex/break-out-index branch 5 times, most recently from 92c9a0b to 212cc62 Compare April 13, 2026 10:55

[ty] Break out the semantic index into a separate crate

150cf4b

AlexWaygood force-pushed the alex/break-out-index branch from 212cc62 to 150cf4b Compare April 13, 2026 11:02

AlexWaygood merged commit 4619940 into main Apr 13, 2026
56 checks passed

AlexWaygood deleted the alex/break-out-index branch April 13, 2026 11:39

AlexWaygood added a commit that referenced this pull request Apr 13, 2026

Add ty_python_core to the pr-assignee-pools.toml config

1bc55d7

Missed in #24471

AlexWaygood mentioned this pull request Apr 13, 2026

Add ty_python_core to the pr-assignee-pools.toml config #24605

Merged

Conversation

AlexWaygood commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

astral-sh-bot Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ecosystem-analyzer results

Uh oh!

astral-sh-bot Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

astral-sh-bot Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

No changes detected ✅

Uh oh!

astral-sh-bot Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Memory usage report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser commented Apr 7, 2026

Uh oh!

AlexWaygood commented Apr 7, 2026

Uh oh!

carljm commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexWaygood commented Apr 7, 2026

Uh oh!

MichaReiser commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexWaygood commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexWaygood commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexWaygood commented Apr 9, 2026

Uh oh!

ibraheemdev commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexWaygood commented Apr 11, 2026

Uh oh!

MichaReiser commented Apr 11, 2026

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AlexWaygood commented Apr 7, 2026 •

edited

Loading

astral-sh-bot Bot commented Apr 7, 2026 •

edited

Loading

`ecosystem-analyzer` results

astral-sh-bot Bot commented Apr 7, 2026 •

edited

Loading

`ruff-ecosystem` results

astral-sh-bot Bot commented Apr 7, 2026 •

edited

Loading

astral-sh-bot Bot commented Apr 7, 2026 •

edited

Loading

carljm commented Apr 7, 2026 •

edited

Loading

MichaReiser commented Apr 8, 2026 •

edited

Loading

AlexWaygood commented Apr 8, 2026 •

edited

Loading

AlexWaygood commented Apr 8, 2026 •

edited

Loading

ibraheemdev commented Apr 9, 2026 •

edited

Loading