Skip to content

[ty] Break the semantic index out into its own crate#24471

Merged
AlexWaygood merged 1 commit intomainfrom
alex/break-out-index
Apr 13, 2026
Merged

[ty] Break the semantic index out into its own crate#24471
AlexWaygood merged 1 commit intomainfrom
alex/break-out-index

Conversation

@AlexWaygood
Copy link
Copy Markdown
Member

@AlexWaygood AlexWaygood commented Apr 7, 2026

Summary

The hope is that this might make our incremental compile times a bit faster for ty_python_semantic. It seems like a logical split to me anyway, even if it doesn't: almost all of our machinery in the semantic_index submodule currently has no knowledge of types or type inference, and will never need to have any knowledge of types and type inference.

The only bit that gets a little bit painful here is with reachability constraints. Several constraint-resolution methods on the UseDefMap have to be converted into standalone functions that live in the ty_python_semantic crate and receive a &UseDefMap as an argument, which isn't quite as ergonomic.

One outstanding question here is whether we should rename ty_python_semantic? ty_types might be a good fit if the semantic-index crate is called ty_semantic_index. ty_python_semantic and ty_semantic_index just feel a little similar.

Happily, codspeed reports no performance regressions on this PR :-)

One obvious other downside to this PR is that a bunch of things that were previously pub(crate) now need to be pub, which will make it harder to detect unused methods and fields.

Test Plan

existing tests

@AlexWaygood AlexWaygood added the ty Multi-file analysis & type inference label Apr 7, 2026
@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented Apr 7, 2026

ecosystem-analyzer results

No diagnostic changes detected ✅

Large timing changes:

Project Old Time New Time Change
egglog-python 0.28s 0.44s +56%
pyjwt 0.05s 0.08s +54%
com2ann 0.06s 0.08s +51%

Full report with detailed diff (timing results)

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented Apr 7, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented Apr 7, 2026

Typing conformance results

No changes detected ✅

Current numbers
The percentage of diagnostics emitted that were expected errors held steady at 87.92%. The percentage of expected errors that received a diagnostic held steady at 83.11%. The number of fully passing files held steady at 78/133.

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented Apr 7, 2026

Memory usage report

Memory usage unchanged ✅

@AlexWaygood AlexWaygood force-pushed the alex/break-out-index branch from 8c4cbee to e6800d8 Compare April 7, 2026 14:15
@AlexWaygood AlexWaygood added the internal An internal refactor or improvement label Apr 7, 2026
@AlexWaygood AlexWaygood force-pushed the alex/break-out-index branch from e6800d8 to e36173a Compare April 7, 2026 14:44
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file had to be split into two: one reachability_constraints.rs file in the ty_semantic_index crate that is totally naive about types, and one reachability_constraints.rs file in ty_python_semantic that uses the data structures exposed by the ty_semantic_index module and uses them to actually evaluate reachability

Comment on lines +325 to +329
pub(crate) fn of_node(
db: &'db dyn Db,
node: &NodeWithScopeKind,
index: &SemanticIndex<'db>,
) -> Option<Self> {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is NodeWithScopeKind::generic_context() on main, but we can no longer define methods directly on NodeWithScopeKind that return a GenericContext with this refactor, so it has to become a method on GenericContext instead.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why of_node and not from_node? We could by using an extension trait, but I don't think it's worth it here. This one method seems perfectly fine.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why of_node and not from_node?

Just because, semantically, we're not converting a node into a generic context, we're trying to construct the generic context of the node

@AlexWaygood AlexWaygood force-pushed the alex/break-out-index branch 2 times, most recently from 64d5a0c to 4728bdf Compare April 7, 2026 15:04
@AlexWaygood AlexWaygood marked this pull request as ready for review April 7, 2026 15:08
@MichaReiser
Copy link
Copy Markdown
Member

My preference here would be to move the type stuff into ty_python_types and use ty_python_semantic.

@AlexWaygood
Copy link
Copy Markdown
Member Author

The refactor I'm proposing here would split semantic analysis of a Python module between two crates: a first pass (the semantic indexing phase) in ty_semantic_index, and the second phase (type inference) in a higher-level crate. So I'm happy to rename the higher-level crate to ty_python_types, but I don't think it makes sense to name the lower-level crate ty_python_semantic, since it's only very low-level semantic analysis that's being done in that crate.

@carljm
Copy link
Copy Markdown
Contributor

carljm commented Apr 7, 2026

I agree with @AlexWaygood that I would find it strange to have a crate named ty_python_semantic which does only the semantic indexing; that's only the most minimal understanding of "Python semantics".

Also agree that probably the best naming here requires not having any crate named ty_python_semantic (even though that means more rename churn), since the whole point of this refactor is that we will no longer have a single crate responsible for implementing Python semantics.

@AlexWaygood
Copy link
Copy Markdown
Member Author

Also agree that probably the best naming here requires not having any crate named ty_python_semantic (even though that means more rename churn), since the whole point of this refactor is that we will no longer have a single crate responsible for implementing Python semantics.

I'll wait until we have consensus on this point before doing that rename, since it'll only increase the risk of painful merge conflicts on this PR. (And, in fact, I'm somewhat inclined to do that rename in a followup.)

@AlexWaygood AlexWaygood force-pushed the alex/break-out-index branch from 4728bdf to 4f9e4da Compare April 7, 2026 18:45
@MichaReiser
Copy link
Copy Markdown
Member

MichaReiser commented Apr 8, 2026

My concern with ty_python_semantic_index is that it's too narrowly scoped, unless we factor out even more crate. It won't just contain the semantic index; it is also a better fit for suppressions and fixes. It contains the semantic analysis settings struct ProgramSettings, and it should probably contain the logic for how to define lint rules (so that a ty_python_linter crate doesn't have to depend on ty_python_types.

@AlexWaygood did you measure whether this helps with incremental compile times? I want to avoid spending time on this if it doesn't.

@AlexWaygood AlexWaygood force-pushed the alex/break-out-index branch from 4f9e4da to 99455e1 Compare April 8, 2026 10:14
@AlexWaygood
Copy link
Copy Markdown
Member Author

AlexWaygood commented Apr 8, 2026

I tested the timings of incremental builds on main and on this PR branch by running cargo build -p ty on both branches, then applying this diff, then running cargo build -p ty again (with no other processes running):

diff --git a/crates/ty_python_semantic/src/types.rs b/crates/ty_python_semantic/src/types.rs
index 1a92b994cc..18bccfb807 100644
--- a/crates/ty_python_semantic/src/types.rs
+++ b/crates/ty_python_semantic/src/types.rs
@@ -157,7 +157,7 @@ pub fn check_types(db: &dyn Db, file: File) -> Vec<Diagnostic> {
     for scope_id in index.scope_ids() {
         // Scopes that may require type context are inferred during the inference of
         // their outer scope.
-        if scope_id.accepts_type_context(db) {
+        if !scope_id.accepts_type_context(db) {
             continue;
         }

I did one measurement for both branches with CARGO_PROFILE_DEV_OPT_LEVEL=0, which is cargo's default, and measured a small decrease in incremental compile times on this branch: 4.67s to 4.40s. I did several measurements with CARGO_PROFILE_DEV_OPT_LEVEL=1, which is now what we tell agents to set when compiling ty, and what I've started setting locally, because it makes running mdtests so much faster once ty has been compiled. Here the impact was more significant, though still not huge: I measured incremental compile times of 13.80s, 13.90s, and 14.00s on main, compared to timings of 12.40s, 12.51s and 12.61s on this branch.

@AlexWaygood
Copy link
Copy Markdown
Member Author

AlexWaygood commented Apr 8, 2026

it should probably contain the logic for how to define lint rules (so that a ty_python_linter crate doesn't have to depend on ty_python_types.

Hmm, but surely the whole point of being able to write a type-aware linter on top of ty's semantic model is that the lints would be able to know about types? I don't see how a type-aware linter could do that without having ty_python_types in its dependency tree.

@AlexWaygood
Copy link
Copy Markdown
Member Author

Did you test the impact of an incremental change to semantic indexing? Probably not significant, but it's possible that there is an regression in that case.

I didn't. I can do so tomorrow, but I'm less worried about that anyway since I feel like changes to the semantic index are overall much less common than changes to other parts of the crate currently known as ty_python_semantic

@ibraheemdev
Copy link
Copy Markdown
Member

ibraheemdev commented Apr 9, 2026

Naming aside, I think I'm in favor of this change. It is a very straightforward extraction of an existing module into a separate crate, and has a real impact on compile times right now. We can always restructure things in the future as things become more integrated.

@AlexWaygood AlexWaygood force-pushed the alex/break-out-index branch 2 times, most recently from 163229d to 509611c Compare April 10, 2026 22:42
Copy link
Copy Markdown
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I obviously didn't review this line by line:

  • You could consider using an extension trait to avoid regressing the reachability API. I think that's worth it because the new free-standing functions are rather awkward to use and discover
  • I slightly prefer ty_python_analysis over ty_python_core because it makes it clear that, e.g,. formatting logic should be put elsewhere ;)
  • Git fails to detect some moves, which is very unfortunate. Can we extract some refactors into separate PRs so that git recognizes the file move?

Comment thread crates/ty/tests/file_watching.rs Outdated
Comment thread crates/ty_python_semantic/src/semantic_index/reachability_constraints.rs Outdated
Comment thread crates/ty_python_semantic/src/types/class/static_literal.rs Outdated
Comment thread crates/ty_python_semantic/src/types/infer/builder/function.rs
@AlexWaygood
Copy link
Copy Markdown
Member Author

I slightly prefer ty_python_analysis over ty_python_core because it makes it clear that, e.g,. formatting logic should be put elsewhere ;)

I still dislike ty_python_analysis, because it implies that all analysis will be done from this crate, when really only very basic, low-level analysis is done from this crate -- nearly all "meaningful" analysis will be done from higher-level crates.

Does the _python_ in the name do much for us here? ty_core_analysis or ty_foundational_analysis could work well?

@MichaReiser
Copy link
Copy Markdown
Member

ty_python_core is fine for now. We can always rename it.

@AlexWaygood AlexWaygood force-pushed the alex/break-out-index branch from 509611c to 6ce51d2 Compare April 11, 2026 17:08
@AlexWaygood AlexWaygood changed the base branch from main to alex/split-reachability April 11, 2026 17:08
@AlexWaygood AlexWaygood force-pushed the alex/break-out-index branch from 6ce51d2 to bb76eae Compare April 11, 2026 17:22
@AlexWaygood AlexWaygood requested a review from MichaReiser April 11, 2026 17:29
Base automatically changed from alex/split-reachability to main April 12, 2026 11:51
@AlexWaygood AlexWaygood force-pushed the alex/break-out-index branch 5 times, most recently from 6a81614 to ba401ec Compare April 12, 2026 16:46
Copy link
Copy Markdown
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This looks good.

It's s a shame that we'll lose most of Rust's unused code detection because everything is now pub.

I'd prefer if we moved PythonPlatform to ty_python_core, considering that it isn't used within ty_site_packages and it simplifies ty_site_packages's dependencies.

I'd also prefer if we remove the testing feature from ty_python_core (at least if my understanding of what we use it for is correct, see my inline comment)

I'm not looking forward to my merge conflicts :(

Comment thread crates/ty/tests/file_watching.rs Outdated
Comment on lines +325 to +329
pub(crate) fn of_node(
db: &'db dyn Db,
node: &NodeWithScopeKind,
index: &SemanticIndex<'db>,
) -> Option<Self> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why of_node and not from_node? We could by using an extension trait, but I don't think it's worth it here. This one method seems perfectly fine.

Comment on lines +806 to +834
pub(crate) fn is_reachable<'db>(
db: &'db dyn Db,
use_def: &UseDefMap<'db>,
reachability: ScopedReachabilityConstraintId,
) -> bool {
evaluate_reachability(db, use_def, reachability).may_be_true()
}

pub(crate) fn binding_reachability<'db, 'map>(
db: &'db dyn Db,
use_def: &'map UseDefMap<'db>,
binding: &BindingWithConstraints<'map, 'db>,
) -> Truthiness {
use_def.reachability_constraints().evaluate(
db,
use_def.predicates(),
binding.reachability_constraint,
)
}

pub(crate) fn evaluate_reachability(
db: &dyn Db,
use_def: &UseDefMap,
reachability: ScopedReachabilityConstraintId,
) -> Truthiness {
use_def
.reachability_constraints()
.evaluate(db, use_def.predicates(), reachability)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume those aren't used enough to justify an extension trait?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was mainly that the ergonomics of these APIs actually seem clearer as standalone functions than as methods. And that I prefer to avoid extension traits unless there's a clear improvement in ergonomics, since the fact that the trait has to be in scope for the methods to be used can make the methods less discoverable

Comment thread crates/ty_python_core/src/platform.rs
Comment thread crates/ty_python_core/src/scope.rs Outdated
@AlexWaygood AlexWaygood force-pushed the alex/break-out-index branch 5 times, most recently from 92c9a0b to 212cc62 Compare April 13, 2026 10:55
@AlexWaygood AlexWaygood force-pushed the alex/break-out-index branch from 212cc62 to 150cf4b Compare April 13, 2026 11:02
@AlexWaygood AlexWaygood merged commit 4619940 into main Apr 13, 2026
56 checks passed
@AlexWaygood AlexWaygood deleted the alex/break-out-index branch April 13, 2026 11:39
AlexWaygood added a commit that referenced this pull request Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants