[ty] Break the semantic index out into its own crate#24471
Conversation
|
| Project | Old Time | New Time | Change |
|---|---|---|---|
egglog-python |
0.28s | 0.44s | +56% |
pyjwt |
0.05s | 0.08s | +54% |
com2ann |
0.06s | 0.08s | +51% |
|
Typing conformance resultsNo changes detected ✅Current numbersThe percentage of diagnostics emitted that were expected errors held steady at 87.92%. The percentage of expected errors that received a diagnostic held steady at 83.11%. The number of fully passing files held steady at 78/133. |
Memory usage reportMemory usage unchanged ✅ |
8c4cbee to
e6800d8
Compare
e6800d8 to
e36173a
Compare
There was a problem hiding this comment.
This file had to be split into two: one reachability_constraints.rs file in the ty_semantic_index crate that is totally naive about types, and one reachability_constraints.rs file in ty_python_semantic that uses the data structures exposed by the ty_semantic_index module and uses them to actually evaluate reachability
| pub(crate) fn of_node( | ||
| db: &'db dyn Db, | ||
| node: &NodeWithScopeKind, | ||
| index: &SemanticIndex<'db>, | ||
| ) -> Option<Self> { |
There was a problem hiding this comment.
this is NodeWithScopeKind::generic_context() on main, but we can no longer define methods directly on NodeWithScopeKind that return a GenericContext with this refactor, so it has to become a method on GenericContext instead.
There was a problem hiding this comment.
Why of_node and not from_node? We could by using an extension trait, but I don't think it's worth it here. This one method seems perfectly fine.
There was a problem hiding this comment.
Why
of_nodeand notfrom_node?
Just because, semantically, we're not converting a node into a generic context, we're trying to construct the generic context of the node
64d5a0c to
4728bdf
Compare
|
My preference here would be to move the type stuff into |
|
The refactor I'm proposing here would split semantic analysis of a Python module between two crates: a first pass (the semantic indexing phase) in |
|
I agree with @AlexWaygood that I would find it strange to have a crate named Also agree that probably the best naming here requires not having any crate named |
I'll wait until we have consensus on this point before doing that rename, since it'll only increase the risk of painful merge conflicts on this PR. (And, in fact, I'm somewhat inclined to do that rename in a followup.) |
4728bdf to
4f9e4da
Compare
|
My concern with @AlexWaygood did you measure whether this helps with incremental compile times? I want to avoid spending time on this if it doesn't. |
4f9e4da to
99455e1
Compare
|
I tested the timings of incremental builds on diff --git a/crates/ty_python_semantic/src/types.rs b/crates/ty_python_semantic/src/types.rs
index 1a92b994cc..18bccfb807 100644
--- a/crates/ty_python_semantic/src/types.rs
+++ b/crates/ty_python_semantic/src/types.rs
@@ -157,7 +157,7 @@ pub fn check_types(db: &dyn Db, file: File) -> Vec<Diagnostic> {
for scope_id in index.scope_ids() {
// Scopes that may require type context are inferred during the inference of
// their outer scope.
- if scope_id.accepts_type_context(db) {
+ if !scope_id.accepts_type_context(db) {
continue;
}I did one measurement for both branches with |
Hmm, but surely the whole point of being able to write a type-aware linter on top of ty's semantic model is that the lints would be able to know about types? I don't see how a type-aware linter could do that without having |
I didn't. I can do so tomorrow, but I'm less worried about that anyway since I feel like changes to the semantic index are overall much less common than changes to other parts of the crate currently known as |
|
Naming aside, I think I'm in favor of this change. It is a very straightforward extraction of an existing module into a separate crate, and has a real impact on compile times right now. We can always restructure things in the future as things become more integrated. |
163229d to
509611c
Compare
MichaReiser
left a comment
There was a problem hiding this comment.
I obviously didn't review this line by line:
- You could consider using an extension trait to avoid regressing the reachability API. I think that's worth it because the new free-standing functions are rather awkward to use and discover
- I slightly prefer
ty_python_analysisoverty_python_corebecause it makes it clear that, e.g,. formatting logic should be put elsewhere ;) - Git fails to detect some moves, which is very unfortunate. Can we extract some refactors into separate PRs so that git recognizes the file move?
I still dislike Does the |
|
|
509611c to
6ce51d2
Compare
6ce51d2 to
bb76eae
Compare
6a81614 to
ba401ec
Compare
MichaReiser
left a comment
There was a problem hiding this comment.
Thank you. This looks good.
It's s a shame that we'll lose most of Rust's unused code detection because everything is now pub.
I'd prefer if we moved PythonPlatform to ty_python_core, considering that it isn't used within ty_site_packages and it simplifies ty_site_packages's dependencies.
I'd also prefer if we remove the testing feature from ty_python_core (at least if my understanding of what we use it for is correct, see my inline comment)
I'm not looking forward to my merge conflicts :(
| pub(crate) fn of_node( | ||
| db: &'db dyn Db, | ||
| node: &NodeWithScopeKind, | ||
| index: &SemanticIndex<'db>, | ||
| ) -> Option<Self> { |
There was a problem hiding this comment.
Why of_node and not from_node? We could by using an extension trait, but I don't think it's worth it here. This one method seems perfectly fine.
| pub(crate) fn is_reachable<'db>( | ||
| db: &'db dyn Db, | ||
| use_def: &UseDefMap<'db>, | ||
| reachability: ScopedReachabilityConstraintId, | ||
| ) -> bool { | ||
| evaluate_reachability(db, use_def, reachability).may_be_true() | ||
| } | ||
|
|
||
| pub(crate) fn binding_reachability<'db, 'map>( | ||
| db: &'db dyn Db, | ||
| use_def: &'map UseDefMap<'db>, | ||
| binding: &BindingWithConstraints<'map, 'db>, | ||
| ) -> Truthiness { | ||
| use_def.reachability_constraints().evaluate( | ||
| db, | ||
| use_def.predicates(), | ||
| binding.reachability_constraint, | ||
| ) | ||
| } | ||
|
|
||
| pub(crate) fn evaluate_reachability( | ||
| db: &dyn Db, | ||
| use_def: &UseDefMap, | ||
| reachability: ScopedReachabilityConstraintId, | ||
| ) -> Truthiness { | ||
| use_def | ||
| .reachability_constraints() | ||
| .evaluate(db, use_def.predicates(), reachability) | ||
| } |
There was a problem hiding this comment.
I assume those aren't used enough to justify an extension trait?
There was a problem hiding this comment.
It was mainly that the ergonomics of these APIs actually seem clearer as standalone functions than as methods. And that I prefer to avoid extension traits unless there's a clear improvement in ergonomics, since the fact that the trait has to be in scope for the methods to be used can make the methods less discoverable
92c9a0b to
212cc62
Compare
212cc62 to
150cf4b
Compare
Summary
The hope is that this might make our incremental compile times a bit faster for
ty_python_semantic. It seems like a logical split to me anyway, even if it doesn't: almost all of our machinery in thesemantic_indexsubmodule currently has no knowledge of types or type inference, and will never need to have any knowledge of types and type inference.The only bit that gets a little bit painful here is with reachability constraints. Several constraint-resolution methods on the
UseDefMaphave to be converted into standalone functions that live in thety_python_semanticcrate and receive a&UseDefMapas an argument, which isn't quite as ergonomic.One outstanding question here is whether we should rename
ty_python_semantic?ty_typesmight be a good fit if the semantic-index crate is calledty_semantic_index.ty_python_semanticandty_semantic_indexjust feel a little similar.Happily, codspeed reports no performance regressions on this PR :-)
One obvious other downside to this PR is that a bunch of things that were previously
pub(crate)now need to bepub, which will make it harder to detect unused methods and fields.Test Plan
existing tests