You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
perf: Reduce Box and Arc allocation churn during tree rewriting (#21749)
## Which issue does this PR close?
- Closes#21751.
## Rationale for this change
Profiling the planner suggests that a surprising amount of time was
being spent doing tree rewriting in the logical optimizer. One culprit
is `TreeNodeContainer::map_elements()` for `Box<C>` and `Arc<C>`, which
do the following:
* Fetch the inner `C` value from the `Box`/`Arc`
* Pass the innter value to the closure
* Wrap the return value of the closure in a newly allocated `Box` /
`Arc`, respectively
This allocates a fresh `Box` or `Arc` for every node visited while
walking an expression or logical plan, even if the tree rewrite we're
doing didn't modify the expression/plan node.
Instead, we can reuse the current `Box<C>` or `Arc<C>`: use
`std::mem::take()` to swap the inner value with `C::default()`, pass the
inner value to the closure, and put the result back in the original
container. Swapping the inner value with `C::default()` means the
container always has a valid value, which is important if the closure
panics.
For `Arc<C>`, we need to use `Arc::make_mut()`, which only clones if the
`Arc` is not unique.
This reduces the bytes allocated to plan TPC-H Q13 by ~22% (988 kB ->
765 kB), and reduces allocated blocks by 8.5% (210k -> 192k).
## What changes are included in this PR?
* Optimize `Box<C>::map_elements()` and `Arc<C>::map_elements()` as
described above
* Change `map_children()` for `Expr::Alias` to use `map_elements()`,
rather than invoking `f(*expr)` directly; this ensures that it can take
advantage of this optimization
* Make `LogicalPlan::default()` use a shared `DFSchema`, rather than
allocating a fresh `DFSchema` for every call. Because `default()` is not
in the hot path for tree rewriting, it is important that it is cheap
* Add unit tests for new `map_elements()` behavior
* Add note to migration guide for breaking API change
## Are these changes tested?
Yes, plus new unit tests added.
## Are there any user-facing changes?
Yes: `TreeNodeContainer` impls for `Box<C>` and `Arc<C>` now require `C:
Default`. This is a breaking API change for third-party code that
implements `TreeNodeContainer` for a custom type. The fix is usually
straightforward.
0 commit comments