[cute] Cleaner LICM: alias DCE + FMA-friendly scale hoist by oulgen · Pull Request #2575 · pytorch/helion

oulgen · 2026-05-25T04:23:03Z

Stacked PRs:

[cute] Cleaner LICM: alias DCE + FMA-friendly scale hoist

Extends hoist_loop_invariant_recip.py with 4 sub-passes orchestrated
in dependency order:

Alias DCE: inline SSA-style NAME = ANOTHER_NAME chains so
mi_copy_1 = mi; mi_copy_1_0 = mi_copy_1 collapse to the root.
This removes the per-iter "copy" instructions Helion's SSA
maintenance leaves behind.
Outer-in reciprocal hoist (was inner-first in P16): places the
inv = 1.0 / di at the OUTERMOST legal scope so we don't emit
cascade aliases like _helion_inv_div_0 = 1.0 * _helion_inv_div_1
at each nested loop level.
FMA-friendly scale hoist: detects (A - INV) * CONST where INV
is loop-invariant. Emits scaled_K = INV * CONST above the loop
and rewrites inside to A * CONST - scaled_K — same value, but
fewer per-iter instructions and FMA-friendly.
DCE for dead pure assigns: removes v_N = pure-expr whose target
is never read afterwards (e.g. the v_10 = v_9 - mi that the
FMA hoist supersedes). Iterates to fixed point.

A correctness piece (rename-aware invariance) plumbs the rename group
map from DeviceFunction so v_1_0 (which post-pass renames to
mi) is recognized as the SAME variable. Without this the FMA
hoist would lift mi * 1.4427 ABOVE the reduce loop and capture
the stale initial -inf, producing wildly wrong softmax outputs.

Bench (B200, fp16, HELION_AUTOTUNE_EFFORT=quick):
Shape Pre-P17 Post-P17 vs ATen
(4096, 256) 84 GB/s 90 GB/s 0.18x (launcher bound)
(4096, 6400) 1384 GB/s 1422 GB/s 1.47x (beats ATen by 47%)
(4096, 12672) 1719 GB/s 1747 GB/s 0.79x
average 1062 GB/s 1087 GB/s 0.81x

Cumulative: 0.45x -> 0.81x ATen (+80% from baseline).

Tests added (TestCuteHoistLoopInvariantP17, 6 tests):

useless_cascade_alias_removed
ssa_alias_chain_inlined
fma_scale_hoist_above_consume
fma_scale_hoist_in_reduce_v_loop
dce_removes_dead_sub_after_fma_hoist
invariance_canonicalization_does_not_break_consume

jansel

Fix test failures before landing

Extends hoist_loop_invariant_recip.py with 4 sub-passes orchestrated in dependency order: 1. Alias DCE: inline SSA-style ``NAME = ANOTHER_NAME`` chains so ``mi_copy_1 = mi; mi_copy_1_0 = mi_copy_1`` collapse to the root. This removes the per-iter "copy" instructions Helion's SSA maintenance leaves behind. 2. Outer-in reciprocal hoist (was inner-first in P16): places the ``inv = 1.0 / di`` at the OUTERMOST legal scope so we don't emit cascade aliases like ``_helion_inv_div_0 = 1.0 * _helion_inv_div_1`` at each nested loop level. 3. FMA-friendly scale hoist: detects ``(A - INV) * CONST`` where INV is loop-invariant. Emits ``scaled_K = INV * CONST`` above the loop and rewrites inside to ``A * CONST - scaled_K`` — same value, but fewer per-iter instructions and FMA-friendly. 4. DCE for dead pure assigns: removes ``v_N = pure-expr`` whose target is never read afterwards (e.g. the ``v_10 = v_9 - mi`` that the FMA hoist supersedes). Iterates to fixed point. A correctness piece (rename-aware invariance) plumbs the rename group map from DeviceFunction so ``v_1_0`` (which post-pass renames to ``mi``) is recognized as the SAME variable. Without this the FMA hoist would lift ``mi * 1.4427`` ABOVE the reduce loop and capture the stale initial -inf, producing wildly wrong softmax outputs. Bench (B200, fp16, HELION_AUTOTUNE_EFFORT=quick): Shape Pre-P17 Post-P17 vs ATen (4096, 256) 84 GB/s 90 GB/s 0.18x (launcher bound) (4096, 6400) 1384 GB/s 1422 GB/s 1.47x (beats ATen by 47%) (4096, 12672) 1719 GB/s 1747 GB/s 0.79x average 1062 GB/s 1087 GB/s 0.81x Cumulative: 0.45x -> 0.81x ATen (+80% from baseline). Tests added (TestCuteHoistLoopInvariantP17, 6 tests): - useless_cascade_alias_removed - ssa_alias_chain_inlined - fma_scale_hoist_above_consume - fma_scale_hoist_in_reduce_v_loop - dce_removes_dead_sub_after_fma_hoist - invariance_canonicalization_does_not_break_consume stack-info: PR: #2575, branch: oulgen/stack/317

oulgen force-pushed the oulgen/stack/317 branch from 3c4029e to 3d64316 Compare May 25, 2026 04:23

oulgen force-pushed the oulgen/stack/316 branch from fbd0549 to 2221fb6 Compare May 25, 2026 04:23

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 25, 2026

oulgen requested a review from jansel May 25, 2026 04:24

jansel approved these changes May 25, 2026

View reviewed changes

oulgen marked this pull request as draft May 25, 2026 16:41

oulgen changed the base branch from oulgen/stack/316 to main May 25, 2026 16:41

oulgen force-pushed the oulgen/stack/317 branch 2 times, most recently from 5b8b743 to 301c843 Compare May 25, 2026 16:42

oulgen mentioned this pull request May 25, 2026

[cute] skip test_xsa on cute #2581

Merged

oulgen changed the base branch from main to oulgen/stack/316 May 25, 2026 16:42

oulgen marked this pull request as ready for review May 25, 2026 16:42

oulgen marked this pull request as draft May 25, 2026 17:11

oulgen changed the base branch from oulgen/stack/316 to main May 25, 2026 17:11

oulgen force-pushed the oulgen/stack/317 branch from 301c843 to ffc20a2 Compare May 25, 2026 17:11

oulgen changed the base branch from main to oulgen/stack/316 May 25, 2026 17:11

oulgen marked this pull request as ready for review May 25, 2026 17:12

oulgen force-pushed the oulgen/stack/316 branch from ae85a58 to 8a8c885 Compare May 25, 2026 17:44

oulgen force-pushed the oulgen/stack/317 branch from ffc20a2 to 637dade Compare May 25, 2026 17:45

oulgen changed the base branch from oulgen/stack/316 to main May 26, 2026 19:05

oulgen force-pushed the oulgen/stack/317 branch from 637dade to 25bd005 Compare May 26, 2026 19:06

oulgen changed the base branch from main to oulgen/stack/316 May 26, 2026 19:06

oulgen marked this pull request as ready for review May 26, 2026 19:06

oulgen marked this pull request as draft May 26, 2026 19:36

oulgen changed the base branch from oulgen/stack/316 to main May 26, 2026 19:36

oulgen force-pushed the oulgen/stack/317 branch from 25bd005 to 161a763 Compare May 26, 2026 19:36

oulgen changed the base branch from main to oulgen/stack/316 May 26, 2026 19:37

oulgen marked this pull request as ready for review May 26, 2026 19:37

oulgen marked this pull request as draft May 26, 2026 20:24

oulgen changed the base branch from oulgen/stack/316 to main May 26, 2026 20:24

oulgen force-pushed the oulgen/stack/317 branch from 161a763 to c954159 Compare May 26, 2026 20:24

oulgen changed the base branch from main to oulgen/stack/316 May 26, 2026 20:24

oulgen marked this pull request as ready for review May 26, 2026 20:25

oulgen marked this pull request as draft May 26, 2026 20:26

oulgen changed the base branch from oulgen/stack/316 to main May 26, 2026 20:26

oulgen force-pushed the oulgen/stack/317 branch from c954159 to 07e5bcf Compare May 26, 2026 20:26

oulgen changed the base branch from main to oulgen/stack/316 May 26, 2026 20:27

oulgen marked this pull request as ready for review May 26, 2026 20:27

oulgen marked this pull request as draft May 26, 2026 20:31

oulgen changed the base branch from oulgen/stack/316 to main May 26, 2026 20:31

oulgen force-pushed the oulgen/stack/317 branch from 07e5bcf to 0b4d2de Compare May 26, 2026 20:31

oulgen changed the base branch from main to oulgen/stack/316 May 26, 2026 20:31

oulgen marked this pull request as ready for review May 26, 2026 20:32

oulgen marked this pull request as draft May 26, 2026 20:58

oulgen changed the base branch from oulgen/stack/316 to main May 26, 2026 20:58

oulgen force-pushed the oulgen/stack/317 branch from 0b4d2de to 79f419b Compare May 26, 2026 20:58

oulgen changed the base branch from main to oulgen/stack/316 May 26, 2026 20:58

oulgen marked this pull request as ready for review May 26, 2026 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cute] Cleaner LICM: alias DCE + FMA-friendly scale hoist#2575

[cute] Cleaner LICM: alias DCE + FMA-friendly scale hoist#2575
oulgen wants to merge 1 commit into
oulgen/stack/316from
oulgen/stack/317

oulgen commented May 25, 2026 •

edited

Loading

Uh oh!

jansel left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oulgen commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!