-
Notifications
You must be signed in to change notification settings - Fork 150
Pull requests: pytorch/helion
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[pallas] direct-call hot-path squeezes: output cache + sig-lock + closure baking + speculative dispatch
CLA Signed
This label is managed by the Meta Open Source bot.
[pallas] bypass JaxCallable for static-shape kernels via direct call_custom_kernel
CLA Signed
This label is managed by the Meta Open Source bot.
[pallas] fast-path host-side launcher with pre-computed cache entries
CLA Signed
This label is managed by the Meta Open Source bot.
[pallas] matmul pipeline launcher VMEM strips + outer-grid strategy
CLA Signed
This label is managed by the Meta Open Source bot.
[Pallas] Fix synchronize_device skipping sync for tuple returns on TPU
CLA Signed
This label is managed by the Meta Open Source bot.
#2580
opened May 25, 2026 by
thcmbs
Collaborator
Loading…
[test] Split cute compiler-pass tests by pass, drop kernel name from filenames
CLA Signed
This label is managed by the Meta Open Source bot.
#2579
opened May 25, 2026 by
oulgen
Contributor
Loading…
[cute] Rewrite online softmax_two_pass → equivalent 3-pass form
CLA Signed
This label is managed by the Meta Open Source bot.
#2578
opened May 25, 2026 by
oulgen
Contributor
Loading…
[cute] Optional carried-only gate for the load pipeline pass
CLA Signed
This label is managed by the Meta Open Source bot.
#2577
opened May 25, 2026 by
oulgen
Contributor
Loading…
[cute] Software-pipeline inner vec loads to hide HBM latency
CLA Signed
This label is managed by the Meta Open Source bot.
#2576
opened May 25, 2026 by
oulgen
Contributor
Loading…
[cute] Cleaner LICM: alias DCE + FMA-friendly scale hoist
CLA Signed
This label is managed by the Meta Open Source bot.
#2575
opened May 25, 2026 by
oulgen
Contributor
Loading…
[cute] LICM for reciprocals: hoist 1/divisor out of inner loops
CLA Signed
This label is managed by the Meta Open Source bot.
#2574
opened May 25, 2026 by
oulgen
Contributor
Loading…
Fix fbcode CI torch.compile fusion with newer PyTorch
CLA Signed
This label is managed by the Meta Open Source bot.
#2567
opened May 23, 2026 by
choijon5
Contributor
Loading…
Speed up Helion kernel launches by avoiding repeated Python work
CLA Signed
This label is managed by the Meta Open Source bot.
[Pallas] Fix layernorm example tolerances and split bwd test
CLA Signed
This label is managed by the Meta Open Source bot.
[Pallas] Propagate inner tile alignment min_size to bounding outer tiles
CLA Signed
This label is managed by the Meta Open Source bot.
#2559
opened May 22, 2026 by
thcmbs
Collaborator
Loading…
[Pallas] Add support for non zero dim in gather
CLA Signed
This label is managed by the Meta Open Source bot.
#2558
opened May 22, 2026 by
thcmbs
Collaborator
Loading…
Reject tensor_descriptor indexing when block size exceeds tensor dim (#2555)
CLA Signed
This label is managed by the Meta Open Source bot.
fb-exported
meta-exported
#2555
opened May 22, 2026 by
mengluy0125
Contributor
Loading…
Skip even more Python on repeated identical calls
CLA Signed
This label is managed by the Meta Open Source bot.
Reuse kernel output buffers instead of allocating fresh on every call
CLA Signed
This label is managed by the Meta Open Source bot.
Use the fast launcher during autotuning
CLA Signed
This label is managed by the Meta Open Source bot.
Add a C extension so launches skip more Python frames
CLA Signed
This label is managed by the Meta Open Source bot.
Speed up Helion kernel launches by avoiding repeated Python work
CLA Signed
This label is managed by the Meta Open Source bot.
[WIP] Pallas grid index map fp8 attention
CLA Signed
This label is managed by the Meta Open Source bot.
[WIP] Fix Pallas grid index BlockSpecs
CLA Signed
This label is managed by the Meta Open Source bot.
[Pallas] Reclaim HBM between kernels in run_tpu.py sweep
CLA Signed
This label is managed by the Meta Open Source bot.
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.