Enable NVFP4 fused grouped MLP SwiGLU by sraman-rgb · Pull Request #5 · pggPL/TransformerEngine

sraman-rgb · 2026-05-23T13:31:38Z

This PR adds the incremental NVFP4 grouped MLP SwiGLU fusion changes on top of PR 2971, without modifying PR 2971 itself and without the temporary Python bulk_allocate fallback.

Changes:

Enable NVFP4 recipes in grouped MLP fusion selection.
Add NVFP4 forward/backward fused grouped MLP SwiGLU handling.
Use grouped tensor GEMM for NVFP4 FC1 dgrad with grouped_fc1_weight directly.
Fix NVFP4 discrete-input grouped GEMM layout metadata.
Avoid the single-group split-size host sync in grouped linear.

Validation:

NVTE_GROUPED_LINEAR_SINGLE_PARAM=1 NVTE_CUTEDSL_FUSED_GROUPED_MLP=1 python3 -m pytest -q --tb=short -ra tests/pytorch/test_fusible_ops.py::TestSequentialModules::test_grouped_mlp -k 'scaled_swiglu and nvfp4' -> 48 passed, 336 skipped
NVTE_GROUPED_LINEAR_SINGLE_PARAM=1 NVTE_CUTEDSL_FUSED_GROUPED_MLP=1 python3 -m pytest -q --tb=short -ra tests/pytorch/test_fusible_ops.py::TestSequentialModules::test_grouped_mlp -> 1216 passed, 4544 skipped

Enable NVFP4 fused grouped MLP SwiGLU

d102471

sraman-rgb closed this May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable NVFP4 fused grouped MLP SwiGLU#5

Enable NVFP4 fused grouped MLP SwiGLU#5
sraman-rgb wants to merge 1 commit into
pggPL:grouped_gemm_nvfp4_and_hopperfrom
sraman-rgb:nvfp4-grouped-mlp-fc1-swiglu

sraman-rgb commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sraman-rgb commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant