feat(cambricon): add SwigluOp in Cambricon by bitzyz · Pull Request #37 · InfiniTensor/InfiniOps

bitzyz · 2026-03-30T06:41:54Z

Summary

Add Cambricon MLU backend implementation for the Swiglu operator (src/native/cambricon/ops/swiglu/swiglu.h, src/native/cambricon/ops/swiglu/kernel.mlu)
Support data types: float16, bfloat16, float32
Includes a fast path for contiguous tensors with matching shapes (no broadcast) and a general path handling non-contiguous tensors and broadcasting
For float32, computes SwiGLU element-wise via input * gate * sigmoid(gate) using scalar loop; for float16/bfloat16, uses BANG intrinsics (__bang_active_sigmoid, __bang_mul)

Motivation

Extends operator coverage to the Cambricon platform by implementing the element-wise Swiglu operator using BANG MLU kernels.

Type of Change

feat — new feature / new operator / new platform

Platforms Affected

Test Results on Supported Platforms

Platform	Built	`pytest` Result	Notes / Hardware
NVIDIA	N/A	N/A	Not affected
Iluvatar	N/A	N/A	Not affected
MetaX	N/A	N/A	Not affected
Cambricon	✅	✅ Passed	Tested locally
Moore	N/A	N/A	Not affected
Ascend	N/A	N/A	Not affected

Benchmark / Performance Impact

N/A

Notes for Reviewers

The kernel uses Union1 task type (single cluster) for simplicity. Multi-cluster support can be added as a follow-up if needed.
The workspace stores shape/stride metadata for device-side access during kernel execution.
No test file changes were needed — the existing test_swiglu.py parameterization covers the MLU device through the default device/dtype fixtures.

Checklist

Title, Branch, and Commits

PR title follows Conventional Commits (e.g. feat(nvidia): …, fix(cuda/gemm): …).
Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
Each commit message follows Conventional Commits.
Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests).
No stray merge commits from master — the branch is rebased cleanly on top of the current master.
No fixup! / squash! / wip commits remain.

Scope and Design

Changes are minimal — nothing unrelated to the stated motivation was added (CONTRIBUTING.md §Code/General).
No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
No unrelated formatting churn that would obscure the diff.
Public API changes (if any) are intentional, documented, and reflected in affected callers/tests.

General Code Hygiene (applies to all languages)

The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General).
Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
No trailing whitespace, tab/space mixing, or stray BOMs.
Identifiers in comments and error messages are wrapped in backticks (e.g. the `seqlens_k` tensor) (CONTRIBUTING.md §Code/General).
All comments and error messages are in English (CONTRIBUTING.md §Code/General).
Comments and error messages are complete sentences — capitalized first letter, terminal punctuation — unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific (if C++ files changed)

Python Specific (if Python files changed)

N/A — no Python files were changed in this PR.

Testing

pytest was run locally on every supported platform that this PR can affect, and the results are recorded in the "Test Results" table above (CONTRIBUTING.md §Pull Requests).
For any platform that could not be tested, an explicit reason is given in the table and a reviewer with access has been tagged.
New functionality has matching tests under tests/ following tests/test_add.py / tests/test_gemm.py patterns (CONTRIBUTING.md §Adding an Operator).
Tests use pytest.mark.parametrize correctly: dependent parameters share one decorator (e.g. @pytest.mark.parametrize("dtype, rtol, atol", …)), independent parameters use separate decorators ordered by parameter declaration.
Where appropriate, pytest.mark.auto_act_and_assert is used and the test returns a Payload whose func and ref share the same calling convention.
Default dtype / device parameterization is relied on, or overridden with an explicit pytest.mark.parametrize when necessary.
Any new test that is flaky under parallelism is marked so, or documented to require pytest -n 1.
For bug fixes: a regression test has been added that fails on master and passes with this PR.

Build, CI, and Tooling

The project builds cleanly from a fresh directory with pip install .[dev] on at least one affected platform.
compile_commands.json still regenerates (CMake option CMAKE_EXPORT_COMPILE_COMMANDS=ON in pyproject.toml — required by the code-lint skill and clang-tidy -p).
New backends / devices have been added to auto-detection in CMakeLists.txt under if(AUTO_DETECT_DEVICES) and to if(AUTO_DETECT_BACKENDS) if applicable.
Only one CUDA-like GPU backend is selectable at a time — the existing mutual-exclusion check in CMakeLists.txt is not broken.
Both CI workflows (clang-format.yml, ruff.yml) are green locally (or expected to be green on CI).
No new runtime dependency was added without updating pyproject.toml's [project.optional-dependencies] (or justified in the PR description).

Documentation

README.md, CONTRIBUTING.md, or inline docs updated when behavior, build flags, or developer workflow changed.
New operators, new dispatch helpers, or new public utilities are documented (docstring, header comment, or an addition to CONTRIBUTING.md §Some Code Explanations).
Any user-visible breaking change is called out explicitly under "Motivation" and in the commit/PR title with a ! or BREAKING CHANGE: footer.

Security and Safety

No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
Third-party code is license-compatible and attributed.
No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

bitzyz self-assigned this Mar 30, 2026

bitzyz requested review from Ziminli and voltjia March 30, 2026 06:42

bitzyz force-pushed the feat/dev-swiglu-cambricon branch from bed0481 to 15cd664 Compare March 30, 2026 06:47

bitzyz force-pushed the feat/dev-swiglu-cambricon branch from 15cd664 to 607361a Compare April 8, 2026 03:18

bitzyz changed the base branch from feat/dev-infra to master April 8, 2026 03:19

bitzyz marked this pull request as draft April 29, 2026 08:45

bitzyz force-pushed the feat/dev-swiglu-cambricon branch from 607361a to c16b0a6 Compare May 28, 2026 03:14

feat(cambricon): add Swiglu op in Cambricon

d8ea2a7

bitzyz force-pushed the feat/dev-swiglu-cambricon branch from c16b0a6 to d8ea2a7 Compare May 28, 2026 03:19

bitzyz changed the title ~~feat: add cambricon swiglu op~~ feat(cambricon): add SwigluOp in Cambricon May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cambricon): add SwigluOp in Cambricon#37

feat(cambricon): add SwigluOp in Cambricon#37
bitzyz wants to merge 1 commit into
masterfrom
feat/dev-swiglu-cambricon

bitzyz commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bitzyz commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Type of Change

Platforms Affected

Test Results on Supported Platforms

Benchmark / Performance Impact

Notes for Reviewers

Checklist

Title, Branch, and Commits

Scope and Design

General Code Hygiene (applies to all languages)

C++ Specific (if C++ files changed)

Python Specific (if Python files changed)

Testing

Build, CI, and Tooling

Documentation

Security and Safety

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bitzyz commented Mar 30, 2026 •

edited

Loading