Skip to content

fix: export operator call instantiations#623

Draft
voltjia wants to merge 2 commits into
masterfrom
fix/export-call-instantiations
Draft

fix: export operator call instantiations#623
voltjia wants to merge 2 commits into
masterfrom
fix/export-call-instantiations

Conversation

@voltjia
Copy link
Copy Markdown
Collaborator

@voltjia voltjia commented May 28, 2026

Summary

  • Reverts the public infini::ops::functional layer added by feat: add public C++ operator API #618.
  • Generates explicit Operator<Op>::Call template instantiations into libinfiniops.so and generated extern template declarations for C++ consumers.
  • Restores #include <infini/ops.h> as the public C++ entrypoint for existing operator classes and adds an external C++ smoke test for infini::ops::Add::Call.

Motivation

Closes #593

Downstream C++ consumers should not need backend kernel headers or vendor compilers just to call existing Operator<Op>::Call APIs. Explicit template instantiation keeps the existing operator class API and moves backend-dependent instantiation into libinfiniops.so, avoiding the extra functional wrapper layer.

Type of Change

  • feat — new feature / new operator / new platform
  • fix — bug fix
  • perf — performance improvement (no behavioral change)
  • refactor — code restructuring without behavior change
  • test — adding or fixing tests only
  • build / ci — build system or CI configuration
  • docs — documentation only
  • chore — tooling, formatting, or other non-code changes
  • N/A — Breaking change: this restores the existing Operator<Op>::Call surface rather than introducing an ABI break.

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Test Results on Supported Platforms

Platform Built pytest Result Notes / Hardware
CPU Yes tests/test_cpp_api.py: 1 passed ssh nvidia, infiniops-ci/nvidia:latest, CPU-only build
NVIDIA Yes tests/test_cpp_api.py: 1 passed ssh nvidia, card 6 via Docker --gpus 'device=6'
Iluvatar No Not run Not run in this PR creation pass
MetaX No Not run Not run in this PR creation pass
Cambricon No Not run Not run in this PR creation pass
Moore No Not run Not run in this PR creation pass
Ascend No Not run Not run in this PR creation pass
Pybind Yes Build only GENERATE_PYTHON_BINDINGS=ON, target ops built on NVIDIA
Validation commands
# NVIDIA + CPU build/install/smoke
cmake -S . -B .eti-build -DWITH_CPU=ON -DWITH_NVIDIA=ON -DGENERATE_PYTHON_BINDINGS=OFF -DCMAKE_BUILD_TYPE=Release
cmake --build .eti-build --target infiniops -j2
cmake --install .eti-build --prefix .eti-install
INFINIOPS_INSTALL_PREFIX=/workspace/project/.eti-install pytest -q tests/test_cpp_api.py
# Result: 1 passed

# CPU-only build/install/smoke
cmake -S . -B .cpu-build -DWITH_CPU=ON -DGENERATE_PYTHON_BINDINGS=OFF -DCMAKE_BUILD_TYPE=Release
cmake --build .cpu-build --target infiniops -j2
cmake --install .cpu-build --prefix .cpu-install
INFINIOPS_INSTALL_PREFIX=/workspace/project/.cpu-install pytest -q tests/test_cpp_api.py
# Result: 1 passed

# Pybind build
cmake -S . -B .pybind-build -DWITH_CPU=ON -DWITH_NVIDIA=ON -DGENERATE_PYTHON_BINDINGS=ON -DCMAKE_BUILD_TYPE=Release
cmake --build .pybind-build --target ops -j2
# Result: built target ops

# Style checks
python -m ruff format --check scripts/generate_wrappers.py tests/test_cpp_api.py
python -m ruff check scripts/generate_wrappers.py tests/test_cpp_api.py
clang-format --dry-run --Werror include/infini/ops.h src/operator.h

Benchmark / Performance Impact

N/A — this PR changes build/codegen/linkage for C++ operator calls, not operator kernels or performance paths.

Notes for Reviewers

  • Public C++ consumers should include <infini/ops.h> to get the generated extern template declarations before calling infini::ops::<Op>::Call.
  • The generated instantiation sources include backend marker and implementation headers, so downstream consumers no longer need backend kernel headers for the covered Call signatures.
  • Operator::Call now takes const Args&... to make the instantiation signature stable across lvalue/rvalue call sites.

Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits.
  • Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type.
  • Each commit message follows Conventional Commits.
  • Every commit is meaningful, well-formed, and independently reviewable.
  • No stray merge commits from master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal and scoped to reverting functional plus exporting Call instantiations.
  • No dead code, debug prints, or unowned TODOs were added.
  • No unrelated formatting churn was introduced.
  • Public API changes are intentional and covered by the external C++ smoke test.

General Code Hygiene

  • Comments were added only where the behavior is non-obvious.
  • Modified and added files end with trailing newlines.
  • No trailing whitespace, tab/space mixing, or stray BOMs.
  • Identifiers in comments and error messages use backticks where applicable.
  • Comments and error messages are in English.
  • Comments and error messages follow project conventions.

C++ Specific

  • Code follows the Google C++ Style Guide.
  • clang-format --dry-run --Werror include/infini/ops.h src/operator.h passed on ssh nvidia.
  • N/A — clang-tidy was not run in this pass; this PR does not add new kernel logic.
  • N/A — operator parameter order is unchanged.
  • No exceptions are thrown.
  • No new error or warning messages were introduced.
  • N/A — no kernel files were added or renamed.
  • N/A — no kernel/kernel launcher split was changed.
  • N/A — no constructor initializer lists were changed.
  • Namespace and spacing in touched C++ headers are formatted by clang-format.
  • N/A — no new operators were added.
  • No raw new/delete was introduced.

Python Specific

  • ruff check scripts/generate_wrappers.py tests/test_cpp_api.py passed.
  • ruff format --check scripts/generate_wrappers.py tests/test_cpp_api.py passed.
  • Comments and strings follow surrounding Python conventions.
  • Framework-specific pytest.skip conventions are preserved.
  • Function-body spacing follows the surrounding style.
  • Control-flow spacing follows the surrounding style.
  • Return statement spacing follows the surrounding style.
  • N/A — no docstrings were added.
  • N/A — the touched generator/test code follows the existing no-type-hints style.

Testing

  • N/A — full pytest was not run on every supported platform in this PR creation pass; the table above records tested and untested platforms explicitly.
  • Untested platforms are listed with explicit reasons in the table.
  • New functionality has a regression smoke test under tests/.
  • N/A — the new smoke test does not need parametrization.
  • N/A — pytest.mark.auto_act_and_assert is not applicable to this external compile/link smoke test.
  • N/A — dtype/device parameterization is not applicable to this external compile/link smoke test.
  • N/A — no new flaky test behavior was observed.
  • The external Add::Call smoke fails on the previous behavior with an empty backend dispatch and passes with this PR.

Build, CI, and Tooling

  • N/A — pip install .[dev] was not run; CMake build/install validation was run instead.
  • N/A — compile_commands.json regeneration was not separately checked.
  • N/A — no new backend/device was added.
  • Existing CUDA-like backend mutual-exclusion logic was not changed.
  • Local equivalents of clang-format.yml and ruff.yml checks passed for touched files.
  • No new runtime dependency was added.

Documentation

  • N/A — README/CONTRIBUTING updates are not required for this internal codegen/linkage fix.
  • N/A — no new operator or public utility API was added beyond restoring <infini/ops.h> as the entrypoint.
  • N/A — no breaking change is introduced.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers were committed.
  • N/A — no third-party code was added.
  • No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

@voltjia voltjia force-pushed the fix/export-call-instantiations branch from a02cb31 to ca40c6f Compare May 28, 2026 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Export C API from libinfiniops.so to eliminate vendor compiler requirement for downstream consumers

1 participant