fix https://github.com/NVIDIA/TensorRT-Edge-LLM/issues/87: hard-error instead of silent return when CuTe DSL GEMM is not compiled#97
Open
suharvest wants to merge 1 commit into
Conversation
… is not compiled
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Type of change: Bug fix
Overview:
cpp/kernels/talkerMLPKernels/talkerMLPKernels.cuwraps bothrunBiasSiLUandrunBiasin#ifdef CUTE_DSL_GEMM_ENABLED. In the#elsebranch (around lines 341 and 385) the kernel callsLOG_ERROR(...)and thenreturn;. Because both helpers (invokeTalkerMLP,invokeLinearLayer) returnvoidand write their result into a pre-allocated output tensor, the caller has no way to observe the failure: the error line goes to stderr, but the program continues and the downstream Talker sampler reads whatever the buffer happened to contain. The program "succeeds" and writes a valid 24 kHz WAV, the audio just decodes to unintelligible filler.This PR replaces the silent-return path with
ELLM_CHECK(false, ...), which is the project's existing pattern for unrecoverable runtime preconditions (defined incpp/common/checkMacros.h, already included by this file). The check fires at the first call site, raises a clear stderr message, and aborts — so a missing-DENABLE_CUTE_DSL=gemmbuild flag is impossible to overlook.Usage
This change has no new user-facing API. It hardens an existing failure mode:
Reachability + safety
#elsebranch only compiles when the operator was built without CuTe DSL support. Inside that branch nothing has launched yet (nocudaMalloc, no stream ownership transfer, no GEMM kernel launch) — so throwing is leak-free.CUTE_DSL_GEMM_ENABLEDis defined, the diff is unreachable and behavior is unchanged for every existing caller.🚀 Pull Request Checklist
✅ Pre-commit Checks
pip install pre-commitpre-commit installpre-commit run --files cpp/kernels/talkerMLPKernels/talkerMLPKernels.cupasses (clang-format / codespell / license header all green)🧪 Tests
#elsebranch that compiles only when the operator was built without CuTe DSL. The new behavior is "abort cleanly at first call with a clear message" instead of "return uninitialized memory and continue."-DENABLE_CUTE_DSL=gemm) still passes.📄 Documentation
⚙️ Compatibility
CUTE_DSL_GEMM_ENABLEDis defined: the diff is unreachable and behavior is unchanged for every existing caller. Breaks only the build configuration that was already producing wrong output silently.Additional Information
Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoiceon Orin NX. The other two (CustomVoice language-conditioned prefix; CuTe DSL prebuilt-artifact row-corruption) are tracked separately in the issue thread.message(FATAL_ERROR ...)incmake/CuteDsl.cmakethat catches the missing flag at build time rather than at first call. Happy to send as a separate PR if maintainers prefer.