fix https://github.com/NVIDIA/TensorRT-Edge-LLM/issues/87: hard-error instead of silent return when CuTe DSL GEMM is not compiled by suharvest · Pull Request #97 · NVIDIA/TensorRT-Edge-LLM

suharvest · 2026-05-27T03:11:04Z

What does this PR do?

Type of change: Bug fix

Overview:

cpp/kernels/talkerMLPKernels/talkerMLPKernels.cu wraps both runBiasSiLU and runBias in #ifdef CUTE_DSL_GEMM_ENABLED. In the #else branch (around lines 341 and 385) the kernel calls LOG_ERROR(...) and then return;. Because both helpers (invokeTalkerMLP, invokeLinearLayer) return void and write their result into a pre-allocated output tensor, the caller has no way to observe the failure: the error line goes to stderr, but the program continues and the downstream Talker sampler reads whatever the buffer happened to contain. The program "succeeds" and writes a valid 24 kHz WAV, the audio just decodes to unintelligible filler.

This PR replaces the silent-return path with ELLM_CHECK(false, ...), which is the project's existing pattern for unrecoverable runtime preconditions (defined in cpp/common/checkMacros.h, already included by this file). The check fires at the first call site, raises a clear stderr message, and aborts — so a missing -DENABLE_CUTE_DSL=gemm build flag is impossible to overlook.

Usage

This change has no new user-facing API. It hardens an existing failure mode:

# Before this PR — build without CUTE_DSL_GEMM_ENABLED, run anyway:
$ ./qwen3_tts_inference --inputFile in.json ...
[ERROR] CuTe DSL GEMM not compiled. Rebuild with -DENABLE_CUTE_DSL=gemm (or ALL).
... (continues, writes wrong WAV) ...
DONE_EVENT: {"ok": true, ...}                # <-- false positive

# After this PR — same build, same invocation:
$ ./qwen3_tts_inference --inputFile in.json ...
terminate called after throwing an instance of 'std::runtime_error'
  what(): invokeTalkerMLP requires CuTe DSL GEMM. Rebuild with -DENABLE_CUTE_DSL=gemm (or ALL); ...
Aborted

Reachability + safety

The #else branch only compiles when the operator was built without CuTe DSL support. Inside that branch nothing has launched yet (no cudaMalloc, no stream ownership transfer, no GEMM kernel launch) — so throwing is leak-free.
When CUTE_DSL_GEMM_ENABLED is defined, the diff is unreachable and behavior is unchanged for every existing caller.

🚀 Pull Request Checklist

✅ Pre-commit Checks

pip install pre-commit
pre-commit install
pre-commit run --files cpp/kernels/talkerMLPKernels/talkerMLPKernels.cu passes (clang-format / codespell / license header all green)

🧪 Tests

No new test added: this is a #else branch that compiles only when the operator was built without CuTe DSL. The new behavior is "abort cleanly at first call with a clear message" instead of "return uninitialized memory and continue."
Existing Qwen3-TTS smoke (built with -DENABLE_CUTE_DSL=gemm) still passes.

📄 Documentation

No doc change required: the build flag is already mentioned in the Qwen3-TTS section of the docs; this PR makes its absence non-silent.

⚙️ Compatibility

Backward compatible when CUTE_DSL_GEMM_ENABLED is defined: the diff is unreachable and behavior is unchanged for every existing caller. Breaks only the build configuration that was already producing wrong output silently.

Additional Information

Related issue: #87 — [Bug] Qwen3-TTS 0.6B TTS output is incorrect on Jetson (Issue A in the follow-up comment from 2026-05-27)
This is one of three issues we localized while validating Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice on Orin NX. The other two (CustomVoice language-conditioned prefix; CuTe DSL prebuilt-artifact row-corruption) are tracked separately in the issue thread.
Optional follow-up (not in this PR): a message(FATAL_ERROR ...) in cmake/CuteDsl.cmake that catches the missing flag at build time rather than at first call. Happy to send as a separate PR if maintainers prefer.

… is not compiled

fix NVIDIA#87: hard-error instead of silent return when CuTe DSL GEMM…

24f2401

… is not compiled

suharvest requested a review from a team May 27, 2026 03:11

suharvest mentioned this pull request May 27, 2026

[Bug] Qwen3-TTS 0.6B TTS output is incorrect on Jetson unless Talker / CodePredictor / Code2Wav contracts are fixed #87

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix https://github.com/NVIDIA/TensorRT-Edge-LLM/issues/87: hard-error instead of silent return when CuTe DSL GEMM is not compiled#97

fix https://github.com/NVIDIA/TensorRT-Edge-LLM/issues/87: hard-error instead of silent return when CuTe DSL GEMM is not compiled#97
suharvest wants to merge 1 commit into
NVIDIA:mainfrom
suharvest:harden/talker-mlp-fail-loud

suharvest commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

suharvest commented May 27, 2026

What does this PR do?

Usage

Reachability + safety

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

📄 Documentation

⚙️ Compatibility

Additional Information

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant