Skip to content

fix https://github.com/NVIDIA/TensorRT-Edge-LLM/issues/87: hard-error instead of silent return when CuTe DSL GEMM is not compiled#97

Open
suharvest wants to merge 1 commit into
NVIDIA:mainfrom
suharvest:harden/talker-mlp-fail-loud
Open

fix https://github.com/NVIDIA/TensorRT-Edge-LLM/issues/87: hard-error instead of silent return when CuTe DSL GEMM is not compiled#97
suharvest wants to merge 1 commit into
NVIDIA:mainfrom
suharvest:harden/talker-mlp-fail-loud

Conversation

@suharvest
Copy link
Copy Markdown

What does this PR do?

Type of change: Bug fix

Overview:

cpp/kernels/talkerMLPKernels/talkerMLPKernels.cu wraps both runBiasSiLU and runBias in #ifdef CUTE_DSL_GEMM_ENABLED. In the #else branch (around lines 341 and 385) the kernel calls LOG_ERROR(...) and then return;. Because both helpers (invokeTalkerMLP, invokeLinearLayer) return void and write their result into a pre-allocated output tensor, the caller has no way to observe the failure: the error line goes to stderr, but the program continues and the downstream Talker sampler reads whatever the buffer happened to contain. The program "succeeds" and writes a valid 24 kHz WAV, the audio just decodes to unintelligible filler.

This PR replaces the silent-return path with ELLM_CHECK(false, ...), which is the project's existing pattern for unrecoverable runtime preconditions (defined in cpp/common/checkMacros.h, already included by this file). The check fires at the first call site, raises a clear stderr message, and aborts — so a missing -DENABLE_CUTE_DSL=gemm build flag is impossible to overlook.

Usage

This change has no new user-facing API. It hardens an existing failure mode:

# Before this PR — build without CUTE_DSL_GEMM_ENABLED, run anyway:
$ ./qwen3_tts_inference --inputFile in.json ...
[ERROR] CuTe DSL GEMM not compiled. Rebuild with -DENABLE_CUTE_DSL=gemm (or ALL).
... (continues, writes wrong WAV) ...
DONE_EVENT: {"ok": true, ...}                # <-- false positive

# After this PR — same build, same invocation:
$ ./qwen3_tts_inference --inputFile in.json ...
terminate called after throwing an instance of 'std::runtime_error'
  what(): invokeTalkerMLP requires CuTe DSL GEMM. Rebuild with -DENABLE_CUTE_DSL=gemm (or ALL); ...
Aborted

Reachability + safety

  • The #else branch only compiles when the operator was built without CuTe DSL support. Inside that branch nothing has launched yet (no cudaMalloc, no stream ownership transfer, no GEMM kernel launch) — so throwing is leak-free.
  • When CUTE_DSL_GEMM_ENABLED is defined, the diff is unreachable and behavior is unchanged for every existing caller.

🚀 Pull Request Checklist

✅ Pre-commit Checks

  • pip install pre-commit
  • pre-commit install
  • pre-commit run --files cpp/kernels/talkerMLPKernels/talkerMLPKernels.cu passes (clang-format / codespell / license header all green)

🧪 Tests

  • No new test added: this is a #else branch that compiles only when the operator was built without CuTe DSL. The new behavior is "abort cleanly at first call with a clear message" instead of "return uninitialized memory and continue."
  • Existing Qwen3-TTS smoke (built with -DENABLE_CUTE_DSL=gemm) still passes.

📄 Documentation

  • No doc change required: the build flag is already mentioned in the Qwen3-TTS section of the docs; this PR makes its absence non-silent.

⚙️ Compatibility

  • Backward compatible when CUTE_DSL_GEMM_ENABLED is defined: the diff is unreachable and behavior is unchanged for every existing caller. Breaks only the build configuration that was already producing wrong output silently.

Additional Information

  • Related issue: #87 — [Bug] Qwen3-TTS 0.6B TTS output is incorrect on Jetson (Issue A in the follow-up comment from 2026-05-27)
  • This is one of three issues we localized while validating Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice on Orin NX. The other two (CustomVoice language-conditioned prefix; CuTe DSL prebuilt-artifact row-corruption) are tracked separately in the issue thread.
  • Optional follow-up (not in this PR): a message(FATAL_ERROR ...) in cmake/CuteDsl.cmake that catches the missing flag at build time rather than at first call. Happy to send as a separate PR if maintainers prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant