Fix draft model ignoring draft_gpu_split on load by baronrabban · Pull Request #425 · theroyallab/tabbyAPI

baronrabban · 2026-06-10T22:47:21Z

Is your pull request related to a problem? Please describe.

The exllamav3 backend reads the user-configured draft_gpu_split option into self.draft_gpu_split (backends/exllamav3/model.py:172), but self.draft_gpu_split is never actually used anywhere. When loading the draft model in load_model_sync, the code passes self.gpu_split — the main model's split — to draft_model.load_gen():

if self.use_draft_model:
    for value in self.draft_model.load_gen(
        reserve_per_device=self.autosplit_reserve,
        use_per_device=self.gpu_split,   # <-- main model's split
        callback=progress_callback,
    ):

As a result, the user's draft_gpu_split setting is silently ignored and the draft model is loaded using the main model's GPU split instead.

Why should this change be made?

So that the draft_gpu_split config option actually takes effect. Users who want to place the draft model on a specific GPU / with a specific split currently have no working way to do so on the exllamav3 backend.

Examples

The fix is a one-line change — use the draft model's own split when loading the draft model:

use_per_device=self.draft_gpu_split,

Additional context

Single-line change in backends/exllamav3/model.py. self.draft_gpu_split is already parsed and defaults to [] (matching the prior behavior when no draft split is configured), so this has no effect for users who don't set the option.

The exllamav3 backend parses the user-configured draft_gpu_split into self.draft_gpu_split, but load_model_sync passed self.gpu_split (the main model's split) when loading the draft model, so the draft split was silently ignored. Use self.draft_gpu_split instead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

turboderp · 2026-06-12T19:11:51Z

👍

turboderp merged commit 637b595 into theroyallab:main Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix draft model ignoring draft_gpu_split on load#425

Fix draft model ignoring draft_gpu_split on load#425
turboderp merged 1 commit into
theroyallab:mainfrom
baronrabban:fix/draft-model-gpu-split

baronrabban commented Jun 10, 2026

Uh oh!

turboderp commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

baronrabban commented Jun 10, 2026

Uh oh!

turboderp commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants