Skip to content

Fix draft model ignoring draft_gpu_split on load#425

Merged
turboderp merged 1 commit into
theroyallab:mainfrom
baronrabban:fix/draft-model-gpu-split
Jun 12, 2026
Merged

Fix draft model ignoring draft_gpu_split on load#425
turboderp merged 1 commit into
theroyallab:mainfrom
baronrabban:fix/draft-model-gpu-split

Conversation

@baronrabban

Copy link
Copy Markdown
Contributor

Is your pull request related to a problem? Please describe.

The exllamav3 backend reads the user-configured draft_gpu_split option into self.draft_gpu_split (backends/exllamav3/model.py:172), but self.draft_gpu_split is never actually used anywhere. When loading the draft model in load_model_sync, the code passes self.gpu_split — the main model's split — to draft_model.load_gen():

if self.use_draft_model:
    for value in self.draft_model.load_gen(
        reserve_per_device=self.autosplit_reserve,
        use_per_device=self.gpu_split,   # <-- main model's split
        callback=progress_callback,
    ):

As a result, the user's draft_gpu_split setting is silently ignored and the draft model is loaded using the main model's GPU split instead.

Why should this change be made?

So that the draft_gpu_split config option actually takes effect. Users who want to place the draft model on a specific GPU / with a specific split currently have no working way to do so on the exllamav3 backend.

Examples

The fix is a one-line change — use the draft model's own split when loading the draft model:

use_per_device=self.draft_gpu_split,

Additional context

Single-line change in backends/exllamav3/model.py. self.draft_gpu_split is already parsed and defaults to [] (matching the prior behavior when no draft split is configured), so this has no effect for users who don't set the option.

The exllamav3 backend parses the user-configured draft_gpu_split into
self.draft_gpu_split, but load_model_sync passed self.gpu_split (the main
model's split) when loading the draft model, so the draft split was
silently ignored. Use self.draft_gpu_split instead.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@turboderp

Copy link
Copy Markdown
Collaborator

👍

@turboderp turboderp merged commit 637b595 into theroyallab:main Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants