Fix 24486: TP: allows the usage of 4-10 gpus for stepfun by krampenschiesser · Pull Request #24554 · ggml-org/llama.cpp

krampenschiesser · 2026-06-13T01:18:10Z

Overview

This fixes #24486 and allows the usage of 4-10 and probably more gpu's. I cannot test with <4 because of too little vram.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: Yes I used llama.cpp and qwen3.5 122b as an advanced grep tool for discovering the codebase all code comes from me (as little as there is).

krampenschiesser · 2026-06-13T18:34:55Z

I am seeing some problems with tensor splits by going larger with granularity. I will try to implement the discovery of additional tensors during the std::vec initialization. this will include the passthrough of the user data+model into the ggml_backend_meta_get_split_state function so I can ask the model about additional tensors by tensor name.
will post some logs tomorrow regarding the core issue with the graph.

krampenschiesser · 2026-06-16T07:48:49Z

This is way better.
works in all configurations i could test (4-10 gpus) probably with 2,3 and 10+ too.
Tensor splits look better too.
Should be good for review.
@JohannesGaessler please take a look, i don't know if it is ok to add more model specific exceptions into the llama-model.cpp

krampenschiesser requested a review from CISC as a code owner June 13, 2026 01:18

krampenschiesser mentioned this pull request Jun 13, 2026

Eval bug: TP: Stepfun 3.7 does not work with uneven splits (GGML_META_DEBUG=1) #24486

Open

CISC requested a review from JohannesGaessler June 13, 2026 11:10

krampenschiesser marked this pull request as draft June 13, 2026 18:29

allows the usage of 4-10 gpus for stpefun in tensor parallel

6df847c

krampenschiesser force-pushed the fix--TP--stepfun-3.7-8+gpus branch from 22c7460 to 6df847c Compare June 16, 2026 07:45

krampenschiesser changed the title ~~Fix 24486: TP: allows the usage of 8,9,10 gpus for stepfun~~ Fix 24486: TP: allows the usage of 4-10 gpus for stepfun Jun 16, 2026

krampenschiesser marked this pull request as ready for review June 16, 2026 07:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 24486: TP: allows the usage of 4-10 gpus for stepfun#24554

Fix 24486: TP: allows the usage of 4-10 gpus for stepfun#24554
krampenschiesser wants to merge 1 commit into
ggml-org:masterfrom
krampenschiesser:fix--TP--stepfun-3.7-8+gpus

krampenschiesser commented Jun 13, 2026 •

edited

Loading

Uh oh!

krampenschiesser commented Jun 13, 2026

Uh oh!

krampenschiesser commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krampenschiesser commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Requirements

Uh oh!

krampenschiesser commented Jun 13, 2026

Uh oh!

krampenschiesser commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

krampenschiesser commented Jun 13, 2026 •

edited

Loading