Skip to content

Fix 24486: TP: allows the usage of 4-10 gpus for stepfun#24554

Open
krampenschiesser wants to merge 1 commit into
ggml-org:masterfrom
krampenschiesser:fix--TP--stepfun-3.7-8+gpus
Open

Fix 24486: TP: allows the usage of 4-10 gpus for stepfun#24554
krampenschiesser wants to merge 1 commit into
ggml-org:masterfrom
krampenschiesser:fix--TP--stepfun-3.7-8+gpus

Conversation

@krampenschiesser

@krampenschiesser krampenschiesser commented Jun 13, 2026

Copy link
Copy Markdown

Overview

This fixes #24486 and allows the usage of 4-10 and probably more gpu's. I cannot test with <4 because of too little vram.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: Yes I used llama.cpp and qwen3.5 122b as an advanced grep tool for discovering the codebase all code comes from me (as little as there is).

@krampenschiesser

Copy link
Copy Markdown
Author

I am seeing some problems with tensor splits by going larger with granularity. I will try to implement the discovery of additional tensors during the std::vec initialization. this will include the passthrough of the user data+model into the ggml_backend_meta_get_split_state function so I can ask the model about additional tensors by tensor name.
will post some logs tomorrow regarding the core issue with the graph.

@krampenschiesser krampenschiesser force-pushed the fix--TP--stepfun-3.7-8+gpus branch from 22c7460 to 6df847c Compare June 16, 2026 07:45
@krampenschiesser krampenschiesser changed the title Fix 24486: TP: allows the usage of 8,9,10 gpus for stepfun Fix 24486: TP: allows the usage of 4-10 gpus for stepfun Jun 16, 2026
@krampenschiesser

Copy link
Copy Markdown
Author

This is way better.
works in all configurations i could test (4-10 gpus) probably with 2,3 and 10+ too.
Tensor splits look better too.
Should be good for review.
@JohannesGaessler please take a look, i don't know if it is ok to add more model specific exceptions into the llama-model.cpp

@krampenschiesser krampenschiesser marked this pull request as ready for review June 16, 2026 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: TP: Stepfun 3.7 does not work with uneven splits (GGML_META_DEBUG=1)

1 participant