glm5-fp4-gb300-dynamo-sglang: extend 8k1k low-lat sweep with 1p17d topology by Ankur-singh · Pull Request #1583 · SemiAnalysisAI/InferenceX

Ankur-singh · 2026-05-28T22:13:05Z

Summary

Mirrors NVIDIA/srt-slurm#175 (zip_override_8k1k_lowlat update for recipes/gb300-fp4/glm5.yaml):

Adds a 5th low-lat topology 8k1k_stp_lowlat_4.yaml with decode_nodes / decode_workers = 17 (1p17d, 18 nodes total).
Lowers per-zip-index decode max-running-requests and cuda-graph-max-bs from a flat 4096 to 128 / 64 / 32 / 16 / 1 across lowlat_0..4.
Drops the per-recipe benchmark.concurrencies from "1024" to "128" / "64" / "32" / "16" / "12" to match.
Keeps nvidia-master.yaml in sync: each glm5-fp4-gb300-dynamo-sglang 8k1k lowlat entry's conc-list now matches its recipe; a new conc-list: [12] / decode.num-worker: 17 entry is appended for the 1p17d topology.

Note

Low Risk
Benchmark and Slurm recipe tuning only; no runtime application or auth changes.

Overview
Aligns GB300 FP4 GLM-5 8k1k disaggregated low-latency STP coverage with NVIDIA/srt-slurm#175: per-topology concurrency and decode batch limits instead of a flat 1024/4096 sweep.

For the four existing 1p3d → 1p15d recipes (8k1k_stp_lowlat_0–3), conc-list in nvidia-master.yaml drops from 1024 to 128 / 64 / 32 / 16, and each recipe’s decode max-running-requests, cuda-graph-max-bs, and benchmark.concurrencies are lowered to the same values (from 4096/1024).

A fifth topology (1p17d, 18 nodes) is added via new 8k1k_stp_lowlat_4.yaml (17 decode workers) with concurrency 12 and decode limits 1, plus a matching master search-space entry pointing at 8k1k_stp_lowlat_4.yaml. perf-changelog.yaml documents the change under glm5-fp4-gb300-dynamo-sglang.

^{Reviewed by Cursor Bugbot for commit cdafe29. Bugbot is set up for automated code reviews on this repo. Configure here.}

…pology Mirrors NVIDIA/srt-slurm#175: adds a 5th 8k1k_stp_lowlat_4 recipe with decode_nodes/workers=17, and lowers per-zip-index decode max-running-requests / cuda-graph-max-bs from a flat 4096 to 128/64/32/16/1 across lowlat_0..4. Benchmark concurrencies follow suit: 128/64/32/16/12. nvidia-master.yaml conc-list updated to match for each of the five 1p{3,5,9,15,17}d entries.

github-actions · 2026-05-28T22:13:14Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude

LGTM — mechanical benchmark recipe / config update mirroring NVIDIA/srt-slurm#175; only finding is a trivial pr-link: TBD placeholder flagged inline.

Extended reasoning...

Overview

This PR is benchmark-only: it mirrors upstream NVIDIA/srt-slurm#175 by (1) adding a 5th low-latency topology (8k1k_stp_lowlat_4.yaml, 1p17d / 18 nodes), (2) lowering decode max-running-requests / cuda-graph-max-bs / benchmark.concurrencies per zip-index across lowlat_0..4 from a uniform 4096/1024 to 128/64/32/16/1 and 128/64/32/16/12, and (3) updating nvidia-master.yaml conc-list values plus appending a new entry for the 1p17d topology. perf-changelog.yaml gets a single entry documenting the update.

Security risks

None. No application/runtime code, no auth, no permissions, no data path, no dependency or container-image changes. Slurm recipe and benchmark-sweep tuning only.

Level of scrutiny

Low. The four edited recipe files are single-number tweaks; the new lowlat_4.yaml is a near-verbatim sibling of lowlat_0..3 (same base: block, same prefill config, only the topology-specific decode_nodes / decode_workers / decode override differ). The five conc-list edits in nvidia-master.yaml line up 1:1 with the per-recipe benchmark.concurrencies values, and the new 1p17d entry follows the existing search-space schema. Each value can be eyeballed against the upstream PR.

Other factors

The one finding — pr-link: TBD in the new changelog entry — is a self-referential metadata placeholder with zero runtime impact, posted inline. Nothing else stood out: conc-list matches recipe concurrency for every topology; num-worker matches decode_workers; the new recipe's decode override is internally consistent with the upstream zip-index pattern.

claude · 2026-05-28T22:21:29Z

+    - glm5-fp4-gb300-dynamo-sglang
+  description:
+    - "Update GB300 FP4 GLM-5 8k1k low-latency sweep to mirror NVIDIA/srt-slurm#175: add a 5th 1p17d topology (decode_nodes/workers=17), and lower decode max-running-requests / cuda-graph-max-bs / benchmark concurrency per-zip-index from a flat 4096/1024 to 128/64/32/16/1 (mrr & cuda-graph) and 128/64/32/16/12 (concurrency)"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1583


🟡 perf-changelog.yaml line 3208 has pr-link: TBD instead of https://github.com/SemiAnalysisAI/InferenceX/pull/1583. Every prior entry in the file uses the canonical https://github.com/SemiAnalysisAI/InferenceX/pull/<N> form, so the placeholder breaks the changelog's link-back convention and any URL-scraping automation. Trivial swap — just substitute the PR number.

Extended reasoning...

What is wrong

The new perf-changelog entry added by this PR (lines 3204–3208 of perf-changelog.yaml at HEAD) ends with:

pr-link: TBD

rather than the canonical form used by every other entry in the file, e.g. the immediately-preceding entry on line 3202:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1579

Since this is PR #1583, the value should be https://github.com/SemiAnalysisAI/InferenceX/pull/1583.

Addressing the refutation

One verifier objected that the <pr-diff> block embedded in the review prompt already shows the substituted URL, so there is nothing to fix. I checked the committed state directly:

$ git log -1 --format='%H %s' 0b73f198e48b1e8fdae5acbe8373bc14971283bd glm5-fp4-gb300-dynamo-sglang: extend 8k1k low-lat sweep with 1p17d topology $ git show HEAD -- perf-changelog.yaml | tail -8 +- config-keys: + - glm5-fp4-gb300-dynamo-sglang + description: + - "Update GB300 FP4 GLM-5 8k1k low-latency sweep ..." + pr-link: TBD +

HEAD (0b73f19) is the PR commit itself, and git show HEAD — i.e. the actual diff this commit introduces — adds pr-link: TBD. The URL in the prompt's <pr-diff> block does not match the committed content; the file as it would be merged has TBD. Three independent verifiers reached the same conclusion by reading the committed file directly. Per the standing guidance to "trust what you observe now" when state conflicts, the committed file is authoritative here.

Step-by-step proof

Open perf-changelog.yaml at line 3208 in the working tree.

Observe the literal string pr-link: TBD.

Run git grep -n "pr-link: TBD" perf-changelog.yaml — exactly one hit, line 3208.

Run git grep -nE "pr-link: https://github.com" perf-changelog.yaml | wc -l — ~30 hits, every other entry uses the URL form.

The PR number is 1583 (from PR metadata), so the intended value is https://github.com/SemiAnalysisAI/InferenceX/pull/1583.

Impact

No runtime, benchmark, or correctness impact — this is a changelog-metadata field. Consequences:

Breaks the file's convention that every entry links back to its PR.

Any tooling that scrapes pr-link: values to build a changelog page, link-check, or cross-reference releases will either skip this entry or emit a TBD placeholder.

Future readers of the changelog cannot click through to find the originating PR.

Fix

Replace line 3208:

- pr-link: TBD + pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1583

github-actions · 2026-05-29T11:24:47Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26605445495
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26605445495

github-actions · 2026-05-29T18:22:53Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26605445495
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26605445495

functionstackx · 2026-05-29T23:10:06Z

/reuse-sweep-run

github-actions · 2026-05-29T23:10:44Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26666839643
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26666839643

Ankur-singh requested a review from a team May 28, 2026 22:13

Ankur-singh requested review from jgangani and kedarpotdar-nv as code owners May 28, 2026 22:13

github-project-automation Bot added this to InferenceMAX Board May 28, 2026

perf-changelog: set PR link to #1583

8cb4972

Ankur-singh added the full-sweep-enabled label May 28, 2026

claude Bot reviewed May 28, 2026

View reviewed changes

Merge branch 'main' into glm5-gb300-lowlat-update

cdafe29

functionstackx merged commit e0cd8f7 into main May 29, 2026
5 of 6 checks passed

functionstackx deleted the glm5-gb300-lowlat-update branch May 29, 2026 23:10

github-project-automation Bot moved this to Done in InferenceMAX Board May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

glm5-fp4-gb300-dynamo-sglang: extend 8k1k low-lat sweep with 1p17d topology#1583

glm5-fp4-gb300-dynamo-sglang: extend 8k1k low-lat sweep with 1p17d topology#1583
functionstackx merged 3 commits into
mainfrom
glm5-gb300-lowlat-update

Ankur-singh commented May 28, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot May 28, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

functionstackx commented May 29, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ankur-singh commented May 28, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude Bot May 28, 2026

Choose a reason for hiding this comment

What is wrong

Addressing the refutation

Step-by-step proof

Impact

Fix

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

functionstackx commented May 29, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ankur-singh commented May 28, 2026 •

edited by cursor Bot

Loading