Updating dsv4 b200 vllm version by wzhao18 · Pull Request #1384 · SemiAnalysisAI/InferenceX

wzhao18 · 2026-05-15T04:36:56Z

Summary

Bump dsv4-fp4-b200-vllm to vLLM v0.22.0 with sweep and launcher adjustments.

Changes

Image: vllm/vllm-openai:v0.21.0 → v0.22.0
ISL 1024 DP-attn sweep: split single conc 256–4096 range into 256–1024 + dedicated 4096 point
Launcher (dsv4_fp4_b200_vllm.sh):
- Remove --gpu-memory-utilization 0.85 on DP-attn path (use vLLM default)
- Remove special-case capping BENCHMARK_MAX_MODEL_LEN to 4096 for ISL/OSL 1024/1024
- Extract --max-cudagraph-capture-size into MAX_CUDAGRAPH_CAPTURE_SIZE variable (value unchanged: 2048)
Perf changelog: record image bump to v0.22.0

Note

Medium Risk
Changes benchmark serving flags and sweep coverage for a large MoE model on B200; mis-tuned concurrency or memory settings could skew perf results without affecting production apps.

Overview
Bumps dsv4-fp4-b200-vllm to vLLM v0.22.0 and aligns the B200 single-node launcher and sweep with that stack.

The ISL 1024 DP-attention search space no longer uses one wide conc 256–4096 band; it is split into 256–1024 plus a separate 4096 point. dsv4_fp4_b200_vllm.sh drops --gpu-memory-utilization 0.85 on the DP-attn path, removes the 1024/1024 cap that forced BENCHMARK_MAX_MODEL_LEN to 4096, and wires --max-cudagraph-capture-size through MAX_CUDAGRAPH_CAPTURE_SIZE (still 2048). perf-changelog.yaml documents the image bump.

^{Reviewed by Cursor Bugbot for commit 5d83d39. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-05-15T04:37:04Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-05-15T04:39:15Z

Claude finished @wzhao18's task in 1m 46s —— View job

Review of PR #1384

Gather context and read changed files
Validate master config changes
Validate perf-changelog.yaml
Validate benchmark script
Post review summary

Summary

One issue to address:

🟡 Image tag dropped -cu130 suffix (comment): The image changed from v0.20.0-cu130 → v0.20.2 (no -cu130). Since this runs on B200 GPUs requiring CUDA 13.0, and the sibling dsv4-fp4-b200-vllm-mtp config still uses v0.20.0-cu130, please confirm this is intentional. If a v0.20.2-cu130 tag exists, it should likely be used instead.

The benchmark script changes (dynamic MAX_CUDAGRAPH_CAPTURE_SIZE based on ISL, removal of --gpu-memory-utilization 0.85 for DP-attention) and the perf-changelog entry all look correct. Note that GMU_ARGS is now always empty but still referenced in the vllm serve command — not a bug, but could be cleaned up.

github-actions · 2026-05-15T05:00:16Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25900553590
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25900553590

github-actions · 2026-05-15T08:12:48Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25900553590
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25900553590

github-actions · 2026-05-15T15:30:40Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25921236779
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25921236779

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 11d0585. Configure here.}

claude

Additional findings (outside current diff — PR may have been updated during review):

🟡 perf-changelog.yaml:2490-2494 — The perf-changelog entry at line 2493 says Update vLLM image from v0.20.0 to v0.20.2, but the actual diff at .github/configs/nvidia-master.yaml:1737 swaps the image to vllm/vllm-openai:nightly-626fa9bba5663a5cf6a870debf031ee344ddb822 — a pinned nightly digest, not a v0.20.2 release. Please update the changelog to record the nightly digest (e.g., Update vLLM image from v0.20.0-cu130 to nightly-626fa9bba5663a5cf6a870debf031ee344ddb822) so future perf-regression bisects can find the actual image.
Extended reasoning...

What the bug is. The new entry in perf-changelog.yaml (lines 2490–2494) records the change as:

"Update vLLM image from v0.20.0 to v0.20.2"

But the actual image change in this PR, at .github/configs/nvidia-master.yaml:1737, is:
```
-  image: vllm/vllm-openai:v0.20.0-cu130
+  image: vllm/vllm-openai:nightly-626fa9bba5663a5cf6a870debf031ee344ddb822
```
The deployed image is a nightly build pinned to a specific upstream commit SHA, not the stable v0.20.2 release the changelog claims. The PR description itself acknowledges this (pinned nightly digest; perf changelog notes v0.20.0 → v0.20.2). An earlier review comment on this line also asked about the -cu130 suffix being dropped — that comment is now stale, since the latest commit replaced the v0.20.2 tag with the nightly digest, but the changelog text was never updated to match.

Why it matters. perf-changelog.yaml is the bisection trail for performance regressions. Recording a stable-looking version string when the deployed artifact is actually a nightly pinned to a commit hash defeats traceability: a future engineer investigating a regression that lands at this commit will pull vllm/vllm-openai:v0.20.2 and either fail (the tag may not exist or may have moved) or test a completely different binary than what actually ran. The nightly digest is the only thing that uniquely identifies the artifact, so it is exactly the value that needs to be in the changelog.

Why existing precedent supports the fix. Other entries in this very file consistently preserve the full tag for cross-variant updates — e.g., Update SGLang image from v0.5.9-cu130 to v0.5.11-cu130 at lines 2481 and 2487, immediately preceding this entry. The bar for tag fidelity is already established in the file; the vLLM entry is the outlier.

Impact. Documentation/traceability only — no runtime behavior change. But the entire purpose of this file is post-hoc perf-regression diagnosis, so an inaccurate entry undermines its only function.

How to fix. Change line 2493 of perf-changelog.yaml from
```
    - "Update vLLM image from v0.20.0 to v0.20.2"
```
to something that names the actual deployed artifact, e.g.
```
    - "Update vLLM image from v0.20.0-cu130 to nightly-626fa9bba5663a5cf6a870debf031ee344ddb822"
```
Step-by-step proof.
1. Open .github/configs/nvidia-master.yaml at line 1737 on this branch. It reads: image: vllm/vllm-openai:nightly-626fa9bba5663a5cf6a870debf031ee344ddb822.
2. The PR diff shows the prior value was vllm/vllm-openai:v0.20.0-cu130. So the literal artifact transition is v0.20.0-cu130 → nightly-626fa9bba5663a5cf6a870debf031ee344ddb822 — neither side is v0.20.2.
3. Open perf-changelog.yaml at the newly added block (lines 2488–2494). The description string is "Update vLLM image from v0.20.0 to v0.20.2" — both the source tag (drops -cu130) and the destination tag (claims v0.20.2) are wrong relative to (1).
4. A future bisect at this commit that reads only the changelog and pulls vllm/vllm-openai:v0.20.2 will not get the same binary that produced these benchmark numbers; only the nightly SHA reproduces the run.

github-actions · 2026-05-29T05:37:15Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26614901433
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26614901433

Updated the vLLM image tag to specify the nightly version.

github-actions · 2026-05-29T16:19:52Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26648723870
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26648723870

wzhao18 · 2026-05-29T16:20:14Z

Sweep successful

@kedarpotdar-nv ready for review

wzhao18 · 2026-05-29T16:23:18Z

actually given vllm v0.22.0 is just release. will switch to that version instead.

github-actions · 2026-05-29T16:23:39Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26648892007
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26648892007

github-actions · 2026-05-29T16:27:15Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26648916984
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26648916984

github-actions · 2026-05-29T19:08:53Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26649066318
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26649066318

wzhao18 · 2026-05-29T19:13:49Z

@claude add PR description succinct and bullet point for posterity

wzhao18 · 2026-05-29T19:14:09Z

@cquil11 @kedarpotdar-nv Ready for merge. Thanks!

Klaud-Cold · 2026-05-29T19:14:13Z

Claude finished @wzhao18's task in 38s —— View job

Updating PR Description

Gather context from current branch state
Draft succinct bullet-point PR description
Update PR description

PR description updated with a concise bullet-point summary covering the image bump, sweep changes, and launcher adjustments.

functionstackx · 2026-05-29T23:09:42Z

/reuse-sweep-run

github-actions · 2026-05-29T23:10:23Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26666826820
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26666826820

Try updating b200 dsv4

0b4ed80

github-project-automation Bot added this to InferenceMAX Board May 15, 2026

add changelog

ba46c77

wzhao18 added the full-sweep-enabled label May 15, 2026

wzhao18 marked this pull request as ready for review May 15, 2026 04:38

wzhao18 requested a review from a team May 15, 2026 04:38

wzhao18 requested review from jgangani and kedarpotdar-nv as code owners May 15, 2026 04:38

claude Bot reviewed May 15, 2026

View reviewed changes

Comment thread .github/configs/nvidia-master.yaml Outdated

claude Bot reviewed May 15, 2026

View reviewed changes

Comment thread benchmarks/single_node/dsv4_fp4_b200_vllm.sh

Comment thread perf-changelog.yaml

wzhao18 closed this May 15, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board May 15, 2026

wzhao18 reopened this May 15, 2026

Set MAX_CUDAGRAPH_CAPTURE_SIZE to 2048 unconditionally

c5d239f

wzhao18 closed this May 15, 2026

wzhao18 reopened this May 26, 2026

wzhao18 closed this May 26, 2026

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread benchmarks/single_node/dsv4_fp4_b200_vllm.sh

Update Docker image for dsv4-fp4-b200-vllm

11d0585

wzhao18 reopened this May 29, 2026

cursor Bot reviewed May 29, 2026

View reviewed changes

Comment thread benchmarks/single_node/dsv4_fp4_b200_vllm.sh

Merge branch 'main' into wzhao/update-dsv4-b200

3db0363

claude Bot reviewed May 29, 2026

View reviewed changes

wzhao18 changed the title ~~[WIP] Updating dsv4 b200 vllm~~ Updating dsv4 b200 vllm May 29, 2026

wzhao18 changed the title ~~Updating dsv4 b200 vllm~~ Updating dsv4 b200 vllm version May 29, 2026

Update vLLM image tag in perf-changelog.yaml

715429d

Updated the vLLM image tag to specify the nightly version.

wzhao18 added 2 commits May 29, 2026 12:22

Update Docker image tag for dsv4-fp4-b200-vllm

0a84685

Update vLLM image tag to v0.22.0

00c24f3

Update conc-end values in nvidia-master.yaml

527d0fb

Merge branch 'main' into wzhao/update-dsv4-b200

5d83d39

functionstackx merged commit 0618646 into main May 29, 2026
4 of 6 checks passed

functionstackx deleted the wzhao/update-dsv4-b200 branch May 29, 2026 23:10

Conversation

wzhao18 commented May 15, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

claude Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #1384

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

wzhao18 commented May 29, 2026

Uh oh!

wzhao18 commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

wzhao18 commented May 29, 2026

Uh oh!

wzhao18 commented May 29, 2026

Uh oh!

Klaud-Cold commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Updating PR Description

Uh oh!

functionstackx commented May 29, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wzhao18 commented May 15, 2026 •

edited by cursor Bot

Loading

claude Bot commented May 15, 2026 •

edited

Loading

Klaud-Cold commented May 29, 2026 •

edited

Loading