Updating dsv4 b200 vllm version#1384
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Claude finished @wzhao18's task in 1m 46s —— View job Review of PR #1384
SummaryOne issue to address: 🟡 Image tag dropped The benchmark script changes (dynamic |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25900553590 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25900553590 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25921236779 |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 11d0585. Configure here.
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🟡
perf-changelog.yaml:2490-2494— The perf-changelog entry at line 2493 saysUpdate vLLM image from v0.20.0 to v0.20.2, but the actual diff at.github/configs/nvidia-master.yaml:1737swaps the image tovllm/vllm-openai:nightly-626fa9bba5663a5cf6a870debf031ee344ddb822— a pinned nightly digest, not a v0.20.2 release. Please update the changelog to record the nightly digest (e.g.,Update vLLM image from v0.20.0-cu130 to nightly-626fa9bba5663a5cf6a870debf031ee344ddb822) so future perf-regression bisects can find the actual image.Extended reasoning...
What the bug is. The new entry in
perf-changelog.yaml(lines 2490–2494) records the change as:"Update vLLM image from v0.20.0 to v0.20.2"
But the actual image change in this PR, at
.github/configs/nvidia-master.yaml:1737, is:- image: vllm/vllm-openai:v0.20.0-cu130 + image: vllm/vllm-openai:nightly-626fa9bba5663a5cf6a870debf031ee344ddb822
The deployed image is a nightly build pinned to a specific upstream commit SHA, not the stable
v0.20.2release the changelog claims. The PR description itself acknowledges this (pinned nightly digest; perf changelog notes v0.20.0 → v0.20.2). An earlier review comment on this line also asked about the-cu130suffix being dropped — that comment is now stale, since the latest commit replaced the v0.20.2 tag with the nightly digest, but the changelog text was never updated to match.Why it matters.
perf-changelog.yamlis the bisection trail for performance regressions. Recording a stable-looking version string when the deployed artifact is actually a nightly pinned to a commit hash defeats traceability: a future engineer investigating a regression that lands at this commit will pullvllm/vllm-openai:v0.20.2and either fail (the tag may not exist or may have moved) or test a completely different binary than what actually ran. The nightly digest is the only thing that uniquely identifies the artifact, so it is exactly the value that needs to be in the changelog.Why existing precedent supports the fix. Other entries in this very file consistently preserve the full tag for cross-variant updates — e.g.,
Update SGLang image from v0.5.9-cu130 to v0.5.11-cu130at lines 2481 and 2487, immediately preceding this entry. The bar for tag fidelity is already established in the file; the vLLM entry is the outlier.Impact. Documentation/traceability only — no runtime behavior change. But the entire purpose of this file is post-hoc perf-regression diagnosis, so an inaccurate entry undermines its only function.
How to fix. Change line 2493 of
perf-changelog.yamlfrom- "Update vLLM image from v0.20.0 to v0.20.2"to something that names the actual deployed artifact, e.g.
- "Update vLLM image from v0.20.0-cu130 to nightly-626fa9bba5663a5cf6a870debf031ee344ddb822"Step-by-step proof.
- Open
.github/configs/nvidia-master.yamlat line 1737 on this branch. It reads:image: vllm/vllm-openai:nightly-626fa9bba5663a5cf6a870debf031ee344ddb822. - The PR diff shows the prior value was
vllm/vllm-openai:v0.20.0-cu130. So the literal artifact transition isv0.20.0-cu130→nightly-626fa9bba5663a5cf6a870debf031ee344ddb822— neither side isv0.20.2. - Open
perf-changelog.yamlat the newly added block (lines 2488–2494). The description string is"Update vLLM image from v0.20.0 to v0.20.2"— both the source tag (drops-cu130) and the destination tag (claimsv0.20.2) are wrong relative to (1). - A future bisect at this commit that reads only the changelog and pulls
vllm/vllm-openai:v0.20.2will not get the same binary that produced these benchmark numbers; only the nightly SHA reproduces the run.
- Open
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26614901433 |
Updated the vLLM image tag to specify the nightly version.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26648723870 |
|
@kedarpotdar-nv ready for review |
|
actually given vllm v0.22.0 is just release. will switch to that version instead. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26648892007 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26648916984 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26649066318 |
|
@claude add PR description succinct and bullet point for posterity |
|
@cquil11 @kedarpotdar-nv Ready for merge. Thanks! |
|
Claude finished @wzhao18's task in 38s —— View job Updating PR Description
PR description updated with a concise bullet-point summary covering the image bump, sweep changes, and launcher adjustments. |
|
/reuse-sweep-run |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26666826820 |


Summary
Bump dsv4-fp4-b200-vllm to vLLM v0.22.0 with sweep and launcher adjustments.
Changes
vllm/vllm-openai:v0.21.0→v0.22.0conc 256–4096range into256–1024+ dedicated4096pointdsv4_fp4_b200_vllm.sh):--gpu-memory-utilization 0.85on DP-attn path (use vLLM default)BENCHMARK_MAX_MODEL_LENto 4096 for ISL/OSL 1024/1024--max-cudagraph-capture-sizeintoMAX_CUDAGRAPH_CAPTURE_SIZEvariable (value unchanged: 2048)Note
Medium Risk
Changes benchmark serving flags and sweep coverage for a large MoE model on B200; mis-tuned concurrency or memory settings could skew perf results without affecting production apps.
Overview
Bumps dsv4-fp4-b200-vllm to vLLM v0.22.0 and aligns the B200 single-node launcher and sweep with that stack.
The ISL 1024 DP-attention search space no longer uses one wide conc 256–4096 band; it is split into 256–1024 plus a separate 4096 point.
dsv4_fp4_b200_vllm.shdrops--gpu-memory-utilization 0.85on the DP-attn path, removes the 1024/1024 cap that forcedBENCHMARK_MAX_MODEL_LENto 4096, and wires--max-cudagraph-capture-sizethroughMAX_CUDAGRAPH_CAPTURE_SIZE(still 2048).perf-changelog.yamldocuments the image bump.Reviewed by Cursor Bugbot for commit 5d83d39. Bugbot is set up for automated code reviews on this repo. Configure here.