Skip to content

Updating dsv4 b200 vllm version#1384

Merged
functionstackx merged 10 commits into
mainfrom
wzhao/update-dsv4-b200
May 29, 2026
Merged

Updating dsv4 b200 vllm version#1384
functionstackx merged 10 commits into
mainfrom
wzhao/update-dsv4-b200

Conversation

@wzhao18
Copy link
Copy Markdown
Collaborator

@wzhao18 wzhao18 commented May 15, 2026

Summary

Bump dsv4-fp4-b200-vllm to vLLM v0.22.0 with sweep and launcher adjustments.

Changes

  • Image: vllm/vllm-openai:v0.21.0v0.22.0
  • ISL 1024 DP-attn sweep: split single conc 256–4096 range into 256–1024 + dedicated 4096 point
  • Launcher (dsv4_fp4_b200_vllm.sh):
    • Remove --gpu-memory-utilization 0.85 on DP-attn path (use vLLM default)
    • Remove special-case capping BENCHMARK_MAX_MODEL_LEN to 4096 for ISL/OSL 1024/1024
    • Extract --max-cudagraph-capture-size into MAX_CUDAGRAPH_CAPTURE_SIZE variable (value unchanged: 2048)
  • Perf changelog: record image bump to v0.22.0

Note

Medium Risk
Changes benchmark serving flags and sweep coverage for a large MoE model on B200; mis-tuned concurrency or memory settings could skew perf results without affecting production apps.

Overview
Bumps dsv4-fp4-b200-vllm to vLLM v0.22.0 and aligns the B200 single-node launcher and sweep with that stack.

The ISL 1024 DP-attention search space no longer uses one wide conc 256–4096 band; it is split into 256–1024 plus a separate 4096 point. dsv4_fp4_b200_vllm.sh drops --gpu-memory-utilization 0.85 on the DP-attn path, removes the 1024/1024 cap that forced BENCHMARK_MAX_MODEL_LEN to 4096, and wires --max-cudagraph-capture-size through MAX_CUDAGRAPH_CAPTURE_SIZE (still 2048). perf-changelog.yaml documents the image bump.

Reviewed by Cursor Bugbot for commit 5d83d39. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@wzhao18 wzhao18 marked this pull request as ready for review May 15, 2026 04:38
@wzhao18 wzhao18 requested a review from a team May 15, 2026 04:38
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 15, 2026

Claude finished @wzhao18's task in 1m 46s —— View job


Review of PR #1384

  • Gather context and read changed files
  • Validate master config changes
  • Validate perf-changelog.yaml
  • Validate benchmark script
  • Post review summary

Summary

One issue to address:

🟡 Image tag dropped -cu130 suffix (comment): The image changed from v0.20.0-cu130v0.20.2 (no -cu130). Since this runs on B200 GPUs requiring CUDA 13.0, and the sibling dsv4-fp4-b200-vllm-mtp config still uses v0.20.0-cu130, please confirm this is intentional. If a v0.20.2-cu130 tag exists, it should likely be used instead.

The benchmark script changes (dynamic MAX_CUDAGRAPH_CAPTURE_SIZE based on ISL, removal of --gpu-memory-utilization 0.85 for DP-attention) and the perf-changelog entry all look correct. Note that GMU_ARGS is now always empty but still referenced in the vllm serve command — not a bug, but could be cleaned up.

Comment thread .github/configs/nvidia-master.yaml Outdated
Comment thread benchmarks/single_node/dsv4_fp4_b200_vllm.sh
Comment thread perf-changelog.yaml
@github-actions
Copy link
Copy Markdown
Contributor

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@wzhao18 wzhao18 closed this May 15, 2026
@wzhao18 wzhao18 reopened this May 26, 2026
@wzhao18 wzhao18 closed this May 26, 2026
Comment thread benchmarks/single_node/dsv4_fp4_b200_vllm.sh
@wzhao18 wzhao18 reopened this May 29, 2026
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 11d0585. Configure here.

Comment thread benchmarks/single_node/dsv4_fp4_b200_vllm.sh
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 perf-changelog.yaml:2490-2494 — The perf-changelog entry at line 2493 says Update vLLM image from v0.20.0 to v0.20.2, but the actual diff at .github/configs/nvidia-master.yaml:1737 swaps the image to vllm/vllm-openai:nightly-626fa9bba5663a5cf6a870debf031ee344ddb822 — a pinned nightly digest, not a v0.20.2 release. Please update the changelog to record the nightly digest (e.g., Update vLLM image from v0.20.0-cu130 to nightly-626fa9bba5663a5cf6a870debf031ee344ddb822) so future perf-regression bisects can find the actual image.

    Extended reasoning...

    What the bug is. The new entry in perf-changelog.yaml (lines 2490–2494) records the change as:

    "Update vLLM image from v0.20.0 to v0.20.2"

    But the actual image change in this PR, at .github/configs/nvidia-master.yaml:1737, is:

    -  image: vllm/vllm-openai:v0.20.0-cu130
    +  image: vllm/vllm-openai:nightly-626fa9bba5663a5cf6a870debf031ee344ddb822

    The deployed image is a nightly build pinned to a specific upstream commit SHA, not the stable v0.20.2 release the changelog claims. The PR description itself acknowledges this (pinned nightly digest; perf changelog notes v0.20.0 → v0.20.2). An earlier review comment on this line also asked about the -cu130 suffix being dropped — that comment is now stale, since the latest commit replaced the v0.20.2 tag with the nightly digest, but the changelog text was never updated to match.

    Why it matters. perf-changelog.yaml is the bisection trail for performance regressions. Recording a stable-looking version string when the deployed artifact is actually a nightly pinned to a commit hash defeats traceability: a future engineer investigating a regression that lands at this commit will pull vllm/vllm-openai:v0.20.2 and either fail (the tag may not exist or may have moved) or test a completely different binary than what actually ran. The nightly digest is the only thing that uniquely identifies the artifact, so it is exactly the value that needs to be in the changelog.

    Why existing precedent supports the fix. Other entries in this very file consistently preserve the full tag for cross-variant updates — e.g., Update SGLang image from v0.5.9-cu130 to v0.5.11-cu130 at lines 2481 and 2487, immediately preceding this entry. The bar for tag fidelity is already established in the file; the vLLM entry is the outlier.

    Impact. Documentation/traceability only — no runtime behavior change. But the entire purpose of this file is post-hoc perf-regression diagnosis, so an inaccurate entry undermines its only function.

    How to fix. Change line 2493 of perf-changelog.yaml from

        - "Update vLLM image from v0.20.0 to v0.20.2"

    to something that names the actual deployed artifact, e.g.

        - "Update vLLM image from v0.20.0-cu130 to nightly-626fa9bba5663a5cf6a870debf031ee344ddb822"

    Step-by-step proof.

    1. Open .github/configs/nvidia-master.yaml at line 1737 on this branch. It reads: image: vllm/vllm-openai:nightly-626fa9bba5663a5cf6a870debf031ee344ddb822.
    2. The PR diff shows the prior value was vllm/vllm-openai:v0.20.0-cu130. So the literal artifact transition is v0.20.0-cu130nightly-626fa9bba5663a5cf6a870debf031ee344ddb822 — neither side is v0.20.2.
    3. Open perf-changelog.yaml at the newly added block (lines 2488–2494). The description string is "Update vLLM image from v0.20.0 to v0.20.2" — both the source tag (drops -cu130) and the destination tag (claims v0.20.2) are wrong relative to (1).
    4. A future bisect at this commit that reads only the changelog and pulls vllm/vllm-openai:v0.20.2 will not get the same binary that produced these benchmark numbers; only the nightly SHA reproduces the run.

@github-actions
Copy link
Copy Markdown
Contributor

@wzhao18 wzhao18 changed the title [WIP] Updating dsv4 b200 vllm Updating dsv4 b200 vllm May 29, 2026
@wzhao18 wzhao18 changed the title Updating dsv4 b200 vllm Updating dsv4 b200 vllm version May 29, 2026
Updated the vLLM image tag to specify the nightly version.
@github-actions
Copy link
Copy Markdown
Contributor

@wzhao18
Copy link
Copy Markdown
Collaborator Author

wzhao18 commented May 29, 2026

Sweep successful
Screenshot 2026-05-29 at 12 19 43 PM

@kedarpotdar-nv ready for review

@wzhao18
Copy link
Copy Markdown
Collaborator Author

wzhao18 commented May 29, 2026

actually given vllm v0.22.0 is just release. will switch to that version instead.

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@wzhao18
Copy link
Copy Markdown
Collaborator Author

wzhao18 commented May 29, 2026

@claude add PR description succinct and bullet point for posterity

@wzhao18
Copy link
Copy Markdown
Collaborator Author

wzhao18 commented May 29, 2026

@cquil11 @kedarpotdar-nv Ready for merge. Thanks!

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented May 29, 2026

Claude finished @wzhao18's task in 38s —— View job


Updating PR Description

  • Gather context from current branch state
  • Draft succinct bullet-point PR description
  • Update PR description

PR description updated with a concise bullet-point summary covering the image bump, sweep changes, and launcher adjustments.

@functionstackx
Copy link
Copy Markdown
Collaborator

/reuse-sweep-run

@functionstackx functionstackx merged commit 0618646 into main May 29, 2026
4 of 6 checks passed
@functionstackx functionstackx deleted the wzhao/update-dsv4-b200 branch May 29, 2026 23:10
@github-actions
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

3 participants