perftest: fix reducescatter latency size reporting by jasha64 · Pull Request #93 · NVIDIA/nvshmem

jasha64 · 2026-06-15T14:34:37Z

This fixes incorrect device reducescatter perftest results caused by RUN_ITERS_OP declaring a loop-local num_elems after the kernel argument lists had already captured the outer variable. The benchmark was printing increasing message sizes while repeatedly timing the initial element count, which produced nearly constant latencies and inflated bandwidth values. Reusing the existing num_elems keeps the kernel arguments, size table, and reported metrics in sync.

Previously, running this reducescatter_latency perftest will report extraordinarily large bandwidth. Fixed in this pull request.

#device_reducescatter
size(B)     count     type      redop     scope     latency(us)       algbw(GB/s)   busbw(GB/s) 
8           1         int64     sum       b         2.140800          0.004         0.000       
16          2         int64     sum       b         1.910400          0.008         0.000       
32          4         int64     sum       b         1.923200          0.017         0.000       
64          8         int64     sum       b         1.955200          0.033         0.000       
128         16        int64     sum       b         1.907200          0.067         0.000       
256         32        int64     sum       b         1.907200          0.134         0.000       
512         64        int64     sum       b         1.900800          0.269         0.000       
1024        128       int64     sum       b         1.897600          0.540         0.000       
2048        256       int64     sum       b         1.916800          1.068         0.000       
4096        512       int64     sum       b         1.910400          2.144         0.000       
8192        1024      int64     sum       b         1.913600          4.281         0.000       
16384       2048      int64     sum       b         1.920000          8.533         0.000       
32768       4096      int64     sum       b         1.913600          17.124        0.000       
65536       8192      int64     sum       b         1.913600          34.247        0.000       
131072      16384     int64     sum       b         1.904000          68.840        0.000       
262144      32768     int64     sum       b         1.980800          132.342       0.000       
524288      65536     int64     sum       b         1.916800          273.523       0.000       
1048576     131072    int64     sum       b         2.016000          520.127       0.000       
2097152     262144    int64     sum       b         1.990400          1053.633      0.000       
4194304     524288    int64     sum       b         1.948800          2152.250      0.000       
8388608     1048576   int64     sum       b         1.926400          4354.552      0.000       
16777216    2097152   int64     sum       b         1.948800          8608.999      0.000       
33554432    4194304   int64     sum       b         2.038400          16461.161     0.000       
67108864    8388608   int64     sum       b         1.900800          35305.590     0.000

Avoid shadowing num_elems in for loop so the timed kernel launches use the current message size instead of repeatedly measuring the initial element count. Signed-off-by: jasha64 <yijunma@student.ethz.ch>

perftest: fix reducescatter latency size reporting

4cb1ebd

Avoid shadowing num_elems in for loop so the timed kernel launches use the current message size instead of repeatedly measuring the initial element count. Signed-off-by: jasha64 <yijunma@student.ethz.ch>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perftest: fix reducescatter latency size reporting#93

perftest: fix reducescatter latency size reporting#93
jasha64 wants to merge 1 commit into
NVIDIA:develfrom
jasha64:devel

jasha64 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jasha64 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant