Revert AVX512-optimized is_ascii (#22)#23
Merged
Conversation
There was a problem hiding this comment.
Yore Benchmarks
Details
| Benchmark suite | Current: a94753a | Previous: 65ad00d | Ratio |
|---|---|---|---|
decode_checked/mostly_ascii/8 |
30 ns/iter (± 10) |
34 ns/iter (± 12) |
0.88 |
decode_checked/mostly_ascii/64 |
62 ns/iter (± 7) |
67 ns/iter (± 8) |
0.93 |
decode_checked/mostly_ascii/256 |
142 ns/iter (± 13) |
147 ns/iter (± 42) |
0.97 |
decode_checked/mostly_ascii/512 |
243 ns/iter (± 20) |
240 ns/iter (± 20) |
1.01 |
decode_checked/mostly_ascii/1024 |
446 ns/iter (± 29) |
419 ns/iter (± 29) |
1.06 |
decode_checked/mostly_ascii/2048 |
887 ns/iter (± 37) |
911 ns/iter (± 123) |
0.97 |
decode_checked/mostly_ascii/4096 |
1971 ns/iter (± 110) |
1998 ns/iter (± 71) |
0.99 |
decode_checked/ascii/8 |
10 ns/iter (± 0) |
9 ns/iter (± 0) |
1.11 |
decode_checked/ascii/64 |
9 ns/iter (± 0) |
8 ns/iter (± 0) |
1.13 |
decode_checked/ascii/256 |
11 ns/iter (± 0) |
10 ns/iter (± 0) |
1.10 |
decode_checked/ascii/512 |
14 ns/iter (± 0) |
16 ns/iter (± 0) |
0.88 |
decode_checked/ascii/1024 |
27 ns/iter (± 0) |
26 ns/iter (± 0) |
1.04 |
decode_checked/ascii/2048 |
48 ns/iter (± 0) |
52 ns/iter (± 0) |
0.92 |
decode_checked/ascii/4096 |
93 ns/iter (± 0) |
92 ns/iter (± 1) |
1.01 |
decode_checked/extended/8 |
34 ns/iter (± 1) |
37 ns/iter (± 3) |
0.92 |
decode_checked/extended/64 |
85 ns/iter (± 3) |
87 ns/iter (± 2) |
0.98 |
decode_checked/extended/256 |
254 ns/iter (± 21) |
266 ns/iter (± 11) |
0.95 |
decode_checked/extended/512 |
483 ns/iter (± 9) |
493 ns/iter (± 8) |
0.98 |
decode_checked/extended/1024 |
909 ns/iter (± 6) |
910 ns/iter (± 10) |
1.00 |
decode_checked/extended/2048 |
1768 ns/iter (± 15) |
1765 ns/iter (± 42) |
1.00 |
decode_checked/extended/4096 |
3489 ns/iter (± 34) |
3481 ns/iter (± 45) |
1.00 |
decode_lossy/all_bad/8 |
39 ns/iter (± 2) |
48 ns/iter (± 5) |
0.81 |
decode_lossy/all_bad/64 |
62 ns/iter (± 1) |
64 ns/iter (± 1) |
0.97 |
decode_lossy/all_bad/256 |
174 ns/iter (± 2) |
178 ns/iter (± 3) |
0.98 |
decode_lossy/all_bad/512 |
329 ns/iter (± 10) |
333 ns/iter (± 10) |
0.99 |
decode_lossy/all_bad/1024 |
626 ns/iter (± 34) |
646 ns/iter (± 27) |
0.97 |
decode_lossy/all_bad/2048 |
1223 ns/iter (± 16) |
1272 ns/iter (± 18) |
0.96 |
decode_lossy/all_bad/4096 |
2440 ns/iter (± 24) |
2491 ns/iter (± 41) |
0.98 |
decode_lossy/mostly_ascii/8 |
39 ns/iter (± 14) |
48 ns/iter (± 19) |
0.81 |
decode_lossy/mostly_ascii/64 |
64 ns/iter (± 6) |
66 ns/iter (± 7) |
0.97 |
decode_lossy/mostly_ascii/256 |
143 ns/iter (± 13) |
154 ns/iter (± 17) |
0.93 |
decode_lossy/mostly_ascii/512 |
231 ns/iter (± 20) |
246 ns/iter (± 20) |
0.94 |
decode_lossy/mostly_ascii/1024 |
408 ns/iter (± 24) |
431 ns/iter (± 30) |
0.95 |
decode_lossy/mostly_ascii/2048 |
777 ns/iter (± 35) |
810 ns/iter (± 46) |
0.96 |
decode_lossy/mostly_ascii/4096 |
1500 ns/iter (± 49) |
1571 ns/iter (± 49) |
0.95 |
encode_checked/mostly_ascii/8 |
41 ns/iter (± 16) |
43 ns/iter (± 17) |
0.95 |
encode_checked/mostly_ascii/64 |
128 ns/iter (± 5) |
139 ns/iter (± 2) |
0.92 |
encode_checked/mostly_ascii/256 |
442 ns/iter (± 9) |
481 ns/iter (± 41) |
0.92 |
encode_checked/mostly_ascii/512 |
872 ns/iter (± 16) |
939 ns/iter (± 50) |
0.93 |
encode_checked/mostly_ascii/1024 |
1687 ns/iter (± 34) |
2005 ns/iter (± 109) |
0.84 |
encode_checked/mostly_ascii/2048 |
3438 ns/iter (± 54) |
4090 ns/iter (± 116) |
0.84 |
encode_checked/mostly_ascii/4096 |
6672 ns/iter (± 166) |
7311 ns/iter (± 309) |
0.91 |
encode_checked/ascii/8 |
10 ns/iter (± 0) |
9 ns/iter (± 0) |
1.11 |
encode_checked/ascii/64 |
9 ns/iter (± 0) |
8 ns/iter (± 0) |
1.13 |
encode_checked/ascii/256 |
12 ns/iter (± 0) |
10 ns/iter (± 1) |
1.20 |
encode_checked/ascii/512 |
16 ns/iter (± 0) |
14 ns/iter (± 0) |
1.14 |
encode_checked/ascii/1024 |
25 ns/iter (± 0) |
24 ns/iter (± 0) |
1.04 |
encode_checked/ascii/2048 |
50 ns/iter (± 0) |
49 ns/iter (± 0) |
1.02 |
encode_checked/ascii/4096 |
93 ns/iter (± 4) |
89 ns/iter (± 0) |
1.04 |
encode_checked/extended/8 |
54 ns/iter (± 0) |
53 ns/iter (± 0) |
1.02 |
encode_checked/extended/64 |
190 ns/iter (± 5) |
202 ns/iter (± 1) |
0.94 |
encode_checked/extended/256 |
789 ns/iter (± 9) |
705 ns/iter (± 7) |
1.12 |
encode_checked/extended/512 |
1366 ns/iter (± 24) |
1375 ns/iter (± 19) |
0.99 |
encode_checked/extended/1024 |
2710 ns/iter (± 26) |
2726 ns/iter (± 20) |
0.99 |
encode_checked/extended/2048 |
5389 ns/iter (± 334) |
5400 ns/iter (± 49) |
1.00 |
encode_checked/extended/4096 |
10698 ns/iter (± 559) |
10749 ns/iter (± 34) |
1.00 |
encode_lossy/all_bad/8 |
49 ns/iter (± 1) |
54 ns/iter (± 12) |
0.91 |
encode_lossy/all_bad/64 |
215 ns/iter (± 8) |
247 ns/iter (± 1) |
0.87 |
encode_lossy/all_bad/256 |
801 ns/iter (± 7) |
877 ns/iter (± 141) |
0.91 |
encode_lossy/all_bad/512 |
1572 ns/iter (± 12) |
1717 ns/iter (± 24) |
0.92 |
encode_lossy/all_bad/1024 |
3119 ns/iter (± 218) |
3412 ns/iter (± 64) |
0.91 |
encode_lossy/all_bad/2048 |
6186 ns/iter (± 39) |
6764 ns/iter (± 131) |
0.91 |
encode_lossy/all_bad/4096 |
12311 ns/iter (± 113) |
13475 ns/iter (± 61) |
0.91 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Yore Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.10.
| Benchmark suite | Current: a94753a | Previous: 65ad00d | Ratio |
|---|---|---|---|
decode_checked/ascii/8 |
10 ns/iter (± 0) |
9 ns/iter (± 0) |
1.11 |
decode_checked/ascii/64 |
9 ns/iter (± 0) |
8 ns/iter (± 0) |
1.13 |
encode_checked/ascii/8 |
10 ns/iter (± 0) |
9 ns/iter (± 0) |
1.11 |
encode_checked/ascii/64 |
9 ns/iter (± 0) |
8 ns/iter (± 0) |
1.13 |
encode_checked/ascii/256 |
12 ns/iter (± 0) |
10 ns/iter (± 1) |
1.20 |
encode_checked/ascii/512 |
16 ns/iter (± 0) |
14 ns/iter (± 0) |
1.14 |
encode_checked/extended/256 |
789 ns/iter (± 9) |
705 ns/iter (± 7) |
1.12 |
This comment was automatically generated by workflow using github-action-benchmark.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reverts the custom AVX512
is_ascii()from #22 (commit 65ad00d).That implementation was a workaround for poor codegen from stdlib's
is_ascii()under-C target-cpu=nativeon AVX512 CPUs (~30x slowdown). The optimization has since been contributed to and merged into Rust std, so the workaround is no longer needed — falling back to stdlibis_ascii()now gets the fast path for free.Changes
src/simd.rsis_ascii()calls indecoder/complete.rs,decoder/incomplete.rs, andencoder.rssimdmodule fromlib.rsThe
nix.ymlfail-thresholdbump (120% → 130%) from #22 is kept.Build verified locally with
cargo build.