Skip to content

Revert AVX512-optimized is_ascii (#22)#23

Merged
bonega merged 1 commit into
masterfrom
revert-avx512-is-ascii
Jun 9, 2026
Merged

Revert AVX512-optimized is_ascii (#22)#23
bonega merged 1 commit into
masterfrom
revert-avx512-is-ascii

Conversation

@bonega

@bonega bonega commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Reverts the custom AVX512 is_ascii() from #22 (commit 65ad00d).

That implementation was a workaround for poor codegen from stdlib's is_ascii() under -C target-cpu=native on AVX512 CPUs (~30x slowdown). The optimization has since been contributed to and merged into Rust std, so the workaround is no longer needed — falling back to stdlib is_ascii() now gets the fast path for free.

Changes

  • Remove src/simd.rs
  • Restore stdlib is_ascii() calls in decoder/complete.rs, decoder/incomplete.rs, and encoder.rs
  • Drop the simd module from lib.rs

The nix.yml fail-threshold bump (120% → 130%) from #22 is kept.

Build verified locally with cargo build.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yore Benchmarks

Details
Benchmark suite Current: a94753a Previous: 65ad00d Ratio
decode_checked/mostly_ascii/8 30 ns/iter (± 10) 34 ns/iter (± 12) 0.88
decode_checked/mostly_ascii/64 62 ns/iter (± 7) 67 ns/iter (± 8) 0.93
decode_checked/mostly_ascii/256 142 ns/iter (± 13) 147 ns/iter (± 42) 0.97
decode_checked/mostly_ascii/512 243 ns/iter (± 20) 240 ns/iter (± 20) 1.01
decode_checked/mostly_ascii/1024 446 ns/iter (± 29) 419 ns/iter (± 29) 1.06
decode_checked/mostly_ascii/2048 887 ns/iter (± 37) 911 ns/iter (± 123) 0.97
decode_checked/mostly_ascii/4096 1971 ns/iter (± 110) 1998 ns/iter (± 71) 0.99
decode_checked/ascii/8 10 ns/iter (± 0) 9 ns/iter (± 0) 1.11
decode_checked/ascii/64 9 ns/iter (± 0) 8 ns/iter (± 0) 1.13
decode_checked/ascii/256 11 ns/iter (± 0) 10 ns/iter (± 0) 1.10
decode_checked/ascii/512 14 ns/iter (± 0) 16 ns/iter (± 0) 0.88
decode_checked/ascii/1024 27 ns/iter (± 0) 26 ns/iter (± 0) 1.04
decode_checked/ascii/2048 48 ns/iter (± 0) 52 ns/iter (± 0) 0.92
decode_checked/ascii/4096 93 ns/iter (± 0) 92 ns/iter (± 1) 1.01
decode_checked/extended/8 34 ns/iter (± 1) 37 ns/iter (± 3) 0.92
decode_checked/extended/64 85 ns/iter (± 3) 87 ns/iter (± 2) 0.98
decode_checked/extended/256 254 ns/iter (± 21) 266 ns/iter (± 11) 0.95
decode_checked/extended/512 483 ns/iter (± 9) 493 ns/iter (± 8) 0.98
decode_checked/extended/1024 909 ns/iter (± 6) 910 ns/iter (± 10) 1.00
decode_checked/extended/2048 1768 ns/iter (± 15) 1765 ns/iter (± 42) 1.00
decode_checked/extended/4096 3489 ns/iter (± 34) 3481 ns/iter (± 45) 1.00
decode_lossy/all_bad/8 39 ns/iter (± 2) 48 ns/iter (± 5) 0.81
decode_lossy/all_bad/64 62 ns/iter (± 1) 64 ns/iter (± 1) 0.97
decode_lossy/all_bad/256 174 ns/iter (± 2) 178 ns/iter (± 3) 0.98
decode_lossy/all_bad/512 329 ns/iter (± 10) 333 ns/iter (± 10) 0.99
decode_lossy/all_bad/1024 626 ns/iter (± 34) 646 ns/iter (± 27) 0.97
decode_lossy/all_bad/2048 1223 ns/iter (± 16) 1272 ns/iter (± 18) 0.96
decode_lossy/all_bad/4096 2440 ns/iter (± 24) 2491 ns/iter (± 41) 0.98
decode_lossy/mostly_ascii/8 39 ns/iter (± 14) 48 ns/iter (± 19) 0.81
decode_lossy/mostly_ascii/64 64 ns/iter (± 6) 66 ns/iter (± 7) 0.97
decode_lossy/mostly_ascii/256 143 ns/iter (± 13) 154 ns/iter (± 17) 0.93
decode_lossy/mostly_ascii/512 231 ns/iter (± 20) 246 ns/iter (± 20) 0.94
decode_lossy/mostly_ascii/1024 408 ns/iter (± 24) 431 ns/iter (± 30) 0.95
decode_lossy/mostly_ascii/2048 777 ns/iter (± 35) 810 ns/iter (± 46) 0.96
decode_lossy/mostly_ascii/4096 1500 ns/iter (± 49) 1571 ns/iter (± 49) 0.95
encode_checked/mostly_ascii/8 41 ns/iter (± 16) 43 ns/iter (± 17) 0.95
encode_checked/mostly_ascii/64 128 ns/iter (± 5) 139 ns/iter (± 2) 0.92
encode_checked/mostly_ascii/256 442 ns/iter (± 9) 481 ns/iter (± 41) 0.92
encode_checked/mostly_ascii/512 872 ns/iter (± 16) 939 ns/iter (± 50) 0.93
encode_checked/mostly_ascii/1024 1687 ns/iter (± 34) 2005 ns/iter (± 109) 0.84
encode_checked/mostly_ascii/2048 3438 ns/iter (± 54) 4090 ns/iter (± 116) 0.84
encode_checked/mostly_ascii/4096 6672 ns/iter (± 166) 7311 ns/iter (± 309) 0.91
encode_checked/ascii/8 10 ns/iter (± 0) 9 ns/iter (± 0) 1.11
encode_checked/ascii/64 9 ns/iter (± 0) 8 ns/iter (± 0) 1.13
encode_checked/ascii/256 12 ns/iter (± 0) 10 ns/iter (± 1) 1.20
encode_checked/ascii/512 16 ns/iter (± 0) 14 ns/iter (± 0) 1.14
encode_checked/ascii/1024 25 ns/iter (± 0) 24 ns/iter (± 0) 1.04
encode_checked/ascii/2048 50 ns/iter (± 0) 49 ns/iter (± 0) 1.02
encode_checked/ascii/4096 93 ns/iter (± 4) 89 ns/iter (± 0) 1.04
encode_checked/extended/8 54 ns/iter (± 0) 53 ns/iter (± 0) 1.02
encode_checked/extended/64 190 ns/iter (± 5) 202 ns/iter (± 1) 0.94
encode_checked/extended/256 789 ns/iter (± 9) 705 ns/iter (± 7) 1.12
encode_checked/extended/512 1366 ns/iter (± 24) 1375 ns/iter (± 19) 0.99
encode_checked/extended/1024 2710 ns/iter (± 26) 2726 ns/iter (± 20) 0.99
encode_checked/extended/2048 5389 ns/iter (± 334) 5400 ns/iter (± 49) 1.00
encode_checked/extended/4096 10698 ns/iter (± 559) 10749 ns/iter (± 34) 1.00
encode_lossy/all_bad/8 49 ns/iter (± 1) 54 ns/iter (± 12) 0.91
encode_lossy/all_bad/64 215 ns/iter (± 8) 247 ns/iter (± 1) 0.87
encode_lossy/all_bad/256 801 ns/iter (± 7) 877 ns/iter (± 141) 0.91
encode_lossy/all_bad/512 1572 ns/iter (± 12) 1717 ns/iter (± 24) 0.92
encode_lossy/all_bad/1024 3119 ns/iter (± 218) 3412 ns/iter (± 64) 0.91
encode_lossy/all_bad/2048 6186 ns/iter (± 39) 6764 ns/iter (± 131) 0.91
encode_lossy/all_bad/4096 12311 ns/iter (± 113) 13475 ns/iter (± 61) 0.91

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Yore Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.10.

Benchmark suite Current: a94753a Previous: 65ad00d Ratio
decode_checked/ascii/8 10 ns/iter (± 0) 9 ns/iter (± 0) 1.11
decode_checked/ascii/64 9 ns/iter (± 0) 8 ns/iter (± 0) 1.13
encode_checked/ascii/8 10 ns/iter (± 0) 9 ns/iter (± 0) 1.11
encode_checked/ascii/64 9 ns/iter (± 0) 8 ns/iter (± 0) 1.13
encode_checked/ascii/256 12 ns/iter (± 0) 10 ns/iter (± 1) 1.20
encode_checked/ascii/512 16 ns/iter (± 0) 14 ns/iter (± 0) 1.14
encode_checked/extended/256 789 ns/iter (± 9) 705 ns/iter (± 7) 1.12

This comment was automatically generated by workflow using github-action-benchmark.

@bonega bonega enabled auto-merge (squash) June 9, 2026 19:33
@bonega bonega merged commit c6a2ee5 into master Jun 9, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant