PR #175219 added tryCompressVPMovPattern, which replaces EVEX vpmovb2m+kmovd with VEX vpmovmskb, but VEX can't access ymm16-31. No register class check is performed, so if the original used an extended register, bad things happen.
This is causes miscompilations (incorrect output) in a complex test in a Halide PR (halide/Halide#8925). It only causes failures in complex tests because simple ones are unlikely to ever use ymm16-31. Claude provided me with the MIR repro below, which is hopefully useful. I don't speak MIR myself so I can't vouch for it. I can provide the much longer piece of breaking .ll if necessary.
# RUN: llc -mtriple=x86_64-- -mcpu=skylake-avx512 -run-pass=x86-compress-evex %s -o /dev/null -verify-machineinstrs
--- |
define void @test(ptr %p) { ret void }
...
---
name: test
tracksRegLiveness: true
body: |
bb.0:
liveins: $ymm23, $rdi
$k3 = VPMOVB2MZ256kr killed $ymm23
$edx = KMOVDrk killed $k3
MOV32mr $rdi, 1, $noreg, 0, $noreg, killed $edx
RET 0
...
PR #175219 added tryCompressVPMovPattern, which replaces EVEX vpmovb2m+kmovd with VEX vpmovmskb, but VEX can't access ymm16-31. No register class check is performed, so if the original used an extended register, bad things happen.
This is causes miscompilations (incorrect output) in a complex test in a Halide PR (halide/Halide#8925). It only causes failures in complex tests because simple ones are unlikely to ever use ymm16-31. Claude provided me with the MIR repro below, which is hopefully useful. I don't speak MIR myself so I can't vouch for it. I can provide the much longer piece of breaking .ll if necessary.