For some inputs with >= 4 limbs, the result of Jacobi symbol computation may be incorrect. This seems to be an inherent issue with using the optimized binary GCD for this purpose. When the top limbs of the compacted representation match but the full-precision inputs do not, the numerator may become negative mid-batch computation. The terms are made positive after the batch completes, but there is no good way to correct the Jacobi symbol sign at this point.
There aren't a lot of constant-time alternatives here besides just using classic binary GCD (slow for large inputs). At the moment the variable-time implementation has the same issue, but that's easier to switch to another method, GMP uses Lehmer's algorithm I believe.
#[test]
fn jacobi_mismatch() {
use crate::{Odd, U64, U128, U256, Uint};
fn check<const LIMBS: usize>(a: Uint<LIMBS>, b: Odd<Uint<LIMBS>>) {
let (gcd_1, jacobi_neg_1) = b.classic_bingcd_(&a);
let (gcd_2, jacobi_neg_2) = b
.optimized_bingcd_::<{ U64::BITS }, { U64::LIMBS }, { U128::LIMBS }>(
&a,
U64::BITS - 2,
);
assert_eq!(gcd_1, gcd_2);
assert_eq!(jacobi_neg_1, jacobi_neg_2);
}
check(
U256::from_be_hex("0000000000000002108DEAFCB180F023912BEED0186CEEAD593A8507B7DA4E9B"),
Odd::<U256>::from_be_hex(
"0000000000000002108DEAFCB180F023912BEED0186CEEED593A8507B7DA4E9B",
),
);
}
For some inputs with >= 4 limbs, the result of Jacobi symbol computation may be incorrect. This seems to be an inherent issue with using the optimized binary GCD for this purpose. When the top limbs of the compacted representation match but the full-precision inputs do not, the numerator may become negative mid-batch computation. The terms are made positive after the batch completes, but there is no good way to correct the Jacobi symbol sign at this point.
There aren't a lot of constant-time alternatives here besides just using classic binary GCD (slow for large inputs). At the moment the variable-time implementation has the same issue, but that's easier to switch to another method, GMP uses Lehmer's algorithm I believe.