Skip to content

[API] Support np.where via ILKernelGenerator #604

@Nucs

Description

@Nucs

Overview

Add SIMD optimization for np.where(condition, x, y) using ILKernelGenerator to improve performance for contiguous arrays.

Problem

The current np.where(condition, x, y) implementation uses NDIterator-based sequential access for all cases. For large contiguous arrays, this is significantly slower than SIMD-optimized code. NumPy uses vectorized operations internally.

Proposal

Add a SIMD fast path using Vector256.ConditionalSelect while keeping the iterator fallback for non-contiguous arrays.

Implementation

  • Create ILKernelGenerator.Where.cs with SIMD helpers
  • Add bool mask expansion (1-byte bools → 4/8-byte vector masks)
  • Support all 11 SIMD-capable dtypes via SIMD path
  • Support Decimal via iterator fallback (16 bytes, not vectorizable)
  • Modify np.where.cs to dispatch to SIMD path when eligible
  • Add comprehensive tests

Dtype Support

All 12 NumSharp types are supported:

Type Path Reason
Boolean, Byte, Int16, UInt16, Int32, UInt32, Int64, UInt64, Char, Single, Double SIMD 1-8 byte types, vectorizable
Decimal Iterator 16 bytes, not vectorizable

SIMD Eligibility Criteria

bool canSimd = ILKernelGenerator.Enabled &&
               outType != NPTypeCode.Decimal &&
               cond.typecode == NPTypeCode.Boolean &&
               cond.Shape.IsContiguous &&
               xArr.Shape.IsContiguous &&
               yArr.Shape.IsContiguous;

Bool Mask Expansion Challenge

The condition array is bool[] (1 byte per element), but x/y can be any dtype (1-8 bytes):

Type Element Size V256 Elements Bools to Load
byte 1 32 32
int/float 4 8 8
long/double 8 4 4

Solution: Load N bools, expand to N-element mask, then ConditionalSelect.

Evidence

Implemented in commit 3162df0c. All 83 tests pass:

  • 36 existing np.where tests
  • 21 battle tests
  • 26 new SIMD correctness tests

Scope / Non-goals

  • Broadcast arrays: Use iterator path (stride=0 not contiguous)
  • Non-bool conditions: Use iterator path (need truthiness conversion)

Related Files

  • src/NumSharp.Core/Backends/Kernels/ILKernelGenerator.Where.cs
  • src/NumSharp.Core/APIs/np.where.cs
  • test/NumSharp.UnitTest/Backends/Kernels/WhereSimdTests.cs

Metadata

Metadata

Assignees

Labels

NumPy 2.x ComplianceAligns behavior with NumPy 2.x (NEPs, breaking changes)coreInternal engine: Shape, Storage, TensorEngine, iteratorsenhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions