Skip to content

Releases: SciSharp/NumSharp

NumSharp 0.50.0 - Long Indexing Release

12 Apr 04:20

Choose a tag to compare

This release introduces Int64/Long Indexing - a complete architectural migration enabling arrays larger than 2.1 billion elements (>2GB), along with comprehensive NumPy 2.x type system alignment, new type introspection APIs, and the Python container protocol.

Installable via NuGet

dotnet add package NumSharp --version 0.50.0-prerelease
dotnet add package NumSharp.Bitmap --version 0.50.0-prerelease

TL;DR

  • Int64/Long Indexing: Full migration from int to long across Shape, NDArray, Storage, Iterators, and ILKernelGenerator - ndarrays >2GB now supported
  • 12 New Type APIs: np.can_cast, np.promote_types, np.result_type, np.min_scalar_type, np.common_type, np.issubdtype, np.finfo, np.iinfo, np.isreal, np.iscomplex, np.isrealobj, np.iscomplexobj
  • 6 Comparison Functions: np.equal, np.not_equal, np.less, np.greater, np.less_equal, np.greater_equal
  • 4 Logical Functions: np.logical_and, np.logical_or, np.logical_not, np.logical_xor
  • Container Protocol: __contains__, __len__, __iter__, __getitem__, __setitem__ - NumPy-compatible iteration
  • New NDArray Methods: tolist(), item() for NumPy parity
  • NumPy 2.x Type System: np.arange() returns Int64, NPTypeHierarchy encoding NumPy's exact type tree, Bool NOT under Number
  • np.frombuffer() Rewrite: Full NumPy signature with count, offset, big-endian support, IntPtr/void* overloads, view semantics
  • 0D Scalar Arrays: np.array(5) now creates 0D arrays (matching NumPy)
  • np.arange() Fixes: Negative step, integer arithmetic, inlined type-specific loops, full NumPy parity.
  • np.any/np.all: 0D array support with axis parameter
  • Random API Alignment (#582): Parameter names match NumPy, np.shuffle fixed
  • Empty Array Handling: Proper NaN returns for mean/std/var on empty arrays
  • NaN Sorting: np.unique now sorts NaN to end (matches NumPy)
  • ValueType to Object Migration: All scalar returns now object (NumPy alignment), discarded usages of ValueType
  • UnmanagedSpan: Ported from dotnet/runtime for Span-like semantics with long length
  • Operator Cleanup: 74% reduction in NDArray.Primitive.cs (150 → 40 overloads)
  • 600+ Battle Tests: All validated against actual NumPy 2.x output
  • 145 Test Fixes: 71 for Int64 alignment + 74 previously failing tests now passing

Changes and Fixes

  • np.arange(10, 0, -2): Before returned [9, 7, 5, 3, 1], now correctly returns [10, 8, 6, 4, 2]
  • np.arange(0, 5, 0.5, int32): Before returned [0,0,1,1,2,2,3,3,4,4], now correctly returns [0,0,0,0,0,0,0,0,0,0] (NumPy behavior)
  • np.any(0D_array, axis=0): Before threw ArgumentException, now returns 0D bool scalar
  • np.all(0D_array, axis=-1): Before threw ArgumentException, now returns 0D bool scalar
  • Contains([1,2], array([1,2,3])): Before returned False, now throws IncorrectShapeException (matches NumPy)
  • np.shuffle axis parameter: Removed non-existent axis param, now matches NumPy legacy API
  • np.random.standard_normal: Fixed typo (stardard_normalstandard_normal)
  • Scalar broadcast assignment: Fixed cross-dtype conversion failure
    • Root cause: AsOrMakeGeneric<T>() called new NDArray<T>(astype(...)) which triggered implicit scalar → size constructor
    • Fix: Use .Storage to pass storage directly, avoiding implicit conversion
  • Fancy indexing dtypes: Now supports all integer dtypes (Int16, Int32, Int64), not just Int32
    • Added NormalizeIndexArray() helper that keeps Int32/Int64 as-is, converts smaller types to Int64
    • Throws IndexOutOfRangeException for non-integer types (float, decimal)
  • NDArray.ToString() now formats 100% identical to numpy.
  • np.mean([]): Returns NaN (was throwing or returning 0)
  • np.mean(zeros((0,3)), axis=0): Returns [NaN, NaN, NaN]
  • np.mean(zeros((0,3)), axis=1): Returns empty array []
  • np.std/var single element: Returns NaN with ddof >= size
  • Empty comparison: All 6 comparison operators now return empty boolean arrays (was returning scalar)
  • np.unique NaN sorting: NaN now sorts to end (matches NumPy: [-inf, 1, 2, inf, nan])
  • ArgMax/ArgMin NaN: First NaN always wins (NaN takes precedence over any value)
  • Single-element axis reduction: Changed Storage.Alias() and squeeze_fast() to return copies (was sharing memory)
  • Clip mixed-dtype: Fixed bug where int32 min/max arrays were read as int64
  • np.invert(bool): Now uses logical NOT (!x) instead of bitwise NOT (~x)
  • np.square(int): Preserves integer dtype instead of promoting to double
  • np.negate(bool): Removed buggy linear-indexing path, now routes through ExecuteUnaryOp
  • Fixed ATan2 non-contiguous array handling by adding np.broadcast_arrays() and .copy() materialization
  • Fixed ATan2 wrong pointer type (byte*) for x operand in all non-byte cases
  • finfo: Use MathF.BitIncrement for float eps (was using Math.BitIncrement which only works on double)
  • issctype: Properly reject string type (was returning true for typeof(string))
  • NDArray.unique(): Fixed for long indexing support
  • np.repeat: Fixed dtype handling and long count support
  • np.random.choice: Fixed for long population sizes
  • np.argmax/argmin IL fix: Removed Conv_I4 instruction that truncated long indices to int32
  • ILKernel loop counters: Fixed numerous int32 overflow issues
  • TransformOffset calculations: Fixed for >2GB arrays
  • SIMD helper functions: Fixed for long indexing
  • AVX2 gather: Added stride check (falls back to scalar for stride > int.MaxValue)
  • Parameter names now match NumPy (size, a, b, p, d0)
  • np.random() added as alias for uniform distribution
  • np.shuffle removed non-existent axis parameter
  • ValueType to Object Migration
    • All scalar return types migrated from ValueType to object
    • NPTypeCode.GetDefaultValue() now returns object
    • All operators migrated to NumPy-aligned object pattern
    • NDArray null checks converted from == null to is null pattern
  • Operator Overload Cleanup
    • NDArray.Primitive.cs: 159 → 42 lines (74% reduction)
    • ~150 explicit scalar overloads → ~40 object-based overloads
    • Added missing implicit operator NDArray(byte)
    • Changed ushort from explicit to implicit
  • Implicit Scalar Conversion
    • (int)ndarray_float64 now works via Converts.ChangeType
    • scalar → NDArray: implicit (safe, creates 0-d array)
    • NDArray → scalar: explicit (requires 0-d, throws IncorrectShapeException)
    • Matches NumPy's int(arr), float(arr), bool(arr) pattern
  • All == null changed to is null (because == now returns NDArray<bool> as does numpy)
  • All != null changed to is not null
  • Type System Consolidation
    • can_cast derived from promotion tables (replaced 80+ lines of switch cases)
    • Single source of truth: NPTypeHierarchy
    • Removed duplicate TypeKind enum and category helper methods

Detailed Breakdown

Read More

Contents

Int64/Long Indexing Support

Complete migration from int to long indexing across the entire codebase, enabling arrays larger than 2.1 billion elements (~2GB for byte arrays, ~16GB for doubles).

Core Type Changes

  • Shape.dimensions: int[] -> long[]
  • Shape.strides: int[] -> long[]
  • Shape.size: int -> long
  • Shape.offset: int -> long
  • NDArray.size: int -> long
  • NDArray.len: int -> long
  • All NDArray indexers: int -> long
  • ArraySlice<T>: int indexing -> long indexing
  • UnmanagedMemoryBlock<T>: int indexing -> long indexing
  • UnmanagedStorage: int indexing -> long indexing
  • NDIterator coordinates: int[] -> long[]
  • MultiIterator: int offsets -> long offsets
  • np.nonzero(): Returns NDArray<long>[] instead of NDArray<int>[]
  • np.argmax/argmin: Returns long indices

ILKernelGenerator Migration (20+ files)

All ILKernelGenerator partial classes updated for long loop counters and offsets:

  • ILKernelGenerator.Binary.cs - Loop counters to long
  • ILKernelGenerator.Reduction.cs - Index variables to long
  • ILKernelGenerator.Reduction.Axis.cs - Axis iteration with long
  • ILKernelGenerator.Reduction.Axis.Simd.cs - SIMD paths with long
  • ILKernelGenerator.Reduction.Axis.NaN.cs - NaN handling with long
  • ILKernelGenerator.Reduction.Axis.Arg.cs - ArgMax/ArgMin with long
  • ILKernelGenerator.Reduction.Axis.VarStd.cs - Variance/StdDev with long
  • ILKernelGenerator.Reduction.NaN.cs - NEW NaN reductions IL generation
  • ILKernelGenerator.Scan.cs - CumSum/CumProd with long indices
  • ILKernelGenerator.MatMul.cs - Matrix dimensions to long
  • ILKernelGenerator.Clip.cs - TransformOffset calculations
  • ILKernelGenerator.Masking.cs - Boolean masking with long
  • ILKernelGenerator.Masking.Boolean.cs - Boolean operations with long
  • ILKernelGenerator.Masking.NaN.cs - NaN masking with long
  • ILKernelGenerator.Masking.VarStd.cs - Variance masking with long

New Infrastructure

  • UnmanagedSpan<T> - Ported from dotnet/runtime Span - Span-like with long length
  • ReadOnlyUnmanagedSpan<T> - Read-only variant
  • UnmanagedSpanExtensions - Extension methods for Span parity
  • UnmanagedSpanHelpers - SIMD-optimized value type methods
  • `UnmanagedSpanHelpers.T...
Read more

NumSharp 0.41.0-prerelease

23 Mar 22:05

Choose a tag to compare

This prerelease introduces the IL Kernel Generator -
A complete architectural overhaul that replaces ~600K lines of Regen-generated template code with ~19K lines of runtime IL generation.
This delivers massive performance improvements, comprehensive NumPy 2.x alignment, and significantly cleaner maintainable code.

Installation

dotnet add package NumSharp --version 0.41.0-prerelease

Or via Package Manager:

Install-Package NumSharp -Version 0.41.0-prerelease

TL;DR

  • IL Kernel Generator: Runtime IL emission replaces 600K lines of Regen templates with 19K lines
  • SIMD everywhere: Vector128/256/512 with runtime detection across all operations
  • 35 new functions: nansum/prod/min/max/mean/var/std, cbrt, floor_divide, left/right_shift, deg2rad, rad2deg, cumprod, count_nonzero, isnan, isfinite, isinf, isclose, invert, reciprocal, square, trunc, plus comparison and logical modules
  • Operators fixed: ==, !=, <, >, <=, >=, &, |, ^
  • np.comparison module: np.equal(), np.not_equal(), np.less(), np.greater(), np.less_equal(), np.greater_equal()
  • np.logical module: np.logical_and(), np.logical_or(), np.logical_not(), np.logical_xor()
  • NDArray<T> operators: Typed &, |, ^ for generic arrays (resolves NDArray<bool> ambiguity)
  • Math functions rewritten: sin, cos, tan, exp, log, sqrt, abs, sign, floor, ceil, etc.
  • 60+ bug fixes: np.negative, np.positive, np.unique, np.dot, np.matmul, np.abs, np.argmax/min, np.mean, np.std/var, np.cumsum, np.nonzero, np.all/any, np.clip, and more
  • MatMul 35-100x faster: Cache-blocked SIMD achieving 20+ GFLOPS
  • Boolean indexing rewrite: SIMD fast path with CountTrue/CopyMasked
  • Axis reductions rewrite: AVX2 gather, NaN-aware, proper keepdims and empty array handling
  • Single-threaded execution: Deterministic, non-blocking (SIMD compensates for parallelism), Removed use of Parallel.*
  • Architecture cleanup: Broadcasting in Shape struct, TensorEngine routing, static ILKernelGenerator
  • np.random aligned (#582): Parameter names match NumPy, Shape overloads added
  • DecimalMath internalized (#588): Removed embedded third-party code
  • NEP50 compliant: NumPy 2.x type promotion rules
  • Benchmark infrastructure: SIMD vs scalar comparison suite
  • DefaultEngine dispatch layer: BinaryOp, BitwiseOp, CompareOp, ReductionOp, UnaryOp
  • +4,200 unit tests, our own and migrated from python/numpy to C#.

Contents

Section Highlights
Summary 106 commits, -533K lines, 3,907 tests
IL Kernel Generator 27 files, SIMD V128/256/512
Architecture Static ILKernelGenerator, TensorEngine routing
New NumPy Functions (35) nansum, isnan, cumprod, etc.
Critical Bug Fixes negative, unique, dot, linspace, intp
Operator Rewrites ==, !=, <, >, &, | now work
Boolean Indexing Rewrite SIMD fast path, 76 battle tests
Slicing Improvements Broadcast stride=0 preserved
Performance Improvements MatMul 35-100x, 20+ GFLOPS
Code Reduction 99% binary, 98% MatMul, 97% Dot
Infrastructure Changes NativeMemory, static kernels
API Alignment random() params aligned with NumPy
New Test Files (68) 34 kernel, 8 NumPy, 4 linalg, 76 boolean
Known Issues 52 OpenBugs excluded
Installation dotnet add package NumSharp

Summary

Metric Value
Commits 106
Files Changed 558
Lines Added +72,635
Lines Deleted -605,976
Net Change -533K lines
Test Results 3,907 passed, 52 OpenBugs, 11 skipped

Detailed Breakdown

Read More

IL Kernel Generator

Runtime IL generation via System.Reflection.Emit.DynamicMethod replaces static Regen templates.

Kernel Files (27 new files)

  • ILKernelGenerator.cs - Core infrastructure, SIMD detection (Vector128/256/512)
  • ILKernelGenerator.Binary.cs - Add, Sub, Mul, Div, BitwiseAnd/Or/Xor
  • ILKernelGenerator.MixedType.cs - Mixed-type ops with type promotion
  • ILKernelGenerator.Unary.cs - Negate, Abs, Sqrt, Sin, Cos, Exp, Log, Sign
  • ILKernelGenerator.Comparison.cs - ==, !=, <, >, <=, >= returning bool arrays
  • ILKernelGenerator.Reduction.cs - Sum, Prod, Min, Max, Mean, ArgMax, ArgMin, All, Any
  • ILKernelGenerator.Reduction.Axis.Simd.cs - AVX2 gather for axis reductions
  • ILKernelGenerator.Scan.cs - CumSum, CumProd with SIMD
  • ILKernelGenerator.Shift.cs - LeftShift, RightShift
  • ILKernelGenerator.MatMul.cs - Cache-blocked SIMD matrix multiply
  • ILKernelGenerator.Clip.cs, .Modf.cs, .Masking.cs - Specialized ops

Execution Paths

  1. SimdFull - Contiguous + SIMD-capable dtype → Vector loop + scalar tail
  2. ScalarFull - Contiguous + non-SIMD dtype (Decimal) → Scalar loop
  3. General - Strided/broadcast → Coordinate-based iteration

Infrastructure

  • KernelKey.cs, KernelOp.cs, KernelSignatures.cs - Kernel dispatch
  • SimdMatMul.cs - SIMD matrix multiplication helpers
  • TypeRules.cs - NEP50 type promotion rules

Architecture

Clean separation of concerns:

Component Design
ILKernelGenerator Static class (27 partial files), internal to DefaultEngine
TensorEngine All np.* ops route through abstract methods
Shape.Broadcasting Pure shape math in Shape struct (456 lines)
ArgMin/ArgMax Unified IL kernel with NaN-aware + Boolean semantics
DecimalMath Internal utility (~403 lines) for Sqrt, Pow, ATan2, Exp, Log

Single-Threaded Execution

All computation is single-threaded with no Parallel.For usage. This provides:

  • Deterministic behavior - Same inputs always produce same outputs in same order
  • Non-blocking execution - No thread synchronization overhead
  • Simplified debugging - Stack traces are straightforward
  • SIMD compensation - Vector128/256/512 intrinsics provide parallelism at the CPU level

Broadcasting External to Engine

Broadcasting logic (Shape.Broadcasting.cs) is pure shape math with no engine dependencies:

  • Shape.AreBroadcastable() - Check if shapes can broadcast
  • Shape.Broadcast() - Compute broadcast result shape and strides
  • Shape.ResolveReturnShape() - Determine output shape for operations
  • DefaultEngine delegates all broadcasting to Shape.* methods

DecimalMath (#588)

Replaced embedded third-party DecimalEx.cs (~1061 lines) with minimal internal DecimalMath.cs (~403 lines) containing only the functions NumSharp actually uses: Sqrt, Pow, ATan2, Exp, Log, Log10, ATan.

TensorEngine Abstract Methods

Compare, NotEqual, Less, LessEqual, Greater, GreaterEqual, BitwiseAnd, BitwiseOr, BitwiseXor, LeftShift, RightShift, Power(NDArray, NDArray), FloorDivide, Truncate, Reciprocal, Square, Cbrt, Invert, Deg2Rad, Rad2Deg, IsInf, ReduceCumMul, Any, NanSum, NanProd, NanMin, NanMax, BooleanMask

DefaultEngine Dispatch Files (IL kernel integration)

File Functions
DefaultEngine.BinaryOp.cs np.add, np.subtract, np.multiply, np.divide, np.mod, np.power
DefaultEngine.BitwiseOp.cs np.bitwise_and, np.bitwise_or, np.bitwise_xor, &, |, ^
DefaultEngine.CompareOp.cs np.equal, np.not_equal, np.less, np.greater, np.less_equal, np.greater_equal
DefaultEngine.ReductionOp.cs np.sum, np.prod, np.min, np.max, np.mean, np.std, np.var, np.argmax, np.argmin
DefaultEngine.UnaryOp.cs np.abs, np.negative, np.sqrt, np.sin, np.cos, np.exp, np.log, np.sign, etc.

Implementation Files

Default.Any.cs, Default.BooleanMask.cs, Default.Reduction.Nan.cs, Shape.Broadcasting.cs


New NumPy Functions (35)

NaN-Aware Reductions (7)

Function Description
np.nansum Sum ignoring NaN
np.nanprod Product ignoring NaN
np.nanmin Minimum ignoring NaN
np.nanmax Maximum ignoring NaN
np.nanmean Mean ignoring NaN
np.nanvar Variance ignoring NaN
np.nanstd Standard deviation ignoring NaN

Math Operations (8)

Function Description
np.cbrt Cube root
np.floor_divide Integer division
np.reciprocal Element-wise 1/x
np.trunc Truncate to integer
np.invert Bitwise NOT
np.square Element-wise square
np.cumprod Cumulative product
np.count_nonzero Count non-zero elements

Bitwise & Trigonometric (4)

Function Description
np.left_shift Bitwise left shift
np.right_shift Bitwise right shift
np.deg2rad Degrees to radians
np.rad2deg Radians to degrees

Logic & Validation (4) - Previously returned null

Function Description
np.isnan Test element-wise for NaN
np.isfinite Test element-wise for finiteness
np.isinf Test element-wise for infinity
np.isclose Element-wise comparison within tolerance

Operators (2) - Previously returned null

Operator Description
operator & Bitwise/logical AND with broadcasting
operator | Bitwise/logical OR with broadcasting

Comparison Functions (6) - New named AP...

Read more

v0.4.0-alpha1

14 Feb 10:14

Choose a tag to compare

v0.4.0-alpha1 Pre-release
Pre-release

NumSharp v0.4.0-alpha1

See #538 for information.

NuGet

No nuget release this preview version.

What's Changed

  • Enabled NDArray boolean comparisons for LessThan, GreaterThan, and … by @Rikki-Tavi in #395
  • Added data types in np.frombuffer. in #425
  • F# in README by @dsyme in #432
  • Added support for user defined decimal precision for np.around() and TensorEngine.Round() by @shashi4u in #453
  • NumSharp.Bitmap support for odd sized bitmaps with odd sized bytes per pixel by @AmbachtIT in #460
  • Fixing the consistency of seed in the random choice. by @bojake in #489
  • (Logics):add high performance logical AND function with axis an… by @zhuoshui-AI in #525
  • Upgrade target frameworks to net8.0;net10.0 by @Nucs in #532
  • Add GitHub Actions CI/CD pipeline by @Nucs in #534
  • Fix: skip Bitmap tests on non-Windows CI by @Nucs in #535
  • docs: relocate website to docs/website/ by @Nucs in #557
  • docs: move docfx_project to docs/website-src by @Nucs in #558
  • feat(docs): upgrade to DocFX v2 modern template by @Nucs in #562

New Contributors

Many of the contributer's merges were piggybacked by this release and was probably not entirely intentional.

Full Changelog: 0.20.5...v0.4.0-alpha1

v0.20.5

31 Dec 16:34

Choose a tag to compare

  • NDArray.Indexing: Rewrite of the getter mechanism, NDArray getter now supports combining 'NDArray, Slice, string, int, bool' in the same slice.
  • NDArray.Indexing: Added support for indexing with unmanaged array of indices: ndarray[int* pointer, int length], nd.GetData(int*, int), etc..
  • NDArray.Broadcasting: fixed multiple issues.
  • NDArray.Slicing: Added support for slicing a broadcasted NDArray.
  • Added NPTypeCode.Float as an alias to NPTypeCode.Single
  • Extending NPY and fixing NPZ (Thanks Matthew Moloney)
  • Added NDArray.AsOrMakeGeneric()
  • Added np.nonzero. np.maximum, np.minimum, np.all, np.any
  • Arrays.cs: perf-optted Arrays.Slice
  • NDArray.FromMultiDimArray: Fixed #367
  • np.clip: Added @out argument
  • Added np.array(IEnumerable) and np.array(IEnumerable, int size) which is faster.
  • np.broadcast_to: added additional overloads.

v0.20.4

05 Oct 16:06

Choose a tag to compare

Changes

  • Added np.transpose, np.swapaxes, ndarray.T, np.moveaxis, np.rollaxis, np.size, np.copyto.
  • Added np.ceil, np.arccos, np.floor, np.modf, np.square, np.round, np.sign, np.arcsin, np.arctan.
  • Added np.random.*: beta, gamma, bernoulli, binomial, lognormal, normal, poisson, chisquare, geometric.
  • Added support for np.newaxis, ... (ellipsis) in a slice.
  • Performance optimization for np.array, np.linspace, Randomizer class and all np.random.* methods.

Bug Fixes

  • ndarray.view copying when it shouldn't.
  • couple of ambiguous methods

Obsoletion

  • nd.Unsafe.Shape is now obsolete in favor of nd.Shape.

Special thanks to @henon and @deepakkumar1984 for a PRing great portion of this release.

v0.20.3

28 Sep 14:57

Choose a tag to compare

Breaking Changes

  • NumSharp.Backends.NPTypeCode moved to NumSharp.NPTypeCode.

v0.10-slice

28 Jul 12:37

Choose a tag to compare

release signed assembly v0.10.6.

v0.7 works with TensorFlow.NET

31 Jan 04:04

Choose a tag to compare

v0.7-tensorflow

Merge branch 'master' of https://github.com/Oceania2018/NumSharp

v0.6 Supports LAPACK

22 Dec 14:45
727fd41

Choose a tag to compare

Merge pull request #162 from dotChris90/master

Extend doc and generated new API docs

v0.5-dtype

05 Dec 02:38

Choose a tag to compare

release v0.5