Releases: SciSharp/NumSharp
NumSharp 0.50.0 - Long Indexing Release
This release introduces Int64/Long Indexing - a complete architectural migration enabling arrays larger than 2.1 billion elements (>2GB), along with comprehensive NumPy 2.x type system alignment, new type introspection APIs, and the Python container protocol.
Installable via NuGet
dotnet add package NumSharp --version 0.50.0-prerelease
dotnet add package NumSharp.Bitmap --version 0.50.0-prerelease
TL;DR
- Int64/Long Indexing: Full migration from
inttolongacross Shape, NDArray, Storage, Iterators, and ILKernelGenerator - ndarrays >2GB now supported - 12 New Type APIs:
np.can_cast,np.promote_types,np.result_type,np.min_scalar_type,np.common_type,np.issubdtype,np.finfo,np.iinfo,np.isreal,np.iscomplex,np.isrealobj,np.iscomplexobj - 6 Comparison Functions:
np.equal,np.not_equal,np.less,np.greater,np.less_equal,np.greater_equal - 4 Logical Functions:
np.logical_and,np.logical_or,np.logical_not,np.logical_xor - Container Protocol:
__contains__,__len__,__iter__,__getitem__,__setitem__- NumPy-compatible iteration - New NDArray Methods:
tolist(),item()for NumPy parity - NumPy 2.x Type System:
np.arange()returns Int64,NPTypeHierarchyencoding NumPy's exact type tree, Bool NOT under Number np.frombuffer()Rewrite: Full NumPy signature withcount,offset, big-endian support,IntPtr/void*overloads, view semantics- 0D Scalar Arrays:
np.array(5)now creates 0D arrays (matching NumPy) np.arange()Fixes: Negative step, integer arithmetic, inlined type-specific loops, full NumPy parity.np.any/np.all: 0D array support with axis parameter- Random API Alignment (#582): Parameter names match NumPy,
np.shufflefixed - Empty Array Handling: Proper NaN returns for mean/std/var on empty arrays
- NaN Sorting:
np.uniquenow sorts NaN to end (matches NumPy) - ValueType to Object Migration: All scalar returns now
object(NumPy alignment), discarded usages of ValueType - UnmanagedSpan: Ported from dotnet/runtime for Span-like semantics with
longlength - Operator Cleanup: 74% reduction in NDArray.Primitive.cs (150 → 40 overloads)
- 600+ Battle Tests: All validated against actual NumPy 2.x output
- 145 Test Fixes: 71 for Int64 alignment + 74 previously failing tests now passing
Changes and Fixes
np.arange(10, 0, -2): Before returned[9, 7, 5, 3, 1], now correctly returns[10, 8, 6, 4, 2]np.arange(0, 5, 0.5, int32): Before returned[0,0,1,1,2,2,3,3,4,4], now correctly returns[0,0,0,0,0,0,0,0,0,0](NumPy behavior)np.any(0D_array, axis=0): Before threwArgumentException, now returns 0D bool scalarnp.all(0D_array, axis=-1): Before threwArgumentException, now returns 0D bool scalarContains([1,2], array([1,2,3])): Before returnedFalse, now throwsIncorrectShapeException(matches NumPy)np.shuffleaxis parameter: Removed non-existentaxisparam, now matches NumPy legacy APInp.random.standard_normal: Fixed typo (stardard_normal→standard_normal)- Scalar broadcast assignment: Fixed cross-dtype conversion failure
- Root cause:
AsOrMakeGeneric<T>()callednew NDArray<T>(astype(...))which triggered implicit scalar → size constructor - Fix: Use
.Storageto pass storage directly, avoiding implicit conversion
- Root cause:
- Fancy indexing dtypes: Now supports all integer dtypes (Int16, Int32, Int64), not just Int32
- Added
NormalizeIndexArray()helper that keeps Int32/Int64 as-is, converts smaller types to Int64 - Throws
IndexOutOfRangeExceptionfor non-integer types (float, decimal)
- Added
- NDArray.ToString() now formats 100% identical to numpy.
np.mean([]): ReturnsNaN(was throwing or returning 0)np.mean(zeros((0,3)), axis=0): Returns[NaN, NaN, NaN]np.mean(zeros((0,3)), axis=1): Returns empty array[]np.std/varsingle element: ReturnsNaNwithddof >= size- Empty comparison: All 6 comparison operators now return empty boolean arrays (was returning scalar)
np.uniqueNaN sorting: NaN now sorts to end (matches NumPy:[-inf, 1, 2, inf, nan])ArgMax/ArgMinNaN: First NaN always wins (NaN takes precedence over any value)- Single-element axis reduction: Changed
Storage.Alias()andsqueeze_fast()to return copies (was sharing memory) - Clip mixed-dtype: Fixed bug where int32 min/max arrays were read as int64
np.invert(bool): Now uses logical NOT (!x) instead of bitwise NOT (~x)np.square(int): Preserves integer dtype instead of promoting to doublenp.negate(bool): Removed buggy linear-indexing path, now routes throughExecuteUnaryOp- Fixed ATan2 non-contiguous array handling by adding
np.broadcast_arrays()and.copy()materialization - Fixed ATan2 wrong pointer type (byte*) for x operand in all non-byte cases
- finfo: Use
MathF.BitIncrementfor float eps (was usingMath.BitIncrementwhich only works on double) - issctype: Properly reject string type (was returning true for
typeof(string)) NDArray.unique(): Fixed for long indexing supportnp.repeat: Fixed dtype handling and long count supportnp.random.choice: Fixed for long population sizesnp.argmax/argminIL fix: RemovedConv_I4instruction that truncated long indices to int32- ILKernel loop counters: Fixed numerous int32 overflow issues
TransformOffsetcalculations: Fixed for >2GB arrays- SIMD helper functions: Fixed for long indexing
- AVX2 gather: Added stride check (falls back to scalar for stride > int.MaxValue)
- Parameter names now match NumPy (
size,a,b,p,d0) np.random()added as alias for uniform distributionnp.shuffleremoved non-existent axis parameter- ValueType to Object Migration
- All scalar return types migrated from
ValueTypetoobject NPTypeCode.GetDefaultValue()now returnsobject- All operators migrated to NumPy-aligned object pattern
- NDArray null checks converted from
== nulltois nullpattern
- All scalar return types migrated from
- Operator Overload Cleanup
NDArray.Primitive.cs: 159 → 42 lines (74% reduction)- ~150 explicit scalar overloads → ~40 object-based overloads
- Added missing
implicit operator NDArray(byte) - Changed
ushortfrom explicit to implicit
- Implicit Scalar Conversion
(int)ndarray_float64now works viaConverts.ChangeTypescalar → NDArray: implicit (safe, creates 0-d array)NDArray → scalar: explicit (requires 0-d, throwsIncorrectShapeException)- Matches NumPy's
int(arr),float(arr),bool(arr)pattern
- All
== nullchanged tois null(because==now returnsNDArray<bool>as does numpy) - All
!= nullchanged tois not null - Type System Consolidation
can_castderived from promotion tables (replaced 80+ lines of switch cases)- Single source of truth:
NPTypeHierarchy - Removed duplicate
TypeKindenum and category helper methods
Detailed Breakdown
Read More
Contents
- Int64/Long Indexing
- NumPy 2.x Type System
- Container Protocol
- New APIs
- Changes and Fixes
- Performance
- Test Improvements
Int64/Long Indexing Support
Complete migration from int to long indexing across the entire codebase, enabling arrays larger than 2.1 billion elements (~2GB for byte arrays, ~16GB for doubles).
Core Type Changes
Shape.dimensions:int[]->long[]Shape.strides:int[]->long[]Shape.size:int->longShape.offset:int->longNDArray.size:int->longNDArray.len:int->long- All NDArray indexers:
int->long ArraySlice<T>:intindexing ->longindexingUnmanagedMemoryBlock<T>:intindexing ->longindexingUnmanagedStorage:intindexing ->longindexingNDIteratorcoordinates:int[]->long[]MultiIterator:intoffsets ->longoffsetsnp.nonzero(): ReturnsNDArray<long>[]instead ofNDArray<int>[]np.argmax/argmin: Returnslongindices
ILKernelGenerator Migration (20+ files)
All ILKernelGenerator partial classes updated for long loop counters and offsets:
ILKernelGenerator.Binary.cs- Loop counters tolongILKernelGenerator.Reduction.cs- Index variables tolongILKernelGenerator.Reduction.Axis.cs- Axis iteration withlongILKernelGenerator.Reduction.Axis.Simd.cs- SIMD paths withlongILKernelGenerator.Reduction.Axis.NaN.cs- NaN handling withlongILKernelGenerator.Reduction.Axis.Arg.cs- ArgMax/ArgMin withlongILKernelGenerator.Reduction.Axis.VarStd.cs- Variance/StdDev withlongILKernelGenerator.Reduction.NaN.cs- NEW NaN reductions IL generationILKernelGenerator.Scan.cs- CumSum/CumProd withlongindicesILKernelGenerator.MatMul.cs- Matrix dimensions tolongILKernelGenerator.Clip.cs- TransformOffset calculationsILKernelGenerator.Masking.cs- Boolean masking withlongILKernelGenerator.Masking.Boolean.cs- Boolean operations withlongILKernelGenerator.Masking.NaN.cs- NaN masking withlongILKernelGenerator.Masking.VarStd.cs- Variance masking withlong
New Infrastructure
UnmanagedSpan<T>- Ported from dotnet/runtime Span - Span-like withlonglengthReadOnlyUnmanagedSpan<T>- Read-only variantUnmanagedSpanExtensions- Extension methods for Span parityUnmanagedSpanHelpers- SIMD-optimized value type methods- `UnmanagedSpanHelpers.T...
NumSharp 0.41.0-prerelease
This prerelease introduces the IL Kernel Generator -
A complete architectural overhaul that replaces ~600K lines of Regen-generated template code with ~19K lines of runtime IL generation.
This delivers massive performance improvements, comprehensive NumPy 2.x alignment, and significantly cleaner maintainable code.
Installation
dotnet add package NumSharp --version 0.41.0-prereleaseOr via Package Manager:
Install-Package NumSharp -Version 0.41.0-prereleaseTL;DR
- IL Kernel Generator: Runtime IL emission replaces 600K lines of Regen templates with 19K lines
- SIMD everywhere: Vector128/256/512 with runtime detection across all operations
- 35 new functions: nansum/prod/min/max/mean/var/std, cbrt, floor_divide, left/right_shift, deg2rad, rad2deg, cumprod, count_nonzero, isnan, isfinite, isinf, isclose, invert, reciprocal, square, trunc, plus comparison and logical modules
- Operators fixed:
==,!=,<,>,<=,>=,&,|,^ - np.comparison module:
np.equal(),np.not_equal(),np.less(),np.greater(),np.less_equal(),np.greater_equal() - np.logical module:
np.logical_and(),np.logical_or(),np.logical_not(),np.logical_xor() - NDArray<T> operators: Typed
&,|,^for generic arrays (resolvesNDArray<bool>ambiguity) - Math functions rewritten: sin, cos, tan, exp, log, sqrt, abs, sign, floor, ceil, etc.
- 60+ bug fixes: np.negative, np.positive, np.unique, np.dot, np.matmul, np.abs, np.argmax/min, np.mean, np.std/var, np.cumsum, np.nonzero, np.all/any, np.clip, and more
- MatMul 35-100x faster: Cache-blocked SIMD achieving 20+ GFLOPS
- Boolean indexing rewrite: SIMD fast path with CountTrue/CopyMasked
- Axis reductions rewrite: AVX2 gather, NaN-aware, proper keepdims and empty array handling
- Single-threaded execution: Deterministic, non-blocking (SIMD compensates for parallelism), Removed use of
Parallel.* - Architecture cleanup: Broadcasting in Shape struct, TensorEngine routing, static ILKernelGenerator
- np.random aligned (#582): Parameter names match NumPy, Shape overloads added
- DecimalMath internalized (#588): Removed embedded third-party code
- NEP50 compliant: NumPy 2.x type promotion rules
- Benchmark infrastructure: SIMD vs scalar comparison suite
- DefaultEngine dispatch layer: BinaryOp, BitwiseOp, CompareOp, ReductionOp, UnaryOp
- +4,200 unit tests, our own and migrated from python/numpy to C#.
Contents
| Section | Highlights |
|---|---|
| Summary | 106 commits, -533K lines, 3,907 tests |
| IL Kernel Generator | 27 files, SIMD V128/256/512 |
| Architecture | Static ILKernelGenerator, TensorEngine routing |
| New NumPy Functions (35) | nansum, isnan, cumprod, etc. |
| Critical Bug Fixes | negative, unique, dot, linspace, intp |
| Operator Rewrites | ==, !=, <, >, &, | now work |
| Boolean Indexing Rewrite | SIMD fast path, 76 battle tests |
| Slicing Improvements | Broadcast stride=0 preserved |
| Performance Improvements | MatMul 35-100x, 20+ GFLOPS |
| Code Reduction | 99% binary, 98% MatMul, 97% Dot |
| Infrastructure Changes | NativeMemory, static kernels |
| API Alignment | random() params aligned with NumPy |
| New Test Files (68) | 34 kernel, 8 NumPy, 4 linalg, 76 boolean |
| Known Issues | 52 OpenBugs excluded |
| Installation | dotnet add package NumSharp |
Summary
| Metric | Value |
|---|---|
| Commits | 106 |
| Files Changed | 558 |
| Lines Added | +72,635 |
| Lines Deleted | -605,976 |
| Net Change | -533K lines |
| Test Results | 3,907 passed, 52 OpenBugs, 11 skipped |
Detailed Breakdown
Read More
IL Kernel Generator
Runtime IL generation via System.Reflection.Emit.DynamicMethod replaces static Regen templates.
Kernel Files (27 new files)
ILKernelGenerator.cs- Core infrastructure, SIMD detection (Vector128/256/512)ILKernelGenerator.Binary.cs- Add, Sub, Mul, Div, BitwiseAnd/Or/XorILKernelGenerator.MixedType.cs- Mixed-type ops with type promotionILKernelGenerator.Unary.cs- Negate, Abs, Sqrt, Sin, Cos, Exp, Log, SignILKernelGenerator.Comparison.cs- ==, !=, <, >, <=, >= returning bool arraysILKernelGenerator.Reduction.cs- Sum, Prod, Min, Max, Mean, ArgMax, ArgMin, All, AnyILKernelGenerator.Reduction.Axis.Simd.cs- AVX2 gather for axis reductionsILKernelGenerator.Scan.cs- CumSum, CumProd with SIMDILKernelGenerator.Shift.cs- LeftShift, RightShiftILKernelGenerator.MatMul.cs- Cache-blocked SIMD matrix multiplyILKernelGenerator.Clip.cs,.Modf.cs,.Masking.cs- Specialized ops
Execution Paths
- SimdFull - Contiguous + SIMD-capable dtype → Vector loop + scalar tail
- ScalarFull - Contiguous + non-SIMD dtype (Decimal) → Scalar loop
- General - Strided/broadcast → Coordinate-based iteration
Infrastructure
KernelKey.cs,KernelOp.cs,KernelSignatures.cs- Kernel dispatchSimdMatMul.cs- SIMD matrix multiplication helpersTypeRules.cs- NEP50 type promotion rules
Architecture
Clean separation of concerns:
| Component | Design |
|---|---|
ILKernelGenerator |
Static class (27 partial files), internal to DefaultEngine |
TensorEngine |
All np.* ops route through abstract methods |
Shape.Broadcasting |
Pure shape math in Shape struct (456 lines) |
ArgMin/ArgMax |
Unified IL kernel with NaN-aware + Boolean semantics |
DecimalMath |
Internal utility (~403 lines) for Sqrt, Pow, ATan2, Exp, Log |
Single-Threaded Execution
All computation is single-threaded with no Parallel.For usage. This provides:
- Deterministic behavior - Same inputs always produce same outputs in same order
- Non-blocking execution - No thread synchronization overhead
- Simplified debugging - Stack traces are straightforward
- SIMD compensation - Vector128/256/512 intrinsics provide parallelism at the CPU level
Broadcasting External to Engine
Broadcasting logic (Shape.Broadcasting.cs) is pure shape math with no engine dependencies:
Shape.AreBroadcastable()- Check if shapes can broadcastShape.Broadcast()- Compute broadcast result shape and stridesShape.ResolveReturnShape()- Determine output shape for operationsDefaultEnginedelegates all broadcasting toShape.*methods
DecimalMath (#588)
Replaced embedded third-party DecimalEx.cs (~1061 lines) with minimal internal DecimalMath.cs (~403 lines) containing only the functions NumSharp actually uses: Sqrt, Pow, ATan2, Exp, Log, Log10, ATan.
TensorEngine Abstract Methods
Compare, NotEqual, Less, LessEqual, Greater, GreaterEqual, BitwiseAnd, BitwiseOr, BitwiseXor, LeftShift, RightShift, Power(NDArray, NDArray), FloorDivide, Truncate, Reciprocal, Square, Cbrt, Invert, Deg2Rad, Rad2Deg, IsInf, ReduceCumMul, Any, NanSum, NanProd, NanMin, NanMax, BooleanMask
DefaultEngine Dispatch Files (IL kernel integration)
| File | Functions |
|---|---|
DefaultEngine.BinaryOp.cs |
np.add, np.subtract, np.multiply, np.divide, np.mod, np.power |
DefaultEngine.BitwiseOp.cs |
np.bitwise_and, np.bitwise_or, np.bitwise_xor, &, |, ^ |
DefaultEngine.CompareOp.cs |
np.equal, np.not_equal, np.less, np.greater, np.less_equal, np.greater_equal |
DefaultEngine.ReductionOp.cs |
np.sum, np.prod, np.min, np.max, np.mean, np.std, np.var, np.argmax, np.argmin |
DefaultEngine.UnaryOp.cs |
np.abs, np.negative, np.sqrt, np.sin, np.cos, np.exp, np.log, np.sign, etc. |
Implementation Files
Default.Any.cs, Default.BooleanMask.cs, Default.Reduction.Nan.cs, Shape.Broadcasting.cs
New NumPy Functions (35)
NaN-Aware Reductions (7)
| Function | Description |
|---|---|
np.nansum |
Sum ignoring NaN |
np.nanprod |
Product ignoring NaN |
np.nanmin |
Minimum ignoring NaN |
np.nanmax |
Maximum ignoring NaN |
np.nanmean |
Mean ignoring NaN |
np.nanvar |
Variance ignoring NaN |
np.nanstd |
Standard deviation ignoring NaN |
Math Operations (8)
| Function | Description |
|---|---|
np.cbrt |
Cube root |
np.floor_divide |
Integer division |
np.reciprocal |
Element-wise 1/x |
np.trunc |
Truncate to integer |
np.invert |
Bitwise NOT |
np.square |
Element-wise square |
np.cumprod |
Cumulative product |
np.count_nonzero |
Count non-zero elements |
Bitwise & Trigonometric (4)
| Function | Description |
|---|---|
np.left_shift |
Bitwise left shift |
np.right_shift |
Bitwise right shift |
np.deg2rad |
Degrees to radians |
np.rad2deg |
Radians to degrees |
Logic & Validation (4) - Previously returned null
| Function | Description |
|---|---|
np.isnan |
Test element-wise for NaN |
np.isfinite |
Test element-wise for finiteness |
np.isinf |
Test element-wise for infinity |
np.isclose |
Element-wise comparison within tolerance |
Operators (2) - Previously returned null
| Operator | Description |
|---|---|
operator & |
Bitwise/logical AND with broadcasting |
operator | |
Bitwise/logical OR with broadcasting |
Comparison Functions (6) - New named AP...
v0.4.0-alpha1
NumSharp v0.4.0-alpha1
See #538 for information.
NuGet
No nuget release this preview version.
What's Changed
- Enabled NDArray boolean comparisons for LessThan, GreaterThan, and … by @Rikki-Tavi in #395
- Added data types in np.frombuffer. in #425
- F# in README by @dsyme in #432
- Added support for user defined decimal precision for np.around() and TensorEngine.Round() by @shashi4u in #453
- NumSharp.Bitmap support for odd sized bitmaps with odd sized bytes per pixel by @AmbachtIT in #460
- Fixing the consistency of seed in the random choice. by @bojake in #489
- (Logics):add high performance logical AND function with axis an… by @zhuoshui-AI in #525
- Upgrade target frameworks to net8.0;net10.0 by @Nucs in #532
- Add GitHub Actions CI/CD pipeline by @Nucs in #534
- Fix: skip Bitmap tests on non-Windows CI by @Nucs in #535
- docs: relocate website to docs/website/ by @Nucs in #557
- docs: move docfx_project to docs/website-src by @Nucs in #558
- feat(docs): upgrade to DocFX v2 modern template by @Nucs in #562
New Contributors
Many of the contributer's merges were piggybacked by this release and was probably not entirely intentional.
- @Rikki-Tavi made their first contribution in #395
- @dsyme made their first contribution in #432
- @shashi4u made their first contribution in #453
- @AmbachtIT made their first contribution in #460
- @bojake made their first contribution in #489
- @zhuoshui-AI made their first contribution in #525
Full Changelog: 0.20.5...v0.4.0-alpha1
v0.20.5
- NDArray.Indexing: Rewrite of the getter mechanism, NDArray getter now supports combining 'NDArray, Slice, string, int, bool' in the same slice.
- NDArray.Indexing: Added support for indexing with unmanaged array of indices: ndarray[int* pointer, int length], nd.GetData(int*, int), etc..
- NDArray.Broadcasting: fixed multiple issues.
- NDArray.Slicing: Added support for slicing a broadcasted NDArray.
- Added NPTypeCode.Float as an alias to NPTypeCode.Single
- Extending NPY and fixing NPZ (Thanks Matthew Moloney)
- Added NDArray.AsOrMakeGeneric()
- Added np.nonzero. np.maximum, np.minimum, np.all, np.any
- Arrays.cs: perf-optted Arrays.Slice
- NDArray.FromMultiDimArray: Fixed #367
- np.clip: Added @out argument
- Added np.array(IEnumerable) and np.array(IEnumerable, int size) which is faster.
- np.broadcast_to: added additional overloads.
v0.20.4
Changes
- Added np.transpose, np.swapaxes, ndarray.T, np.moveaxis, np.rollaxis, np.size, np.copyto.
- Added np.ceil, np.arccos, np.floor, np.modf, np.square, np.round, np.sign, np.arcsin, np.arctan.
- Added np.random.*: beta, gamma, bernoulli, binomial, lognormal, normal, poisson, chisquare, geometric.
- Added support for
np.newaxis,...(ellipsis) in a slice. - Performance optimization for np.array, np.linspace, Randomizer class and all np.random.* methods.
Bug Fixes
- ndarray.view copying when it shouldn't.
- couple of ambiguous methods
Obsoletion
nd.Unsafe.Shapeis now obsolete in favor ofnd.Shape.
Special thanks to @henon and @deepakkumar1984 for a PRing great portion of this release.
v0.20.3
v0.10-slice
release signed assembly v0.10.6.
v0.7 works with TensorFlow.NET
v0.7-tensorflow Merge branch 'master' of https://github.com/Oceania2018/NumSharp
v0.6 Supports LAPACK
Merge pull request #162 from dotChris90/master Extend doc and generated new API docs
v0.5-dtype
release v0.5