Skip to content

[Random, Alignment] Refactor np.random.* to match NumPy parity#602

Merged
Nucs merged 62 commits intomasterfrom
npalign
Apr 12, 2026
Merged

[Random, Alignment] Refactor np.random.* to match NumPy parity#602
Nucs merged 62 commits intomasterfrom
npalign

Conversation

@Nucs
Copy link
Copy Markdown
Member

@Nucs Nucs commented Apr 11, 2026

Summary

Major refactor of NumSharp's random sampling module to achieve 1-to-1 parity with NumPy 2.x.

  • MT19937 RNG Engine: Replaced .NET's Subtractive Generator with Mersenne Twister - identical sequences to NumPy given same seed
  • 25 New Distributions: weibull, standard_cauchy, vonmises, f, logseries, standard_normal, standard_t, triangular, dirichlet, gumbel, hypergeometric, laplace, logistic, multinomial, multivariate_normal, negative_binomial, noncentral_chisquare, noncentral_f, pareto, power, rayleigh, standard_exponential, standard_gamma, wald, zipf
  • 5 Array Splitting Functions: np.split, np.array_split, np.hsplit, np.vsplit, np.dsplit
  • API Standardization: int[] (non-params) + params long[] size overloads across all distributions
  • Multivariate Normal Fix: SVD-based algorithm with NumPy eigenvector sign convention

Stats

  • 49 commits
  • 100 files changed
  • +20,930 / -666 lines
  • 38 new test files

Breaking Changes

Change Migration
RNG sequences differ with same seed Re-generate expected values
rand(5, 10) no longer compiles Use rand(5L, 10L) or new Shape(5, 10)
Scalar returns are 0D arrays Add index [0] or use .GetValue()

Related

Test plan

  • Build passes
  • Random sampling tests pass (rand, beta, gumbel verified)
  • Full CI run after merge

Nucs added 30 commits April 11, 2026 07:47
- np.split: splits array into equal sub-arrays (raises ArgumentException if not divisible)
- np.array_split: splits array into sub-arrays allowing unequal division
- Supports int (number of sections) and int[] (split indices)
- Supports axis parameter for multi-dimensional arrays
- Follows NumPy's swap-slice-swap approach for axis handling
- Returns views (not copies) matching NumPy behavior
- Full test coverage with NumPy 2.4.2 verified behavior

NumPy reference: numpy/lib/_shape_base_impl.py (v2.4.2)
Weibull distribution implementation matching NumPy 2.x behavior:

- weibull(a, size=None) - NumPy-aligned with optional size
- weibull(a, Shape size) - Shape overload
- weibull(a, int size) - 1D convenience
- weibull(a, params int[] size) - multi-dimensional

Features:
- Inverse transform method: X = (-ln(1-U))^(1/a)
- Validates a >= 0 (throws ArgumentException for negative)
- a=0 returns all zeros (NumPy behavior)
- a=1 reduces to exponential distribution (mean=1)
- Returns float64 dtype

Tests cover:
- Scalar and array returns
- Shape overloads
- Validation (negative a throws, a=0 returns zeros)
- Statistical properties (a=1 exponential, a=2 mean~0.886)
- Non-negative output values
- Heavy tail behavior (small a) and concentrated values (large a)
Standard Cauchy (Lorentz) distribution with loc=0, scale=1.

Implementation:
- standard_cauchy(Shape? size = null) - NumPy-aligned with optional size
- standard_cauchy(Shape size) - Shape overload
- standard_cauchy(int size) - 1D convenience
- standard_cauchy(params int[] size) - multi-dimensional

Algorithm:
- Inverse transform: X = tan(pi * (U - 0.5)) where U ~ Uniform(0,1)

Properties (verified in tests):
- Median = 0 (mean/variance undefined due to heavy tails)
- Interquartile range = 2 (Q1=-1, Q3=1)
- Heavy tails produce extreme values
- Returns float64 dtype

Tests cover:
- Scalar and array returns
- Shape overloads
- Statistical properties (median, IQR)
- Heavy tail behavior
- Reproducibility with seed
…ibution)

- Draw samples from the von Mises distribution on [-pi, pi]
- Matches NumPy 2.4.2 algorithm from distributions.c exactly:
  - kappa < 1e-8: uniform distribution on circle
  - 1e-5 <= kappa <= 1e6: rejection sampling algorithm
  - kappa > 1e6: wrapped normal approximation
- Supports mu (mean direction) and kappa (concentration) parameters
- kappa >= 0 required (raises ArgumentException otherwise)
- Full test coverage including edge cases and statistical properties

NumPy reference: numpy/random/src/distributions/distributions.c (v2.4.2)
F distribution (Fisher distribution) for ANOVA tests.

Implementation:
- f(dfnum, dfden, Shape? size = null) - NumPy-aligned
- f(dfnum, dfden, Shape size) - Shape overload
- f(dfnum, dfden, int size) - 1D convenience
- f(dfnum, dfden, params int[] size) - multi-dimensional

Algorithm:
- F = (chi2_num * dfden) / (chi2_den * dfnum)
- Reuses existing chisquare() which uses gamma()
- Added GammaSample/MarsagliaSample helpers for scalar generation

Validation:
- dfnum > 0 required (throws ArgumentException)
- dfden > 0 required (throws ArgumentException)

Properties (verified):
- Mean = dfden/(dfden-2) for dfden > 2
- All values positive
- Returns float64 dtype

Tests: 13 covering scalar/array, validation, statistics
Implements the logarithmic series distribution following NumPy 2.4.2 behavior
exactly. The distribution returns positive integers k >= 1 with probability
P(k) = -p^k / (k * ln(1-p)).

Implementation details:
- Uses NumPy's rejection sampling algorithm from distributions.c
- Parameter validation: p must be in range [0, 1), throws for NaN
- Returns int64 dtype (matches NumPy on 64-bit systems)
- Edge cases: p=0 returns all 1s, small p produces mostly 1s

Algorithm (from NumPy's random_logseries):
1. r = log(1-p)
2. If V >= p, return 1
3. q = 1 - exp(r * U)
4. If V <= q^2, return floor(1 + log(V)/log(q))
5. If V >= q, return 1
6. Otherwise return 2

Tests cover:
- Scalar and array output shapes
- Small/high p value distribution characteristics
- Validation (negative p, p=1, p>1, NaN)
- Statistical mean approximation to theoretical value
- All values >= 1 constraint

Verified against NumPy 2.4.2 output via Python battletest.
Add standard_normal(Shape? size = null) overload to NumPyRandom:
- Returns 0-d NDArray scalar when size is null (matches NumPy behavior)
- Uses existing NextGaussian() helper for Box-Muller transform
- Delegates to normal(0, 1.0, size) for array generation
- Verified: randn, standard_normal, and normal(0,1) produce identical results

Statistical verification:
- mean ~= 0 (tested: 0.0098)
- std ~= 1 (tested: 0.9933)
- variance ~= 1 (tested: 0.9867)

Tests: 15 test cases covering shapes, statistics, reproducibility
- Implements Student's t distribution matching NumPy 2.4.2 API
- Supports df (degrees of freedom) parameter
- Supports size parameter for array output
- Includes unit tests
- Implements triangular distribution matching NumPy 2.4.2 API
- Parameters: left, mode, right for triangular shape
- Supports size parameter for array output
- Includes unit tests validating shape, dtype, and bounds
Exhaustive inventory of all public APIs exposed by NumPy 2.4.2 as np.*
covering constants, data types, array creation, manipulation, math
functions, statistics, linear algebra, random sampling, FFT, and more.

This serves as the authoritative reference for API parity work.
Documents all 142 np.* APIs in NumSharp 0.41.x with status:
- 118 working, 12 partial, 12 broken/stub
- Covers array creation, math, statistics, manipulation, random sampling
- Tracks known behavioral differences from NumPy 2.x
- Lists missing functions and TODO items for future work
…ions

- hsplit: splits horizontally (column-wise) along axis 1 for 2D+, axis 0 for 1D
- vsplit: splits vertically (row-wise) along axis 0, requires ndim >= 2
- dsplit: splits along depth (axis 2), requires ndim >= 3

All support both integer (equal division) and indices array overloads.
Delegate to existing np.split with appropriate axis parameter.
NumPy-aligned parameter validation and error messages.
Comprehensive tests based on NumPy 2.4.2 behavior:
- hsplit: 1D/2D/3D arrays, indices overload, 0D error handling
- vsplit: 2D/3D arrays, indices overload, 1D error handling
- dsplit: 3D/4D arrays, indices overload, 1D/2D error handling

Verifies shape correctness and element values match NumPy output.
Draw samples from the Dirichlet distribution - a distribution over vectors
that sum to 1. Used for Bayesian inference and topic modeling.

Algorithm: Draw Y_i ~ Gamma(alpha_i, 1), then normalize X = Y / sum(Y).
Supports double[] and NDArray alpha, Shape or int[] size parameters.
Output shape is (*size, k) where k is length of alpha.
Draw samples from the Gumbel distribution used to model distribution
of maximum/minimum values. Common in extreme value statistics.

PDF: p(x) = (1/scale) * exp(-(x-loc)/scale) * exp(-exp(-(x-loc)/scale))
Mean = loc + scale * γ (Euler-Mascheroni ≈ 0.5772)
Algorithm matches NumPy's random_gumbel in distributions.c.
Draw samples from hypergeometric distribution - models drawing without
replacement (e.g., white/black marbles from an urn).

Parameters: ngood, nbad (populations), nsample (draws).
Returns: number of "good" items in sample.
Mean = nsample * ngood / (ngood + nbad)
Uses complement optimization for large samples.
Draw samples from Laplace distribution - similar to Gaussian but with
sharper peak and fatter tails. Represents difference between two
independent exponential random variables.

PDF: f(x; μ, λ) = (1/2λ) * exp(-|x - μ| / λ)
Algorithm matches NumPy's random_laplace in distributions.c.
Draw samples from logistic distribution - similar to normal but with
heavier tails. Used in extreme value problems, finance, growth modeling.

PDF: f(x; μ, s) = exp(-(x-μ)/s) / (s * (1 + exp(-(x-μ)/s))^2)
Mean = loc, Variance = scale^2 * π^2 / 3
Uses inverse transform: X = loc + scale * ln(U / (1 - U))
Draw samples from multinomial distribution - multivariate generalization
of binomial. Models n experiments where each yields one of k outcomes.

Algorithm from NumPy: for each category i, draw binomial with
adjusted probability, subtract from remaining trials.
Output shape is (*size, k) where k is length of pvals.
Each row sums to n.
Draw samples from N-dimensional normal distribution specified by
mean vector and covariance matrix.

Algorithm: Cholesky decomposition L = cholesky(cov), then
X = mean + L @ Z where Z ~ N(0, I).
Supports check_valid modes: "warn", "raise", "ignore".
Includes fallback for nearly positive-semidefinite matrices.
Draw samples from negative binomial - models number of failures
before n successes with probability p per trial.

Mean = n * (1-p) / p, Variance = n * (1-p) / p^2
Algorithm: Gamma-Poisson mixture (Y ~ Gamma, X ~ Poisson(Y)).
Includes Marsaglia-Tsang gamma sampling and Knuth/rejection Poisson.
Draw samples from noncentral chi-square - generalization of chi-square
with non-centrality parameter. Mean = df + nonc.

NumPy algorithm:
- nonc == 0: return chisquare(df)
- df > 1: return chisquare(df-1) + (N(0,1) + sqrt(nonc))^2
- df <= 1: return chisquare(df + 2*Poisson(nonc/2))
Draw samples from noncentral F distribution - used for power analysis
in hypothesis testing when null hypothesis is false.

Algorithm from NumPy:
t = noncentral_chisquare(dfnum, nonc) * dfden
return t / (chisquare(dfden) * dfnum)
Draw samples from Pareto II (Lomax) distribution - not classical Pareto.
Used for modeling heavy-tailed phenomena like wealth distribution.

PDF: f(x; a) = a / (1 + x)^(a+1) for x >= 0
Mean = 1/(a-1) for a > 1
Uses inverse transform: X = (1 / U^(1/a)) - 1
Draw samples in [0, 1] from power distribution with exponent a-1.
Also known as power function distribution - inverse of Pareto.

PDF: P(x; a) = a * x^(a-1) for 0 <= x <= 1, a > 0
Special case of Beta distribution.
Uses inverse transform: U^(1/a)
Draw samples from Rayleigh distribution - models wind speed when
East/North components have identical zero-mean Gaussian distributions.

PDF: P(x; scale) = (x/scale^2) * exp(-x^2/(2*scale^2))
Mean = scale * sqrt(π/2) ≈ 1.253 * scale
Algorithm: scale * sqrt(-2 * log(1 - U))
Draw samples from standard exponential distribution (scale=1).
Equivalent to exponential(scale=1.0, size=size).

Mean = 1, Variance = 1
Uses inverse transform: X = -log(1 - U) where U ~ Uniform(0, 1)
Draw samples from standard Gamma distribution (scale=1).
For different scale, multiply result: scale * standard_gamma(shape).

PDF: p(x) = x^(shape-1) * e^(-x) / Gamma(shape)
shape=0 returns all zeros.
Reuses SampleStandardGamma() from np.random.standard_t.cs.
Draw samples from Wald (inverse Gaussian) distribution - first studied
in relation to Brownian motion. Approaches Gaussian as scale → ∞.

PDF: P(x;μ,λ) = √(λ/(2πx³)) * exp(-λ(x-μ)²/(2μ²x))
Algorithm from NumPy's random_wald in distributions.c.
Draw samples from Zipf distribution - models word frequency in texts,
city sizes, and many power-law phenomena.

PMF: p(k) = k^(-a) / ζ(a) where ζ is Riemann zeta function
Requires a > 1. Returns positive integers (long).
Uses rejection sampling algorithm from NumPy's random_zipf.
This was referenced Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Random, Alignment] Refactor np.random.* to match NumPy parity

1 participant