[Random, Alignment] Refactor np.random.* to match NumPy parity by Nucs · Pull Request #602 · SciSharp/NumSharp

Nucs · 2026-04-11T12:01:43Z

Summary

Major refactor of NumSharp's random sampling module to achieve 1-to-1 parity with NumPy 2.x.

MT19937 RNG Engine: Replaced .NET's Subtractive Generator with Mersenne Twister - identical sequences to NumPy given same seed
25 New Distributions: weibull, standard_cauchy, vonmises, f, logseries, standard_normal, standard_t, triangular, dirichlet, gumbel, hypergeometric, laplace, logistic, multinomial, multivariate_normal, negative_binomial, noncentral_chisquare, noncentral_f, pareto, power, rayleigh, standard_exponential, standard_gamma, wald, zipf
5 Array Splitting Functions: np.split, np.array_split, np.hsplit, np.vsplit, np.dsplit
API Standardization: int[] (non-params) + params long[] size overloads across all distributions
Multivariate Normal Fix: SVD-based algorithm with NumPy eigenvector sign convention

Stats

49 commits
100 files changed
+20,930 / -666 lines
38 new test files

Breaking Changes

Change	Migration
RNG sequences differ with same seed	Re-generate expected values
`rand(5, 10)` no longer compiles	Use `rand(5L, 10L)` or `new Shape(5, 10)`
Scalar returns are 0D arrays	Add index `[0]` or use `.GetValue()`

Test plan

Build passes
Random sampling tests pass (rand, beta, gumbel verified)
Full CI run after merge

- np.split: splits array into equal sub-arrays (raises ArgumentException if not divisible) - np.array_split: splits array into sub-arrays allowing unequal division - Supports int (number of sections) and int[] (split indices) - Supports axis parameter for multi-dimensional arrays - Follows NumPy's swap-slice-swap approach for axis handling - Returns views (not copies) matching NumPy behavior - Full test coverage with NumPy 2.4.2 verified behavior NumPy reference: numpy/lib/_shape_base_impl.py (v2.4.2)

Weibull distribution implementation matching NumPy 2.x behavior: - weibull(a, size=None) - NumPy-aligned with optional size - weibull(a, Shape size) - Shape overload - weibull(a, int size) - 1D convenience - weibull(a, params int[] size) - multi-dimensional Features: - Inverse transform method: X = (-ln(1-U))^(1/a) - Validates a >= 0 (throws ArgumentException for negative) - a=0 returns all zeros (NumPy behavior) - a=1 reduces to exponential distribution (mean=1) - Returns float64 dtype Tests cover: - Scalar and array returns - Shape overloads - Validation (negative a throws, a=0 returns zeros) - Statistical properties (a=1 exponential, a=2 mean~0.886) - Non-negative output values - Heavy tail behavior (small a) and concentrated values (large a)

Standard Cauchy (Lorentz) distribution with loc=0, scale=1. Implementation: - standard_cauchy(Shape? size = null) - NumPy-aligned with optional size - standard_cauchy(Shape size) - Shape overload - standard_cauchy(int size) - 1D convenience - standard_cauchy(params int[] size) - multi-dimensional Algorithm: - Inverse transform: X = tan(pi * (U - 0.5)) where U ~ Uniform(0,1) Properties (verified in tests): - Median = 0 (mean/variance undefined due to heavy tails) - Interquartile range = 2 (Q1=-1, Q3=1) - Heavy tails produce extreme values - Returns float64 dtype Tests cover: - Scalar and array returns - Shape overloads - Statistical properties (median, IQR) - Heavy tail behavior - Reproducibility with seed

…ibution) - Draw samples from the von Mises distribution on [-pi, pi] - Matches NumPy 2.4.2 algorithm from distributions.c exactly: - kappa < 1e-8: uniform distribution on circle - 1e-5 <= kappa <= 1e6: rejection sampling algorithm - kappa > 1e6: wrapped normal approximation - Supports mu (mean direction) and kappa (concentration) parameters - kappa >= 0 required (raises ArgumentException otherwise) - Full test coverage including edge cases and statistical properties NumPy reference: numpy/random/src/distributions/distributions.c (v2.4.2)

F distribution (Fisher distribution) for ANOVA tests. Implementation: - f(dfnum, dfden, Shape? size = null) - NumPy-aligned - f(dfnum, dfden, Shape size) - Shape overload - f(dfnum, dfden, int size) - 1D convenience - f(dfnum, dfden, params int[] size) - multi-dimensional Algorithm: - F = (chi2_num * dfden) / (chi2_den * dfnum) - Reuses existing chisquare() which uses gamma() - Added GammaSample/MarsagliaSample helpers for scalar generation Validation: - dfnum > 0 required (throws ArgumentException) - dfden > 0 required (throws ArgumentException) Properties (verified): - Mean = dfden/(dfden-2) for dfden > 2 - All values positive - Returns float64 dtype Tests: 13 covering scalar/array, validation, statistics

Implements the logarithmic series distribution following NumPy 2.4.2 behavior exactly. The distribution returns positive integers k >= 1 with probability P(k) = -p^k / (k * ln(1-p)). Implementation details: - Uses NumPy's rejection sampling algorithm from distributions.c - Parameter validation: p must be in range [0, 1), throws for NaN - Returns int64 dtype (matches NumPy on 64-bit systems) - Edge cases: p=0 returns all 1s, small p produces mostly 1s Algorithm (from NumPy's random_logseries): 1. r = log(1-p) 2. If V >= p, return 1 3. q = 1 - exp(r * U) 4. If V <= q^2, return floor(1 + log(V)/log(q)) 5. If V >= q, return 1 6. Otherwise return 2 Tests cover: - Scalar and array output shapes - Small/high p value distribution characteristics - Validation (negative p, p=1, p>1, NaN) - Statistical mean approximation to theoretical value - All values >= 1 constraint Verified against NumPy 2.4.2 output via Python battletest.

Add standard_normal(Shape? size = null) overload to NumPyRandom: - Returns 0-d NDArray scalar when size is null (matches NumPy behavior) - Uses existing NextGaussian() helper for Box-Muller transform - Delegates to normal(0, 1.0, size) for array generation - Verified: randn, standard_normal, and normal(0,1) produce identical results Statistical verification: - mean ~= 0 (tested: 0.0098) - std ~= 1 (tested: 0.9933) - variance ~= 1 (tested: 0.9867) Tests: 15 test cases covering shapes, statistics, reproducibility

- Implements Student's t distribution matching NumPy 2.4.2 API - Supports df (degrees of freedom) parameter - Supports size parameter for array output - Includes unit tests

- Implements triangular distribution matching NumPy 2.4.2 API - Parameters: left, mode, right for triangular shape - Supports size parameter for array output - Includes unit tests validating shape, dtype, and bounds

Exhaustive inventory of all public APIs exposed by NumPy 2.4.2 as np.* covering constants, data types, array creation, manipulation, math functions, statistics, linear algebra, random sampling, FFT, and more. This serves as the authoritative reference for API parity work.

Documents all 142 np.* APIs in NumSharp 0.41.x with status: - 118 working, 12 partial, 12 broken/stub - Covers array creation, math, statistics, manipulation, random sampling - Tracks known behavioral differences from NumPy 2.x - Lists missing functions and TODO items for future work

…ions - hsplit: splits horizontally (column-wise) along axis 1 for 2D+, axis 0 for 1D - vsplit: splits vertically (row-wise) along axis 0, requires ndim >= 2 - dsplit: splits along depth (axis 2), requires ndim >= 3 All support both integer (equal division) and indices array overloads. Delegate to existing np.split with appropriate axis parameter. NumPy-aligned parameter validation and error messages.

Comprehensive tests based on NumPy 2.4.2 behavior: - hsplit: 1D/2D/3D arrays, indices overload, 0D error handling - vsplit: 2D/3D arrays, indices overload, 1D error handling - dsplit: 3D/4D arrays, indices overload, 1D/2D error handling Verifies shape correctness and element values match NumPy output.

Draw samples from the Dirichlet distribution - a distribution over vectors that sum to 1. Used for Bayesian inference and topic modeling. Algorithm: Draw Y_i ~ Gamma(alpha_i, 1), then normalize X = Y / sum(Y). Supports double[] and NDArray alpha, Shape or int[] size parameters. Output shape is (*size, k) where k is length of alpha.

Draw samples from the Gumbel distribution used to model distribution of maximum/minimum values. Common in extreme value statistics. PDF: p(x) = (1/scale) * exp(-(x-loc)/scale) * exp(-exp(-(x-loc)/scale)) Mean = loc + scale * γ (Euler-Mascheroni ≈ 0.5772) Algorithm matches NumPy's random_gumbel in distributions.c.

Draw samples from hypergeometric distribution - models drawing without replacement (e.g., white/black marbles from an urn). Parameters: ngood, nbad (populations), nsample (draws). Returns: number of "good" items in sample. Mean = nsample * ngood / (ngood + nbad) Uses complement optimization for large samples.

Draw samples from Laplace distribution - similar to Gaussian but with sharper peak and fatter tails. Represents difference between two independent exponential random variables. PDF: f(x; μ, λ) = (1/2λ) * exp(-|x - μ| / λ) Algorithm matches NumPy's random_laplace in distributions.c.

Draw samples from logistic distribution - similar to normal but with heavier tails. Used in extreme value problems, finance, growth modeling. PDF: f(x; μ, s) = exp(-(x-μ)/s) / (s * (1 + exp(-(x-μ)/s))^2) Mean = loc, Variance = scale^2 * π^2 / 3 Uses inverse transform: X = loc + scale * ln(U / (1 - U))

Draw samples from multinomial distribution - multivariate generalization of binomial. Models n experiments where each yields one of k outcomes. Algorithm from NumPy: for each category i, draw binomial with adjusted probability, subtract from remaining trials. Output shape is (*size, k) where k is length of pvals. Each row sums to n.

Draw samples from N-dimensional normal distribution specified by mean vector and covariance matrix. Algorithm: Cholesky decomposition L = cholesky(cov), then X = mean + L @ Z where Z ~ N(0, I). Supports check_valid modes: "warn", "raise", "ignore". Includes fallback for nearly positive-semidefinite matrices.

Draw samples from negative binomial - models number of failures before n successes with probability p per trial. Mean = n * (1-p) / p, Variance = n * (1-p) / p^2 Algorithm: Gamma-Poisson mixture (Y ~ Gamma, X ~ Poisson(Y)). Includes Marsaglia-Tsang gamma sampling and Knuth/rejection Poisson.

Draw samples from noncentral chi-square - generalization of chi-square with non-centrality parameter. Mean = df + nonc. NumPy algorithm: - nonc == 0: return chisquare(df) - df > 1: return chisquare(df-1) + (N(0,1) + sqrt(nonc))^2 - df <= 1: return chisquare(df + 2*Poisson(nonc/2))

Draw samples from noncentral F distribution - used for power analysis in hypothesis testing when null hypothesis is false. Algorithm from NumPy: t = noncentral_chisquare(dfnum, nonc) * dfden return t / (chisquare(dfden) * dfnum)

Draw samples from Pareto II (Lomax) distribution - not classical Pareto. Used for modeling heavy-tailed phenomena like wealth distribution. PDF: f(x; a) = a / (1 + x)^(a+1) for x >= 0 Mean = 1/(a-1) for a > 1 Uses inverse transform: X = (1 / U^(1/a)) - 1

Draw samples in [0, 1] from power distribution with exponent a-1. Also known as power function distribution - inverse of Pareto. PDF: P(x; a) = a * x^(a-1) for 0 <= x <= 1, a > 0 Special case of Beta distribution. Uses inverse transform: U^(1/a)

Draw samples from Rayleigh distribution - models wind speed when East/North components have identical zero-mean Gaussian distributions. PDF: P(x; scale) = (x/scale^2) * exp(-x^2/(2*scale^2)) Mean = scale * sqrt(π/2) ≈ 1.253 * scale Algorithm: scale * sqrt(-2 * log(1 - U))

Draw samples from standard exponential distribution (scale=1). Equivalent to exponential(scale=1.0, size=size). Mean = 1, Variance = 1 Uses inverse transform: X = -log(1 - U) where U ~ Uniform(0, 1)

Draw samples from standard Gamma distribution (scale=1). For different scale, multiply result: scale * standard_gamma(shape). PDF: p(x) = x^(shape-1) * e^(-x) / Gamma(shape) shape=0 returns all zeros. Reuses SampleStandardGamma() from np.random.standard_t.cs.

Draw samples from Wald (inverse Gaussian) distribution - first studied in relation to Brownian motion. Approaches Gaussian as scale → ∞. PDF: P(x;μ,λ) = √(λ/(2πx³)) * exp(-λ(x-μ)²/(2μ²x)) Algorithm from NumPy's random_wald in distributions.c.

Draw samples from Zipf distribution - models word frequency in texts, city sizes, and many power-law phenomena. PMF: p(k) = k^(-a) / ζ(a) where ζ is Riemann zeta function Requires a > 1. Returns positive integers (long). Uses rejection sampling algorithm from NumPy's random_zipf.

Nucs added 30 commits April 11, 2026 07:47

feat(random): add np.random.standard_t distribution

070736e

- Implements Student's t distribution matching NumPy 2.4.2 API - Supports df (degrees of freedom) parameter - Supports size parameter for array output - Includes unit tests

feat(random): add np.random.triangular distribution

2ecdf97

- Implements triangular distribution matching NumPy 2.4.2 API - Parameters: left, mode, right for triangular shape - Supports size parameter for array output - Includes unit tests validating shape, dtype, and bounds

feat: implement np.random.noncentral_f distribution

58d683f

Draw samples from noncentral F distribution - used for power analysis in hypothesis testing when null hypothesis is false. Algorithm from NumPy: t = noncentral_chisquare(dfnum, nonc) * dfden return t / (chisquare(dfden) * dfnum)

feat: implement np.random.standard_exponential distribution

bca6747

Draw samples from standard exponential distribution (scale=1). Equivalent to exponential(scale=1.0, size=size). Mean = 1, Variance = 1 Uses inverse transform: X = -log(1 - U) where U ~ Uniform(0, 1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Random, Alignment] Refactor np.random.* to match NumPy parity#602

[Random, Alignment] Refactor np.random.* to match NumPy parity#602
Nucs merged 62 commits intomasterfrom
npalign

Nucs commented Apr 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Nucs commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Stats

Breaking Changes

Related

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Nucs commented Apr 11, 2026 •

edited

Loading