Skip to content

Commit cb67f3f

Browse files
committed
Comprehensive bugfixes, correctness improvements, and code quality enhancements
Phase 1 - Critical correctness fixes: - Fix InverseChiSquared CDF and InverseCDF formulas (was computing survival function) - Fix NormalCopula CDF computing survival copula instead of copula - Fix AMH, Frank, and Normal copula ValidateParameter returning non-null for valid params - Fix MultipleGrubbsBeckTest static mutable fields (thread safety) - Fix TimeSeries OneQuarter interval off by factor of 4 - Fix Search.Hunt always assuming descending order for OrderedPairedData overload - Fix Bilinear UseSmartSearch copy-paste bug (X2LI never set) - Fix KMeans++ initialization always using centroid[0] - Fix SNIS weight filter wrong boolean operator (|| vs &&) - Fix KNearestNeighbors multi-row GetNeighbors overwriting results Phase 2 - High-priority fixes: - Fix TimeSeries StartDate/EndDate crash on empty series - Fix TimeSeries InterpolateMissingData index -1 crash - Fix TimeSeries CumulativeSum NaN vs zero confusion - Fix TimeSeries Divide silent infinity on zero divisor - Fix TimeSeries StandardDeviation/Duration/SummaryPercentiles NaN handling - Fix LogPearsonTypeIII Clone() dropping Base parameter - Fix ChiSquared PDF overflow for large degrees of freedom (log-space) - Fix Bernoulli Skewness/Kurtosis division by zero at p=0,1 - Fix Binomial InverseCDF off-by-one (never returns NumberOfTrials) - Fix Rayleigh MoM wrong estimator - Fix UniformDiscrete StandardDeviation wrong formula (continuous vs discrete) - Fix GeneralizedBeta Mode 0/0 at Alpha=Beta=1 - Fix TruncatedNormal ValidateParameters reading property instead of parameter - Fix EmpiricalDistribution/KernelDensity GetParameters NotImplementedException - Fix Statistics HarmonicMean missing zero guard - Fix GoodnessOfFit RSR using N instead of N-1 - Fix Histogram multiple issues (BinWidth=0, swapped args, empty guard) - Fix MCMCDiagnostics GelmanRubin wrong denominator with warmup - Fix GLM bounds inversion for negative intercept - Fix HMC boundary enforcement (add momentum reflection) - Fix NaiveBayes NaN for singleton classes - Fix JenksCluster numerically unstable variance (Welford's algorithm) - Fix ARWMH ProposalSigma division by zero when N=1 - Fix PowerFunction MinimumOfParameters/MaximumOfParameters wrong array length - Fix LinearFunction/PowerFunction SetParameters not updating _normal - Fix LinearFunction InverseFunction division by zero when Beta=0 - Fix MCMCSampler AcceptanceRates/ProgressChanged division by zero - Fix DEMCz infinite loop when PopulationMatrix.Count=1 - Fix Autocorrelation/Correlation division by zero for constant series - Fix HypothesisTests division-by-zero issues - Fix BoxCox/GrubbsBeckTest non-positive values not guarded - Remove VonMises Kurtosis dead computation Phase 3 - Security (TimeSeriesDownload): - Use static HttpClient instances (prevent socket exhaustion) - Add CancellationToken support to all public async methods - Validate redirect URLs against expected domains - Remove UseProxy=false bypass - Replace browser User-Agent with library identifier - Replace Google.com connectivity check with target endpoint - Sanitize GHCN station numbers - Fix temp file leak on error - Add explicit timeouts and retry logic with exponential backoff - Use automatic gzip decompression Phase 4 - Robustness: - Fix Brent.Solve returning 0 instead of last iterate on failure - Fix Series.SyncRoot returning unstable object - Fix FrankCopula generator division by zero at Theta=0 - Fix TruncatedDistribution ValidateParameters exact float equality - Fix Mixture EM convergence check placement - Fix Deterministic/Binomial wrong parameter names in exceptions - Fix JsonConverters silent zero-fill on malformed input - Fix SafeProgressReporter MessageCount thread safety - Fix ExtensionMethods.NextDoubles exact type check (use 'is') - Fix TimeSeries FillMissingDates O(n^2) to O(n) - Fix TimeSeries ResampleWithKNN index exceeding neighbor count - Fix AdaptiveGaussKronrod wrong exception type Phase 5 - Code quality: - Remove useless catch/throw blocks (SobolSequence) - Remove unused Microsoft.VisualBasic using (Gamma.cs) - Fix CubicSpline XML doc saying "linear interpolation" - Remove dead code (HypothesisTests, Search, YeoJohnson, DEMCzs) - Fix KernelDensity typo GuassianKernel -> GaussianKernel - Fix bitwise | vs logical || (Ordinate, Search) - Remove unused NUnit references from test project - Fix KNN XML doc "clusters" -> "nearest neighbors" - Fix PowerFunction.Minimum setter no-op -> NotSupportedException - Enable CS1591 (missing XML doc) warning and add missing XML docs - Add XML docs for BinaryHeap, Network, SafeProgressReporter, MCMCSampler, MultivariateNormal, KernelDensity, SimulatedAnnealing, TimeSeries.FillMissingDates Phase 6 - Unit tests with Python-verified reference values: - Update InverseChiSquared tests for corrected CDF/InverseCDF - Add InverseChiSquared CDF/InverseCDF roundtrip test - Add NormalCopula CDF asymmetric inputs test - Add Bernoulli boundary skewness/kurtosis test - Add Binomial InverseCDF boundary test - Add ChiSquared PDF large DoF test - Add GeneralizedBeta Mode uniform case test - Add Rayleigh MoM estimation test - Add UniformDiscrete StandardDeviation test - Add Statistics HarmonicMean zero value test - Add KNN multi-row GetNeighbors test - Add MCMCDiagnostics GelmanRubin warmup test - Add HMC non-finite gradient crash test - Fix MSTEST0037 analyzer warnings across test project
1 parent cc58b63 commit cb67f3f

72 files changed

Lines changed: 1254 additions & 490 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Numerics/Data/Interpolation/Bilinear.cs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ public bool UseSmartSearch
119119
{
120120
_useSmartSearch = value;
121121
X1LI.UseSmartSearch = value;
122-
X1LI.UseSmartSearch = value;
122+
X2LI.UseSmartSearch = value;
123123
}
124124
}
125125

Numerics/Data/Interpolation/CubicSpline.cs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ public class CubicSpline : Interpolater
6262
{
6363

6464
/// <summary>
65-
/// Construct new linear interpolation.
65+
/// Construct new cubic spline interpolation.
6666
/// </summary>
6767
/// <param name="xValues">List of x-values.</param>
6868
/// <param name="yValues">List of y-values.</param>

Numerics/Data/Interpolation/Support/Search.cs

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -762,7 +762,7 @@ public static int Hunt(double xValue, OrderedPairedData orderedPairedData, int s
762762
int XLO = start;
763763
int XHI;
764764
int XM;
765-
var ASCND = default(bool);
765+
var ASCND = orderedPairedData.OrderX == SortOrder.Ascending;
766766
int INC;
767767

768768
// validate
@@ -816,7 +816,7 @@ public static int Hunt(double xValue, OrderedPairedData orderedPairedData, int s
816816
XLO = start;
817817

818818
// Perform the hunt search algorithm
819-
if (XLO <= 0 | XLO > N)
819+
if (XLO <= 0 || XLO > N)
820820
{
821821
// The input guess is not useful.
822822
// Go immediately to bisection.
@@ -849,7 +849,6 @@ public static int Hunt(double xValue, OrderedPairedData orderedPairedData, int s
849849
else
850850
{
851851
// Hunt down
852-
XHI = XLO;
853852
XHI = XLO - 1;
854853
while (X < orderedPairedData[XLO].X == ASCND)
855854
{
@@ -965,7 +964,7 @@ public static int Hunt(double xValue, IList<Ordinate> ordinateData, int start =
965964
XLO = start;
966965

967966
// Perform the hunt search algorithm
968-
if (XLO <= 0 | XLO > N)
967+
if (XLO <= 0 || XLO > N)
969968
{
970969
// The input guess is not useful.
971970
// Go immediately to bisection.
@@ -998,7 +997,6 @@ public static int Hunt(double xValue, IList<Ordinate> ordinateData, int start =
998997
else
999998
{
1000999
// Hunt down
1001-
XHI = XLO;
10021000
XHI = XLO - 1;
10031001
while (X < ordinateData[XLO].X == ASCND)
10041002
{

Numerics/Data/Paired Data/Ordinate.cs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ public Ordinate(double xValue, double yValue)
6868
X = xValue;
6969
Y = yValue;
7070
IsValid = true;
71-
if (double.IsInfinity(X) | double.IsNaN(X) || double.IsInfinity(Y) | double.IsNaN(Y)) { IsValid = false; }
71+
if (double.IsInfinity(X) || double.IsNaN(X) || double.IsInfinity(Y) || double.IsNaN(Y)) { IsValid = false; }
7272
}
7373

7474

Numerics/Data/Statistics/Autocorrelation.cs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,7 @@ public enum Type
215215
if (acf == null) return null;
216216

217217
double den = acf[0, 1];
218+
if (den == 0) return null;
218219
for (int i = 0; i < acf.GetLength(0); i++)
219220
acf[i, 1] /= den;
220221
return acf;
@@ -239,6 +240,7 @@ public enum Type
239240
if (acf == null) return null;
240241

241242
double den = acf[0, 1];
243+
if (den == 0) return null;
242244
for (int i = 0; i < acf.GetLength(0); i++)
243245
acf[i, 1] /= den;
244246
return acf;

Numerics/Data/Statistics/BoxCox.cs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,7 @@ public static double LogJacobian(IList<double> values, double lambda)
133133
/// <param name="lambda">The transformation exponent. Range -5 to +5.</param>
134134
public static double Transform(double value, double lambda)
135135
{
136+
if (value <= 0) return double.NaN;
136137
if (Math.Abs(lambda) > 5d) return double.NaN;
137138
if (Math.Abs(lambda) < 1e-8) return Math.Log(value);
138139
return (Math.Pow(value, lambda) - 1.0d) / lambda;

Numerics/Data/Statistics/GoodnessOfFit.cs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -834,7 +834,7 @@ public static double RSR(IList<double> observedValues, IList<double> modeledValu
834834
{
835835
variance += Tools.Sqr(observedValues[i] - obsMean);
836836
}
837-
double stdDev = Math.Sqrt(variance / n);
837+
double stdDev = Math.Sqrt(variance / (n - 1));
838838

839839
return rmse / stdDev;
840840
}

Numerics/Data/Statistics/Histogram.cs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,7 @@ public Histogram(IList<double> data)
189189
// Get the bin boundaries
190190
LowerBound = data.Min();
191191
UpperBound = data.Max();
192+
if (UpperBound == LowerBound) UpperBound = LowerBound + 1.0;
192193
BinWidth = (UpperBound - LowerBound) / NumberOfBins;
193194
// Add bins
194195
double xl = LowerBound;
@@ -217,6 +218,7 @@ public Histogram(IList<double> data, int numberOfBins)
217218
NumberOfBins = numberOfBins;
218219
LowerBound = data.Min();
219220
UpperBound = data.Max();
221+
if (UpperBound == LowerBound) UpperBound = LowerBound + 1.0;
220222
BinWidth = (UpperBound - LowerBound) / NumberOfBins;
221223
// Add bins
222224
double xl = LowerBound;
@@ -322,6 +324,7 @@ public double Median
322324
int total = 0;
323325
for (int i = 0; i < _bins.Count; i++)
324326
total += _bins[i].Frequency;
327+
if (total == 0) return double.NaN;
325328
int halfTotal = (int)(total / 2d);
326329
int m = 0;
327330
int v = 0;
@@ -458,7 +461,7 @@ public int GetBinIndexOf(double value)
458461
SortBins();
459462
if (value < _bins.First().LowerBound || value > _bins.Last().UpperBound)
460463
{
461-
throw new ArgumentException("value", "The value is not contained with the histogram limits.");
464+
throw new ArgumentException("The value is not contained with the histogram limits.", nameof(value));
462465
}
463466
int idx = Search.Bisection(value, _binLimits);
464467
return idx < 0 ? 0 : idx >= _binLimits.Count ? _binLimits.Count - 1 : idx;

Numerics/Data/Statistics/HypothesisTests.cs

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ public class HypothesisTests
6565
/// <returns>Returns the 2-sided p-value of the test statistic.</returns>
6666
public static double OneSampleTtest(IList<double> sample, double populationMean = 0d)
6767
{
68+
if (sample.Count < 2) throw new ArgumentException("Sample must have at least 2 observations.", nameof(sample));
6869
var meanVar = Statistics.MeanVariance(sample);
6970
int N = sample.Count;
7071
double se = Math.Sqrt(meanVar.Item2) / Math.Sqrt(N);
@@ -82,6 +83,7 @@ public static double OneSampleTtest(IList<double> sample, double populationMean
8283
/// <returns>Returns the 2-sided p-value of the test statistic.</returns>
8384
public static double EqualVarianceTtest(IList<double> sample1, IList<double> sample2)
8485
{
86+
if (sample1.Count + sample2.Count < 3) throw new ArgumentException("Combined sample size must be at least 3.");
8587
var meanVar1 = Statistics.MeanVariance(sample1);
8688
var meanVar2 = Statistics.MeanVariance(sample2);
8789
int N1 = sample1.Count;
@@ -144,10 +146,12 @@ public static double PairedTtest(IList<double> sample1, IList<double> sample2)
144146
/// <returns>Returns the p-value of the test statistic.</returns>
145147
public static double Ftest(IList<double> sample1, IList<double> sample2)
146148
{
149+
if (sample1.Count < 2 || sample2.Count < 2) throw new ArgumentException("Each sample must have at least 2 observations.");
147150
var meanVar1 = Statistics.MeanVariance(sample1);
148151
var meanVar2 = Statistics.MeanVariance(sample2);
149152
int n1 = sample1.Count, n2 = sample2.Count;
150153
double ave1 = meanVar1.Item1, var1 = meanVar1.Item2, ave2 = meanVar2.Item1, var2 = meanVar2.Item2, df1, df2, f, pVal;
154+
if (var1 == 0 && var2 == 0) return 1.0;
151155
if (var1 > var2)
152156
{
153157
f = var1 / var2;
@@ -177,6 +181,8 @@ public static double Ftest(IList<double> sample1, IList<double> sample2)
177181
/// <param name="pValue">The p-value of the test statistic.</param>
178182
public static void FtestModels(double sseRestricted, double sseFull, int dfRestricted, int dfFull, out double fStat, out double pValue)
179183
{
184+
if (dfRestricted == dfFull) throw new ArgumentException("Restricted and full model degrees of freedom cannot be equal.");
185+
if (dfFull <= 0) throw new ArgumentException("Full model degrees of freedom must be positive.", nameof(dfFull));
180186
fStat = ((sseRestricted - sseFull) / (dfRestricted - dfFull)) / (sseFull / dfFull);
181187
pValue = 2.0 * Beta.Incomplete(0.5 * dfFull, 0.5 * dfRestricted, dfFull / (dfFull + dfRestricted * fStat));
182188
if (pValue > 1.0) pValue = 2d - pValue;
@@ -329,7 +335,6 @@ public static double LinearTrendTest(IList<double> indices, IList<double> sample
329335
var yVals = new Vector(sample.ToArray());
330336
var lm = new LinearRegression(xVals, yVals, true);
331337
var tdist = new StudentT(lm.DegreesOfFreedom);
332-
double d = Math.Abs(lm.Parameters[1] / lm.ParameterStandardErrors[1]);
333338
return (1 - tdist.CDF(Math.Abs(lm.Parameters[1] / lm.ParameterStandardErrors[1]))) * 2;
334339
}
335340

Numerics/Data/Statistics/MultipleGrubbsBeckTest.cs

Lines changed: 4 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -173,10 +173,6 @@ public static int Function(double[] X)
173173
return MGBTP;
174174
}
175175

176-
private static int _nIn;
177-
private static int _rIn;
178-
private static double _etaIn;
179-
180176
/// <summary>
181177
/// Auxiliary routine used to compute p-values (GGCRITP) for a Generalized Grubbs-Beck Test.
182178
/// </summary>
@@ -187,16 +183,10 @@ private static double GGBCRITP(int N, int R, double ETA)
187183
{
188184
return 0.5d;
189185
}
190-
else
191-
{
192-
_nIn = N;
193-
_rIn = R;
194-
_etaIn = ETA;
195-
}
196186

197-
// The original FORTRAN source code utilized a globally adaptive Gauss-Kronrod integration method.
187+
// The original FORTRAN source code utilized a globally adaptive Gauss-Kronrod integration method.
198188
// The number of low outliers computed by this method is consistent with the results from the FORTRAN code.
199-
var sr = new AdaptiveGaussKronrod(FGGB, 1E-16, 1 - 1E-16);
189+
var sr = new AdaptiveGaussKronrod((pzr) => FGGB(pzr, N, R, ETA), 1E-16, 1 - 1E-16);
200190
sr.MaxDepth = 25;
201191
sr.ReportFailure = false;
202192
sr.Integrate();
@@ -206,16 +196,13 @@ private static double GGBCRITP(int N, int R, double ETA)
206196
/// <summary>
207197
/// Auxiliary routine used in GGBCRITP
208198
/// </summary>
209-
private static double FGGB(double PZR)
199+
private static double FGGB(double PZR, int N, int R, double ETA)
210200
{
211201
double df, MuM, MuS2, VarM, VarS2, CovMS2;
212202
double EX1, EX2, EX3, EX4;
213203
double CovMS, VarS, alpha, beta;
214204
double MuMP, EtaP, H, Lambda, MuS, ncp, q, VarMP, PR, ZR, N2;
215205
double ANS;
216-
int N = _nIn;
217-
int R = _rIn;
218-
double ETA = _etaIn;
219206

220207
// Compute the value of the r-th smallest obs. based on its order statistic
221208
N2 = N - R;
@@ -261,6 +248,7 @@ public static void GrubbsBeckTest(IList<double> sample, out double XHi, out doub
261248
{
262249
// The following polynomial approximation proposed by Pilon et al. (1985)
263250
int n = sample.Count;
251+
if (sample.Any(x => x <= 0)) throw new ArgumentException("All sample values must be positive for the Grubbs-Beck test.", nameof(sample));
264252
var logSample = new double[n];
265253
for (int i = 0; i < n; i++) logSample[i] = Math.Log(sample[i]);
266254
double mean = Statistics.Mean(logSample);

0 commit comments

Comments
 (0)