Skip to content

Commit 3b902f8

Browse files
committed
Updating technical content of docs and improving navigation
1 parent 0b12ef6 commit 3b902f8

20 files changed

Lines changed: 3042 additions & 186 deletions

docs/data/interpolation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Data and Interpolation
22

3-
[← Previous: Hypothesis Tests](../statistics/hypothesis-tests.md) | [Back to Index](../index.md) | [Next: Linear Regression →](regression.md)
3+
[← Previous: ODE Solvers](../mathematics/ode-solvers.md) | [Back to Index](../index.md) | [Next: Linear Regression →](regression.md)
44

55
Interpolation is the process of estimating values between known data points. Given a set of $n$ data points $(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)$, interpolation constructs a function $p(x)$ that passes through all the data points, i.e., $p(x_i) = y_i$ for all $i$. This is distinct from regression, which fits a function that approximates the data while minimizing some error criterion.
66

@@ -360,4 +360,4 @@ for (double t = 0; t <= 1; t += 0.2)
360360

361361
---
362362

363-
[← Previous: Hypothesis Tests](../statistics/hypothesis-tests.md) | [Back to Index](../index.md) | [Next: Linear Regression →](regression.md)
363+
[← Previous: ODE Solvers](../mathematics/ode-solvers.md) | [Back to Index](../index.md) | [Next: Linear Regression →](regression.md)

docs/data/time-series.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Time Series
22

3-
[← Previous: Linear Regression](regression.md) | [Back to Index](../index.md) | [Next: Random Generation](../sampling/random-generation.md)
3+
[← Previous: Linear Regression](regression.md) | [Back to Index](../index.md) | [Next: Descriptive Statistics](../statistics/descriptive.md)
44

55
The ***Numerics*** library provides a comprehensive `TimeSeries` class for working with time-indexed data. This class supports regular and irregular time intervals, statistical operations, transformations, and analysis methods essential for hydrological and environmental data.
66

@@ -721,4 +721,4 @@ for (int m = 1; m <= 12; m++)
721721

722722
---
723723

724-
[← Previous: Linear Regression](regression.md) | [Back to Index](../index.md) | [Next: Random Generation](../sampling/random-generation.md)
724+
[← Previous: Linear Regression](regression.md) | [Back to Index](../index.md) | [Next: Descriptive Statistics](../statistics/descriptive.md)

docs/distributions/multivariate.md

Lines changed: 213 additions & 21 deletions
Large diffs are not rendered by default.

docs/distributions/parameter-estimation.md

Lines changed: 190 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ public enum ParameterEstimationMethod
2727
| **Method of Moments** | Simple, fast | Inefficient, biased for small samples | Quick estimates, stable parameters |
2828
| **Method of Percentiles** | Intuitive, robust | Less efficient | Expert judgment, special cases |
2929

30-
**Recommendation for Hydrological Applications:** L-Moments are recommended by USGS [[1]](#1) for flood frequency analysis due to superior performance with small samples and robustness to outliers.
30+
**Recommendation for Hydrological Applications:** L-moments are recommended by USGS [[1]](#1) for flood frequency analysis due to superior performance with small samples and robustness to outliers.
3131

3232
## Using the Estimate() Method
3333

@@ -78,7 +78,7 @@ double[] data = { 12500, 15300, 11200, 18700, 14100, 16800, 13400, 17200 };
7878
// Step 1: Compute L-moments from data
7979
double[] lMoments = Statistics.LinearMoments(data);
8080

81-
Console.WriteLine("Sample L-Moments:");
81+
Console.WriteLine("Sample L-moments:");
8282
Console.WriteLine($" λ₁ (mean): {lMoments[0]:F2}");
8383
Console.WriteLine($" λ₂ (L-scale): {lMoments[1]:F2}");
8484
Console.WriteLine($" τ₃ (L-skewness): {lMoments[2]:F4}");
@@ -146,12 +146,58 @@ L-moments are linear combinations of order statistics that provide robust altern
146146
- Hydrological applications
147147
- Extreme value analysis
148148

149+
### Mathematical Formulation
150+
151+
L-moments are defined through probability-weighted moments (PWMs). For a random variable $X$ with CDF $F(x)$, the probability-weighted moments are:
152+
153+
```math
154+
\beta_r = E\left[X \cdot F(X)^r\right] = \int_0^1 x(F) \cdot F^r \, dF, \quad r = 0, 1, 2, \ldots
155+
```
156+
157+
The first four L-moments are linear combinations of the PWMs:
158+
159+
```math
160+
\lambda_1 = \beta_0
161+
```
162+
163+
```math
164+
\lambda_2 = 2\beta_1 - \beta_0
165+
```
166+
167+
```math
168+
\lambda_3 = 6\beta_2 - 6\beta_1 + \beta_0
169+
```
170+
171+
```math
172+
\lambda_4 = 20\beta_3 - 30\beta_2 + 12\beta_1 - \beta_0
173+
```
174+
175+
The L-moment ratios, which are dimensionless and bounded, are defined as:
176+
177+
```math
178+
\tau = \frac{\lambda_2}{\lambda_1} \quad \text{(L-CV)}, \qquad \tau_3 = \frac{\lambda_3}{\lambda_2} \quad \text{(L-skewness)}, \qquad \tau_4 = \frac{\lambda_4}{\lambda_2} \quad \text{(L-kurtosis)}
179+
```
180+
181+
L-skewness is bounded in $[-1, 1]$ and L-kurtosis in $[\frac{1}{4}(5\tau_3^2 - 1),\; 1]$, unlike conventional skewness and kurtosis which are unbounded. This boundedness makes L-moment ratios more interpretable and stable.
182+
183+
**Sample estimation.** Given a sorted sample $x_{1:n} \leq x_{2:n} \leq \cdots \leq x_{n:n}$, the unbiased sample PWM estimators are:
184+
185+
```math
186+
b_r = \frac{1}{n}\sum_{j=r+1}^{n} \frac{\binom{j-1}{r}}{\binom{n-1}{r}} \, x_{j:n}, \quad r = 0, 1, 2, \ldots
187+
```
188+
189+
The `Statistics.LinearMoments()` method computes these sample PWMs and returns the array $[\lambda_1,\; \lambda_2,\; \tau_3,\; \tau_4]$.
190+
191+
**Why L-moments are preferred for small samples.** Conventional moments involve powers of deviations from the mean, so a single extreme observation can dominate the skewness or kurtosis estimate. L-moments use only linear combinations of order statistics, which makes them far more robust to outliers and nearly unbiased even for samples as small as $n = 10$. For hydrological applications where sample sizes are often 30--60 years of annual data, this robustness is critical.
192+
193+
**L-moment ratio diagrams.** Plotting sample L-skewness ($\tau_3$) against L-kurtosis ($\tau_4$) and comparing to the theoretical curves of candidate distributions is a powerful tool for distribution identification. Each distribution family traces a distinct curve (or point) in L-moment ratio space, making visual comparison straightforward [[2]](#2).
194+
149195
### Properties of L-Moments
150196

151-
1. **More robust** than conventional moments - less influenced by outliers
197+
1. **More robust** than conventional moments -- less influenced by outliers
152198
2. **Less biased** for small samples
153-
3. **More efficient** - smaller sampling variance
154-
4. **Bounded** - L-moment ratios are bounded, unlike conventional moments
199+
3. **More efficient** -- smaller sampling variance
200+
4. **Bounded** -- L-moment ratios are bounded, unlike conventional moments
155201
5. **Nearly unbiased** even for very small samples (n = 10)
156202

157203
### Computing L-Moments
@@ -213,7 +259,53 @@ Console.WriteLine($"{"Sample",-12} | {sampleLM[2],11:F4} | {sampleLM[3],11:F4}")
213259

214260
## Maximum Likelihood Estimation
215261

216-
MLE finds parameters that maximize the likelihood of observing the data [[3]](#3):
262+
Maximum Likelihood Estimation (MLE) finds the parameter values that make the observed data most probable under the assumed model [[3]](#3).
263+
264+
### Mathematical Formulation
265+
266+
Given independent observations $x_1, x_2, \ldots, x_n$ from a distribution with PDF $f(x|\boldsymbol{\theta})$, the likelihood function is the joint probability of the data viewed as a function of the parameters:
267+
268+
```math
269+
L(\boldsymbol{\theta} \,|\, \mathbf{x}) = \prod_{i=1}^{n} f(x_i \,|\, \boldsymbol{\theta})
270+
```
271+
272+
Because products are numerically unstable, optimization is performed on the log-likelihood:
273+
274+
```math
275+
\ell(\boldsymbol{\theta}) = \sum_{i=1}^{n} \log f(x_i \,|\, \boldsymbol{\theta})
276+
```
277+
278+
The MLE is the parameter vector that maximizes the log-likelihood:
279+
280+
```math
281+
\hat{\boldsymbol{\theta}}_{\text{MLE}} = \underset{\boldsymbol{\theta}}{\text{argmax}} \; \ell(\boldsymbol{\theta})
282+
```
283+
284+
For some distributions (e.g., Normal, Exponential), the MLE has a closed-form solution. For most distributions used in hydrology (GEV, LP3, Weibull), the optimization must be solved numerically. The library uses constrained optimization with initial values derived from L-moment estimates.
285+
286+
**Fisher Information and standard errors.** The Fisher Information matrix quantifies the curvature of the log-likelihood surface at the maximum:
287+
288+
```math
289+
\mathcal{I}(\boldsymbol{\theta}) = -E\left[\frac{\partial^2 \ell}{\partial \boldsymbol{\theta} \, \partial \boldsymbol{\theta}^T}\right]
290+
```
291+
292+
Under regularity conditions, the MLE is asymptotically normal [[5]](#5):
293+
294+
```math
295+
\sqrt{n}\left(\hat{\boldsymbol{\theta}} - \boldsymbol{\theta}\right) \xrightarrow{d} N\left(\mathbf{0},\; \mathcal{I}(\boldsymbol{\theta})^{-1}\right) \quad \text{as } n \to \infty
296+
```
297+
298+
This provides approximate standard errors for each parameter:
299+
300+
```math
301+
\text{SE}(\hat{\theta}_j) \approx \frac{1}{\sqrt{\mathcal{I}(\hat{\boldsymbol{\theta}})_{jj}}}
302+
```
303+
304+
**Strengths:** Asymptotically efficient (achieves the lowest possible variance among consistent estimators), asymptotically unbiased, invariant under reparameterization, provides a natural framework for model comparison via AIC and BIC.
305+
306+
**Weaknesses:** Requires numerical optimization that may fail to converge, sensitive to outliers, can be biased and inefficient for small samples, requires specification of the full probability model.
307+
308+
### Using MLE
217309

218310
```cs
219311
using Numerics.Distributions;
@@ -275,7 +367,49 @@ catch (Exception ex)
275367

276368
## Method of Moments
277369

278-
MOM matches sample moments with theoretical moments:
370+
The Method of Moments (MOM) is the oldest and simplest approach to parameter estimation. The core idea is to equate sample moments to the corresponding theoretical moments of the distribution and solve for the unknown parameters.
371+
372+
### Mathematical Formulation
373+
374+
Given a sample $x_1, x_2, \ldots, x_n$, the first four sample moments are the mean, standard deviation, skewness, and kurtosis:
375+
376+
```math
377+
\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i
378+
```
379+
380+
```math
381+
s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2}
382+
```
383+
384+
```math
385+
\hat{\gamma} = \frac{n}{(n-1)(n-2)} \sum_{i=1}^{n}\left(\frac{x_i - \bar{x}}{s}\right)^3
386+
```
387+
388+
```math
389+
\hat{\kappa} = \frac{n(n+1)}{(n-1)(n-2)(n-3)} \sum_{i=1}^{n}\left(\frac{x_i - \bar{x}}{s}\right)^4 - \frac{3(n-1)^2}{(n-2)(n-3)}
390+
```
391+
392+
The `Statistics.ProductMoments()` method returns these four quantities as the array $[\bar{x},\; s,\; \hat{\gamma},\; \hat{\kappa}]$.
393+
394+
MOM estimation sets the theoretical moments equal to the sample moments and solves for the distribution parameters. For a two-parameter distribution, only the first two moments (mean and standard deviation) are needed. For three-parameter distributions, skewness is also required.
395+
396+
**Example: Normal distribution.** The Normal($\mu$, $\sigma$) has $E[X] = \mu$ and $\text{SD}[X] = \sigma$. Equating sample to theoretical moments yields:
397+
398+
```math
399+
\hat{\mu} = \bar{x}, \quad \hat{\sigma} = s
400+
```
401+
402+
**Example: Gamma distribution.** The Gamma($\kappa$, $\theta$) has $E[X] = \kappa\theta$ and $\text{Var}[X] = \kappa\theta^2$. Solving for the parameters:
403+
404+
```math
405+
\hat{\kappa} = \frac{\bar{x}^2}{s^2}, \quad \hat{\theta} = \frac{s^2}{\bar{x}}
406+
```
407+
408+
**Strengths:** Simple, closed-form solutions, always produces estimates, computationally fast.
409+
410+
**Weaknesses:** Not statistically efficient (higher variance than MLE), can produce invalid parameters for skewed distributions, estimates are sensitive to outliers because conventional moments give disproportionate weight to extreme values.
411+
412+
### Using Method of Moments
279413

280414
```cs
281415
double[] data = { 100, 105, 98, 110, 95, 102, 108, 97, 103, 106 };
@@ -298,6 +432,53 @@ Console.WriteLine($" Sample mean = {moments[0]:F2}");
298432
Console.WriteLine($" Sample std dev = {moments[1]:F2}");
299433
```
300434

435+
## Method of Percentiles
436+
437+
The Method of Percentiles (also called least-squares fitting or quantile matching) estimates parameters by matching theoretical quantiles of the distribution to empirical quantiles computed from the data.
438+
439+
### Mathematical Formulation
440+
441+
Given a sorted sample $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$, each observation is assigned a plotting position $p_i$ that estimates $F(x_{(i)})$. A common choice is the Weibull plotting position:
442+
443+
```math
444+
p_i = \frac{i}{n + 1}
445+
```
446+
447+
The parameters $\boldsymbol{\theta}$ are then chosen so that the theoretical quantile function (inverse CDF) matches the observed data as closely as possible. For a distribution with quantile function $F^{-1}(p;\,\boldsymbol{\theta})$, the parameters minimize the sum of squared differences:
448+
449+
```math
450+
\hat{\boldsymbol{\theta}} = \underset{\boldsymbol{\theta}}{\text{argmin}} \sum_{i=1}^{n} \left[x_{(i)} - F^{-1}(p_i;\,\boldsymbol{\theta})\right]^2
451+
```
452+
453+
For a two-parameter distribution, it is sufficient to select two percentiles (e.g., the median and the 84th percentile) and solve the resulting system of two equations:
454+
455+
```math
456+
F^{-1}(p_j;\,\boldsymbol{\theta}) = x_{(j)}, \quad j \in \{j_1,\, j_2\}
457+
```
458+
459+
**Strengths:** Intuitive and easy to visualize, always produces estimates, moderately robust to outliers in the tails, useful when expert judgment suggests specific quantile targets.
460+
461+
**Weaknesses:** Uses only selected data points or gives equal weight to all quantiles (not statistically efficient), lower precision than MLE or L-moments for most distributions.
462+
463+
## Estimation Method Comparison
464+
465+
The choice of estimation method depends on sample size, data quality, and application requirements. The following table summarizes the key trade-offs:
466+
467+
| Method | Efficiency | Small Samples | Robustness | Complexity | Best For |
468+
|--------|-----------|---------------|-----------|-----------|---------|
469+
| **MOM** | Low | Fair | Low | Simple | Quick estimates, stable distributions |
470+
| **L-Moments** | Moderate--High | Excellent | High | Moderate | Hydrological data, small samples |
471+
| **MLE** | Highest (asymptotic) | Poor--Fair | Low | Complex | Large samples, model comparison |
472+
| **Percentiles** | Low | Fair | Moderate | Simple | Visual fitting, expert judgment |
473+
474+
### Rules of Thumb
475+
476+
- **n < 50:** Prefer L-moments. With small samples, robustness matters more than asymptotic efficiency, and L-moment estimates are nearly unbiased.
477+
- **n > 100:** MLE becomes competitive and provides standard errors via Fisher Information, enabling confidence intervals and hypothesis tests.
478+
- **Skewed distributions:** L-moments substantially outperform MOM, because conventional skewness estimates are highly variable for small samples.
479+
- **US flood frequency analysis:** L-moments are recommended by USGS Bulletin 17C [[1]](#1). The Expected Moments Algorithm (EMA) extends the framework to handle censored and historical data.
480+
- **Model selection:** When comparing candidate distributions, MLE enables the use of information criteria (AIC, BIC) for objective model ranking.
481+
301482
## Distribution-Specific Estimation
302483

303484
### Log-Pearson Type III (USGS Bulletin 17C)
@@ -501,26 +682,6 @@ foreach (var (name, dist) in candidates)
501682
}
502683
```
503684

504-
## Estimation with Censored Data
505-
506-
For data with detection limits or censoring:
507-
508-
```cs
509-
// Low flows below detection limit (left-censored)
510-
double detectionLimit = 5.0;
511-
var observed = data.Where(x => x >= detectionLimit).ToArray();
512-
int nCensored = data.Length - observed.Length;
513-
514-
Console.WriteLine($"Observed: {observed.Length}, Censored: {nCensored}");
515-
516-
// Fit using only observed values
517-
var lognormal = new LogNormal();
518-
lognormal.Estimate(observed, ParameterEstimationMethod.MethodOfMoments);
519-
520-
// Note: This is a simple approach. For formal censored data analysis,
521-
// use MLE with censored likelihood (requires custom implementation)
522-
```
523-
524685
## Tips and Best Practices
525686

526687
### 1. Sample Size Requirements
@@ -631,6 +792,8 @@ Console.WriteLine("Fitted with historical information included");
631792

632793
<a id="4">[4]</a> Rao, A. R., & Hamed, K. H. (2000). *Flood Frequency Analysis*. CRC Press.
633794

795+
<a id="5">[5]</a> Casella, G. & Berger, R. L. (2002). *Statistical Inference* (2nd ed.). Duxbury/Thomson.
796+
634797
---
635798

636799
[← Previous: Univariate Distributions](univariate.md) | [Back to Index](../index.md) | [Next: Uncertainty Analysis →](uncertainty-analysis.md)

0 commit comments

Comments
 (0)