|
| 1 | +# single-statistics |
| 2 | + |
| 3 | +[](https://crates.io/crates/single-statistics) |
| 4 | +[](https://docs.rs/single-statistics) |
| 5 | +[](LICENSE.md) |
| 6 | + |
| 7 | +A specialized Rust library for statistical analysis of single-cell data, part of the single-rust ecosystem. |
| 8 | + |
| 9 | +## Overview |
| 10 | + |
| 11 | +`single-statistics` provides robust statistical methods for biological analysis of single-cell data, focusing on differential expression analysis, marker gene identification, and related statistical tests. This crate builds on the foundations provided by `single-algebra` while implementing biologically-relevant statistical approaches optimized for sparse single-cell data. |
| 12 | + |
| 13 | +## Features |
| 14 | + |
| 15 | +- **Differential Expression Analysis** |
| 16 | + - Parametric tests (Student's t-test, Welch's t-test) |
| 17 | + - Non-parametric tests (Mann-Whitney U test) |
| 18 | + - Effect size calculations |
| 19 | + - Parallel implementation for performance |
| 20 | + |
| 21 | +- **Multiple Testing Correction** |
| 22 | + - Bonferroni correction |
| 23 | + - Benjamini-Hochberg (FDR) |
| 24 | + - Benjamini-Yekutieli |
| 25 | + - Holm-Bonferroni |
| 26 | + - Storey's q-value |
| 27 | + |
| 28 | +- **Statistical Framework** |
| 29 | + - Generic interfaces for statistical tests |
| 30 | + - Support for sparse matrix representations |
| 31 | + - Type-safe operations via traits |
| 32 | + |
| 33 | +## Getting Started |
| 34 | + |
| 35 | +Add the crate to your Cargo.toml: |
| 36 | + |
| 37 | +```toml |
| 38 | +[dependencies] |
| 39 | +single-statistics = "0.1.0" |
| 40 | +``` |
| 41 | + |
| 42 | +## Example Usage |
| 43 | + |
| 44 | +```rust |
| 45 | +use nalgebra_sparse::CsrMatrix; |
| 46 | +use single_statistics::testing::{Alternative, MatrixStatTests, TestMethod, TTestType}; |
| 47 | + |
| 48 | +fn main() -> anyhow::Result<()> { |
| 49 | + // Create or load your expression matrix (genes x cells) |
| 50 | + let expression_matrix: CsrMatrix<f64> = // ... |
| 51 | + |
| 52 | + // Define groups (e.g., cell types, conditions) |
| 53 | + let group_ids = vec![0, 0, 0, 1, 1, 1]; |
| 54 | + |
| 55 | + // Run differential expression analysis |
| 56 | + let results = expression_matrix.differential_expression( |
| 57 | + &group_ids, |
| 58 | + TestMethod::TTest(TTestType::Welch) |
| 59 | + )?; |
| 60 | + |
| 61 | + // Get significantly differentially expressed genes |
| 62 | + let significant_genes = results.significant_indices(0.05); |
| 63 | + println!("Found {} significant genes", significant_genes.len()); |
| 64 | + |
| 65 | + // Access statistics, p-values, and effect sizes |
| 66 | + if let Some(effect_sizes) = &results.effect_sizes { |
| 67 | + for (i, &gene_idx) in significant_genes.iter().enumerate() { |
| 68 | + println!( |
| 69 | + "Gene {}: statistic = {}, p-value = {}, adjusted p-value = {}, effect size = {}", |
| 70 | + gene_idx, |
| 71 | + results.statistics[gene_idx], |
| 72 | + results.p_values[gene_idx], |
| 73 | + results.adjusted_p_values.as_ref().unwrap()[gene_idx], |
| 74 | + effect_sizes[i] |
| 75 | + ); |
| 76 | + } |
| 77 | + } |
| 78 | + |
| 79 | + Ok(()) |
| 80 | +} |
| 81 | +``` |
| 82 | + |
| 83 | +## Integration with the single-rust Ecosystem |
| 84 | + |
| 85 | +`single-statistics` is designed to work seamlessly with other components of the single-rust ecosystem: |
| 86 | + |
| 87 | +- **single-algebra**: Core algebraic operations for single-cell data |
| 88 | +- **single-clustering**: Algorithms for clustering cells |
| 89 | +- **single-utilities**: Common utilities for the ecosystem |
| 90 | + |
| 91 | +## Scope |
| 92 | + |
| 93 | +This crate focuses specifically on statistics related to differential expression and marker gene identification. It implements robust, efficient algorithms optimized for sparse data, providing statistical foundations for higher-level analyses in the single-cell domain. |
| 94 | + |
| 95 | +Features in scope: |
| 96 | +- Statistical tests relevant to single-cell RNA-seq analysis |
| 97 | +- Implementations of various hypothesis testing methods |
| 98 | +- Multiple testing correction |
| 99 | +- Effect size calculations |
| 100 | + |
| 101 | +Features out of scope (available in other crates): |
| 102 | +- General matrix statistics (in `single-algebra`) |
| 103 | +- Basic QC metrics computation (in `single-algebra`) |
| 104 | +- Plotting/visualization |
| 105 | +- Clustering algorithms (in `single-clustering`) |
| 106 | +- Batch correction |
| 107 | + |
| 108 | +## Contributing |
| 109 | + |
| 110 | +Contributions are welcome! Please feel free to submit a Pull Request. |
| 111 | + |
| 112 | +## License |
| 113 | + |
| 114 | +This project is licensed under the BSD 3-Clause License - see the [LICENSE.md](LICENSE.md) file for details. |
0 commit comments