diff --git a/AGENTS.md b/AGENTS.md index 108414a..2c117e8 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -13,6 +13,8 @@ > [preseq](https://github.com/smithlabcode/preseq) `lc_extrap`) > - **samtools-compatible** flagstat, idxstats, and full stats output > - **Gene body coverage** profiling with Qualimap-compatible output +> - **bigWig** genome coverage tracks (nf-core/rnaseq-compatible; bedtools +> `genomecov` + UCSC `bedClip` semantics) > > Binary crate (`rustqc`), Rust edition 2021. @@ -63,7 +65,7 @@ src/ io.rs — Shared I/O utilities (gzip-transparent file reading) gtf.rs — GTF annotation file parser (with configurable attribute extraction) rna/ - mod.rs — Re-exports all submodules (dupradar, featurecounts, rseqc, bam_flags, cpp_rng, preseq, qualimap) + mod.rs — Re-exports all submodules (dupradar, featurecounts, rseqc, bam_flags, cpp_rng, preseq, qualimap, bigwig) bam_flags.rs — BAM flag constants cpp_rng.rs — C++ RNG FFI shim for preseq bootstrap reproducibility dupradar/ @@ -76,6 +78,10 @@ src/ mod.rs — Re-exports output output.rs — featureCounts-format output & biotype counting preseq.rs — preseq lc_extrap library complexity extrapolation + bigwig/ + mod.rs — Re-exports accumulator and output + accumulator.rs — bedtools genomecov difference-array coverage accumulation + output.rs — bedClip semantics + bigWig writing via bigtools qualimap/ mod.rs — Re-exports all Qualimap modules accumulator.rs — Gene body coverage accumulation logic @@ -120,8 +126,9 @@ dupRadar duplicate rate analysis, featureCounts-compatible gene counting, all 8 RSeQC-equivalent tools (bam_stat, infer_experiment, read_duplication, read_distribution, junction_annotation, junction_saturation, inner_distance, TIN), preseq library complexity extrapolation, samtools-compatible outputs -(flagstat, idxstats, stats), and gene body coverage profiling with -Qualimap-compatible output. The GTF parser extracts transcript-level +(flagstat, idxstats, stats), gene body coverage profiling with +Qualimap-compatible output, and nf-core/rnaseq-compatible bigWig coverage +tracks. The GTF parser extracts transcript-level structure (exons + CDS features) to derive all data needed by every tool. Individual tools can be disabled via the YAML config file (each has an `enabled` @@ -255,6 +262,8 @@ To prepare a release: | `rayon` | Data parallelism | | `rand` / `rand_chacha` | Reproducible random sampling | | `flate2` | Gzip decompression (annotation files) | +| `bigtools` | bigWig file writing | +| `tokio` | Async runtime for bigtools | ## Duplicate Marking Validation @@ -294,7 +303,7 @@ forwarded to `count_reads()` as the `skip_dup_check: bool` parameter). `infer_experiment:`, `read_duplication:`, `read_distribution:`, `junction_annotation:`, `junction_saturation:`, `inner_distance:`, `tin:`). Each has an `enabled: bool` toggle and tool-specific parameter overrides. CLI flags take precedence over config values. -- Under `rna:`, there are also sections for `preseq:`, `qualimap:`, +- Under `rna:`, there are also sections for `preseq:`, `qualimap:`, `bigwig:`, `flagstat:`, `idxstats:`, and `samtools_stats:`. Each has an `enabled: bool` toggle. Preseq has additional parameters: `max_extrap`, `step_size`, `n_bootstraps`, `confidence_level`, `seed`, `max_terms`, `defects`. TIN has `sample_size` and diff --git a/CHANGELOG.md b/CHANGELOG.md index 2b67cce..6712497 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,15 @@ # RustQC Changelog +## Unreleased + +### Features + +- Add bigWig genome coverage tracks in the existing BAM streaming pass, matching nf-core/rnaseq `bedtools genomecov` + UCSC `bedClip` + `bedGraphToBigWig` output ([#112](https://github.com/seqeralabs/RustQC/issues/112)) + +### Documentation + +- Document bigWig coverage tracks on the docs site, README, AGENTS.md, and related markdown files + ## [Version 0.2.1](https://github.com/seqeralabs/RustQC/releases/tag/v0.2.1) - 2026-04-09 ### Bug fixes diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 181de96..65b30ad 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -86,7 +86,7 @@ src/ gtf.rs GTF annotation file parser rna/ mod.rs Re-exports all submodules (bam_flags, cpp_rng, dupradar, - featurecounts, preseq, qualimap, rseqc) + featurecounts, preseq, qualimap, bigwig, rseqc) bam_flags.rs BAM flag constants cpp_rng.rs C++ RNG FFI shim for preseq bootstrap reproducibility dupradar/ @@ -99,6 +99,10 @@ src/ mod.rs Re-exports output output.rs featureCounts-format output and biotype counting preseq.rs preseq lc_extrap library complexity extrapolation + bigwig/ + mod.rs Re-exports accumulator and output + accumulator.rs bedtools genomecov coverage accumulation + output.rs bedClip semantics + bigWig writing qualimap/ mod.rs Re-exports accumulator, coverage, index, output, plots, report accumulator.rs Gene body coverage accumulator diff --git a/Cargo.lock b/Cargo.lock index da20215..a3844c5 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -88,6 +88,37 @@ version = "1.5.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "c08606f8c3cbf4ce6ec8e28fb0014a2c086708fe954eaa885384a6165172e7e8" +[[package]] +name = "bigtools" +version = "0.5.8" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "af1b9bbf6596d602e472a23ed5aa5d611fb04f14e7772226fb61c720e806202e" +dependencies = [ + "bincode", + "byteorder", + "byteordered", + "crossbeam-channel", + "crossbeam-utils", + "futures", + "index_list", + "itertools 0.10.5", + "libdeflater", + "serde", + "smallvec", + "tempfile", + "thiserror 1.0.69", + "tokio", +] + +[[package]] +name = "bincode" +version = "1.3.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b1f45e9417d87227c7a56d22e471c6206462cba514c7590c09aff4cf6d1ddcad" +dependencies = [ + "serde", +] + [[package]] name = "bindgen" version = "0.69.5" @@ -97,7 +128,7 @@ dependencies = [ "bitflags 2.11.0", "cexpr", "clang-sys", - "itertools", + "itertools 0.12.1", "lazy_static", "lazycell", "proc-macro2", @@ -151,6 +182,15 @@ version = "1.5.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1fd0f2584146f6f2ef48085050886acf353beff7305ebd1ae69500e27c67f64b" +[[package]] +name = "byteordered" +version = "0.6.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bbf2cd9424f5ff404aba1959c835cbc448ee8b689b870a9981c76c0fd46280e6" +dependencies = [ + "byteorder", +] + [[package]] name = "bzip2-sys" version = "0.1.13+1.0.8" @@ -373,6 +413,15 @@ dependencies = [ "cfg-if", ] +[[package]] +name = "crossbeam-channel" +version = "0.5.15" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "82b8f8f868b36967f9606790d1903570de9ceaf870a7bf9fbbd3016d636a2cb2" +dependencies = [ + "crossbeam-utils", +] + [[package]] name = "crossbeam-deque" version = "0.8.6" @@ -535,6 +584,22 @@ version = "1.0.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "877a4ace8713b0bcf2a4e7eec82529c029f1d0619886d18145fea96c3ffe5c0f" +[[package]] +name = "errno" +version = "0.3.14" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "39cab71617ae0d63f51a36d69f866391735b51691dbda63cf6f96d042b63efeb" +dependencies = [ + "libc", + "windows-sys 0.61.2", +] + +[[package]] +name = "fastrand" +version = "2.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9f1f227452a390804cdb637b74a86990f2a7d7ba4b7d5693aac9b4dd6defd8d6" + [[package]] name = "fdeflate" version = "0.3.7" @@ -653,6 +718,94 @@ dependencies = [ "quick-error", ] +[[package]] +name = "futures" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8b147ee9d1f6d097cef9ce628cd2ee62288d963e16fb287bd9286455b241382d" +dependencies = [ + "futures-channel", + "futures-core", + "futures-executor", + "futures-io", + "futures-sink", + "futures-task", + "futures-util", +] + +[[package]] +name = "futures-channel" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "07bbe89c50d7a535e539b8c17bc0b49bdb77747034daa8087407d655f3f7cc1d" +dependencies = [ + "futures-core", + "futures-sink", +] + +[[package]] +name = "futures-core" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7e3450815272ef58cec6d564423f6e755e25379b217b0bc688e295ba24df6b1d" + +[[package]] +name = "futures-executor" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "baf29c38818342a3b26b5b923639e7b1f4a61fc5e76102d4b1981c6dc7a7579d" +dependencies = [ + "futures-core", + "futures-task", + "futures-util", +] + +[[package]] +name = "futures-io" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cecba35d7ad927e23624b22ad55235f2239cfa44fd10428eecbeba6d6a717718" + +[[package]] +name = "futures-macro" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e835b70203e41293343137df5c0664546da5745f82ec9b84d40be8336958447b" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "futures-sink" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c39754e157331b013978ec91992bde1ac089843443c49cbc7f46150b0fad0893" + +[[package]] +name = "futures-task" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "037711b3d59c33004d3856fbdc83b99d4ff37a24768fa1be9ce3538a1cde4393" + +[[package]] +name = "futures-util" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "389ca41296e6190b48053de0321d02a77f32f8a5d2461dd38762c0593805c6d6" +dependencies = [ + "futures-channel", + "futures-core", + "futures-io", + "futures-macro", + "futures-sink", + "futures-task", + "memchr", + "pin-project-lite", + "slab", +] + [[package]] name = "getrandom" version = "0.2.17" @@ -896,6 +1049,12 @@ dependencies = [ "png", ] +[[package]] +name = "index_list" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "30141a73bc8a129ac1ce472e33f45af3e2091d86b3479061b9c2f92fdbe9a28c" + [[package]] name = "indexmap" version = "2.13.0" @@ -927,6 +1086,15 @@ version = "1.70.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "a6cb138bb79a146c1bd460005623e142ef0181e3d0219cb493e02f7d08a35695" +[[package]] +name = "itertools" +version = "0.10.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b0fd2260e829bddf4cb6ea802289de2f86d6a7a690192fbe91b3f46e0f2c8473" +dependencies = [ + "either", +] + [[package]] name = "itertools" version = "0.12.1" @@ -1016,6 +1184,24 @@ version = "0.2.184" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "48f5d2a454e16a5ea0f4ced81bd44e4cfc7bd3a507b61887c99fd3538b28e4af" +[[package]] +name = "libdeflate-sys" +version = "1.25.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "72753e0008ea87963d2f0770042d0df7abe51fafbb8dcaf618ac440f2f1fec0a" +dependencies = [ + "cc", +] + +[[package]] +name = "libdeflater" +version = "1.25.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d1ee41cf6fb1bb6030dfb59ffb7bc01ab26aade44142084c87f0fc7a1658fe71" +dependencies = [ + "libdeflate-sys", +] + [[package]] name = "libloading" version = "0.8.9" @@ -1054,6 +1240,12 @@ version = "1.2.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "bfae20f6b19ad527b550c223fddc3077a547fc70cda94b9b566575423fd303ee" +[[package]] +name = "linux-raw-sys" +version = "0.12.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "32a66949e030da00e8c7d4434b251670a91556f4144941d37452769c25d58a53" + [[package]] name = "litemap" version = "0.8.1" @@ -1198,6 +1390,12 @@ version = "2.3.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9b4f627cb1b25917193a259e49bdad08f671f8d9708acfd5fe0a8c1455d87220" +[[package]] +name = "pin-project-lite" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a89322df9ebe1c1578d689c92318e070967d1042b512afbe49518723f4e6d5cd" + [[package]] name = "pkg-config" version = "0.3.32" @@ -1475,11 +1673,25 @@ dependencies = [ "semver 1.0.27", ] +[[package]] +name = "rustix" +version = "1.1.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b6fe4565b9518b83ef4f91bb47ce29620ca828bd32cb7e408f0062e9930ba190" +dependencies = [ + "bitflags 2.11.0", + "errno", + "libc", + "linux-raw-sys", + "windows-sys 0.61.2", +] + [[package]] name = "rustqc" version = "0.2.1" dependencies = [ "anyhow", + "bigtools", "cc", "clap", "coitrees", @@ -1502,6 +1714,7 @@ dependencies = [ "serde", "serde_json", "serde_yaml_ng", + "tokio", ] [[package]] @@ -1605,6 +1818,12 @@ version = "0.3.9" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "703d5c7ef118737c72f1af64ad2f6f8c5e1921f818cdcb97b8fe6fc69bf66214" +[[package]] +name = "slab" +version = "0.4.12" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0c790de23124f9ab44544d7ac05d60440adc586479ce501c1d6d7da3cd8c9cf5" + [[package]] name = "smallvec" version = "1.15.1" @@ -1658,6 +1877,19 @@ dependencies = [ "syn", ] +[[package]] +name = "tempfile" +version = "3.27.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "32497e9a4c7b38532efcdebeef879707aa9f794296a4f0244f6f69e9bc8574bd" +dependencies = [ + "fastrand", + "getrandom 0.4.2", + "once_cell", + "rustix", + "windows-sys 0.61.2", +] + [[package]] name = "thiserror" version = "1.0.69" @@ -1708,6 +1940,15 @@ dependencies = [ "zerovec", ] +[[package]] +name = "tokio" +version = "1.52.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8fc7f01b389ac15039e4dc9531aa973a135d7a4135281b12d7c1bc79fd57fffe" +dependencies = [ + "pin-project-lite", +] + [[package]] name = "ttf-parser" version = "0.20.0" diff --git a/Cargo.toml b/Cargo.toml index ebf02f0..b3155e4 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -64,6 +64,10 @@ indicatif = "0.17" number_prefix = "0.4" serde_json = "1" +# bigWig writing (bedGraphToBigWig equivalent) +bigtools = { version = "0.5", default-features = false, features = ["write"] } +tokio = { version = "1", features = ["rt-multi-thread"] } + [build-dependencies] cc = "1" diff --git a/README.md b/README.md index f576fc7..65dc431 100644 --- a/README.md +++ b/README.md @@ -28,7 +28,7 @@

- Benchmark: RustQC ~14m 54s vs traditional tools ~15h 34m sequential (dupRadar + featureCounts + 8 RSeQC tools incl. TIN + preseq + samtools + Qualimap) + Benchmark: RustQC ~14m 54s vs traditional tools ~15h 34m sequential (dupRadar + featureCounts + 8 RSeQC tools incl. TIN + preseq + samtools + Qualimap + bigWig)

@@ -52,11 +52,12 @@ It currently includes: | TIN | [RSeQC](https://rseqc.sourceforge.net/#tin-py) `tin.py` | Transcript Integrity Number | | preseq | [preseq](http://smithlabresearch.org/software/preseq/) `lc_extrap` | Library complexity extrapolation | | Qualimap rnaseq | [Qualimap](http://qualimap.conesalab.org/) `rnaseq` | Gene body coverage, read origin, strand specificity | +| bigWig | [nf-core/rnaseq](https://nf-co.re/rnaseq) `bedtools genomecov` + UCSC utilities | Genome-wide coverage tracks (combined and per-strand) | | flagstat | [samtools](http://www.htslib.org/) `flagstat` | Alignment flag summary | | idxstats | [samtools](http://www.htslib.org/) `idxstats` | Per-chromosome read counts | | stats | [samtools](http://www.htslib.org/) `stats` | Full samtools stats output including all histogram sections | -All outputs are format- and numerically identical to the upstream tools, and compatible with [MultiQC](https://multiqc.info/) for reporting. +Most outputs are format- and numerically identical to the upstream tools, and compatible with [MultiQC](https://multiqc.info/) for reporting. bigWig coverage intervals match [bedtools](https://github.com/arq5x/bedtools2) `genomecov` exactly; binary `.bigWig` files use a different encoder but decode to the same coverage (see [bigWig docs](https://seqeralabs.github.io/RustQC/rna/bigwig/)). ## Quick start @@ -84,7 +85,7 @@ See the [documentation](https://seqeralabs.github.io/RustQC/) for full usage det ## Use as a Rust library -The crate is also published as a library, so the QC analysis modules (GTF parsing, dupRadar, featureCounts, RSeQC, Qualimap, preseq, samtools-style outputs) can be embedded into other Rust programs: +The crate is also published as a library, so the QC analysis modules (GTF parsing, dupRadar, featureCounts, RSeQC, Qualimap, bigWig, preseq, samtools-style outputs) can be embedded into other Rust programs: ```toml [dependencies] diff --git a/docs/astro.config.mjs b/docs/astro.config.mjs index 4e1c2f4..01a7fca 100644 --- a/docs/astro.config.mjs +++ b/docs/astro.config.mjs @@ -68,6 +68,7 @@ export default defineConfig({ { label: "featureCounts", slug: "rna/featurecounts" }, { label: "RSeQC", slug: "rna/rseqc" }, { label: "Qualimap", slug: "rna/qualimap" }, + { label: "bigWig", slug: "rna/bigwig" }, { label: "Preseq", slug: "rna/preseq" }, { label: "Samtools", slug: "rna/samtools" }, ], diff --git a/docs/src/content/docs/about/credits.mdx b/docs/src/content/docs/about/credits.mdx index 171086e..75bb8a8 100644 --- a/docs/src/content/docs/about/credits.mdx +++ b/docs/src/content/docs/about/credits.mdx @@ -98,6 +98,22 @@ Qualimap rnaseq report. - Website: [qualimap.conesalab.org](http://qualimap.conesalab.org/) - Publication: [doi.org/10.1093/bioinformatics/bts503](https://doi.org/10.1093/bioinformatics/bts503) +## bigWig coverage tracks + +RustQC produces nf-core/rnaseq-compatible genome-wide bigWig coverage tracks, +matching the parameters used by [bedtools](https://github.com/arq5x/bedtools2) +`genomecov` and UCSC `bedClip` / `bedGraphToBigWig`. See the +[bigWig](../rna/bigwig/) page for output files, strandedness behaviour, and +validation details. + +> Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing +> genomic features. *Bioinformatics*. 2010;26(6):841-842. + +- bedtools: [github.com/arq5x/bedtools2](https://github.com/arq5x/bedtools2) +- Publication: [doi.org/10.1093/bioinformatics/btq033](https://doi.org/10.1093/bioinformatics/btq033) +- bigWig format (UCSC): Kent WJ, et al. The human genome browser at UCSC. + *Genome Research*. 2002;12(6):996-1006. + ## Key dependencies RustQC is built with the following open-source Rust libraries: @@ -114,6 +130,8 @@ RustQC is built with the following open-source Rust libraries: | [flate2](https://github.com/rust-lang/flate2-rs) | Gzip decompression for annotation files | | [serde](https://github.com/serde-rs/serde) | YAML configuration deserialization | | [indexmap](https://github.com/indexmap-rs/indexmap) | Insertion-order-preserving maps | +| [bigtools](https://github.com/jackh726/bigtools) | bigWig and BigBed file I/O | +| [tokio](https://github.com/tokio-rs/tokio) | Async runtime (bigWig writing) | | [env_logger](https://github.com/rust-cli/env_logger) | Logging output | ## RustQC diff --git a/docs/src/content/docs/getting-started/introduction.mdx b/docs/src/content/docs/getting-started/introduction.mdx index e65f65c..63deb95 100644 --- a/docs/src/content/docs/getting-started/introduction.mdx +++ b/docs/src/content/docs/getting-started/introduction.mdx @@ -13,6 +13,7 @@ RustQC is a fast quality control toolkit for sequencing data, written in Rust. I - [preseq](https://github.com/smithlabcode/preseq): library complexity extrapolation (lc_extrap) - [samtools](http://www.htslib.org/): flagstat, idxstats, and full stats output including all histogram sections - [Qualimap](http://qualimap.conesalab.org/): gene body coverage profiling and RNA-seq QC summary +- [nf-core/rnaseq](https://nf-co.re/rnaseq) bigWig: genome-wide coverage tracks (`bedtools genomecov` + UCSC utilities) ## Why RustQC? @@ -41,6 +42,7 @@ Given a duplicate-marked BAM file and a GTF annotation, `rustqc rna` runs all of | dupRadar | [dupRadar](https://bioconductor.org/packages/dupRadar/) | RNA-seq PCR duplicate rate analysis | | featureCounts | [featureCounts](http://subread.sourceforge.net/) | Gene-level read counting and biotypes | | Qualimap rnaseq | [Qualimap](http://qualimap.conesalab.org/) rnaseq | Gene body coverage and RNA-seq QC | +| bigWig | [nf-core/rnaseq](https://nf-co.re/rnaseq) `bedtools genomecov` | Genome-wide coverage tracks | | preseq | [preseq](http://smithlabresearch.org/software/preseq/) `lc_extrap` | Library complexity extrapolation | | flagstat | `samtools flagstat` | Alignment flag statistics | | idxstats | `samtools idxstats` | Per-chromosome read counts | diff --git a/docs/src/content/docs/getting-started/quickstart.md b/docs/src/content/docs/getting-started/quickstart.md index 1f3dd84..99afe7a 100644 --- a/docs/src/content/docs/getting-started/quickstart.md +++ b/docs/src/content/docs/getting-started/quickstart.md @@ -7,7 +7,7 @@ A basic RustQC analysis from install to results. ## RNA-seq duplicate analysis -Run all RNA-seq QC analyses (dupRadar, featureCounts, RSeQC tools including TIN, Qualimap, preseq, and samtools) in a single pass: +Run all RNA-seq QC analyses (dupRadar, featureCounts, RSeQC tools including TIN, Qualimap, bigWig coverage tracks, preseq, and samtools) in a single pass: ```bash rustqc rna sample.markdup.bam --gtf genes.gtf -p -o results/ @@ -27,7 +27,8 @@ This command: ### Output -Output files are organized into subdirectories by tool group. +Output files are organized into subdirectories by tool group +(for example `dupradar/`, `rseqc/`, `qualimap/`, `bigwig/`, `preseq/`, and `samtools/`). Files are generally named the same as their upstream tool equivalents. This means that MultiQC should find them and report them as if they were created by the original tool. diff --git a/docs/src/content/docs/index.mdx b/docs/src/content/docs/index.mdx index 54afb33..24c2170 100644 --- a/docs/src/content/docs/index.mdx +++ b/docs/src/content/docs/index.mdx @@ -27,6 +27,7 @@ export const base = import.meta.env.BASE_URL; featureCounts,{" "} RSeQC,{" "} Qualimap,{" "} + nf-core/rnaseq bigWig,{" "} preseq, and{" "} samtools in Rust, delivering matching results much faster. @@ -163,7 +164,8 @@ export const base = import.meta.env.BASE_URL; Produces identical or near-identical output to R's dupRadar and featureCounts, plus faithful reimplementations of 8 RSeQC tools (including TIN), preseq, - and samtools. See [benchmark details](./rna/benchmark-details/). + samtools, and nf-core/rnaseq-compatible bigWig coverage tracks. + See [benchmark details](./rna/benchmark-details/). Processes ~186M reads in 14m 54s on AWS vs ~15h 34m of sequential tool @@ -171,7 +173,7 @@ export const base = import.meta.env.BASE_URL; workflow. - Replaces 14+ separate QC tool invocations with a single CLI command. + Replaces 15+ separate QC tool invocations with a single CLI command. One task to stage data for. One pass through the BAM file. Minimal staging and I/O overhead. @@ -183,5 +185,5 @@ export const base = import.meta.env.BASE_URL; ## Credits -RustQC reimplements established tools including dupRadar, featureCounts, RSeQC, preseq, samtools, and Qualimap. +RustQC reimplements established tools including dupRadar, featureCounts, RSeQC, preseq, samtools, Qualimap, and nf-core/rnaseq bigWig coverage tracks. See the [Credits & Citations](./about/credits/) page for the full list and how to cite them. diff --git a/docs/src/content/docs/rna/benchmark-details.mdx b/docs/src/content/docs/rna/benchmark-details.mdx index 8594e00..a3ec3e0 100644 --- a/docs/src/content/docs/rna/benchmark-details.mdx +++ b/docs/src/content/docs/rna/benchmark-details.mdx @@ -45,7 +45,8 @@ across chromosomes. Per-tool comparison tables and known differences are on the individual tool pages: [dupRadar](../dupradar/), [featureCounts](../featurecounts/), [RSeQC](../rseqc/), -[preseq](../preseq/), [Samtools](../samtools/), [Qualimap](../qualimap/). +[preseq](../preseq/), [Samtools](../samtools/), [Qualimap](../qualimap/), +[bigWig](../bigwig/). ## Upstream tool versions @@ -59,6 +60,7 @@ RustQC output was validated against these specific versions: | preseq | 3.2.0 | | samtools | 1.22.1 | | Qualimap | 2.3 | +| bedtools | 2.31.1 | Exact container tags and commands are in the [RustQC-benchmarks](https://github.com/seqeralabs/RustQC-benchmarks) Nextflow pipeline. diff --git a/docs/src/content/docs/rna/bigwig.mdx b/docs/src/content/docs/rna/bigwig.mdx new file mode 100644 index 0000000..ed29b50 --- /dev/null +++ b/docs/src/content/docs/rna/bigwig.mdx @@ -0,0 +1,149 @@ +--- +title: bigWig Coverage Tracks +description: Genome-wide bigWig coverage tracks compatible with nf-core/rnaseq, generated in the RustQC streaming pass. +--- + +import { Aside, FileTree } from "@astrojs/starlight/components"; + + + +RustQC generates **bigWig** genome coverage tracks during the `rustqc rna` +single-pass BAM scan. The implementation replicates the parameters used by +[nf-core/rnaseq](https://nf-co.re/rnaseq) for its `bigwig/` output directory. + +## What it replaces + +In nf-core/rnaseq, each sample typically runs three sequential steps per strand: + +1. `bedtools genomecov` on the final genome BAM → sorted bedGraph +2. `bedClip` with chromosome sizes → clipped bedGraph +3. `bedGraphToBigWig` → `.bigWig` file + +RustQC collapses this into the existing BAM read loop. Coverage is accumulated +with the same `bedtools genomecov` logic (`-split`, optional `-du` and +`-strand`), clipped to chromosome boundaries (UCSC `bedClip` semantics), and +written as bigWig via the [`bigtools`](https://github.com/jackh726/bigtools) +crate. + + + +## Output files + +All bigWig files are written to a `bigwig/` subdirectory under the output +directory. Use `--flat-output` to write files directly to the top-level output +directory instead. + + +- bigwig/ + - sample.bigWig Combined strand-agnostic coverage (all libraries) + - sample.forward.bigWig Forward-genome-strand coverage (stranded libraries only) + - sample.reverse.bigWig Reverse-genome-strand coverage (stranded libraries only) + + +File naming matches nf-core/rnaseq: `{sample_id}.bigWig`, +`{sample_id}.forward.bigWig`, and `{sample_id}.reverse.bigWig`. + +### Combined track (all libraries) + +**File:** `.bigWig` + +Equivalent to nf-core `BEDTOOLS_GENOMECOV_COMBINED`: + +```bash +bedtools genomecov -ibam sample.bam -split -bg +``` + +### Per-strand tracks (stranded libraries only) + +When `--stranded forward` or `--stranded reverse` is set (or configured in +YAML), RustQC also writes forward- and reverse-genome-strand tracks using +`-split -du -strand +/-` with strand labels swapped for reverse libraries, matching +nf-core/rnaseq `bigwig.config`. + +| Library type | Forward track file | bedtools strand filter | +| ------------ | ------------------ | ---------------------- | +| `forward` | `.forward.bigWig` | `-strand +` | +| `forward` | `.reverse.bigWig` | `-strand -` | +| `reverse` | `.forward.bigWig` | `-strand -` | +| `reverse` | `.reverse.bigWig` | `-strand +` | + +For **unstranded** libraries, only the combined `{sample}.bigWig` is produced. +Per-strand tracks are omitted because strand-specific filtering is not +meaningful without a known library protocol. + + + +## Parameters + +RustQC matches nf-core/rnaseq defaults: + +| Parameter | nf-core/rnaseq | RustQC | +| --------- | -------------- | ------ | +| Normalization | None (`scale=1`) | None | +| Splicing | `-split` | `-split` (CIGAR N/D block splitting) | +| PE deduplication | `-du` on per-strand tracks only | `-du` on per-strand tracks only | +| MAPQ filter | None (all mapped reads) | None (all mapped reads) | +| Clipping | `bedClip` with chrom sizes | Equivalent clipping before write | + +Input must be a **coordinate-sorted** BAM (or SAM/CRAM). No GTF annotation is +used for coverage calculation itself, but `--gtf` is still required by the +`rustqc rna` command. + +## Validation + +Coverage intervals match [bedtools](https://github.com/arq5x/bedtools2) +`genomecov` v2.31.1 with UCSC `bedClip` semantics. Integration tests decode +both RustQC and reference bigWig files with UCSC `bigWigToBedGraph` and compare +all bedGraph intervals exactly. + +The **binary `.bigWig` files are not bit-identical** to UCSC `bedGraphToBigWig` +output because RustQC encodes them with `bigtools` rather than the UCSC +utility. Decoded coverage values and interval boundaries are identical; only +the on-disk compression and index structure differ. + +## Configuration + +bigWig generation is enabled by default. Disable with: + +```yaml +rna: + bigwig: + enabled: false +``` + +See the [Configuration](../usage/configuration/#bigwig) page for details. + +## nf-core/rnaseq integration + +To use RustQC bigWig output in place of the nf-core/rnaseq bigWig subworkflow, +run `rustqc rna` on the final genome BAM and publish the `bigwig/` directory +to `{outdir}/{aligner}/bigwig/` (the path nf-core uses for MultiQC and +downstream consumers). + +Set `--stranded` to match the library protocol so per-strand tracks are +generated with the correct nf-core naming and strand filters. + +## References + +- **bedtools:** Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities + for comparing genomic features. *Bioinformatics*. 2010;26(6):841-842. + [bedtools on GitHub](https://github.com/arq5x/bedtools2) +- **bigWig format:** Kent WJ, et al. The human genome browser at UCSC. + *Genome Research*. 2002;12(6):996-1006. +- **bigtools:** Ramírez JM, et al. Bigtools: a high-performance BigWig and + BigBed library in Rust. *Bioinformatics*. 2024. + [bigtools on GitHub](https://github.com/jackh726/bigtools) diff --git a/docs/src/content/docs/usage/cli-reference.mdx b/docs/src/content/docs/usage/cli-reference.mdx index deca6b4..8e4cf2c 100644 --- a/docs/src/content/docs/usage/cli-reference.mdx +++ b/docs/src/content/docs/usage/cli-reference.mdx @@ -16,7 +16,7 @@ RNA-seq quality control: duplicate rate analysis (dupRadar equivalent), featureCounts-compatible read counting with biotype summaries, 8 RSeQC-equivalent tools (bam_stat, infer_experiment, read_duplication, read_distribution, junction_annotation, junction_saturation, inner_distance, -TIN), Qualimap RNA-seq QC, preseq library complexity, and samtools-compatible +TIN), Qualimap RNA-seq QC, bigWig genome coverage tracks, preseq library complexity, and samtools-compatible outputs. ### Synopsis @@ -54,7 +54,7 @@ Path to a GTF gene annotation file (plain or gzip-compressed). **Required.** The GTF must contain `exon` features with a `gene_id` attribute. Transcript-level structure (exon blocks, CDS features) is extracted automatically and used by all analyses: dupRadar, featureCounts, all 8 RSeQC tools (including TIN), -Qualimap, preseq, and samtools. +Qualimap, bigWig coverage tracks, preseq, and samtools. Gzip compression is detected automatically by inspecting the file header (magic bytes), so the `.gz` extension is not required. @@ -79,6 +79,11 @@ Library strandedness for strand-aware read counting: **Default:** `unstranded` +For **bigWig coverage tracks**, stranded libraries also produce +`{sample}.forward.bigWig` and `{sample}.reverse.bigWig` with nf-core/rnaseq- +compatible strand filters. Unstranded libraries receive only `{sample}.bigWig`. +See [bigWig](../rna/bigwig/) for details. +