Thanks for RustQC - the single-pass design is awesome!
Potential gap: the biotype % rRNA substantially undercounts rRNA on human GRCh38 given the GTF-file approach (vs RSeQC's split_bam.py BED-file read counting approach)
What's happening
On GRCh38 / GENCODE v38 STAR-aligned BAMs, RustQC's biotype % rRNA comes out ≈0 across all WTS samples, while RSeQC split_bam (reads overlapping an rRNA-region BED) reports several percent on the same BAMs. The featureCounts summary shows why:
- Multi-mappers are dropped (featureCounts default). rDNA is high-copy repeat, so most rRNA reads multi-map → Unassigned_MultiMapping.
- The 45S rDNA (18S/5.8S/28S) isn't annotated in GENCODE GRCh38 (it's in the unassembled acrocentric arms — only a few dozen rRNA genes exist, mostly 5S + pseudogenes), so uniquely-mapped rRNA reads → Unassigned_NoFeatures.
So biotype counting structurally can't see rRNA on a stock GRCh38/GENCODE setup; interval overlap catches it. The practical impact is that % rRNA reads near-zero when true residualrRNA is ~10%, so it can't be used to judge depletion.
What would be really awesome
- An interval/BED-based rRNA mode (like RSeQC split_bam) — most robust for rRNA quantification.
- Options to count multi-mapping/multi-overlapping reads in the biotype pass (the featureCounts -M/-O equivalents, which RustQC doesn't currently expose).
- At minimum, a docs note that biotype % rRNA under-reports where the rDNA isn't annotated?
RSeQC covers it but doesn't have the speed of the single-pass call in Rust. Happy to share details, test a fix or help in any way.
Environment: RustQC 0.2.1 · GRCh38 · GENCODE v38 · STAR, paired-end.
Thanks for RustQC - the single-pass design is awesome!
Potential gap: the biotype % rRNA substantially undercounts rRNA on human GRCh38 given the GTF-file approach (vs RSeQC's split_bam.py BED-file read counting approach)
What's happening
On GRCh38 / GENCODE v38 STAR-aligned BAMs, RustQC's biotype % rRNA comes out ≈0 across all WTS samples, while RSeQC split_bam (reads overlapping an rRNA-region BED) reports several percent on the same BAMs. The featureCounts summary shows why:
So biotype counting structurally can't see rRNA on a stock GRCh38/GENCODE setup; interval overlap catches it. The practical impact is that % rRNA reads near-zero when true residualrRNA is ~10%, so it can't be used to judge depletion.
What would be really awesome
RSeQC covers it but doesn't have the speed of the single-pass call in Rust. Happy to share details, test a fix or help in any way.
Environment: RustQC 0.2.1 · GRCh38 · GENCODE v38 · STAR, paired-end.