Skip to content

Commit 8543087

Browse files
committed
Updated README
1 parent 1e334e8 commit 8543087

1 file changed

Lines changed: 44 additions & 7 deletions

File tree

README.md

Lines changed: 44 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
BamHash
2-
=======
1+
# BamHash
2+
33

44
Hash BAM and FASTQ files to verify data integrity
55

@@ -9,12 +9,49 @@ All the hash values are summed up so the result is independent of the ordering w
99
The result can be compared to verify that the pair of FASTQ files contain the same read
1010
information as the aligned BAM file.
1111

12-
Manuscript
13-
==========
12+
## Manuscript
13+
14+
15+
In preperation. A preprint is available on [bioRxiv](http://biorxiv.org/content/early/2015/03/03/015867)
16+
17+
## Usage
18+
19+
The program has three executables which are used for different filetypes. Running them with `--help` displays detailed help messages.
20+
21+
### Common options
22+
23+
All programs work with sets of reads. The reads are made up of a read name, sequence and quality information. All of these components go into the hash, but the read name or quality information can be ignored if necessary. This would be the case if a pipeline mangled the names, quantizised the quality or after realigning quality scores.
24+
25+
The default mode is to assume paired end reads. If you have single end reads you can supply the `--no-paired` option.
26+
27+
A debug option `-d` prints the information and hash value of each read individually, this can be helpful if BamHash is not cooperating with your pipeline.
28+
29+
Both multiline FASTA and FASTQ are supported and gzipped input for FASTA and FASTQ.
30+
31+
### BAM
32+
33+
~~~
34+
bamhash_checksum_bam [OPTIONS] <in.bam> <in2.bam> ...
35+
~~~
36+
37+
processes a number of BAM files. BAM files are assumed to contain paired end reads. If you run with `--no-paired` it treats all reads as single end and displays a warning if any read is marked as "second in pair" in the BAM file.
38+
39+
### FASTQ
40+
41+
~~~
42+
bamhash_checksum_fastq [OPTIONS] <in1.fastq.gz> [in2.fastq.gz ... ]
43+
~~~
44+
45+
processes a number of FASTQ files. FASTQ files are assumed to contain paired end reads, such that the first two files contain the first pair of reads, etc. If any of the read names in the two pairs don't match the program exits with failure.
46+
47+
### FASTA
48+
49+
~~~
50+
bamhash_checksum_fasta [OPTIONS] <in1.fasta> [in2.fasta ... ]
51+
~~~
1452

15-
In preperation.
53+
processes a number of FASTA files. All FASTA files are assumed to be single end reads with no quality information. To compare to a BAM file, run `bamhash_checksum_bam --no-paired --no-quality`
1654

17-
Compiling
18-
=========
55+
## Compiling
1956

2057
The only external dependency is on OpenSSL for the MD5 implementation.

0 commit comments

Comments
 (0)