Skip to content

Commit d5eb8dc

Browse files
committed
added qc
1 parent 18cb11f commit d5eb8dc

1 file changed

Lines changed: 29 additions & 4 deletions

File tree

qc.md

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -53,18 +53,43 @@ Scythe can be run minimally with:
5353

5454
`scythe -a adapter_file.fasta -o trimmed_sequences.fastq sequences.fastq`
5555

56-
Trim the adapters in both your read files!
56+
Try to trim the adapters in both your read files!
5757

5858
## Sickle
5959

60-
https://github.com/najoshi/sickle
60+
Most modern sequencing technologies produce reads that have deteriorating quality towards the 3'-end and some towards the 5'-end as well. Incorrectly called bases in both regions negatively impact assembles, mapping, and downstream bioinformatics analyses.
6161

6262
We will trim each read individually down to the good quality part to keep the bad part from interfering with downstream applications.
6363

64-
and set the quality score to 25. This means the trimmer will work its way from both ends of each read, cutting away any bases with a quality score < 25.
64+
To do so, we will use sickle. Sickle is a tool that uses sliding windows along with quality and length thresholds to determine when quality is sufficiently low to trim the 3'-end of reads and also determines when the quality is sufficiently high enough to trim the 5'-end of reads. It will also discard reads based upon a length threshold.
6565

66+
First, install sickle:
6667

67-
What did the trimming do to the per-base sequence quality, the per sequence quality scores and the sequence length distribution?
68+
```
69+
git clone https://github.com/najoshi/sickle.git
70+
cd sickle
71+
make
72+
```
73+
74+
Sickle has two modes to work with both paired-end and single-end reads: sickle se and sickle pe.
75+
76+
Running sickle by itself will print the help:
77+
78+
`sickle`
79+
80+
Running sickle with either the "se" or "pe" commands will give help specific to those commands. Since we have paired end reads:
81+
82+
`sickle pe`
83+
84+
Set the quality score to 25. This means the trimmer will work its way from both ends of each read, cutting away any bases with a quality score < 25.
85+
86+
```
87+
sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \
88+
-o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \
89+
-s trimmed_singles_file.fastq -q 25
90+
```
91+
92+
What did the trimming do to the per-base sequence quality, the per sequence quality scores and the sequence length distribution? Run FastQC again to find out.
6893

6994
What is the sequence duplication levels graph about? Why should you care about a high level of duplication, and why is the level of duplication very low for this data?
7095

0 commit comments

Comments
 (0)