added qc

HadrienG · HadrienG · commit d5eb8dca5086 · 2017-02-10T09:30:02.000+01:00
diff --git a/qc.md b/qc.md
@@ -53,18 +53,43 @@ Scythe can be run minimally with:
 
 `scythe -a adapter_file.fasta -o trimmed_sequences.fastq sequences.fastq`
 
-Trim the adapters in both your read files!
+Try to trim the adapters in both your read files!
 
 ## Sickle
 
-https://github.com/najoshi/sickle
+Most modern sequencing technologies produce reads that have deteriorating quality towards the 3'-end and some towards the 5'-end as well. Incorrectly called bases in both regions negatively impact assembles, mapping, and downstream bioinformatics analyses.
 
 We will trim each read individually down to the good quality part to keep the bad part from interfering with downstream applications.
 
-and set the quality score to 25. This means the trimmer will work its way from both ends of each read, cutting away any bases with a quality score < 25.
+To do so, we will use sickle. Sickle is a tool that uses sliding windows along with quality and length thresholds to determine when quality is sufficiently low to trim the 3'-end of reads and also determines when the quality is sufficiently high enough to trim the 5'-end of reads. It will also discard reads based upon a length threshold.
 
+First, install sickle:
 
-What did the trimming do to the per-base sequence quality, the per sequence quality scores and the sequence length distribution?
+```
+git clone https://github.com/najoshi/sickle.git
+cd sickle
+make
+```
+
+Sickle has two modes to work with both paired-end and single-end reads: sickle se and sickle pe.
+
+Running sickle by itself will print the help:
+
+`sickle`
+
+Running sickle with either the "se" or "pe" commands will give help specific to those commands. Since we have paired end reads:
+
+`sickle pe`
+
+Set the quality score to 25. This means the trimmer will work its way from both ends of each read, cutting away any bases with a quality score < 25.
+
+```
+sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \
+-o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \
+-s trimmed_singles_file.fastq -q 25
+```
+
+What did the trimming do to the per-base sequence quality, the per sequence quality scores and the sequence length distribution? Run FastQC again to find out.
 
 What is the sequence duplication levels graph about? Why should you care about a high level of duplication, and why is the level of duplication very low for this data?