Skip to content

Commit 77a77c4

Browse files
author
Jonas Ohlsson
committed
fix typos. add instructions for how to install scythe and sickle into $PATH.
1 parent 26927e1 commit 77a77c4

1 file changed

Lines changed: 14 additions & 8 deletions

File tree

qc.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ The sequencing was done as paired-end 2x150bp.
77

88
## Downloading the data
99

10-
The Raw data were deposited at the European nucleotide archive, under the accession number SRR957824, go the the ENA [website](http://www.ebi.ac.uk/ena) and search for the run with the accession SRR957824. Download the two fastq files associated with the run:
10+
The raw data were deposited at the European Nucleotide Archive, under the accession number SRR957824. Go to the ENA [website](http://www.ebi.ac.uk/ena) and search for the run with the accession SRR957824. Download the two fastq files associated with the run:
1111

1212
```
1313
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR957/SRR957824/SRR957824_1.fastq.gz
@@ -18,18 +18,18 @@ wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR957/SRR957824/SRR957824_2.fastq.gz
1818

1919
To check the quality of the sequence data we will use a tool called FastQC. With this you can check things like read length distribution, quality distribution across the read length, sequencing artifacts and much more.
2020

21-
FastQC has a graphical interface and can be downloaded and ran on a Windows or LINUX computer without installation. It is available [here](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
21+
FastQC has a graphical interface and can be downloaded and run on a Windows or Linux computer without installation. It is available [here](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
2222

23-
However, FastQC is also available as a command line utility on the training server you are using. You can load the module and execute the program as follow:
23+
However, FastQC is also available as a command line utility on the training server you are using. You can load the module and execute the program as follows:
2424

2525
```
26-
module load fastqc
26+
module load FastQC
2727
fastqc $read1 $read2
2828
```
2929

3030
which will produce both a .zip archive containing all the plots, and a html document for you to look at the result in your browser.
3131

32-
Open the html file with your favourite web browser, and try to interpret them
32+
Open the html file with your favourite web browser, and try to interpret them.
3333

3434
Pay special attention to the per base sequence quality and sequence length distribution. Explanations for the various quality modules can be found [here](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/). Also, have a look at examples of a [good](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/good_sequence_short_fastqc.html) and a [bad](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/bad_sequence_fastqc.html) illumina read set for comparison.
3535

@@ -49,7 +49,9 @@ cd scythe
4949
make all
5050
```
5151

52-
Then, copy or move "scythe" to a directory in your $PATH.
52+
Then, copy or move "scythe" to a directory in your $PATH, for example like this:
53+
54+
`cp scythe $HOME/bin/`
5355

5456
Scythe can be run minimally with:
5557

@@ -73,6 +75,10 @@ cd sickle
7375
make
7476
```
7577

78+
Copy sickle to a directory in your $PATH:
79+
80+
`cp sickle $HOME/bin/`
81+
7682
Sickle has two modes to work with both paired-end and single-end reads: sickle se and sickle pe.
7783

7884
Running sickle by itself will print the help:
@@ -95,8 +101,8 @@ What did the trimming do to the per-base sequence quality, the per sequence qual
95101

96102
What is the sequence duplication levels graph about? Why should you care about a high level of duplication, and why is the level of duplication very low for this data?
97103

98-
Based on the FastQC report, there seems to be a population of shorter reads that are technical artefacts. We will ignore them for now as they will not interfere with our analysis.
104+
Based on the FastQC report, there seems to be a population of shorter reads that are technical artifacts. We will ignore them for now as they will not interfere with our analysis.
99105

100106
## Extra exercises
101107

102-
Perform quality control on the extra datasets given by your instructors
108+
Perform quality control on the extra datasets given by your instructors.

0 commit comments

Comments
 (0)