Skip to content

Commit bfee467

Browse files
committed
started assembly tuto
1 parent eb9bda9 commit bfee467

1 file changed

Lines changed: 42 additions & 0 deletions

File tree

assembly.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Whole-genome de-novo Assembly
2+
3+
In this practical we will perform the assembly of M. genitalium, a bacterium published in 1995 by Fraser et al in Science.
4+
5+
## Getting the data
6+
7+
Go to ENA, and search for the run ERR486840.
8+
9+
Download the 2 fastq files associated with the run.
10+
11+
## Quality control?
12+
13+
How many reads are in the fastq file? What is the read length?
14+
Does the data need trimming or other filtering? If so, do it.
15+
16+
Find the genome size of M. genitalium in the Fraser paper abstract.
17+
Based on the expected genome size, the read length and the number of reads – what average coverage do you expect to get from this fastq read files?
18+
19+
## De-novo assembly
20+
21+
We will be using the SPAdes assembler to assemble our bacterium.
22+
23+
This will produce a series of outputs. The scaffolds will be in fasta format.
24+
25+
How well does the assembly total consensus size and coverage correspond to your earlier estimation?
26+
What is the N50 of the assembly? What does this mean?
27+
How many contigs in total did the assembly produce?
28+
How many contigs longer than 500bp? What is the N50 of those contigs only?
29+
30+
Perform more assemblies with the following options:
31+
32+
Raw reads (no trimming, but with adapters removed), let spades do the qc.
33+
34+
## Comparing assemblies
35+
36+
quast
37+
38+
### Comparing to the reference?
39+
40+
Get the reference fasta here
41+
42+
## Fixing misassemblies?

0 commit comments

Comments
 (0)