File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change 1+ # Whole-genome de-novo Assembly
2+
3+ In this practical we will perform the assembly of M. genitalium, a bacterium published in 1995 by Fraser et al in Science.
4+
5+ ## Getting the data
6+
7+ Go to ENA, and search for the run ERR486840.
8+
9+ Download the 2 fastq files associated with the run.
10+
11+ ## Quality control?
12+
13+ How many reads are in the fastq file? What is the read length?
14+ Does the data need trimming or other filtering? If so, do it.
15+
16+ Find the genome size of M. genitalium in the Fraser paper abstract.
17+ Based on the expected genome size, the read length and the number of reads – what average coverage do you expect to get from this fastq read files?
18+
19+ ## De-novo assembly
20+
21+ We will be using the SPAdes assembler to assemble our bacterium.
22+
23+ This will produce a series of outputs. The scaffolds will be in fasta format.
24+
25+ How well does the assembly total consensus size and coverage correspond to your earlier estimation?
26+ What is the N50 of the assembly? What does this mean?
27+ How many contigs in total did the assembly produce?
28+ How many contigs longer than 500bp? What is the N50 of those contigs only?
29+
30+ Perform more assemblies with the following options:
31+
32+ Raw reads (no trimming, but with adapters removed), let spades do the qc.
33+
34+ ## Comparing assemblies
35+
36+ quast
37+
38+ ### Comparing to the reference?
39+
40+ Get the reference fasta here
41+
42+ ## Fixing misassemblies?
You can’t perform that action at this time.
0 commit comments