De novo genome assembly steps:
Different De novo genome assemblers use different algorithms to find overlaps, arrange and merge the DNA reads into larger contiguous sequences called contigs. SPAdes and velvet are examples of assemblers that use De-Bruijn Graph algorithm which depends on breaking down the reads into smaller k-mers (sequences of k nucleotides), construct a De Bruijn graph where nodes represent k-mers and edges represent overlaps of k-1 nucleotides.
What are the practical steps for genome assembly?
1- Download sequence reads:
The rule is: the more data you get, the more successful the assembly is.
- Short reads (illumina):
Reads generated from illumina sequencing can be downloaded from SRA from NCBI. These sequences are usually present in fastq format.
You can download using the command line tool (wget). After downloading, 2 files are generated in gz compressed format. One file for forward reads and the other for reverse reads as it is paired-end sequencing.
- Long reads(PacBio):
Reads generated from PacBio are usually long reads, enabling easier genome assembly.
You can download the dataset from the website https://lnkd.in/dMiY5StP.
2- Evaluate the input reads:
- Run FASTQC to view the quality scores of the sequence reads.
- After that, we trim bad-quality reads using Trimmomatic.
3- Assemble the genome using minia, velvet, SPAdes or any other suitable assembler. Each assembler has its own parameters.
4- check the quality of the genome using QUAST.
Note that in case there is a reference genome for the organism under study, you can compare the assembled genome to the reference genome by aligning the assembled genome against the reference genome using BWA and visualising the alignment using IGV.
#Bioinformatics #genomeassembly
Founder and Principle Consultant
2moSuch great news for patients!!!!