🧬✨ Bioinformatics Workflow of Next-Generation Sequencing (NGS) ✨🧬 #BioinfoTuesdays #NGS NGS technologies, such as #Illumina and #IonTorrent, have unlocked the possibility of deeper #genome exploration by simultaneously analysing millions of genomic regions in multiple samples. However, to derive valuable insights from the raw data output, proper processing and subsequent analysis are required. Do you know the standard steps to process NGS raw data? Let's dive in! 🧬 🔍 Base Calling: The raw signal generated during sequencing is converted to nucleotide sequences by the instrument and stored in a #FastQ file, alongside their corresponding quality scores. ✂️ Quality Control: This step is crucial to ensure #high-quality data. Sequence quality, read length, and base composition, among other factors, are assessed. Adaptors added during library preparation are removed, and if necessary, low-quality sequences are trimmed, generating a trimmed FastQ file. 🎯 Reads Alignment: The filtered reads (nucleotide sequences) are #aligned to a reference genome/transcriptome to identify the likely origin of the observed sequences. This information is recorded in a Sequence Alignment Map (SAM) file, which can be converted into a Binary Alignment Map (BAM) file to save space. 🔍 Variant Calling: By comparing aligned reads to the reference sequence, genetic variations such as single nucleotide polymorphisms (#SNPs) and insertions/deletions (#InDels) can be identified. The Variant Call Format (VCF) is commonly used to store these variant calls. 📝 Annotation: It's time to gather more #information about the identified variants, such as corresponding #genes, functional impact predictions, and allele #frequencies. Annotation data is often stored in tabular formats like Comma-Separated Values (CSV), Microsoft Excel (XLSX), or Tab-Separated Values (TSV), facilitating easy integration with other tools and databases. 📈 Downstream Analysis: Finally, we interpret the variant data, unveiling biological #insights. This includes statistical analyses, pathway enrichment studies, and genotype-phenotype #associations. Results are typically recorded in tabular formats for further interpretation and visualisation. Are you interested in learning more about bioinformatics? Share your thoughts and experiences in the comments below! And don't forget to follow me for more insights into bioinformatics and genomics. 💻🔍 #bioinformatics #genetics #genomics #biology #biotechnology #lifesciences #dna #sequencing #ngs #alignment #sciencecommunication #bioinformatica #genetica #genomica #biologia #biotecnologia
Everyday, I realize what I know is little to what I should. I will be willing to learn more. Let me know the possibilities. Thanks
Interesting! thanks for sharing 🙏
ML Ecosystem Consultant - Driving Business Growth
7moInteresting stuff!