Thassia Carratto’s Post

View profile for Thassia Carratto, graphic

Bioinformatician | Genomics Data Scientist | NGS | Python | R | Computational Biologist

🧬✨ Bioinformatics Workflow of Next-Generation Sequencing (NGS) ✨🧬 #BioinfoTuesdays #NGS NGS technologies, such as #Illumina and #IonTorrent, have unlocked the possibility of deeper #genome exploration by simultaneously analysing millions of genomic regions in multiple samples. However, to derive valuable insights from the raw data output, proper processing and subsequent analysis are required. Do you know the standard steps to process NGS raw data? Let's dive in! 🧬 🔍 Base Calling: The raw signal generated during sequencing is converted to nucleotide sequences by the instrument and stored in a #FastQ file, alongside their corresponding quality scores. ✂️ Quality Control: This step is crucial to ensure #high-quality data. Sequence quality, read length, and base composition, among other factors, are assessed. Adaptors added during library preparation are removed, and if necessary, low-quality sequences are trimmed, generating a trimmed FastQ file. 🎯 Reads Alignment: The filtered reads (nucleotide sequences) are #aligned to a reference genome/transcriptome to identify the likely origin of the observed sequences. This information is recorded in a Sequence Alignment Map (SAM) file, which can be converted into a Binary Alignment Map (BAM) file to save space. 🔍 Variant Calling: By comparing aligned reads to the reference sequence, genetic variations such as single nucleotide polymorphisms (#SNPs) and insertions/deletions (#InDels) can be identified. The Variant Call Format (VCF) is commonly used to store these variant calls. 📝 Annotation: It's time to gather more #information about the identified variants, such as corresponding #genes, functional impact predictions, and allele #frequencies. Annotation data is often stored in tabular formats like Comma-Separated Values (CSV), Microsoft Excel (XLSX), or Tab-Separated Values (TSV), facilitating easy integration with other tools and databases. 📈 Downstream Analysis: Finally, we interpret the variant data, unveiling biological #insights. This includes statistical analyses, pathway enrichment studies, and genotype-phenotype #associations. Results are typically recorded in tabular formats for further interpretation and visualisation. Are you interested in learning more about bioinformatics? Share your thoughts and experiences in the comments below! And don't forget to follow me for more insights into bioinformatics and genomics. 💻🔍 #bioinformatics #genetics #genomics #biology #biotechnology #lifesciences #dna #sequencing #ngs #alignment #sciencecommunication #bioinformatica #genetica #genomica #biologia #biotecnologia

  • No alternative text description for this image
Henry (Harry) Kemp

ML Ecosystem Consultant - Driving Business Growth

7mo

Interesting stuff!

Olalekan Kemiki

MANAGER AT BABCOCK MOLECULAR AND TISSUE CULTURE LABORATORY

6mo

Everyday, I realize what I know is little to what I should. I will be willing to learn more. Let me know the possibilities. Thanks

Matakone Moise

MSc. in Clinical Biology ||Looking for PhD scholarship || AMR and One Health Approach enthusiast

6mo

Interesting! thanks for sharing 🙏

See more comments

To view or add a comment, sign in

Explore topics