ScaiDigest Volume 6: Variational autoencoders (VAEs) in biology
Variational autoencoders (VAEs) are driving a mini-revolution of sorts in the field of single-cell data analysis. Today, they are employed in a variety of applications in the field, ranging from the integration of multimodal data (as seen in tools like multiVI and totalVI), to data augmentation for training neural networks (as in CeLEry).
Generally, these tools build upon the foundations laid by methods like scVI which uses deep neural networks within a Bayesian framework to enable the probabilistic modeling of single-cell transcriptomics data. Developed by the same group behind scVI, totalVI is a framework for the end-to-end joint analysis of multi-omic CITE-seq data, using a probabilistic model to represent the data as a composite of biological and technical (e.g. batch) factors. It is able to handle a diverse array of tasks common in single-cell analysis such as dimensionality reduction, dataset integration, and differential expression testing, thereby offering a comprehensive solution for analyzing single-cell multi-omic data.
A recent example of a novel tool leveraging the unique capabilities of variational autoencoders is scPoli, a semi-supervised conditional VAE that can learn sample and cell representations of single-cell data, for use in data integration, label transfer, and atlas mapping. scPoli can also perform multi-scale analysis, offering insights into both cell-level and sample-level data representations - crucial for going beyond single-cell integration, and actually performing sample annotation.
Recommended by LinkedIn
Tools such as flowVI (flow cytometry), scVI-3D (single-cell Hi-C), and SIMVI (spatial omics) have extended the application of these techniques to cover other common types of single-cell biological data as well. flowVI in particular, allows for both the imputation of missing data, but also for the integration of multiple datasets.
VAE-based tools are thus playing a major role in advancing the field of single-cell analysis by enabling comprehensive and nuanced interpretations of cellular heterogeneity and function, while simultaneously accounting for technical effects, putting them in a position to help significantly advance our understanding of complex biological systems.
Co-Founder, CEO LLAAMA | T2R2 for Traceable & Reproducible Data Science & AI Distributed System Architecture | Edge Cloud Continuum Architecture
11moThanks for sharing, very interesting indeed !
Scailyte Peter Nestorov Corinne Solier Anna Dimitrova Sarah Carl Diana Stoycheva Dennis Göhlsdorf Sukalp Muzumdar, PhD Daniel Schoener, PhD (he/him) Angelo Duò Lea Duempelmann Cinzia Donato Peter Evans Shaoline S. Kaloyan Gospodinov Filipa Teixeira Sahar Sala - Mansour Vijay Tiwari Julian Spagnuolo Greta Bordin