Catalogue Search | MBRL

Assessing single-cell transcriptomic variability through density-preserving data visualization

by Cho, Hyunghoon , Narayan, Ashwin , Berger, Bonnie in 631/114 , 631/114/2164 , 631/337/2019

2021

Nonlinear data visualization methods, such as t -distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), summarize the complex transcriptomic landscape of single cells in two dimensions or three dimensions, but they neglect the local density of data points in the original space, often resulting in misleading visualizations where densely populated subsets of cells are given more visual space than warranted by their transcriptional diversity in the dataset. Here we present den-SNE and densMAP, which are density-preserving visualization tools based on t-SNE and UMAP, respectively, and demonstrate their ability to accurately incorporate information about transcriptomic variability into the visual interpretation of single-cell RNA sequencing data. Applied to recently published datasets, our methods reveal significant changes in transcriptomic variability in a range of biological processes, including heterogeneity in transcriptomic variability of immune cells in blood and tumor, human immune cell specialization and the developmental trajectory of Caenorhabditis elegans . Our methods are readily applicable to visualizing high-dimensional data in other scientific domains. den-SNE and densMAP enhance single-cell transcriptomic data visualization by incorporating density information.

Journal Article

Share this book

Add to My Shelf

Emerging technologies towards enhancing privacy in genomic data sharing

by Cho, Hyunghoon , Berger, Bonnie in Access control , Animal Genetics and Genomics , Bioinformatics

2019

As the scale of genomic and health-related data explodes and our understanding of these data matures, the privacy of the individuals behind the data is increasingly at stake. Traditional approaches to protect privacy have fundamental limitations. Here we discuss emerging privacy-enhancing technologies that can enable broader data sharing and collaboration in genomics research.

Journal Article

Share this book

Add to My Shelf

CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks

by Zhong, Ellen D , Bepler, Tristan , Davis, Joseph H in Algorithms , Artificial neural networks , Datasets

2021

Cryo-electron microscopy (cryo-EM) single-particle analysis has proven powerful in determining the structures of rigid macromolecules. However, many imaged protein complexes exhibit conformational and compositional heterogeneity that poses a major challenge to existing three-dimensional reconstruction methods. Here, we present cryoDRGN, an algorithm that leverages the representation power of deep neural networks to directly reconstruct continuous distributions of 3D density maps and map per-particle heterogeneity of single-particle cryo-EM datasets. Using cryoDRGN, we uncovered residual heterogeneity in high-resolution datasets of the 80S ribosome and the RAG complex, revealed a new structural state of the assembling 50S ribosome, and visualized large-scale continuous motions of a spliceosome complex. CryoDRGN contains interactive tools to visualize a dataset’s distribution of per-particle variability, generate density maps for exploratory analysis, extract particle subsets for use with other tools and generate trajectories to visualize molecular motions. CryoDRGN is open-source software freely available at http://cryodrgn.csail.mit.edu.CryoDRGN is an unsupervised machine learning algorithm that reconstructs continuous distributions of three-dimensional density maps from heterogeneous single-particle cryo-EM data.

Journal Article

Share this book

Add to My Shelf

Topaz-Denoise: general deep denoising models for cryoEM and cryoET

by Noble, Alex J. , Kelley, Kotaro , Bepler, Tristan in 101/28 , 631/114/1305 , 631/535/1258/1259

2020

Cryo-electron microscopy (cryoEM) is becoming the preferred method for resolving protein structures. Low signal-to-noise ratio (SNR) in cryoEM images reduces the confidence and throughput of structure determination during several steps of data processing, resulting in impediments such as missing particle orientations. Denoising cryoEM images can not only improve downstream analysis but also accelerate the time-consuming data collection process by allowing lower electron dose micrographs to be used for analysis. Here, we present Topaz-Denoise, a deep learning method for reliably and rapidly increasing the SNR of cryoEM images and cryoET tomograms. By training on a dataset composed of thousands of micrographs collected across a wide range of imaging conditions, we are able to learn models capturing the complexity of the cryoEM image formation process. The general model we present is able to denoise new datasets without additional training. Denoising with this model improves micrograph interpretability and allows us to solve 3D single particle structures of clustered protocadherin, an elongated particle with previously elusive views. We then show that low dose collection, enabled by Topaz-Denoise, improves downstream analysis in addition to reducing data collection time. We also present a general 3D denoising model for cryoET. Topaz-Denoise and pre-trained general models are now included in Topaz. We expect that Topaz-Denoise will be of broad utility to the cryoEM community for improving micrograph and tomogram interpretability and accelerating analysis. The low signal-to-noise ratio (SNR) in cryoEM images can make the first steps in cryoEM structure determination challenging, particularly for non-globular and small proteins. Here, the authors present Topaz-Denoise, a deep learning based method for micrograph denoising that significantly increases the SNR of cryoEM images and cryoET tomograms, which helps to accelerate the cryoEM pipeline.

Journal Article

Share this book

Add to My Shelf

SCA: recovering single-cell heterogeneity through information-based dimensionality reduction

by DeMeo, Benjamin , Berger, Bonnie in Animal Genetics and Genomics , Binomial distribution , Bioinformatics

2023

Dimensionality reduction summarizes the complex transcriptomic landscape of single-cell datasets for downstream analyses. Current approaches favor large cellular populations defined by many genes, at the expense of smaller and more subtly defined populations. Here, we present surprisal component analysis (SCA), a technique that newly leverages the information-theoretic notion of surprisal for dimensionality reduction to promote more meaningful signal extraction. For example, SCA uncovers clinically important cytotoxic T-cell subpopulations that are indistinguishable using existing pipelines. We also demonstrate that SCA substantially improves downstream imputation. SCA’s efficient information-theoretic paradigm has broad applications to the study of complex biological tissues in health and disease.

Journal Article

Share this book

Add to My Shelf

Global alignment of multiple protein interaction networks with application to functional orthology detection

by Singh, Rohit , Xu, Jinbo , Berger, Bonnie in Algorithms , Animals , Blasts

2008

Protein-protein interactions (PPIs) and their networks play a central role in all biological processes. Akin to the complete sequencing of genomes and their comparative analysis, complete descriptions of interactomes and their comparative analysis is fundamental to a deeper understanding of biological processes. A first step in such an analysis is to align two or more PPI networks. Here, we introduce an algorithm, IsoRank, for global alignment of multiple PPI networks. The guiding intuition here is that a protein in one PPI network is a good match for a protein in another network if their respective sequences and neighborhood topologies are a good match. We encode this intuition as an eigenvalue problem in a manner analogous to Google's PageRank method. Using IsoRank, we compute a global alignment of the Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, and Homo sapiens PPI networks. We demonstrate that incorporating PPI data in ortholog prediction results in improvements over existing sequence-only approaches and over predictions from local alignments of the yeast and fly networks. Previous methods have been effective at identifying conserved, localized network patterns across pairs of networks. This work takes the further step of performing a global alignment of multiple PPI networks. It simultaneously uses sequence similarity and network data and, unlike previous approaches, explicitly models the tradeoff inherent in combining them. We expect IsoRank--with its simultaneous handling of node similarity and network similarity--to be applicable across many scientific domains.

Journal Article

Share this book

Add to My Shelf

Sequre: a high-performance framework for secure multiparty computation enables biomedical data sharing

by Shajii, Ariya , Cho, Hyunghoon , Berger, Bonnie in Animal Genetics and Genomics , Bioinformatics , Biomedical and Life Sciences

2023

Secure multiparty computation (MPC) is a cryptographic tool that allows computation on top of sensitive biomedical data without revealing private information to the involved entities. Here, we introduce Sequre, an easy-to-use, high-performance framework for developing performant MPC applications. Sequre offers a set of automatic compile-time optimizations that significantly improve the performance of MPC applications and incorporates the syntax of Python programming language to facilitate rapid application development. We demonstrate its usability and performance on various bioinformatics tasks showing up to 3–4 times increased speed over the existing pipelines with 7-fold reductions in codebase sizes.

Journal Article

Share this book

Add to My Shelf

Schema: metric learning enables interpretable synthesis of heterogeneous single-cell modalities

by Narayan, Ashwin , Hie, Brian L. , Singh, Rohit in Agreements , Animal Genetics and Genomics , Bioinformatics

2021

A complete understanding of biological processes requires synthesizing information across heterogeneous modalities, such as age, disease status, or gene expression. Technological advances in single-cell profiling have enabled researchers to assay multiple modalities simultaneously. We present Schema, which uses a principled metric learning strategy that identifies informative features in a modality to synthesize disparate modalities into a single coherent interpretation. We use Schema to infer cell types by integrating gene expression and chromatin accessibility data; demonstrate informative data visualizations that synthesize multiple modalities; perform differential gene expression analysis in the context of spatial variability; and estimate evolutionary pressure on peptide sequences.

Journal Article

Share this book

Add to My Shelf

An integrative approach to ortholog prediction for disease-focused and other functional studies

by Bergwitz, Clemens , Hu, Yanhui , Vinayagam, Arunachalam in Algorithms , Amino acids , Animals

2011

Background Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward. Results We report a simple but effective tool, the D rosophila RNAi Screening Center I ntegrative O rtholog P rediction T ool (DIOPT; http://www.flyrnai.org/diopt ), for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila , and S. cerevisiae . As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST; http://www.flyrnai.org/diopt-dist ). Conclusions DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.

Journal Article

Share this book

Add to My Shelf

Carnelian uncovers hidden functional patterns across diverse study populations from whole metagenome sequencing reads

by Nazeen, Sumaiya , Berger, Bonnie , Yu, Yun William in Alignment-free binning , Animal Genetics and Genomics , Annotations

2020

Microbial populations exhibit functional changes in response to different ambient environments. Although whole metagenome sequencing promises enough raw data to study those changes, existing tools are limited in their ability to directly compare microbial metabolic function across samples and studies. We introduce Carnelian, an end-to-end pipeline for metabolic functional profiling uniquely suited to finding functional trends across diverse datasets. Carnelian is able to find shared metabolic pathways, concordant functional dysbioses, and distinguish Enzyme Commission (EC) terms missed by existing methodologies. We demonstrate Carnelian’s effectiveness on type 2 diabetes, Crohn’s disease, Parkinson’s disease, and industrialized and non-industrialized gut microbiome cohorts.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter