Catalogue Search | MBRL

Universal annotation of the human genome through integration of over a thousand epigenomic datasets

by Vu, Ha , Ernst, Jason in Animal Genetics and Genomics , Annotations , Bioinformatics

2022

Background Genome-wide maps of chromatin marks such as histone modifications and open chromatin sites provide valuable information for annotating the non-coding genome, including identifying regulatory elements. Computational approaches such as ChromHMM have been applied to discover and annotate chromatin states defined by combinatorial and spatial patterns of chromatin marks within the same cell type. An alternative “stacked modeling” approach was previously suggested, where chromatin states are defined jointly from datasets of multiple cell types to produce a single universal genome annotation based on all datasets. Despite its potential benefits for applications that are not specific to one cell type, such an approach was previously applied only for small-scale specialized purposes. Large-scale applications of stacked modeling have previously posed scalability challenges. Results Using a version of ChromHMM enhanced for large-scale applications, we apply the stacked modeling approach to produce a universal chromatin state annotation of the human genome using over 1000 datasets from more than 100 cell types, with the learned model denoted as the full-stack model. The full-stack model states show distinct enrichments for external genomic annotations, which we use in characterizing each state. Compared to per-cell-type annotations, the full-stack annotations directly differentiate constitutive from cell type-specific activity and is more predictive of locations of external genomic annotations. Conclusions The full-stack ChromHMM model provides a universal chromatin state annotation of the genome and a unified global view of over 1000 datasets. We expect this to be a useful resource that complements existing per-cell-type annotations for studying the non-coding human genome.

Journal Article

Share this book

Add to My Shelf

ChromHMM: automating chromatin-state discovery and characterization

by Kellis, Manolis , Ernst, Jason in 631/1647/2210/2211 , 631/1647/48 , Algorithms

2012

ChromHMM outputs both the learned chromatin-state model parameters and the chromatin-state assignments for each genomic position. The learned emission and transition parameters are returned in both text and image format (Fig. 1), automatically grouping chromatin states with similar emission parameters or proximal genomic locations, although a user-specified reordering can also be used (Supplementary Figs. 12 and Supplementary Note). ChromHMM enables the study of the likely biological roles of each chromatin state based on enrichment in diverse external annotations and experimental data, shown as heat maps and tables (Fig. 1), both for direct genomic overlap and at various distances from a chromatin state.

Journal Article

Share this book

Add to My Shelf

Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues

by Kellis, Manolis , Ernst, Jason in 631/114/2398 , 631/61/212/177 , Agriculture

2015

Large-scale epigenomic profiles are predicted from experimental data using multiple regression tree models. With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.

Journal Article

Share this book

Add to My Shelf

Discovery and characterization of chromatin states for systematic annotation of the human genome

by Kellis, Manolis , Ernst, Jason in 631/114 , 631/1647/2210/2211 , 631/61/212/177

2010

Which of the possible combinations of epigenetic marks have biological significance is a major question in epigenetics. Analyzing data from human T-cells, Ernst and Kellis discover 51 distinct, recurring combinations of histone modifications that can be correlated with the functional annotations of the underlying DNA sequences. A plethora of epigenetic modifications have been described in the human genome and shown to play diverse roles in gene regulation, cellular differentiation and the onset of disease. Although individual modifications have been linked to the activity levels of various genetic functional elements, their combinatorial patterns are still unresolved and their potential for systematic de novo genome annotation remains untapped. Here, we use a multivariate Hidden Markov Model to reveal 'chromatin states' in human T cells, based on recurrent and spatially coherent combinations of chromatin marks. We define 51 distinct chromatin states, including promoter-associated, transcription-associated, active intergenic, large-scale repressed and repeat-associated states. Each chromatin state shows specific enrichments in functional annotations, sequence motifs and specific experimentally observed characteristics, suggesting distinct biological roles. This approach provides a complementary functional annotation of the human genome that reveals the genome-wide locations of diverse classes of epigenetic function.

Journal Article

Share this book

Add to My Shelf

Universal chromatin state annotation of the mouse genome

by Vu, Ha , Ernst, Jason in Animal Genetics and Genomics , Animals , Annotations

2023

A large-scale application of the “stacked modeling” approach for chromatin state discovery previously provides a single “universal” chromatin state annotation of the human genome based jointly on data from many cell and tissue types. Here, we produce an analogous chromatin state annotation for mouse based on 901 datasets assaying 14 chromatin marks in 26 cell or tissue types. To characterize each chromatin state, we relate the states to external annotations and compare them to analogously defined human states. We expect the universal chromatin state annotation for mouse to be a useful resource for studying this key model organism’s genome.

Journal Article

Share this book

Add to My Shelf

ChromActivity: integrative epigenomic and functional characterization assay based annotation of regulatory activity across diverse human cell types

by Dincer, Tevfik Umut , Ernst, Jason in Animal Genetics and Genomics , Annotations , Bioinformatics

2025

We introduce ChromActivity, a computational framework for predicting and annotating regulatory activity across the genome through integration of multiple epigenomic maps and various functional characterization datasets. ChromActivity generates genomewide predictions of regulatory activity associated with each functional characterization dataset across many cell types based on available epigenomic data. It then for each cell type produces ChromScoreHMM genome annotations based on the combinatorial and spatial patterns within these predictions and ChromScore tracks of overall predicted regulatory activity. ChromActivity provides a resource for analyzing and interpreting the human regulatory genome across diverse cell types.

Journal Article

Share this book

Add to My Shelf

Identifying associations of de novo noncoding variants with autism through integration of gene expression, sequence, and sex information

by Ernst, Jason , Li, Runjia in Animal Genetics and Genomics , Autism , Autism Spectrum Disorder - genetics

2025

Background Whole-genome sequencing (WGS) data has facilitated genome-wide identification of rare noncoding variants. However, elucidating these variants’ associations with complex diseases remains challenging. A previous study utilized a deep-learning-based framework and reported a significant brain-related association signal of autism spectrum disorder (ASD) detected from de novo noncoding variants in the Simons Simplex Collection (SSC) WGS cohort. Results We revisit the reported significant brain-related ASD association signal attributed to deep-learning and show that local GC content can capture similar association signals. We further show that the association signal appears driven by variants from male proband-female sibling pairs that are upstream of assigned genes. We then develop Expression Neighborhood Sequence Association Study (ENSAS), which utilizes gene expression correlations and sequence information, to more systematically identify phenotype-associated variant sets. Applying ENSAS to the same set of de novo variants, we identify gene expression-based neighborhoods showing significant ASD association signal, enriched for synapse-related gene ontology terms. For these top neighborhoods, we also identify chromatin state annotations of variants that are predictive of the proband-sibling local GC content differences. Conclusions Overall, our work simplifies a previously reported ASD signal and provides new insights into associations of noncoding de novo mutations in ASD. We also present a new analytical framework for understanding disease impact of de novo mutations, applicable to other phenotypes.

Journal Article

Share this book

Add to My Shelf

ChromGene: gene-based modeling of epigenomic data

by Jaroszewicz, Artur , Ernst, Jason in Animal Genetics and Genomics , Annotations , Bioinformatics

2023

Various computational approaches have been developed to annotate epigenomes on a per-position basis by modeling combinatorial and spatial patterns within epigenomic data. However, such annotations are less suitable for gene-based analyses. We present ChromGene, a method based on a mixture of learned hidden Markov models, to annotate genes based on multiple epigenomic maps across the gene body and flanks. We provide ChromGene assignments for over 100 cell and tissue types. We characterize the mixture components in terms of gene expression, constraint, and other gene annotations. The ChromGene method and annotations will provide a useful resource for gene-based epigenomic analyses.

Journal Article

Share this book

Add to My Shelf

Learning a genome-wide score of human–mouse conservation at the functional genomics level

by Kwon, Soo Bin , Ernst, Jason in 631/114 , 631/208/177 , 631/208/212/748

2021

Identifying genomic regions with functional genomic properties that are conserved between human and mouse is an important challenge in the context of mouse model studies. To address this, we develop a method to learn a score of evidence of conservation at the functional genomics level by integrating information from a compendium of epigenomic, transcription factor binding, and transcriptomic data from human and mouse. The method, Learning Evidence of Conservation from Integrated Functional genomic annotations (LECIF), trains neural networks to generate this score for the human and mouse genomes. The resulting LECIF score highlights human and mouse regions with shared functional genomic properties and captures correspondence of biologically similar human and mouse annotations. Analysis with independent datasets shows the score also highlights loci associated with similar phenotypes in both species. LECIF will be a resource for mouse model studies by identifying loci whose functional genomic properties are likely conserved. Understanding conserved functional genomic properties between human and mouse provides important context for mouse model studies. Here, the authors present a genome-wide conservation score integrating epigenomic, transcription factor binding, and transcriptomic data from mouse and human genomes.

Journal Article

Share this book

Add to My Shelf

CMImpute: cross-species and tissue imputation of species-level DNA methylation samples across mammalian species

by Horvath, Steve , Maciejewski, Emily , Ernst, Jason in Animal Genetics and Genomics , Animals , Bioinformatics

2025

The large-scale application of the mammalian methylation array has substantially expanded the availability of DNA methylation data in mammalian species. However, this data captures only a small portion of species-tissue combinations. To address this, we develop CMImpute (Cross-species Methylation Imputation), a method based on a conditional variational autoencoder, to impute DNA methylation representing species-tissue combinations. We demonstrate that CMImpute achieves strong sample-wise correlation between imputed and observed values. Using CMImpute and data from 348 species and 59 tissue types, we impute methylation data for 19,786 new species-tissue combinations. We expect CMImpute will be a useful resource for DNA methylation analyses.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter