Catalogue Search | MBRL

Visualizing the structure of RNA-seq expression data using grade of membership models

by Hsiao, Chiaowen Joyce , Stephens, Matthew , Dey, Kushal K. in Animal models , Biology and Life Sciences , Cluster analysis

2017

Grade of membership models, also known as \"admixture models\", \"topic models\" or \"Latent Dirichlet Allocation\", are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to model admixed individuals who have ancestry from multiple \"populations\", and in natural language processing to model documents having words from multiple \"topics\". Here we illustrate the potential for these models to cluster samples of RNA-seq gene expression data, measured on either bulk samples or single cells. We also provide methods to help interpret the clusters, by identifying genes that are distinctively expressed in each cluster. By applying these methods to several example RNA-seq applications we demonstrate their utility in identifying and summarizing structure and heterogeneity. Applied to data from the GTEx project on 53 human tissues, the approach highlights similarities among biologically-related tissues and identifies distinctively-expressed genes that recapitulate known biology. Applied to single-cell expression data from mouse preimplantation embryos, the approach highlights both discrete and continuous variation through early embryonic development stages, and highlights genes involved in a variety of relevant processes-from germ cell development, through compaction and morula formation, to the formation of inner cell mass and trophoblast at the blastocyst stage. The methods are implemented in the Bioconductor package CountClust.

Journal Article

Share this book

Add to My Shelf

Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity

by Engreitz, Jesse M. , Weissbrod, Omer , Pasaniuc, Bogdan in 631/208/200 , 631/208/205/2138 , 692/699

2022

Disease-associated single-nucleotide polymorphisms (SNPs) generally do not implicate target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis . Here, we developed a heritability-based framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk. Our optimal combined S2G strategy (cS2G) included seven constituent S2G strategies and achieved a precision of 0.75 and a recall of 0.33, more than doubling the recall of any individual strategy. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 5,095 causal SNP–gene-disease triplets (with S2G-derived functional interpretation) with high confidence. We further applied cS2G to provide an empirical assessment of disease omnigenicity; we determined that the top 1% of genes explained roughly half of the SNP heritability linked to all genes and that gene-level architectures vary with variant allele frequency. A heritability-based framework for evaluation of SNP-to-gene linking methods is used to construct an optimal, combined approach and applied to 49 traits. Analysis of trait omnigenicity suggests gene-level architecture varies depending on variant frequency.

Journal Article

Share this book

Add to My Shelf

Polygenic architecture of rare coding variation across 394,783 exomes

by Nadig, Ajay , Weiner, Daniel J. , Neale, Benjamin M. in 38/39 , 45/23 , 45/43

2023

Both common and rare genetic variants influence complex traits and common diseases. Genome-wide association studies have identified thousands of common-variant associations, and more recently, large-scale exome sequencing studies have identified rare-variant associations in hundreds of genes 1 – 3 . However, rare-variant genetic architecture is not well characterized, and the relationship between common-variant and rare-variant architecture is unclear 4 . Here we quantify the heritability explained by the gene-wise burden of rare coding variants across 22 common traits and diseases in 394,783 UK Biobank exomes 5 . Rare coding variants (allele frequency < 1 × 10 −3 ) explain 1.3% (s.e. = 0.03%) of phenotypic variance on average—much less than common variants—and most burden heritability is explained by ultrarare loss-of-function variants (allele frequency < 1 × 10 −5 ). Common and rare variants implicate the same cell types, with similar enrichments, and they have pleiotropic effects on the same pairs of traits, with similar genetic correlations. They partially colocalize at individual genes and loci, but not to the same extent: burden heritability is strongly concentrated in significant genes, while common-variant heritability is more polygenic, and burden heritability is also more strongly concentrated in constrained genes. Finally, we find that burden heritability for schizophrenia and bipolar disorder 6 , 7 is approximately 2%. Our results indicate that rare coding variants will implicate a tractable number of large-effect genes, that common and rare associations are mechanistically convergent, and that rare coding variants will contribute only modestly to missing heritability and population risk stratification. An analysis of rare coding variants across 22 common traits and diseases indicates that these variants will contribute substantially to biological insights but modestly to population risk stratification.

Journal Article

Share this book

Add to My Shelf

Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements

by Kawakami, Eiryo , Price, Alkes L. , Sugishita, Hiroki in 45/15 , 45/43 , 631/208

2020

Poor trans-ancestry portability of polygenic risk scores is a consequence of Eurocentric genetic studies and limited knowledge of shared causal variants. Leveraging regulatory annotations may improve portability by prioritizing functional over tagging variants. We constructed a resource of 707 cell-type-specific IMPACT regulatory annotations by aggregating 5,345 epigenetic datasets to predict binding patterns of 142 transcription factors across 245 cell types. We then partitioned the common SNP heritability of 111 genome-wide association study summary statistics of European (average n ≈ 189,000) and East Asian (average n ≈ 157,000) origin. IMPACT annotations captured consistent SNP heritability between populations, suggesting prioritization of shared functional variants. Variant prioritization using IMPACT resulted in increased trans-ancestry portability of polygenic risk scores from Europeans to East Asians across all 21 phenotypes analyzed (49.9% mean relative increase in R 2 ). Our study identifies a crucial role for functional annotations such as IMPACT to improve the trans-ancestry portability of genetic data. A resource of cell-type-specific IMPACT regulatory annotations improves the trans-ancestry portability of polygenic risk scores by prioritizing variants enriched for trait heritability.

Journal Article

Share this book

Add to My Shelf

SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests

by Zhao, Zhangchen , Bi, Wenjian , Neale, Benjamin M. in 45/43 , 631/114/794 , 631/208/205/2138

2022

Several biobanks, including UK Biobank (UKBB), are generating large-scale sequencing data. An existing method, SAIGE-GENE, performs well when testing variants with minor allele frequency (MAF) ≤ 1%, but inflation is observed in variance component set-based tests when restricting to variants with MAF ≤ 0.1% or 0.01%. Here, we propose SAIGE-GENE+ with greatly improved type I error control and computational efficiency to facilitate rare variant tests in large-scale data. We further show that incorporating multiple MAF cutoffs and functional annotations can improve power and thus uncover new gene–phenotype associations. In the analysis of UKBB whole exome sequencing data for 30 quantitative and 141 binary traits, SAIGE-GENE+ identified 551 gene–phenotype associations. SAIGE-GENE+ performs set-based rare variant association tests with improved type 1 error control and computational efficiency by collapsing ultra-rare variants and conducting multiple tests corresponding to different minor allele frequency cutoffs and annotations.

Journal Article

Share this book

Add to My Shelf

A new sequence logo plot to highlight enrichment and depletion

by Xie, Dongyue , Stephens, Matthew , Dey, Kushal K. in Algorithms , Amino acid sequence , Amino acids

2018

Background Sequence logo plots have become a standard graphical tool for visualizing sequence motifs in DNA, RNA or protein sequences. However standard logo plots primarily highlight enrichment of symbols, and may fail to highlight interesting depletions. Current alternatives that try to highlight depletion often produce visually cluttered logos. Results We introduce a new sequence logo plot, the EDLogo plot, that highlights both enrichment and depletion, while minimizing visual clutter. We provide an easy-to-use and highly customizable R package Logolas to produce a range of logo plots, including EDLogo plots. This software also allows elements in the logo plot to be strings of characters, rather than a single character, extending the range of applications beyond the usual DNA, RNA or protein sequences. And the software includes new Empirical Bayes methods to stabilize estimates of enrichment and depletion, and thus better highlight the most significant patterns in data. We illustrate our methods and software on applications to transcription factor binding site motifs, protein sequence alignments and cancer mutation signature profiles. Conclusions Our new EDLogo plots and flexible software implementation can help data analysts visualize both enrichment and depletion of characters (DNA sequence bases, amino acids, etc.) across a wide range of applications.

Journal Article

Share this book

Add to My Shelf

Single-cell multi-ome regression models identify functional and disease-associated enhancers and enable chromatin potential analysis

by Wong, Wilfred , Hartemink, Alexander J. , Leslie, Christina S. in 631/114/794 , 631/208/176 , 631/208/457

2024

We present a gene-level regulatory model, single-cell ATAC + RNA linking (SCARlink), which predicts single-cell gene expression and links enhancers to target genes using multi-ome (scRNA-seq and scATAC–seq co-assay) sequencing data. The approach uses regularized Poisson regression on tile-level accessibility data to jointly model all regulatory effects at a gene locus, avoiding the limitations of pairwise gene–peak correlations and dependence on peak calling. SCARlink outperformed existing gene scoring methods for imputing gene expression from chromatin accessibility across high-coverage multi-ome datasets while giving comparable to improved performance on low-coverage datasets. Shapley value analysis on trained models identified cell-type-specific gene enhancers that are validated by promoter capture Hi-C and are 11× to 15× and 5× to 12× enriched in fine-mapped eQTLs and fine-mapped genome-wide association study (GWAS) variants, respectively. We further show that SCARlink-predicted and observed gene expression vectors provide a robust way to compute a chromatin potential vector field to enable developmental trajectory analysis. Single-cell ATAC + RNA linking (SCARlink) predicts gene expression by jointly modeling local tiled chromatin accessibility using regularized Poisson regression on multi-ome data. SCARlink predictions can be used to identify cell-type-specific enhancers and perform chromatin potential analysis.

Journal Article

Share this book

Add to My Shelf

Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics

by Montoro, Daniel T. , Engreitz, Jesse M. , Price, Alkes L. in 631/208/199 , 631/208/205/2138 , 631/208/212/2019

2022

Genome-wide association studies provide a powerful means of identifying loci and genes contributing to disease, but in many cases, the related cell types/states through which genes confer disease risk remain unknown. Deciphering such relationships is important for identifying pathogenic processes and developing therapeutics. In the present study, we introduce sc-linker, a framework for integrating single-cell RNA-sequencing, epigenomic SNP-to-gene maps and genome-wide association study summary statistics to infer the underlying cell types and processes by which genetic variants influence disease. The inferred disease enrichments recapitulated known biology and highlighted notable cell–disease relationships, including γ-aminobutyric acid-ergic neurons in major depressive disorder, a disease-dependent M-cell program in ulcerative colitis and a disease-specific complement cascade process in multiple sclerosis. In autoimmune disease, both healthy and disease-dependent immune cell-type programs were associated, whereas only disease-dependent epithelial cell programs were prominent, suggesting a role in disease response rather than initiation. Our framework provides a powerful approach for identifying the cell types and cellular processes by which genetic variants influence disease. The sc-linker is an analysis framework that combines genome-wide association study summary statistics, epigenomics and single-cell transcriptomes to identify disease-critical cell types and cellular processes across tissues and states.

Journal Article

Share this book

Add to My Shelf

Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data

by Wang, Bruce , Pasaniuc, Bogdan , Price, Alkes L. in 631/114/794 , 631/208/199 , 631/208/205/2138

2022

Single-cell RNA sequencing (scRNA-seq) provides unique insights into the pathology and cellular origin of disease. We introduce single-cell disease relevance score (scDRS), an approach that links scRNA-seq with polygenic disease risk at single-cell resolution, independent of annotated cell types. scDRS identifies cells exhibiting excess expression across disease-associated genes implicated by genome-wide association studies (GWASs). We applied scDRS to 74 diseases/traits and 1.3 million single-cell gene-expression profiles across 31 tissues/organs. Cell-type-level results broadly recapitulated known cell-type–disease associations. Individual-cell-level results identified subpopulations of disease-associated cells not captured by existing cell-type labels, including T cell subpopulations associated with inflammatory bowel disease, partially characterized by their effector-like states; neuron subpopulations associated with schizophrenia, partially characterized by their spatial locations; and hepatocyte subpopulations associated with triglyceride levels, partially characterized by their higher ploidy levels. Genes whose expression was correlated with the scDRS score across cells (reflecting coexpression with GWAS disease-associated genes) were strongly enriched for gold-standard drug target and Mendelian disease genes. scDRS associates individual cells in scRNA-seq with disease by scoring single-cell transcriptomes using GWAS gene signatures. Applied to 74 GWAS and 1.3 million single-cell profiles, scDRS identifies specific cellular subpopulations associated with these diseases.

Journal Article

Share this book

Add to My Shelf

Leveraging single-cell ATAC-seq and RNA-seq to identify disease-critical fetal and adult brain cell types

by Jagadeesh, Karthik , Kim, Samuel S. , Kellis, Manolis in 38/43 , 45/43 , 631/114

2024

Prioritizing disease-critical cell types by integrating genome-wide association studies (GWAS) with functional data is a fundamental goal. Single-cell chromatin accessibility (scATAC-seq) and gene expression (scRNA-seq) have characterized cell types at high resolution, and studies integrating GWAS with scRNA-seq have shown promise, but studies integrating GWAS with scATAC-seq have been limited. Here, we identify disease-critical fetal and adult brain cell types by integrating GWAS summary statistics from 28 brain-related diseases/traits (average N = 298 K) with 3.2 million scATAC-seq and scRNA-seq profiles from 83 cell types. We identified disease-critical fetal (respectively adult) brain cell types for 22 (respectively 23) of 28 traits using scATAC-seq, and for 8 (respectively 17) of 28 traits using scRNA-seq. Significant scATAC-seq enrichments included fetal photoreceptor cells for major depressive disorder, fetal ganglion cells for BMI, fetal astrocytes for ADHD, and adult VGLUT2 excitatory neurons for schizophrenia. Our findings improve our understanding of brain-related diseases/traits and inform future analyses. This study analyzed data from human cells assayed using single-cell technologies, together with data associating genetic variants to disease, to identify fetal and brain cell types whose biologically critically influences the etiology of disease.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter