Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
LanguageLanguage
-
SubjectSubject
-
Item TypeItem Type
-
DisciplineDiscipline
-
YearFrom:-To:
-
More FiltersMore FiltersIs Peer Reviewed
Done
Filters
Reset
56
result(s) for
"Dey, Kushal K."
Sort by:
Visualizing the structure of RNA-seq expression data using grade of membership models
by
Hsiao, Chiaowen Joyce
,
Stephens, Matthew
,
Dey, Kushal K.
in
Animal models
,
Biology and Life Sciences
,
Cluster analysis
2017
Grade of membership models, also known as \"admixture models\", \"topic models\" or \"Latent Dirichlet Allocation\", are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to model admixed individuals who have ancestry from multiple \"populations\", and in natural language processing to model documents having words from multiple \"topics\". Here we illustrate the potential for these models to cluster samples of RNA-seq gene expression data, measured on either bulk samples or single cells. We also provide methods to help interpret the clusters, by identifying genes that are distinctively expressed in each cluster. By applying these methods to several example RNA-seq applications we demonstrate their utility in identifying and summarizing structure and heterogeneity. Applied to data from the GTEx project on 53 human tissues, the approach highlights similarities among biologically-related tissues and identifies distinctively-expressed genes that recapitulate known biology. Applied to single-cell expression data from mouse preimplantation embryos, the approach highlights both discrete and continuous variation through early embryonic development stages, and highlights genes involved in a variety of relevant processes-from germ cell development, through compaction and morula formation, to the formation of inner cell mass and trophoblast at the blastocyst stage. The methods are implemented in the Bioconductor package CountClust.
Journal Article
Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements
2020
Poor trans-ancestry portability of polygenic risk scores is a consequence of Eurocentric genetic studies and limited knowledge of shared causal variants. Leveraging regulatory annotations may improve portability by prioritizing functional over tagging variants. We constructed a resource of 707 cell-type-specific IMPACT regulatory annotations by aggregating 5,345 epigenetic datasets to predict binding patterns of 142 transcription factors across 245 cell types. We then partitioned the common SNP heritability of 111 genome-wide association study summary statistics of European (average
n
≈ 189,000) and East Asian (average
n
≈ 157,000) origin. IMPACT annotations captured consistent SNP heritability between populations, suggesting prioritization of shared functional variants. Variant prioritization using IMPACT resulted in increased trans-ancestry portability of polygenic risk scores from Europeans to East Asians across all 21 phenotypes analyzed (49.9% mean relative increase in
R
2
). Our study identifies a crucial role for functional annotations such as IMPACT to improve the trans-ancestry portability of genetic data.
A resource of cell-type-specific IMPACT regulatory annotations improves the trans-ancestry portability of polygenic risk scores by prioritizing variants enriched for trait heritability.
Journal Article
Polygenic architecture of rare coding variation across 394,783 exomes
2023
Both common and rare genetic variants influence complex traits and common diseases. Genome-wide association studies have identified thousands of common-variant associations, and more recently, large-scale exome sequencing studies have identified rare-variant associations in hundreds of genes
1
–
3
. However, rare-variant genetic architecture is not well characterized, and the relationship between common-variant and rare-variant architecture is unclear
4
. Here we quantify the heritability explained by the gene-wise burden of rare coding variants across 22 common traits and diseases in 394,783 UK Biobank exomes
5
. Rare coding variants (allele frequency < 1 × 10
−3
) explain 1.3% (s.e. = 0.03%) of phenotypic variance on average—much less than common variants—and most burden heritability is explained by ultrarare loss-of-function variants (allele frequency < 1 × 10
−5
). Common and rare variants implicate the same cell types, with similar enrichments, and they have pleiotropic effects on the same pairs of traits, with similar genetic correlations. They partially colocalize at individual genes and loci, but not to the same extent: burden heritability is strongly concentrated in significant genes, while common-variant heritability is more polygenic, and burden heritability is also more strongly concentrated in constrained genes. Finally, we find that burden heritability for schizophrenia and bipolar disorder
6
,
7
is approximately 2%. Our results indicate that rare coding variants will implicate a tractable number of large-effect genes, that common and rare associations are mechanistically convergent, and that rare coding variants will contribute only modestly to missing heritability and population risk stratification.
An analysis of rare coding variants across 22 common traits and diseases indicates that these variants will contribute substantially to biological insights but modestly to population risk stratification.
Journal Article
SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests
2022
Several biobanks, including UK Biobank (UKBB), are generating large-scale sequencing data. An existing method, SAIGE-GENE, performs well when testing variants with minor allele frequency (MAF) ≤ 1%, but inflation is observed in variance component set-based tests when restricting to variants with MAF ≤ 0.1% or 0.01%. Here, we propose SAIGE-GENE+ with greatly improved type I error control and computational efficiency to facilitate rare variant tests in large-scale data. We further show that incorporating multiple MAF cutoffs and functional annotations can improve power and thus uncover new gene–phenotype associations. In the analysis of UKBB whole exome sequencing data for 30 quantitative and 141 binary traits, SAIGE-GENE+ identified 551 gene–phenotype associations.
SAIGE-GENE+ performs set-based rare variant association tests with improved type 1 error control and computational efficiency by collapsing ultra-rare variants and conducting multiple tests corresponding to different minor allele frequency cutoffs and annotations.
Journal Article
A new sequence logo plot to highlight enrichment and depletion
by
Xie, Dongyue
,
Stephens, Matthew
,
Dey, Kushal K.
in
Algorithms
,
Amino acid sequence
,
Amino acids
2018
Background
Sequence logo plots have become a standard graphical tool for visualizing sequence motifs in DNA, RNA or protein sequences. However standard logo plots primarily highlight enrichment of symbols, and may fail to highlight interesting depletions. Current alternatives that try to highlight depletion often produce visually cluttered logos.
Results
We introduce a new sequence logo plot, the
EDLogo
plot, that highlights both enrichment and depletion, while minimizing visual clutter. We provide an easy-to-use and highly customizable R package
Logolas
to produce a range of logo plots, including
EDLogo
plots. This software also allows elements in the logo plot to be strings of characters, rather than a single character, extending the range of applications beyond the usual DNA, RNA or protein sequences. And the software includes new Empirical Bayes methods to stabilize estimates of enrichment and depletion, and thus better highlight the most significant patterns in data. We illustrate our methods and software on applications to transcription factor binding site motifs, protein sequence alignments and cancer mutation signature profiles.
Conclusions
Our new
EDLogo
plots and flexible software implementation can help data analysts visualize both enrichment and depletion of characters (DNA sequence bases, amino acids, etc.) across a wide range of applications.
Journal Article
Leveraging single-cell ATAC-seq and RNA-seq to identify disease-critical fetal and adult brain cell types
2024
Prioritizing disease-critical cell types by integrating genome-wide association studies (GWAS) with functional data is a fundamental goal. Single-cell chromatin accessibility (scATAC-seq) and gene expression (scRNA-seq) have characterized cell types at high resolution, and studies integrating GWAS with scRNA-seq have shown promise, but studies integrating GWAS with scATAC-seq have been limited. Here, we identify disease-critical fetal and adult brain cell types by integrating GWAS summary statistics from 28 brain-related diseases/traits (average
N
= 298 K) with 3.2 million scATAC-seq and scRNA-seq profiles from 83 cell types. We identified disease-critical fetal (respectively adult) brain cell types for 22 (respectively 23) of 28 traits using scATAC-seq, and for 8 (respectively 17) of 28 traits using scRNA-seq. Significant scATAC-seq enrichments included fetal photoreceptor cells for major depressive disorder, fetal ganglion cells for BMI, fetal astrocytes for ADHD, and adult VGLUT2 excitatory neurons for schizophrenia. Our findings improve our understanding of brain-related diseases/traits and inform future analyses.
This study analyzed data from human cells assayed using single-cell technologies, together with data associating genetic variants to disease, to identify fetal and brain cell types whose biologically critically influences the etiology of disease.
Journal Article
Regional influences on community structure across the tropical-temperate divide
by
Mohan, Dhananjai
,
White, Alexander E.
,
Stephens, Matthew
in
631/158/2450
,
631/158/670
,
631/158/851
2019
Many models to explain the differences in the flora and fauna of tropical and temperate regions assume that whole clades are restricted to the tropics. We develop methods to assess the extent to which biotas are geographically discrete, and find that transition zones between regions occupied by tropical-associated or temperate-associated biotas are often narrow, suggesting a role for freezing temperatures in partitioning global biotas. Across the steepest tropical-temperate gradient in the world, that of the Himalaya, bird communities below and above the freezing line are largely populated by different tropical and temperate biotas with links to India and Southeast Asia, or to China respectively. The importance of the freezing line is retained when clades rather than species are considered, reflecting confinement of different clades to one or another climate zone. The reality of the sharp tropical-temperate boundary adds credence to the argument that exceptional species richness in the tropics reflects species accumulation over time, with limited transgressions of species and clades into the temperate.
Multiple drivers maintain unique species assemblages at multiple biogeographic scales. Here, the authors show that the freezing line is a key barrier generating evolutionary differences in temperate and tropical bird communities across a steep elevational gradient in the Himalaya.
Journal Article
Evaluating the informativeness of deep learning annotations for human complex diseases
2020
Deep learning models have shown great promise in predicting regulatory effects from DNA sequence, but their informativeness for human complex diseases is not fully understood. Here, we evaluate genome-wide SNP annotations from two previous deep learning models, DeepSEA and Basenji, by applying stratified LD score regression to 41 diseases and traits (average
N
= 320K), conditioning on a broad set of coding, conserved and regulatory annotations. We aggregated annotations across all (respectively blood or brain) tissues/cell-types in meta-analyses across all (respectively 11 blood or 8 brain) traits. The annotations were highly enriched for disease heritability, but produced only limited conditionally significant results: non-tissue-specific and brain-specific Basenji-H3K4me3 for all traits and brain traits respectively. We conclude that deep learning models have yet to achieve their full potential to provide considerable unique information for complex disease, and that their conditional informativeness for disease cannot be inferred from their accuracy in predicting regulatory annotations.
Deep learning models have shown great promise in predicting regulatory effects from DNA sequence. Here the authors evaluate sequence-based epigenomic deep learning models and conclude that these models are not yet ready to inform our knowledge of human disease.
Journal Article
Improving the informativeness of Mendelian disease-derived pathogenicity scores for common disease
2020
Despite considerable progress on pathogenicity scores prioritizing variants for Mendelian disease, little is known about the utility of these scores for common disease. Here, we assess the informativeness of Mendelian disease-derived pathogenicity scores for common disease and improve upon existing scores. We first apply stratified linkage disequilibrium (LD) score regression to evaluate published pathogenicity scores across 41 common diseases and complex traits (average
N
= 320K). Several of the resulting annotations are informative for common disease, even after conditioning on a broad set of functional annotations. We then improve upon published pathogenicity scores by developing AnnotBoost, a machine learning framework to impute and denoise pathogenicity scores using a broad set of functional annotations. AnnotBoost substantially increases the informativeness for common disease of both previously uninformative and previously informative pathogenicity scores, implying that Mendelian and common disease variants share similar properties. The boosted scores also produce improvements in heritability model fit and in classifying disease-associated, fine-mapped SNPs. Our boosted scores may improve fine-mapping and candidate gene discovery for common disease.
Pathogenicity scores are instrumental in prioritizing variants for Mendelian disease, yet their application to common disease is largely unexplored. Here, the authors assess the utility of pathogenicity scores for 41 complex traits and develop a framework to improve their informativeness for common disease.
Journal Article
Correction: Visualizing the structure of RNA-seq expression data using grade of membership models
2017
[This corrects the article DOI: 10.1371/journal.pgen.1006599.].
Journal Article