Catalogue Search | MBRL

Efficient variance components analysis across millions of genomes

by Pasaniuc, Bogdan , Wu, Yue , Burch, Kathryn S. in 631/114/2415 , 631/208/729 , Alleles

2020

While variance components analysis has emerged as a powerful tool in complex trait genetics, existing methods for fitting variance components do not scale well to large-scale datasets of genetic variation. Here, we present a method for variance components analysis that is accurate and efficient: capable of estimating one hundred variance components on a million individuals genotyped at a million SNPs in a few hours. We illustrate the utility of our method in estimating and partitioning variation in a trait explained by genotyped SNPs (SNP-heritability). Analyzing 22 traits with genotypes from 300,000 individuals across about 8 million common and low frequency SNPs, we observe that per-allele squared effect size increases with decreasing minor allele frequency (MAF) and linkage disequilibrium (LD) consistent with the action of negative selection. Partitioning heritability across 28 functional annotations, we observe enrichment of heritability in FANTOM5 enhancers in asthma, eczema, thyroid and autoimmune disorders. Variance components analysis may be used for a variety of applications including heritability estimation and association mapping. Here, the authors present a computationally efficient method, scalable to extremely large GWAS datasets, and use it for heritabilty analysis of 22 traits from UK Biobank

Journal Article

Share this book

Add to My Shelf

Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification

by Lapinska, Sandra , Vilhjálmsson, Bjarni , Ding, Yi in 45/43 , 631/114 , 631/208/212

2022

Although the cohort-level accuracy of polygenic risk scores (PRSs)—estimates of genetic value at the individual level—has been widely assessed, uncertainty in PRSs remains underexplored. In the present study, we show that Bayesian PRS methods can estimate the variance of an individual’s PRS and can yield well-calibrated credible intervals via posterior sampling. For 13 real traits in the UK Biobank ( n = 291,273 unrelated ‘white British’), we observe large variances in individual PRS estimates which impact interpretation of PRS-based stratification; averaging across traits, only 0.8% (s.d. = 1.6%) of individuals with PRS point estimates in the top decile have corresponding 95% credible intervals fully contained in the top decile. We provide an analytical estimator for the expectation of individual PRS variance as a function of SNP heritability, number of causal SNPs and sample size. Our results showcase the importance of incorporating uncertainty in individual PRS estimates into subsequent analyses. Analysis of real traits in the UK Biobank demonstrates that large uncertainty in polygenic risk score (PRS) estimates at the individual level impacts the interpretation of subsequent analyses such as PRS-based stratification.

Journal Article

Share this book

Add to My Shelf

Integrative genomic analyses identify susceptibility genes underlying COVID-19 hospitalization

by Wendt, Frank R. , Lu, Zeyun , Pasaniuc, Bogdan in 631/208 , 631/208/205 , 692/699/255/2514

2021

Despite rapid progress in characterizing the role of host genetics in SARS-Cov-2 infection, there is limited understanding of genes and pathways that contribute to COVID-19. Here, we integrate a genome-wide association study of COVID-19 hospitalization (7,885 cases and 961,804 controls from COVID-19 Host Genetics Initiative) with mRNA expression, splicing, and protein levels (n = 18,502). We identify 27 genes related to inflammation and coagulation pathways whose genetically predicted expression was associated with COVID-19 hospitalization. We functionally characterize the 27 genes using phenome- and laboratory-wide association scans in Vanderbilt Biobank (n = 85,460) and identified coagulation-related clinical symptoms, immunologic, and blood-cell-related biomarkers. We replicate these findings across trans-ethnic studies and observed consistent effects in individuals of diverse ancestral backgrounds in Vanderbilt Biobank, pan-UK Biobank, and Biobank Japan. Our study highlights and reconfirms putative causal genes impacting COVID-19 severity and symptomology through the host inflammatory response. Genome-wide association studies of COVID-19 have identified genetic loci affecting disease severity, but the mechanisms remain to be fully described. Here, the authors use genetically predicted transcriptome, splicing and proteome data to identify potential genes and pathways underlying COVID- 19 severity.

Journal Article

Share this book

Add to My Shelf

Optimized design of single-cell RNA sequencing experiments for cell-type-specific eQTL analysis

by Mandric, Igor , Pasaniuc, Bogdan , Satija, Rahul in 631/114/2397 , 631/114/2415 , 631/208/480

2020

Single-cell RNA-sequencing (scRNA-Seq) is a compelling approach to directly and simultaneously measure cellular composition and state, which can otherwise only be estimated by applying deconvolution methods to bulk RNA-Seq estimates. However, it has not yet become a widely used tool in population-scale analyses, due to its prohibitively high cost. Here we show that given the same budget, the statistical power of cell-type-specific expression quantitative trait loci (eQTL) mapping can be increased through low-coverage per-cell sequencing of more samples rather than high-coverage sequencing of fewer samples. We use simulations starting from one of the largest available real single-cell RNA-Seq data from 120 individuals to also show that multiple experimental designs with different numbers of samples, cells per sample and reads per cell could have similar statistical power, and choosing an appropriate design can yield large cost savings especially when multiplexed workflows are considered. Finally, we provide a practical approach on selecting cost-effective designs for maximizing cell-type-specific eQTL power which is available in the form of a web tool. Single cell RNA-sequencing can be a powerful approach to characterizing cell composition in a population of cells but is thought to be too expensive for population-scale analyses. Here, the authors show how lower coverage of more samples can increase the power to detect cell-type-specific eQTL.

Journal Article

Share this book

Add to My Shelf

Admixed and single-continental genome segments of the same ancestry have distinct linkage disequilibrium patterns

by Hou, Kangcheng , Lee, Hanbin , Pasaniuc, Bogdan in Admixture , ancestry , Animal Genetics and Genomics

2025

Background Admixed populations offer valuable insight into the genetic architecture of complex traits. Many studies have proposed methods for genome-wide association study (GWAS) in admixed populations and various simulation studies have evaluated their performances. In this work, we propose another direction of comparison of recently proposed methods for admixed GWAS from a population genetic viewpoint. Results Our theoretical approach mathematically and directly compares the power of methods given that the causal variant is tested. This is done by deriving the variance formula of the methods from the population genetic admixture model. Our results analytically confirm previous observation that the standard GWAS test is more powerful than alternative tests due to leveraging allele frequency heterogeneity in which alternatives do not. As a by-product, we obtain a simple method to improve the power of multi-degrees-of-freedom tests only using summary statistics. We further investigate the problem when the causal variant is not directly known but is detected by tagging variants in linkage disequilibrium (LD). The analysis shows that a genetic segment from admixed genomes may exhibit distinct LD patterns from the single-continental counterpart of the same ancestry. Conclusions While the classic admixture model is successful in predicting GWAS power, its popular extension in the literature falls short in explaining the LD patterns found in simulations and real data, warranting an improved model for LD in admixed genomes.

Journal Article

Share this book

Add to My Shelf

Estimation of regional polygenicity from GWAS provides insights into the genetic architecture of complex traits

by Johnson, Ruth , Hou, Kangcheng , Pasaniuc, Bogdan in Algorithms , Biology and Life Sciences , Blood pressure

2021

The number of variants that have a non-zero effect on a trait ( i.e . polygenicity) is a fundamental parameter in the study of the genetic architecture of a complex trait. Although many previous studies have investigated polygenicity at a genome-wide scale, a detailed understanding of how polygenicity varies across genomic regions is currently lacking. In this work, we propose an accurate and scalable statistical framework to estimate regional polygenicity for a complex trait. We show that our approach yields approximately unbiased estimates of regional polygenicity in simulations across a wide-range of various genetic architectures. We then partition the polygenicity of anthropometric and blood pressure traits across 6-Mb genomic regions ( N = 290K, UK Biobank) and observe that all analyzed traits are highly polygenic: over one-third of regions harbor at least one causal variant for each of the traits analyzed. Additionally, we observe wide variation in regional polygenicity: on average across all traits, 48.9% of regions contain at least 5 causal SNPs, 5.44% of regions contain at least 50 causal SNPs. Finally, we find that heritability is proportional to polygenicity at the regional level, which is consistent with the hypothesis that heritability enrichments are largely driven by the variation in the number of causal SNPs.

Journal Article

Share this book

Add to My Shelf

Temporally distinct 3D multi-omic dynamics in the developing human brain

by Pasaniuc, Bogdan , Paredes, Mercedes F. , Zhang, Martin Jinye in 14/1 , 14/32 , 38/23

2024

The human hippocampus and prefrontal cortex play critical roles in learning and cognition 1 , 2 , yet the dynamic molecular characteristics of their development remain enigmatic. Here we investigated the epigenomic and three-dimensional chromatin conformational reorganization during the development of the hippocampus and prefrontal cortex, using more than 53,000 joint single-nucleus profiles of chromatin conformation and DNA methylation generated by single-nucleus methyl-3C sequencing (snm3C-seq3) 3 . The remodelling of DNA methylation is temporally separated from chromatin conformation dynamics. Using single-cell profiling and multimodal single-molecule imaging approaches, we have found that short-range chromatin interactions are enriched in neurons, whereas long-range interactions are enriched in glial cells and non-brain tissues. We reconstructed the regulatory programs of cell-type development and differentiation, finding putatively causal common variants for schizophrenia strongly overlapping with chromatin loop-connected, cell-type-specific regulatory regions. Our data provide multimodal resources for studying gene regulatory dynamics in brain development and demonstrate that single-cell three-dimensional multi-omics is a powerful approach for dissecting neuropsychiatric risk loci. Using a single-nucleus multi-omics approach, a study jointly profiles the reorganization of the epigenome and the three-dimensional chromatin conformation during the development of the human hippocampus and prefrontal cortex.

Journal Article

Share this book

Add to My Shelf

Calibrated prediction intervals for polygenic scores across diverse contexts

by Shi, Zhuozheng , Ding, Yi , Boulier, Kristin in 45/23 , 45/43 , 631/114

2024

Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields. We show that PGS performance varies broadly across contexts and biobanks. Contexts such as age, sex and income can impact PGS accuracy with similar magnitudes as genetic ancestry. Here we introduce an approach (CalPred) that models all contexts jointly to produce prediction intervals that vary across contexts to achieve calibration (include the trait with 90% probability), whereas existing methods are miscalibrated. In analyses of 72 traits across large and diverse biobanks (All of Us and UK Biobank), we find that prediction intervals required adjustment by up to 80% for quantitative traits. For disease traits, PGS-based predictions were miscalibrated across socioeconomic contexts such as annual household income levels, further highlighting the need of accounting for context information in PGS-based prediction across diverse populations. CalPred is a framework that adjusts polygenic score (PGS) prediction intervals based on joint modeling of multiple contexts, such as age, sex and genetic ancestry. PGS show pervasive context-specific accuracy, suggesting that accounting for this will improve portability across contexts.

Journal Article

Share this book

Add to My Shelf

Polygenic scoring accuracy varies across the genetic ancestry continuum

by Ding, Yi , Boulier, Kristin , Pasaniuc, Bogdan in 45/43 , 631/114/2415 , 631/208

2023

Polygenic scores (PGSs) have limited portability across different groupings of individuals (for example, by genetic ancestries and/or social determinants of health), preventing their equitable use 1 – 3 . PGS portability has typically been assessed using a single aggregate population-level statistic (for example, R 2 ) 4 , ignoring inter-individual variation within the population. Here, using a large and diverse Los Angeles biobank 5 (ATLAS, n = 36,778) along with the UK Biobank 6 (UKBB, n = 487,409), we show that PGS accuracy decreases individual-to-individual along the continuum of genetic ancestries 7 in all considered populations, even within traditionally labelled ‘homogeneous’ genetic ancestries. The decreasing trend is well captured by a continuous measure of genetic distance (GD) from the PGS training data: Pearson correlation of −0.95 between GD and PGS accuracy averaged across 84 traits. When applying PGS models trained on individuals labelled as white British in the UKBB to individuals with European ancestries in ATLAS, individuals in the furthest GD decile have 14% lower accuracy relative to the closest decile; notably, the closest GD decile of individuals with Hispanic Latino American ancestries show similar PGS performance to the furthest GD decile of individuals with European ancestries. GD is significantly correlated with PGS estimates themselves for 82 of 84 traits, further emphasizing the importance of incorporating the continuum of genetic ancestries in PGS interpretation. Our results highlight the need to move away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when considering PGSs. Using two large biobank datasets, a study shows that the accuracy of polygenic scores decreases as a function of relatedness at the individual level when modelling genetic ancestry as a continuum.

Journal Article

Share this book

Add to My Shelf

On powerful GWAS in admixed populations

by Hou, Kangcheng , Pasaniuc, Bogdan , Bhattacharya, Arjun in 45/43 , 631/208/1516 , 631/208/457

2021

Most importantly, by using such tests, GWAS in individuals with African American ancestry attain superior power relative to GWAS in ancestrally homogeneous populations, such as Europeans or Africans3-5. [...]when allelic effects are similar across ancestries, correcting for local ancestry is expected to impair statistical power for GWAS discovery compared to global ancestry adjustment9 and is more useful as a localization tool in post-GWAS fine-mapping3. Since SNP1 and Tractor-M2 are analogous to disease mapping in ancestrally homogeneous populations3,5, it follows that admixed populations can offer increased power for disease mapping compared to ancestrally homogeneous populations. [...]GWAS in admixed populations attain improved power for discovery over homogeneous populations in either scenario-similar or different ancestry-specific allelic effects-thus further supporting the need for larger genomic studies in such populations. In this study, we showed that disease mapping in admixed populations is well powered when allelic effects are similar across ancestries, whereas Atkinson et al.2 showcased the power gains from two d.f. tests in the presence of effect size heterogeneity by ancestry2,3,5. Since the true extent of heterogeneity in causal allelic effects across ancestries is currently unknown11-15, we recommend careful consideration of the balance between expected allelic effect size heterogeneity across ancestries and association power when selecting a statistical test for GWAS in admixed populations.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter