Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
36 result(s) for "Hou, Kangcheng"
Sort by:
Efficient variance components analysis across millions of genomes
While variance components analysis has emerged as a powerful tool in complex trait genetics, existing methods for fitting variance components do not scale well to large-scale datasets of genetic variation. Here, we present a method for variance components analysis that is accurate and efficient: capable of estimating one hundred variance components on a million individuals genotyped at a million SNPs in a few hours. We illustrate the utility of our method in estimating and partitioning variation in a trait explained by genotyped SNPs (SNP-heritability). Analyzing 22 traits with genotypes from 300,000 individuals across about 8 million common and low frequency SNPs, we observe that per-allele squared effect size increases with decreasing minor allele frequency (MAF) and linkage disequilibrium (LD) consistent with the action of negative selection. Partitioning heritability across 28 functional annotations, we observe enrichment of heritability in FANTOM5 enhancers in asthma, eczema, thyroid and autoimmune disorders. Variance components analysis may be used for a variety of applications including heritability estimation and association mapping. Here, the authors present a computationally efficient method, scalable to extremely large GWAS datasets, and use it for heritabilty analysis of 22 traits from UK Biobank
Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification
Although the cohort-level accuracy of polygenic risk scores (PRSs)—estimates of genetic value at the individual level—has been widely assessed, uncertainty in PRSs remains underexplored. In the present study, we show that Bayesian PRS methods can estimate the variance of an individual’s PRS and can yield well-calibrated credible intervals via posterior sampling. For 13 real traits in the UK Biobank ( n  = 291,273 unrelated ‘white British’), we observe large variances in individual PRS estimates which impact interpretation of PRS-based stratification; averaging across traits, only 0.8% (s.d. = 1.6%) of individuals with PRS point estimates in the top decile have corresponding 95% credible intervals fully contained in the top decile. We provide an analytical estimator for the expectation of individual PRS variance as a function of SNP heritability, number of causal SNPs and sample size. Our results showcase the importance of incorporating uncertainty in individual PRS estimates into subsequent analyses. Analysis of real traits in the UK Biobank demonstrates that large uncertainty in polygenic risk score (PRS) estimates at the individual level impacts the interpretation of subsequent analyses such as PRS-based stratification.
Integrative genomic analyses identify susceptibility genes underlying COVID-19 hospitalization
Despite rapid progress in characterizing the role of host genetics in SARS-Cov-2 infection, there is limited understanding of genes and pathways that contribute to COVID-19. Here, we integrate a genome-wide association study of COVID-19 hospitalization (7,885 cases and 961,804 controls from COVID-19 Host Genetics Initiative) with mRNA expression, splicing, and protein levels (n = 18,502). We identify 27 genes related to inflammation and coagulation pathways whose genetically predicted expression was associated with COVID-19 hospitalization. We functionally characterize the 27 genes using phenome- and laboratory-wide association scans in Vanderbilt Biobank (n = 85,460) and identified coagulation-related clinical symptoms, immunologic, and blood-cell-related biomarkers. We replicate these findings across trans-ethnic studies and observed consistent effects in individuals of diverse ancestral backgrounds in Vanderbilt Biobank, pan-UK Biobank, and Biobank Japan. Our study highlights and reconfirms putative causal genes impacting COVID-19 severity and symptomology through the host inflammatory response. Genome-wide association studies of COVID-19 have identified genetic loci affecting disease severity, but the mechanisms remain to be fully described. Here, the authors use genetically predicted transcriptome, splicing and proteome data to identify potential genes and pathways underlying COVID- 19 severity.
Optimized design of single-cell RNA sequencing experiments for cell-type-specific eQTL analysis
Single-cell RNA-sequencing (scRNA-Seq) is a compelling approach to directly and simultaneously measure cellular composition and state, which can otherwise only be estimated by applying deconvolution methods to bulk RNA-Seq estimates. However, it has not yet become a widely used tool in population-scale analyses, due to its prohibitively high cost. Here we show that given the same budget, the statistical power of cell-type-specific expression quantitative trait loci (eQTL) mapping can be increased through low-coverage per-cell sequencing of more samples rather than high-coverage sequencing of fewer samples. We use simulations starting from one of the largest available real single-cell RNA-Seq data from 120 individuals to also show that multiple experimental designs with different numbers of samples, cells per sample and reads per cell could have similar statistical power, and choosing an appropriate design can yield large cost savings especially when multiplexed workflows are considered. Finally, we provide a practical approach on selecting cost-effective designs for maximizing cell-type-specific eQTL power which is available in the form of a web tool. Single cell RNA-sequencing can be a powerful approach to characterizing cell composition in a population of cells but is thought to be too expensive for population-scale analyses. Here, the authors show how lower coverage of more samples can increase the power to detect cell-type-specific eQTL.
Admixed and single-continental genome segments of the same ancestry have distinct linkage disequilibrium patterns
Background Admixed populations offer valuable insight into the genetic architecture of complex traits. Many studies have proposed methods for genome-wide association study (GWAS) in admixed populations and various simulation studies have evaluated their performances. In this work, we propose another direction of comparison of recently proposed methods for admixed GWAS from a population genetic viewpoint. Results Our theoretical approach mathematically and directly compares the power of methods given that the causal variant is tested. This is done by deriving the variance formula of the methods from the population genetic admixture model. Our results analytically confirm previous observation that the standard GWAS test is more powerful than alternative tests due to leveraging allele frequency heterogeneity in which alternatives do not. As a by-product, we obtain a simple method to improve the power of multi-degrees-of-freedom tests only using summary statistics. We further investigate the problem when the causal variant is not directly known but is detected by tagging variants in linkage disequilibrium (LD). The analysis shows that a genetic segment from admixed genomes may exhibit distinct LD patterns from the single-continental counterpart of the same ancestry. Conclusions While the classic admixture model is successful in predicting GWAS power, its popular extension in the literature falls short in explaining the LD patterns found in simulations and real data, warranting an improved model for LD in admixed genomes.
Estimation of regional polygenicity from GWAS provides insights into the genetic architecture of complex traits
The number of variants that have a non-zero effect on a trait ( i.e . polygenicity) is a fundamental parameter in the study of the genetic architecture of a complex trait. Although many previous studies have investigated polygenicity at a genome-wide scale, a detailed understanding of how polygenicity varies across genomic regions is currently lacking. In this work, we propose an accurate and scalable statistical framework to estimate regional polygenicity for a complex trait. We show that our approach yields approximately unbiased estimates of regional polygenicity in simulations across a wide-range of various genetic architectures. We then partition the polygenicity of anthropometric and blood pressure traits across 6-Mb genomic regions ( N = 290K, UK Biobank) and observe that all analyzed traits are highly polygenic: over one-third of regions harbor at least one causal variant for each of the traits analyzed. Additionally, we observe wide variation in regional polygenicity: on average across all traits, 48.9% of regions contain at least 5 causal SNPs, 5.44% of regions contain at least 50 causal SNPs. Finally, we find that heritability is proportional to polygenicity at the regional level, which is consistent with the hypothesis that heritability enrichments are largely driven by the variation in the number of causal SNPs.
Temporally distinct 3D multi-omic dynamics in the developing human brain
The human hippocampus and prefrontal cortex play critical roles in learning and cognition 1 , 2 , yet the dynamic molecular characteristics of their development remain enigmatic. Here we investigated the epigenomic and three-dimensional chromatin conformational reorganization during the development of the hippocampus and prefrontal cortex, using more than 53,000 joint single-nucleus profiles of chromatin conformation and DNA methylation generated by single-nucleus methyl-3C sequencing (snm3C-seq3) 3 . The remodelling of DNA methylation is temporally separated from chromatin conformation dynamics. Using single-cell profiling and multimodal single-molecule imaging approaches, we have found that short-range chromatin interactions are enriched in neurons, whereas long-range interactions are enriched in glial cells and non-brain tissues. We reconstructed the regulatory programs of cell-type development and differentiation, finding putatively causal common variants for schizophrenia strongly overlapping with chromatin loop-connected, cell-type-specific regulatory regions. Our data provide multimodal resources for studying gene regulatory dynamics in brain development and demonstrate that single-cell three-dimensional multi-omics is a powerful approach for dissecting neuropsychiatric risk loci. Using a single-nucleus multi-omics approach, a study jointly profiles the reorganization of the epigenome and the three-dimensional chromatin conformation during the development of the human hippocampus and prefrontal cortex.
Calibrated prediction intervals for polygenic scores across diverse contexts
Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields. We show that PGS performance varies broadly across contexts and biobanks. Contexts such as age, sex and income can impact PGS accuracy with similar magnitudes as genetic ancestry. Here we introduce an approach (CalPred) that models all contexts jointly to produce prediction intervals that vary across contexts to achieve calibration (include the trait with 90% probability), whereas existing methods are miscalibrated. In analyses of 72 traits across large and diverse biobanks (All of Us and UK Biobank), we find that prediction intervals required adjustment by up to 80% for quantitative traits. For disease traits, PGS-based predictions were miscalibrated across socioeconomic contexts such as annual household income levels, further highlighting the need of accounting for context information in PGS-based prediction across diverse populations. CalPred is a framework that adjusts polygenic score (PGS) prediction intervals based on joint modeling of multiple contexts, such as age, sex and genetic ancestry. PGS show pervasive context-specific accuracy, suggesting that accounting for this will improve portability across contexts.
Polygenic scoring accuracy varies across the genetic ancestry continuum
Polygenic scores (PGSs) have limited portability across different groupings of individuals (for example, by genetic ancestries and/or social determinants of health), preventing their equitable use 1 – 3 . PGS portability has typically been assessed using a single aggregate population-level statistic (for example, R 2 ) 4 , ignoring inter-individual variation within the population. Here, using a large and diverse Los Angeles biobank 5 (ATLAS, n  = 36,778) along with the UK Biobank 6 (UKBB,  n  = 487,409), we show that PGS accuracy decreases individual-to-individual along the continuum of genetic ancestries 7 in all considered populations, even within traditionally labelled ‘homogeneous’ genetic ancestries. The decreasing trend is well captured by a continuous measure of genetic distance (GD) from the PGS training data: Pearson correlation of −0.95 between GD and PGS accuracy averaged across 84 traits. When applying PGS models trained on individuals labelled as white British in the UKBB to individuals with European ancestries in ATLAS, individuals in the furthest GD decile have 14% lower accuracy relative to the closest decile; notably, the closest GD decile of individuals with Hispanic Latino American ancestries show similar PGS performance to the furthest GD decile of individuals with European ancestries. GD is significantly correlated with PGS estimates themselves for 82 of 84 traits, further emphasizing the importance of incorporating the continuum of genetic ancestries in PGS interpretation. Our results highlight the need to move away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when considering PGSs. Using two large biobank datasets, a study shows that the accuracy of polygenic scores decreases as a function of relatedness at the individual level when modelling genetic ancestry as a continuum.
On powerful GWAS in admixed populations
Most importantly, by using such tests, GWAS in individuals with African American ancestry attain superior power relative to GWAS in ancestrally homogeneous populations, such as Europeans or Africans3-5. [...]when allelic effects are similar across ancestries, correcting for local ancestry is expected to impair statistical power for GWAS discovery compared to global ancestry adjustment9 and is more useful as a localization tool in post-GWAS fine-mapping3. Since SNP1 and Tractor-M2 are analogous to disease mapping in ancestrally homogeneous populations3,5, it follows that admixed populations can offer increased power for disease mapping compared to ancestrally homogeneous populations. [...]GWAS in admixed populations attain improved power for discovery over homogeneous populations in either scenario-similar or different ancestry-specific allelic effects-thus further supporting the need for larger genomic studies in such populations. In this study, we showed that disease mapping in admixed populations is well powered when allelic effects are similar across ancestries, whereas Atkinson et al.2 showcased the power gains from two d.f. tests in the presence of effect size heterogeneity by ancestry2,3,5. Since the true extent of heterogeneity in causal allelic effects across ancestries is currently unknown11-15, we recommend careful consideration of the balance between expected allelic effect size heterogeneity across ancestries and association power when selecting a statistical test for GWAS in admixed populations.