Catalogue Search | MBRL

Dissecting the genetics of complex traits using summary association statistics

by Price, Alkes L. , Pasaniuc, Bogdan in 631/208/205/2138 , 631/208/726/649 , 631/208/729/743

2017

Key Points Summary association statistics from genome-wide association studies (GWAS) are widely available in large sample sizes across hundreds of complex traits. Analyses of such data can yield important insights, motivating the development of new statistical methods in this area. Single variant association analysis (including meta-analyses, conditional association and imputation) can be performed effectively using summary association data. These methods often rely on linkage disequilibrium (LD) information from population reference panels. Summary association data can be used to perform gene-based association tests to identify genes influencing complex traits. In particular, expression quantitative trait loci (eQTLs) can be integrated to identify genes whose expression levels influence complex traits, and rare variant association tests can aggregate evidence of association across multiple rare variants in a gene. Statistical fine-mapping of causal variant (or variants) at GWAS loci can be performed using summary association data, leveraging information on the strength of association, functional genomic annotations and differences in LD patterns across different populations. It is becoming increasingly clear that most complex traits and common diseases have a large number of causal variants with small effects. Summary association statistics can be used to understand these polygenic architectures and leverage them for polygenic risk prediction. Summary association statistics have broad utility in cross-trait analyses, including detecting pleiotropic effects and inferring genetic correlations between traits. Pleiotropic effects can be used in Mendelian randomization analyses to draw inferences about causal relationships among traits. Investigating the genetic basis of complex traits and diseases using individual-level genetic data from genome-wide association studies is often hampered by privacy concerns and logistical considerations. Here, the authors review recent statistical methods that leverage summary association data, which are widely available and can circumvent these issues. During the past decade, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyse summary association statistics. Here, we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.

Journal Article

Share this book

Add to My Shelf

Integrating Functional Data to Prioritize Causal Variants in Statistical Fine-Mapping Studies

by Kichaev, Gleb , Yang, Wen-Yun , Pasaniuc, Bogdan in Algorithms , Annotations , Biology and Life Sciences

2014

Standard statistical approaches for prioritization of variants for functional testing in fine-mapping studies either use marginal association statistics or estimate posterior probabilities for variants to be causal under simplifying assumptions. Here, we present a probabilistic framework that integrates association strength with functional genomic annotation data to improve accuracy in selecting plausible causal variants for functional validation. A key feature of our approach is that it empirically estimates the contribution of each functional annotation to the trait of interest directly from summary association statistics while allowing for multiple causal variants at any risk locus. We devise efficient algorithms that estimate the parameters of our model across all risk loci to further increase performance. Using simulations starting from the 1000 Genomes data, we find that our framework consistently outperforms the current state-of-the-art fine-mapping methods, reducing the number of variants that need to be selected to capture 90% of the causal variants from an average of 13.3 to 10.4 SNPs per locus (as compared to the next-best performing strategy). Furthermore, we introduce a cost-to-benefit optimization framework for determining the number of variants to be followed up in functional assays and assess its performance using real and simulation data. We validate our findings using a large scale meta-analysis of four blood lipids traits and find that the relative probability for causality is increased for variants in exons and transcription start sites and decreased in repressed genomic regions at the risk loci of these traits. Using these highly predictive, trait-specific functional annotations, we estimate causality probabilities across all traits and variants, reducing the size of the 90% confidence set from an average of 17.5 to 13.5 variants per locus in this data.

Journal Article

Share this book

Add to My Shelf

Using Extended Genealogy to Estimate Components of Heritability for 23 Quantitative and Dichotomous Traits

by Pasaniuc, Bogdan , Price, Alkes L. , Bhatia, Gaurav in Biology , Cardiovascular disease , Epigenetic inheritance

2013

Important knowledge about the determinants of complex human phenotypes can be obtained from the estimation of heritability, the fraction of phenotypic variation in a population that is determined by genetic factors. Here, we make use of extensive phenotype data in Iceland, long-range phased genotypes, and a population-wide genealogical database to examine the heritability of 11 quantitative and 12 dichotomous phenotypes in a sample of 38,167 individuals. Most previous estimates of heritability are derived from family-based approaches such as twin studies, which may be biased upwards by epistatic interactions or shared environment. Our estimates of heritability, based on both closely and distantly related pairs of individuals, are significantly lower than those from previous studies. We examine phenotypic correlations across a range of relationships, from siblings to first cousins, and find that the excess phenotypic correlation in these related individuals is predominantly due to shared environment as opposed to dominance or epistasis. We also develop a new method to jointly estimate narrow-sense heritability and the heritability explained by genotyped SNPs. Unlike existing methods, this approach permits the use of information from both closely and distantly related pairs of individuals, thereby reducing the variance of estimates of heritability explained by genotyped SNPs while preventing upward bias. Our results show that common SNPs explain a larger proportion of the heritability than previously thought, with SNPs present on Illumina 300K genotyping arrays explaining more than half of the heritability for the 23 phenotypes examined in this study. Much of the remaining heritability is likely to be due to rare alleles that are not captured by standard genotyping arrays.

Journal Article

Share this book

Add to My Shelf

Distinguishing genetic correlation from causation across 52 diseases and complex traits

by Price, Alkes L. , O’Connor, Luke J. in 45/43 , 631/208 , 631/208/205/2138

2018

Mendelian randomization, a method to infer causal relationships, is confounded by genetic correlations reflecting shared etiology. We developed a model in which a latent causal variable mediates the genetic correlation; trait 1 is partially genetically causal for trait 2 if it is strongly genetically correlated with the latent causal variable, quantified using the genetic causality proportion. We fit this model using mixed fourth moments E ( α 1 2 α 1 α 2 ) and E α 2 2 α 1 α 2 of marginal effect sizes for each trait; if trait 1 is causal for trait 2, then SNPs affecting trait 1 (large α 1 2 ) will have correlated effects on trait 2 (large α 1 α 2 ), but not vice versa. In simulations, our method avoided false positives due to genetic correlations, unlike Mendelian randomization. Across 52 traits (average n = 331,000), we identified 30 causal relationships with high genetic causality proportion estimates. Novel findings included a causal effect of low-density lipoprotein on bone mineral density, consistent with clinical trials of statins in osteoporosis. This study presents a new latent causal variable (LCV) model that distinguishes between genetic correlation and causation. Applying LCV to genome-wide association summary statistics for 52 traits identified genetically causal effects for 59 pairs of traits.

Journal Article

Share this book

Add to My Shelf

Population Structure and Eigenanalysis

by Price, Alkes L. , Patterson, Nick , Reich, David in Computer Simulation - statistics & numerical data , Eigenfunctions , Eukaryotes

2006

Current methods for inferring population structure from genetic data do not provide formal significance tests for population differentiation. We discuss an approach to studying population structure (principal components analysis) that was first applied to genetic data by Cavalli-Sforza and colleagues. We place the method on a solid statistical footing, using results from modern statistics to develop formal significance tests. We also uncover a general \"phase change\" phenomenon about the ability to detect structure in genetic data, which emerges from the statistical theory we use, and has an important implication for the ability to discover structure in genetic data: for a fixed but large dataset size, divergence between two populations (as measured, for example, by a statistic like FST) below a threshold is essentially undetectable, but a little above threshold, detection will be easy. This means that we can predict the dataset size needed to detect structure.

Journal Article

Share this book

Add to My Shelf

Reference-based phasing using the Haplotype Reference Consortium panel

by McCarthy, Shane , K Finucane, Hilary , L Price, Alkes in 631/114/794 , 631/208 , 631/208/205/2138

2016

Po-Ru Loh, Alkes Price and colleagues present Eagle2, a reference-based phasing algorithm that allows for highly accurate and efficient phasing of genotypes across a broad range of cohort sizes. They demonstrate an approximately 10% improvement in accuracy and 20% improvement in speed compared to a competing method, SHAPEIT2. Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing in a genotyped cohort, an approach that can yield high accuracy in very large cohorts but attains lower accuracy in smaller cohorts. Here we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm, Eagle2, that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform. We demonstrate that Eagle2 attains a ∼20× speedup and ∼10% increase in accuracy compared to reference-based phasing using SHAPEIT2. On European-ancestry samples, Eagle2 with the HRC panel achieves >2× the accuracy of 1000 Genomes–based phasing. Eagle2 is open source and freely available for HRC-based phasing via the Sanger Imputation Service and the Michigan Imputation Server.

Journal Article

Share this book

Add to My Shelf

Fast and accurate long-range phasing in a UK Biobank cohort

by Palamara, Pier Francesco , Price, Alkes L , Loh, Po-Ru in 631/114/794 , 631/208 , Accuracy

2016

Po-Ru Loh, Pier Francesco Palamara and Alkes Price develop a new long-range phasing method, Eagle, that harnesses long, shared identical-by-descent tracts and can be applied to large outbred populations. They use Eagle to phase samples from the UK Biobank and find that it is faster and has better accuracy than existing methods. Recent work has leveraged the extensive genotyping of the Icelandic population to perform long-range phasing (LRP), enabling accurate imputation and association analysis of rare variants in target samples typed on genotyping arrays. Here we develop a fast and accurate LRP method, Eagle, that extends this paradigm to populations with much smaller proportions of genotyped samples by harnessing long (>4-cM) identical-by-descent (IBD) tracts shared among distantly related individuals. We applied Eagle to N ≈ 150,000 samples (0.2% of the British population) from the UK Biobank, and we determined that it is 1–2 orders of magnitude faster than existing methods while achieving similar or better phasing accuracy (switch error rate ≈ 0.3%, corresponding to perfect phase in a majority of 10-Mb segments). We also observed that, when used within an imputation pipeline, Eagle prephasing improved downstream imputation accuracy in comparison to prephasing in batches using existing methods, as necessary to achieve comparable computational cost.

Journal Article

Share this book

Add to My Shelf

Identifying loci with different allele frequencies among cases of eight psychiatric disorders using CC-GWAS

by Peyrot, Wouter J. , Price, Alkes L. in 631/208 , 631/208/205 , 692/699/476

2021

Psychiatric disorders are highly genetically correlated, but little research has been conducted on the genetic differences between disorders. We developed a new method (case–case genome-wide association study; CC-GWAS) to test for differences in allele frequency between cases of two disorders using summary statistics from the respective case–control GWAS, transcending current methods that require individual-level data. Simulations and analytical computations confirm that CC-GWAS is well powered with effective control of type I error. We applied CC-GWAS to publicly available summary statistics for schizophrenia, bipolar disorder, major depressive disorder and five other psychiatric disorders. CC-GWAS identified 196 independent case–case loci, including 72 CC-GWAS-specific loci that were not significant at the genome-wide level in the input case–control summary statistics; two of the CC-GWAS-specific loci implicate the genes KLF6 and KLF16 (from the Krüppel-like family of transcription factors), which have been linked to neurite outgrowth and axon regeneration. CC-GWAS loci replicated convincingly in applications to datasets with independent replication data. Identification of the genetic differences between two different disorders has been hampered by a need for individual-level data from cases of both disorders. CC-GWAS enables the comparison of allele frequencies among cases of two disorders using case–control GWAS summary statistics.

Journal Article

Share this book

Add to My Shelf

Mixed-model association for biobank-scale datasets

by Kichaev, Gleb , Loh, Po-Ru , Schoech, Armin P. in 45/43 , 631/114/794 , 631/208/205/2138

2018

[...]we note two caveats regarding mixed-model analysis of binary traits.[...]conditioning on genome-wide signal can produce loss of power under case-control ascertainment2,3; specialized LMM methods are needed for modeling this scenario at scale.BOLT-LMM association statistics computed in this study are currently available for public download at http://data.broadinstitute.org/ alkesgroup/UKBB/ and have been submitted to the UK Biobank Data Showcase. □ Po-Ru Loh1-2·, Gleb Kichaev3, Steven Gazal2-4, Armin P. Schoech2-4-5 and Alkes L. Price2-4-5· 1Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA. 2Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. 3Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA.4Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Journal Article

Share this book

Add to My Shelf

Advantages and pitfalls in the application of mixed-model association methods

by Price, Alkes L , Visscher, Peter M , Goddard, Michael E in 631/114 , 631/208/205 , 631/208/457

2014

Alkes Price, Peter Visscher and colleagues provide recommendations on the application of mixed-linear-model association methods across a range of study designs. Mixed linear models are emerging as a method of choice for conducting genetic association studies in humans and other organisms. The advantages of the mixed-linear-model association (MLMA) method include the prevention of false positive associations due to population or relatedness structure and an increase in power obtained through the application of a correction that is specific to this structure. An underappreciated point is that MLMA can also increase power in studies without sample structure by implicitly conditioning on associated loci other than the candidate locus. Numerous variations on the standard MLMA approach have recently been published, with a focus on reducing computational cost. These advances provide researchers applying MLMA methods with many options to choose from, but we caution that MLMA methods are still subject to potential pitfalls. Here we describe and quantify the advantages and pitfalls of MLMA methods as a function of study design and provide recommendations for the application of these methods in practical settings.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter