Catalogue Search | MBRL

SumHer better estimates the SNP heritability of complex traits from summary statistics

by Speed, Doug , Balding, David J. in 631/114 , 631/208 , 631/208/205/2138

2019

We present SumHer, software for estimating confounding bias, SNP heritability, enrichments of heritability and genetic correlations using summary statistics from genome-wide association studies. The key difference between SumHer and the existing software LD Score Regression (LDSC) is that SumHer allows the user to specify the heritability model. We apply SumHer to results from 24 large-scale association studies (average sample size 121,000) using our recommended heritability model. We show that these studies tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci was under-reported by about a quarter. We also estimate enrichments for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further six categories with above threefold enrichment. By contrast, our analysis using SumHer finds that none of the categories have enrichment above twofold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data. SumHer is a software for estimating SNP heritability from summary statistics using heritability models. Applying SumHer to publicly available results for 24 GWAS provides an improved understanding of the genetic architecture of complex traits.

Journal Article

Share this book

Add to My Shelf

Evaluating and improving heritability models using summary statistics

by Holmes, John , Speed, Doug , Balding, David J. in 631/208/205 , 692/308/2056 , Agriculture

2020

There is currently much debate regarding the best model for how heritability varies across the genome. The authors of GCTA recommend the GCTA-LDMS-I model, the authors of LD Score Regression recommend the Baseline LD model, and we have recommended the LDAK model. Here we provide a statistical framework for assessing heritability models using summary statistics from genome-wide association studies. Based on 31 studies of complex human traits (average sample size 136,000), we show that the Baseline LD model is more realistic than other existing heritability models, but that it can be improved by incorporating features from the LDAK model. Our framework also provides a method for estimating the selection-related parameter α from summary statistics. We find strong evidence ( P < 1 × 10 −6 ) of negative genome-wide selection for traits, including height, systolic blood pressure and college education, and that the impact of selection is stronger inside functional categories, such as coding SNPs and promoter regions. Assessing heritability models using summary statistics from genome-wide association studies of 31 human traits shows that the Baseline LD model is realistic and can be improved by incorporating features from the LDAK model.

Journal Article

Share this book

Add to My Shelf

A tutorial on statistical methods for population association studies

by Balding, David J. in Agriculture , Animal Genetics and Genomics , Biomedical and Life Sciences

2006

Key Points Although population association studies are not new, there remain many areas of disagreement over appropriate statistical analyses. This article provides an overview of statistical methods, including areas of controversy and ongoing developments. It does not consider family-based association studies, nor linkage or admixture studies. I first cover analyses that are preliminary to association testing: testing for Hardy–Weinberg equilibrium; imputing missing genotype data; inferring haplotype from genotype data; measures of linkage disequilibrium and estimates of recombination rates; and choosing tag SNPs. Among tests of association, I cover case–control, quantitative and ordered phenotypes, and analyses that are based on single SNPs, multiple SNPs and haplotypes. There is a discussion of issues that are relevant to genome-wide association studies. I discuss Genomic Control and other approaches to the problem of population stratification. I give particular attention to the problem of multiple testing, and discuss both frequentist and Bayesian approaches to addressing the problem. Identifying polymorphisms that are overrepresented in disease cases versus controls would seem to be a straightforward process, but genetic association studies are notoriously riddled with complex analysis problems. This article outlines these statistical issues and provides some guidance to overcoming them. Although genetic association studies have been with us for many years, even for the simplest analyses there is little consensus on the most appropriate statistical procedures. Here I give an overview of statistical approaches to population association studies, including preliminary analyses (Hardy–Weinberg equilibrium testing, inference of phase and missing data, and SNP tagging), and single-SNP and multipoint tests for association. My goal is to outline the key methods with a brief discussion of problems (population structure and multiple testing), avenues for solutions and some ongoing developments.

Journal Article

Share this book

Add to My Shelf

Relatedness in the post-genomic era: is it still useful?

by Speed, Doug , Balding, David J. in 631/1647/2217 , 631/208 , Agriculture

2015

Key Points Relatedness is a fundamental concept in everyday life and in quantitative genetics. It has a central role in efforts to understand genetic mechanisms and in predicting phenotypes, as well as in population, evolutionary and forensic genetics. Traditionally, the relatedness of two individuals was measured in terms of the fraction of genome they share IBD (identity-by-descent), which is defined as inheritance from a recent common ancestor, but there are many approaches to interpreting 'recent'. A better viewpoint is given by coalescent theory: the time since the most recent common ancestor for two individuals varies along the genome and can take an essentially continuous range of possible values. There are now many different ways to measure the genetic similarity between pairs of individuals using genome-wide single-nucleotide polymorphism (SNP) data. The binary IBD versus non-IBD distinction provides a simple approximation but gives an inadequate representation of reality compared with the precision offered by the extensive data sets available nowadays. We argue that, for many applications, traditional concepts of relatedness are no longer required; instead, models and analyses can be based directly on genome similarity. There is no one best measure of genome similarity, but different measures can be evaluated on their performance in specific applications. Relatedness has traditionally been defined using pedigree-based measures, but these have serious deficiencies. With genome-wide data, SNP-based measures can now be used to directly measure genome similarity, a more useful concept than relatedness. This Review outlines ways to evaluate measures of genome similarity. Relatedness is a fundamental concept in genetics but is surprisingly hard to define in a rigorous yet useful way. Traditional relatedness coefficients specify expected genome sharing between individuals in pedigrees, but actual genome sharing can differ considerably from these expected values, which in any case vary according to the pedigree that happens to be available. Nowadays, we can measure genome sharing directly from genome-wide single-nucleotide polymorphism (SNP) data; however, there are many such measures in current use, and we lack good criteria for choosing among them. Here, we review SNP-based measures of relatedness and criteria for comparing them. We discuss how useful pedigree-based concepts remain today and highlight opportunities for further advances in quantitative genetics, with a focus on heritability estimation and phenotype prediction.

Journal Article

Share this book

Add to My Shelf

Reevaluation of SNP heritability in complex human traits

by Cai, Na , Johnson, Michael R , Nejentsev, Sergey in 631/114/794 , 631/208/205/2138 , 692/308/2056

2017

By analyzing imputed genetic data for 42 human traits, Doug Speed and colleagues derive a model that describes how heritability varies with minor allele frequency, linkage disequilibrium and genotype certainty. Using this model, they show that common SNPs contribute substantially more heritability than previously thought. SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but current assumptions have not been thoroughly tested. By analyzing imputed data for a large number of human traits, we empirically derive a model that more accurately describes how heritability varies with minor allele frequency (MAF), linkage disequilibrium (LD) and genotype certainty. Across 19 traits, our improved model leads to estimates of common SNP heritability on average 43% (s.d. 3%) higher than those obtained from the widely used software GCTA and 25% (s.d. 2%) higher than those from the recently proposed extension GCTA-LDMS. Previously, DNase I hypersensitivity sites were reported to explain 79% of SNP heritability; using our improved heritability model, their estimated contribution is only 24%.

Journal Article

Share this book

Add to My Shelf

Bayesian statistical methods for genetic association studies

by Balding, David J. , Stephens, Matthew in Agriculture , Animal Genetics and Genomics , Bayes Theorem

2009

Key Points p -values are commonly used as summaries of evidence for association between a genetic variant and phenotype, but they have an important limitation in that they are unable to quantify how confident one should be that a given SNP is truly associated with a phenotype. Bayesian methods provide an alternative approach to assessing associations. We show that Bayesian analyses are not too difficult and can be rewarding — for example, unlike p -values, a Bayesian probability of association is comparable across SNPs and across studies. For a Bayesian analysis of single-SNP association in a case–control study, we discuss genetic models that can form an alternative to the null hypothesis of no association, in addition to effect-size distributions for the parameters of these models. An alternative Bayesian analysis derives a posterior distribution for effect size, without reference to a null hypothesis. We give an example of a multi-SNP Bayesian analysis for fine-scale mapping and discuss Bayesian approaches to multiple testing and meta-analysis. Broad guidelines are suggested for editors and reviewers of Bayesian analyses. Bayesian analyses are increasingly being used in genetics, particularly in the context of genome-wide association studies. This article provides a guide to using Bayesian analyses for assessing single-SNP associations and highlights the advantages of these methods compared with standard frequentist analyses. Bayesian statistical methods have recently made great inroads into many areas of science, and this advance is now extending to the assessment of association between genetic variants and disease or other phenotypes. We review these methods, focusing on single-SNP tests in genome-wide association studies. We discuss the advantages of the Bayesian approach over classical (frequentist) approaches in this setting and provide a tutorial on basic analysis steps, including practical guidelines for appropriate prior specification. We demonstrate the use of Bayesian methods for fine mapping in candidate regions, discuss meta-analyses and provide guidance for refereeing manuscripts that contain Bayesian analyses.

Journal Article

Share this book

Add to My Shelf

A Genome-Wide Association Study of the Metabolic Syndrome in Indian Asian Men

by Zabaneh, Delilah , Balding, David J. in Adult , Adults , Aged

2010

We conducted a two-stage genome-wide association study to identify common genetic variation altering risk of the metabolic syndrome and related phenotypes in Indian Asian men, who have a high prevalence of these conditions. In Stage 1, approximately 317,000 single nucleotide polymorphisms were genotyped in 2700 individuals, from which 1500 SNPs were selected to be genotyped in a further 2300 individuals. Selection for inclusion in Stage 1 was based on four metabolic syndrome component traits: HDL-cholesterol, plasma glucose and Type 2 diabetes, abdominal obesity measured by waist to hip ratio, and diastolic blood pressure. Association was tested with these four traits and a composite metabolic syndrome phenotype. Four SNPs reaching significance level p<5x10(-7) and with posterior probability of association >0.8 were found in genes CETP and LPL, associated with HDL-cholesterol. These associations have already been reported in Indian Asians and in Europeans. Five additional loci harboured SNPs significant at p<10(-6) and posterior probability >0.5 for HDL-cholesterol, type 2 diabetes or diastolic blood pressure. Our results suggest that the primary genetic determinants of metabolic syndrome are the same in Indian Asians as in other populations, despite the higher prevalence. Further, we found little evidence of a common genetic basis for metabolic syndrome traits in our sample of Indian Asian men.

Journal Article

Share this book

Add to My Shelf

Correction: How convincing is a matching Y-chromosome profile?

by Andersen, Mikkel M. , Balding, David J.

2026

[This corrects the article DOI: 10.1371/journal.pgen.1007028.].

Journal Article

Share this book

Add to My Shelf

Epigenome-wide association studies for common human diseases

by Down, Thomas A. , Balding, David J. , Rakyan, Vardhman K. in 631/208/205/2138 , 631/208/726/649 , Agriculture

2011

Key Points Epigenetic variation affects genome function and hence can contribute to common disease. To establish a possible link requires systematic studies, such as the proposed epigenome-wide association studies (EWASs). Of the many epigenetic marks, DNA methylation (DNAm) is the most stable and accessible and therefore ideally suited for EWASs. In principle, EWASs should be equally successful as genome-wide association studies (GWASs) for the identification of disease-associated variations. However, there are fundamental differences between GWASs and EWASs that need to be considered for appropriate study design. The key differences for EWASs are tissue specificity and the possibility that some epigenetic changes may occur downstream of the disease process. Both considerations affect the type of cohorts and samples that should be analyzed. Technologies for EWASs are readily available for both array- and sequencing-based platforms but many of the computational and statistical analysis methods remain to be developed. At this early stage, it is challenging to predict the possible effect of DNAm variation. However, if it does exist and if the right study design is used, then much more than the 'low-hanging fruit' should be detectable in fewer samples than are required for a typical GWAS, based on simulations assuming a conservative methylation odds ratio. Technological advances now allow large-scale studies of human disease-associated epigenetic variation, specifically variation in DNA methylation. Such epigenome-wide association studies present novel opportunities, but, as discussed here, they also create new challenges that are not encountered in genome-wide association studies. Despite the success of genome-wide association studies (GWASs) in identifying loci associated with common diseases, a substantial proportion of the causality remains unexplained. Recent advances in genomic technologies have placed us in a position to initiate large-scale studies of human disease-associated epigenetic variation, specifically variation in DNA methylation. Such epigenome-wide association studies (EWASs) present novel opportunities but also create new challenges that are not encountered in GWASs. We discuss EWAS design, cohort and sample selections, statistical significance and power, confounding factors and follow-up studies. We also discuss how integration of EWASs with GWASs can help to dissect complex GWAS haplotypes for functional analysis.

Journal Article

Share this book

Add to My Shelf

Population Structure and Cryptic Relatedness in Genetic Association Studies

by Astle, William , Balding, David J. in Alleles , Ancestry , ascertainment

2009

We review the problem of confounding in genetic association studies, which arises principally because of population structure and cryptic relatedness. Many treatments of the problem consider only a simple \"island\" model of population structure. We take a broader approach, which views population structure and cryptic relatedness as different aspects of a single confounder: the unobserved pedigree defining the (often distant) relationships among the study subjects. Kinship is therefore a central concept, and we review methods of defining and estimating kinship coefficients, both pedigree-based and marker-based. In this unified framework we review solutions to the problem of population structure, including family-based study designs, genomic control, structured association, regression control, principal components adjustment and linear mixed models. The last solution makes the most explicit use of the kinships among the study subjects, and has an established role in the analysis of animal and plant breeding studies. Recent computational developments mean that analyses of human genetic association data are beginning to benefit from its powerful tests for association, which protect against population structure and cryptic kinship, as well as intermediate levels of confounding by the pedigree.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter