Catalogue Search | MBRL

Detecting Long-Term Balancing Selection Using Allele Frequency Correlation

by Siewert, Katherine M , Voight, Benjamin F in Alleles , Balancing , Biology

2017

Balancing selection occurs when multiple alleles are maintained in a population, which can result in their preservation over long evolutionary time periods. A characteristic signature of this long-term balancing selection is an excess number of intermediate frequency polymorphisms near the balanced variant. However, the expected distribution of allele frequencies at these loci has not been extensively detailed, and therefore existing summary statistic methods do not explicitly take it into account. Using simulations, we show that new mutations which arise in close proximity to a site targeted by balancing selection accumulate at frequencies nearly identical to that of the balanced allele. In order to scan the genome for balancing selection, we propose a new summary statistic, β, which detects these clusters of alleles at similar frequencies. Simulation studies show that compared with existing summary statistics, our measure has improved power to detect balancing selection, and is reasonably powered in non-equilibrium demographic models and under a range of recombination and mutation rates. We compute β on 1000 Genomes Project data to identify loci potentially subjected to long-term balancing selection in humans. We report two balanced haplotypes—localized to the genes WFS1 and CADM2—that are strongly linked to association signals for complex traits. Our approach is computationally efficient and applicable to species that lack appropriate outgroup sequences, allowing for well-powered analysis of selection in the wide variety of species for which population data are rapidly being generated.

Journal Article

Share this book

Add to My Shelf

From Mouse to Human: Evolutionary Genomics Analysis of Human Orthologs of Essential Genes

by Voight, Benjamin F. , Bućan, Maja , Georgi, Benjamin in Animals , Autism , Biology

2013

Understanding the core set of genes that are necessary for basic developmental functions is one of the central goals in biology. Studies in model organisms identified a significant fraction of essential genes through the analysis of null-mutations that lead to lethality. Recent large-scale next-generation sequencing efforts have provided unprecedented data on genetic variation in human. However, evolutionary and genomic characteristics of human essential genes have never been directly studied on a genome-wide scale. Here we use detailed phenotypic resources available for the mouse and deep genomics sequencing data from human populations to characterize patterns of genetic variation and mutational burden in a set of 2,472 human orthologs of known essential genes in the mouse. Consistent with the action of strong, purifying selection, these genes exhibit comparatively reduced levels of sequence variation, skew in allele frequency towards more rare, and exhibit increased conservation across the primate and rodent lineages relative to the remainder of genes in the genome. In individual genomes we observed ~12 rare mutations within essential genes predicted to be damaging. Consistent with the hypothesis that mutations in essential genes are risk factors for neurodevelopmental disease, we show that de novo variants in patients with Autism Spectrum Disorder are more likely to occur in this collection of genes. While incomplete, our set of human orthologs shows characteristics fully consistent with essential function in human and thus provides a resource to inform and facilitate interpretation of sequence data in studies of human disease.

Journal Article

Share this book

Add to My Shelf

Genetics of height and risk of atrial fibrillation: A Mendelian randomization study

by Voight, Benjamin F. , Verma, Shefali S. , Hyman, Matthew C. in Adult , Aged , Apnea

2020

Observational studies have identified height as a strong risk factor for atrial fibrillation, but this finding may be limited by residual confounding. We aimed to examine genetic variation in height within the Mendelian randomization (MR) framework to determine whether height has a causal effect on risk of atrial fibrillation. In summary-level analyses, MR was performed using summary statistics from genome-wide association studies of height (GIANT/UK Biobank; 693,529 individuals) and atrial fibrillation (AFGen; 65,446 cases and 522,744 controls), finding that each 1-SD increase in genetically predicted height increased the odds of atrial fibrillation (odds ratio [OR] 1.34; 95% CI 1.29 to 1.40; p = 5 × 10-42). This result remained consistent in sensitivity analyses with MR methods that make different assumptions about the presence of pleiotropy, and when accounting for the effects of traditional cardiovascular risk factors on atrial fibrillation. Individual-level phenome-wide association studies of height and a height genetic risk score were performed among 6,567 European-ancestry participants of the Penn Medicine Biobank (median age at enrollment 63 years, interquartile range 55-72; 38% female; recruitment 2008-2015), confirming prior observational associations between height and atrial fibrillation. Individual-level MR confirmed that each 1-SD increase in height increased the odds of atrial fibrillation, including adjustment for clinical and echocardiographic confounders (OR 1.89; 95% CI 1.50 to 2.40; p = 0.007). The main limitations of this study include potential bias from pleiotropic effects of genetic variants, and lack of generalizability of individual-level findings to non-European populations. In this study, we observed evidence that height is likely a positive causal risk factor for atrial fibrillation. Further study is needed to determine whether risk prediction tools including height or anthropometric risk factors can be used to improve screening and primary prevention of atrial fibrillation, and whether biological pathways involved in height may offer new targets for treatment of atrial fibrillation.

Journal Article

Share this book

Add to My Shelf

Regularized sequence-context mutational trees capture variation in mutation rates across the human genome

by Voight, Benjamin F. , Conery, Mitchell , Auerbach, Benjamin J. in Analysis , Bayes Theorem , Bayesian analysis

2023

Germline mutation is the mechanism by which genetic variation in a population is created. Inferences derived from mutation rate models are fundamental to many population genetics methods. Previous models have demonstrated that nucleotides flanking polymorphic sites–the local sequence context–explain variation in the probability that a site is polymorphic. However, limitations to these models exist as the size of the local sequence context window expands. These include a lack of robustness to data sparsity at typical sample sizes, lack of regularization to generate parsimonious models and lack of quantified uncertainty in estimated rates to facilitate comparison between models. To address these limitations, we developed Baymer, a regularized Bayesian hierarchical tree model that captures the heterogeneous effect of sequence contexts on polymorphism probabilities. Baymer implements an adaptive Metropolis-within-Gibbs Markov Chain Monte Carlo sampling scheme to estimate the posterior distributions of sequence-context based probabilities that a site is polymorphic. We show that Baymer accurately infers polymorphism probabilities and well-calibrated posterior distributions, robustly handles data sparsity, appropriately regularizes to return parsimonious models, and scales computationally at least up to 9-mer context windows. We demonstrate application of Baymer in three ways–first, identifying differences in polymorphism probabilities between continental populations in the 1000 Genomes Phase 3 dataset, second, in a sparse data setting to examine the use of polymorphism models as a proxy for de novo mutation probabilities as a function of variant age, sequence context window size, and demographic history, and third, comparing model concordance between different great ape species. We find a shared context-dependent mutation rate architecture underlying our models, enabling a transfer-learning inspired strategy for modeling germline mutations. In summary, Baymer is an accurate polymorphism probability estimation algorithm that automatically adapts to data sparsity at different sequence context levels, thereby making efficient use of the available data.

Journal Article

Share this book

Add to My Shelf

The relationship between circulating lipids and breast cancer risk: A Mendelian randomization study

by Voight, Benjamin F. , Johnson, Kelsey E. , Maxwell, Kara N. in Adult , Biology and Life Sciences , Biomarkers

2020

A number of epidemiological and genetic studies have attempted to determine whether levels of circulating lipids are associated with risks of various cancers, including breast cancer (BC). However, it remains unclear whether a causal relationship exists between lipids and BC. If alteration of lipid levels also reduced risk of BC, this could present a target for disease prevention. This study aimed to assess a potential causal relationship between genetic variants associated with plasma lipid traits (high-density lipoprotein, HDL; low-density lipoprotein, LDL; triglycerides, TGs) with risk for BC using Mendelian randomization (MR). Data from genome-wide association studies in up to 215,551 participants from the Million Veteran Program (MVP) were used to construct genetic instruments for plasma lipid traits. The effect of these instruments on BC risk was evaluated using genetic data from the BCAC (Breast Cancer Association Consortium) based on 122,977 BC cases and 105,974 controls. Using MR, we observed that a 1-standard-deviation genetically determined increase in HDL levels is associated with an increased risk for all BCs (HDL: OR [odds ratio] = 1.08, 95% confidence interval [CI] = 1.04-1.13, P < 0.001). Multivariable MR analysis, which adjusted for the effects of LDL, TGs, body mass index (BMI), and age at menarche, corroborated this observation for HDL (OR = 1.06, 95% CI = 1.03-1.10, P = 4.9 × 10-4) and also found a relationship between LDL and BC risk (OR = 1.03, 95% CI = 1.01-1.07, P = 0.02). We did not observe a difference in these relationships when stratified by breast tumor estrogen receptor (ER) status. We repeated this analysis using genetic variants independent of the leading association at core HDL pathway genes and found that these variants were also associated with risk for BCs (OR = 1.11, 95% CI = 1.06-1.16, P = 1.5 × 10-6), including locus-specific associations at ABCA1 (ATP Binding Cassette Subfamily A Member 1), APOE-APOC1-APOC4-APOC2 (Apolipoproteins E, C1, C4, and C2), and CETP (Cholesteryl Ester Transfer Protein). In addition, we found evidence that genetic variation at the ABO locus is associated with both lipid levels and BC. Through multiple statistical approaches, we minimized and tested for the confounding effects of pleiotropy and population stratification on our analysis; however, the possible existence of residual pleiotropy and stratification remains a limitation of this study. We observed that genetically elevated plasma HDL and LDL levels appear to be associated with increased BC risk. Future studies are required to understand the mechanism underlying this putative causal relationship, with the goal of developing potential therapeutic strategies aimed at altering the cholesterol-mediated effect on BC risk.

Journal Article

Share this book

Add to My Shelf

A single genetic locus controls both expression of DPEP1/CHMP1A and kidney disease development via ferroptosis

by Voight, Benjamin F. , Ma, Ziyuan , Miao, Zhen in 38/1 , 38/88 , 38/91

2021

Genome-wide association studies (GWAS) have identified loci for kidney disease, but the causal variants, genes, and pathways remain unknown. Here we identify two kidney disease genes Dipeptidase 1 ( DPEP1 ) and Charged Multivesicular Body Protein 1 A ( CHMP1A ) via the triangulation of kidney function GWAS, human kidney expression, and methylation quantitative trait loci. Using single-cell chromatin accessibility and genome editing, we fine map the region that controls the expression of both genes. Mouse genetic models demonstrate the causal roles of both genes in kidney disease. Cellular studies indicate that both Dpep1 and Chmp1a are important regulators of a single pathway, ferroptosis and lead to kidney disease development via altering cellular iron trafficking. Identifying causal variants and genes is an essential step in interpreting GWAS loci. Here, the authors investigate a kidney disease GWAS locus with functional genomics data, CRISPR editing and mouse experiments to identify DPEP1 and CHMP1A as putative kidney disease genes via ferroptosis.

Journal Article

Share this book

Add to My Shelf

Testing for an Unusual Distribution of Rare Variants

by Voight, Benjamin F. , Devlin, Bernie , Kathiresan, Sekar in Algorithms , Analysis of Variance , Autism

2011

Technological advances make it possible to use high-throughput sequencing as a primary discovery tool of medical genetics, specifically for assaying rare variation. Still this approach faces the analytic challenge that the influence of very rare variants can only be evaluated effectively as a group. A further complication is that any given rare variant could have no effect, could increase risk, or could be protective. We propose here the C-alpha test statistic as a novel approach for testing for the presence of this mixture of effects across a set of rare variants. Unlike existing burden tests, C-alpha, by testing the variance rather than the mean, maintains consistent power when the target set contains both risk and protective variants. Through simulations and analysis of case/control data, we demonstrate good power relative to existing methods that assess the burden of rare variants in individuals.

Journal Article

Share this book

Add to My Shelf

Mapping the genetic architecture of human traits to cell types in the kidney identifies mechanisms of disease and potential treatments

by Sheng, Xin , Duffin, Kevin L. , Voight, Benjamin F. in 631/208/176 , 631/208/200 , 692/699/1585/104

2021

The functional interpretation of genome-wide association studies (GWAS) is challenging due to the cell-type-dependent influences of genetic variants. Here, we generated comprehensive maps of expression quantitative trait loci (eQTLs) for 659 microdissected human kidney samples and identified cell-type-eQTLs by mapping interactions between cell type abundances and genotypes. By partitioning heritability using stratified linkage disequilibrium score regression to integrate GWAS with single-cell RNA sequencing and single-nucleus assay for transposase-accessible chromatin with high-throughput sequencing data, we prioritized proximal tubules for kidney function and endothelial cells and distal tubule segments for blood pressure pathogenesis. Bayesian colocalization analysis nominated more than 200 genes for kidney function and hypertension. Our study clarifies the mechanism of commonly used antihypertensive and renal-protective drugs and identifies drug repurposing opportunities for kidney disease. Cell-type-specific eQTL maps in the human kidney generated from the analysis of over 600 microdissected kidney samples, together with single-cell RNA sequencing and single-nucleus ATAC-seq, prioritize cell types influencing kidney function, hypertension and other traits.

Journal Article

Share this book

Add to My Shelf

Keen on the tenure track job, are you? Know these things, you should

by Voight, Benjamin F. in Animal Genetics and Genomics , Bioinformatics , Biomedical and Life Sciences

2019

Success along the tenure track requires more than hard work and long hours. Here, the experiences of a recently tenured professor are distilled into a collection of tips to assist others along the path.

Journal Article

Share this book

Add to My Shelf

Loss-of-function mutations in SLC30A8 protect against type 2 diabetes

by Masson, Gisli , Thorleifsson, Gudmar , Mohlke, Karen L in 45/23 , 631/208/205 , 631/208/514

2014

David Altshuler and colleagues report genotyping or sequencing of ∼150,000 individuals from several population-based cohorts, identifying 12 rare protein-truncating variants in SLC30A8 , encoding a pancreatic islet zinc transporter. Carriers of these rare protein-truncating variants in SLC30A8 show reduced risk of type 2 diabetes and reduced glucose levels. Loss-of-function mutations protective against human disease provide in vivo validation of therapeutic targets 1 , 2 , 3 , but none have yet been described for type 2 diabetes (T2D). Through sequencing or genotyping of ∼150,000 individuals across 5 ancestry groups, we identified 12 rare protein-truncating variants in SLC30A8 , which encodes an islet zinc transporter (ZnT8) 4 and harbors a common variant (p.Trp325Arg) associated with T2D risk and glucose and proinsulin levels 5 , 6 , 7 . Collectively, carriers of protein-truncating variants had 65% reduced T2D risk ( P = 1.7 × 10 −6 ), and non-diabetic Icelandic carriers of a frameshift variant (p.Lys34Serfs*50) demonstrated reduced glucose levels (−0.17 s.d., P = 4.6 × 10 −4 ). The two most common protein-truncating variants (p.Arg138* and p.Lys34Serfs*50) individually associate with T2D protection and encode unstable ZnT8 proteins. Previous functional study of SLC30A8 suggested that reduced zinc transport increases T2D risk 8 , 9 , and phenotypic heterogeneity was observed in mouse Slc30a8 knockouts 10 , 11 , 12 , 13 , 14 , 15 . In contrast, loss-of-function mutations in humans provide strong evidence that SLC30A8 haploinsufficiency protects against T2D, suggesting ZnT8 inhibition as a therapeutic strategy in T2D prevention.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter