Catalogue Search | MBRL

Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies

by Dey, Rounak , Fritsche, Lars G. , Wolford, Brooke N. in 45/43 , 631/208/205/2138 , 639/705/531

2018

In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, the linear mixed model and the recently proposed logistic mixed model, perform poorly; they produce large type I error rates when used to analyze unbalanced case-control phenotypes. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation to calibrate the distribution of score test statistics. This method, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), provides accurate P values even when case-control ratios are extremely unbalanced. SAIGE uses state-of-art optimization strategies to reduce computational costs; hence, it is applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 samples from white British participants with European ancestry for > 1,400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness. SAIGE (Scalable and Accurate Implementation of GEneralized mixed model) is a generalized mixed model association test that can efficiently analyze large data sets while controlling for unbalanced case-control ratios and sample relatedness, as shown by applying SAIGE to the UK Biobank data for > 1,400 binary phenotypes.

Journal Article

Share this book

Add to My Shelf

Exploring and visualizing large-scale genetic associations by using PheWeb

by Fritsche, Lars G. , Boehnke, Michael , Taliun, Daniel in 631/208/205/2138 , 692/308/2056 , Agriculture

2020

The ability of investigators to explore their own data by alternating between these two view types, is an increasingly common feature of large-scale association analyses. [...]we developed PheWeb, an easy-to-use open-source web-based tool for visualizing, navigating and sharing GWAS and PheWAS results. The regional view (Supplementary Fig. 4) highlights that rs4975616 and several of its proxies are in the GWAS Catalog and are associated with various cancers (for example, lung cancer, pancreatic cancer and basal cell carcinoma), thus suggesting a broad role of the locus in cancer susceptibility. Interestingly, for the other top loci (both on chromosome 8), the regional and PheWAS views (rs2976384 in JRK and PSCA, Fig. 1b,c; rs10094872 near ???, Supplementary Figs. 6 and 7) distinctly convey that these loci are not associated with skin and lung cancers, but are instead associated with gastric and urinary traits such as duodenal ulcer, urinary tract infection and pancreatic cancer. [...]PheWeb is as useful as the data and results behind it, but we expect that these results will be much more useful when they are accessible.

Journal Article

Share this book

Add to My Shelf

On cross-ancestry cancer polygenic risk scores

by Salvatore, Maxwell , Fritsche, Lars G. , Mukherjee, Bhramar in Biology and Life Sciences , Blood pressure , Blood tests

2021

Polygenic risk scores (PRS) can provide useful information for personalized risk stratification and disease risk assessment, especially when combined with non-genetic risk factors. However, their construction depends on the availability of summary statistics from genome-wide association studies (GWAS) independent from the target sample. For best compatibility, it was reported that GWAS and the target sample should match in terms of ancestries. Yet, GWAS, especially in the field of cancer, often lack diversity and are predominated by European ancestry. This bias is a limiting factor in PRS research. By using electronic health records and genetic data from the UK Biobank, we contrast the utility of breast and prostate cancer PRS derived from external European-ancestry-based GWAS across African, East Asian, European, and South Asian ancestry groups. We highlight differences in the PRS distributions of these groups that are amplified when PRS methods condense hundreds of thousands of variants into a single score. While European-GWAS-derived PRS were not directly transferrable across ancestries on an absolute scale, we establish their predictive potential when considering them separately within each group. For example, the top 10% of the breast cancer PRS distributions within each ancestry group each revealed significant enrichments of breast cancer cases compared to the bottom 90% (odds ratio of 2.81 [95%CI: 2.69,2.93] in European, 2.88 [1.85, 4.48] in African, 2.60 [1.25, 5.40] in East Asian, and 2.33 [1.55, 3.51] in South Asian individuals). Our findings highlight a compromise solution for PRS research to compensate for the lack of diversity in well-powered European GWAS efforts while recruitment of diverse participants in the field catches up.

Journal Article

Share this book

Add to My Shelf

Assessing the added value of linking electronic health records to improve the prediction of self-reported COVID-19 testing and diagnosis

by Salvatore, Maxwell , Clark-Boucher, Dylan , Smith, Jennifer A. in Analysis , Biology and Life Sciences , Computer and Information Sciences

2022

Since the beginning of the Coronavirus Disease 2019 (COVID-19) pandemic, a focus of research has been to identify risk factors associated with COVID-19-related outcomes, such as testing and diagnosis, and use them to build prediction models. Existing studies have used data from digital surveys or electronic health records (EHRs), but very few have linked the two sources to build joint predictive models. In this study, we used survey data on 7,054 patients from the Michigan Genomics Initiative biorepository to evaluate how well self-reported data could be integrated with electronic records for the purpose of modeling COVID-19-related outcomes. We observed that among survey respondents, self-reported COVID-19 diagnosis captured a larger number of cases than the corresponding EHRs, suggesting that self-reported outcomes may be better than EHRs for distinguishing COVID-19 cases from controls. In the modeling context, we compared the utility of survey- and EHR-derived predictor variables in models of survey-reported COVID-19 testing and diagnosis. We found that survey-derived predictors produced uniformly stronger models than EHR-derived predictors—likely due to their specificity, temporal proximity, and breadth—and that combining predictors from both sources offered no consistent improvement compared to using survey-based predictors alone. Our results suggest that, even though general EHRs are useful in predictive models of COVID-19 outcomes, they may not be essential in those models when rich survey data are already available. The two data sources together may offer better prediction for COVID severity, but we did not have enough severe cases in the survey respondents to assess that hypothesis in in our study.

Journal Article

Share this book

Add to My Shelf

Incorporating functional annotation with bilevel continuous shrinkage for polygenic risk prediction

by Kim, Na Yeon , Fritsche, Lars G. , Zhuang, Yongwen in Algorithms , Annotations , Bayes Theorem

2024

Background Genetic variants can contribute differently to trait heritability by their functional categories, and recent studies have shown that incorporating functional annotation can improve the predictive performance of polygenic risk scores (PRSs). In addition, when only a small proportion of variants are causal variants, PRS methods that employ a Bayesian framework with shrinkage can account for such sparsity. It is possible that the annotation group level effect is also sparse. However, the number of PRS methods that incorporate both annotation information and shrinkage on effect sizes is limited. We propose a PRS method, PRSbils, which utilizes the functional annotation information with a bilevel continuous shrinkage prior to accommodate the varying genetic architectures both on the variant-specific level and on the functional annotation level. Results We conducted simulation studies and investigated the predictive performance in settings with different genetic architectures. Results indicated that when there was a relatively large variability of group-wise heritability contribution, the gain in prediction performance from the proposed method was on average 8.0% higher AUC compared to the benchmark method PRS-CS. The proposed method also yielded higher predictive performance compared to PRS-CS in settings with different overlapping patterns of annotation groups and obtained on average 6.4% higher AUC. We applied PRSbils to binary and quantitative traits in three real world data sources (the UK Biobank, the Michigan Genomics Initiative (MGI), and the Korean Genome and Epidemiology Study (KoGES)), and two sources of annotations: ANNOVAR, and pathway information from the Kyoto Encyclopedia of Genes and Genomes (KEGG), and demonstrated that the proposed method holds the potential for improving predictive performance by incorporating functional annotations. Conclusions By utilizing a bilevel shrinkage framework, PRSbils enables the incorporation of both overlapping and non-overlapping annotations into PRS construction to improve the performance of genetic risk prediction. The software is available at https://github.com/styvon/PRSbils .

Journal Article

Share this book

Add to My Shelf

Systemic Complement Activation in Age-Related Macular Degeneration

by Weber, Bernhard H. F. , Fritsche, Lars G. , Fimmers, Rolf in Age related diseases , Aged , Aged, 80 and over

2008

Dysregulation of the alternative pathway (AP) of complement cascade has been implicated in the pathogenesis of age-related macular degeneration (AMD), the leading cause of blindness in the elderly. To further test the hypothesis that defective control of complement activation underlies AMD, parameters of complement activation in blood plasma were determined together with disease-associated genetic markers in AMD patients. Plasma concentrations of activation products C3d, Ba, C3a, C5a, SC5b-9, substrate proteins C3, C4, factor B and regulators factor H and factor D were quantified in patients (n = 112) and controls (n = 67). Subjects were analyzed for single nucleotide polymorphisms in factor H (CFH), factor B-C2 (BF-C2) and complement C3 (C3) genes which were previously found to be associated with AMD. All activation products, especially markers of chronic complement activation Ba and C3d (p<0.001), were significantly elevated in AMD patients compared to controls. Similar alterations were observed in factor D, but not in C3, C4 or factor H. Logistic regression analysis revealed better discriminative accuracy of a model that is based only on complement activation markers Ba, C3d and factor D compared to a model based on genetic markers of the complement system within our study population. In both the controls' and AMD patients' group, the protein markers of complement activation were correlated with CFH haplotypes.This study is the first to show systemic complement activation in AMD patients. This suggests that AMD is a systemic disease with local disease manifestation at the ageing macula. Furthermore, the data provide evidence for an association of systemic activation of the alternative complement pathway with genetic variants of CFH that were previously linked to AMD susceptibility.

Journal Article

Share this book

Add to My Shelf

Retinal transcriptome and eQTL analyses identify genes associated with age-related macular degeneration

by Ratnapriya, Rinki , Battle, Alexis , Fritsche, Lars G. in 631/208/200 , 692/308/2056 , 692/699/375/365

2019

Genome-wide association studies (GWAS) have identified genetic variants at 34 loci contributing to age-related macular degeneration (AMD) 1 – 3 . We generated transcriptional profiles of postmortem retinas from 453 controls and cases at distinct stages of AMD and integrated retinal transcriptomes, covering 13,662 protein-coding and 1,462 noncoding genes, with genotypes at more than 9 million common SNPs for expression quantitative trait loci (eQTL) analysis of a tissue not included in Genotype-Tissue Expression (GTEx) and other large datasets 4 , 5 . Cis-eQTL analysis identified 10,474 genes under genetic regulation, including 4,541 eQTLs detected only in the retina. Integrated analysis of AMD-GWAS with eQTLs ascertained likely target genes at six reported loci. Using transcriptome-wide association analysis (TWAS), we identified three additional genes, RLBP1 , HIC1 and PARP12 , after Bonferroni correction. Our studies expand the genetic landscape of AMD and establish the Eye Genotype Expression (EyeGEx) database as a resource for post-GWAS interpretation of multifactorial ocular traits. The authors transcriptionally profiled postmortem retinas from 453 age-related macular degeneration (AMD) cases and controls. Integration of AMD GWAS with eQTL analysis and TWAS identified several AMD-associated genes.

Journal Article

Share this book

Add to My Shelf

Uncovering associations between pre-existing conditions and COVID-19 Severity: A polygenic risk score approach across three large biobanks

by Salvatore, Maxwell , Fritsche, Lars G. , Kundu, Ritoban in Bias , Biobanks , Biological Specimen Banks

2023

To overcome the limitations associated with the collection and curation of COVID-19 outcome data in biobanks, this study proposes the use of polygenic risk scores (PRS) as reliable proxies of COVID-19 severity across three large biobanks: the Michigan Genomics Initiative (MGI), UK Biobank (UKB), and NIH All of Us. The goal is to identify associations between pre-existing conditions and COVID-19 severity. Drawing on a sample of more than 500,000 individuals from the three biobanks, we conducted a phenome-wide association study (PheWAS) to identify associations between a PRS for COVID-19 severity, derived from a genome-wide association study on COVID-19 hospitalization, and clinical pre-existing, pre-pandemic phenotypes. We performed cohort-specific PRS PheWAS and a subsequent fixed-effects meta-analysis. The current study uncovered 23 pre-existing conditions significantly associated with the COVID-19 severity PRS in cohort-specific analyses, of which 21 were observed in the UKB cohort and two in the MGI cohort. The meta-analysis yielded 27 significant phenotypes predominantly related to obesity, metabolic disorders, and cardiovascular conditions. After adjusting for body mass index, several clinical phenotypes, such as hypercholesterolemia and gastrointestinal disorders, remained associated with an increased risk of hospitalization following COVID-19 infection. By employing PRS as a proxy for COVID-19 severity, we corroborated known risk factors and identified novel associations between pre-existing clinical phenotypes and COVID-19 severity. Our study highlights the potential value of using PRS when actual outcome data may be limited or inadequate for robust analyses.

Journal Article

Share this book

Add to My Shelf

Biobank-driven genomic discovery yields new insight into atrial fibrillation biology

by Dey, Rounak , Wolford, Brooke N. , Abecasis, Gonçalo R. in 45/22 , 45/23 , 45/43

2018

To identify genetic variation underlying atrial fibrillation, the most common cardiac arrhythmia, we performed a genome-wide association study of >1,000,000 people, including 60,620 atrial fibrillation cases and 970,216 controls. We identified 142 independent risk variants at 111 loci and prioritized 151 functional candidate genes likely to be involved in atrial fibrillation. Many of the identified risk variants fall near genes where more deleterious mutations have been reported to cause serious heart defects in humans ( GATA4, MYH6 , NKX2-5 , PITX2 , TBX5 ) 1 , or near genes important for striated muscle function and integrity (for example, CFL2 , MYH7 , PKP2, RBM20, SGC G, SSPN ). Pathway and functional enrichment analyses also suggested that many of the putative atrial fibrillation genes act via cardiac structural remodeling, potentially in the form of an ‘atrial cardiomyopathy’ 2 , either during fetal heart development or as a response to stress in the adult heart. Large-scale association analyses identify 142 independent risk variants for atrial fibrillation. Pathway and functional enrichment analyses suggest that many of the putative risk genes act via cardiac structural remodeling.

Journal Article

Share this book

Add to My Shelf

Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts

by Fritsche, Lars G. , Bi, Wenjian , Neale, Benjamin M. in 631/208 , 631/208/205 , Agriculture

2020

With very large sample sizes, biobanks provide an exciting opportunity to identify genetic components of complex traits. To analyze rare variants, region-based multiple-variant aggregate tests are commonly used to increase power for association tests. However, because of the substantial computational cost, existing region-based tests cannot analyze hundreds of thousands of samples while accounting for confounders such as population stratification and sample relatedness. Here we propose a scalable generalized mixed-model region-based association test, SAIGE-GENE, that is applicable to exome-wide and genome-wide region-based analysis for hundreds of thousands of samples and can account for unbalanced case–control ratios for binary traits. Through extensive simulation studies and analysis of the HUNT study with 69,716 Norwegian samples and the UK Biobank data with 408,910 White British samples, we show that SAIGE-GENE can efficiently analyze large-sample data ( N > 400,000) with type I error rates well controlled. SAIGE-GENE is a scalable generalized mixed-model region-based association test that can analyze large datasets while accounting for sample relatedness and unbalanced case–control ratios for binary traits.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter