Catalogue Search | MBRL

Computationally efficient whole-genome regression for quantitative and binary traits

by Reid, Jeffrey , Mbatchou, Joelle , Benner, Christian in 45/43 , 631/114/794 , 631/208/205

2021

Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case–control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals. REGENIE is a whole-genome regression method based on ridge regression that enables highly parallelized analysis of quantitative and binary traits in biobank-scale data with reduced computational requirements.

Journal Article

Share this book

Add to My Shelf

Yield of genetic association signals from genomes, exomes and imputation in the UK Biobank

by Hofmeister, Robin J. , Salerno, William J. , Thornton, Timothy A. in 45/43 , 631/208/514/1948 , 692/308/2056

2024

Whole-genome sequencing (WGS), whole-exome sequencing (WES) and array genotyping with imputation (IMP) are common strategies for assessing genetic variation and its association with medically relevant phenotypes. To date, there has been no systematic empirical assessment of the yield of these approaches when applied to hundreds of thousands of samples to enable the discovery of complex trait genetic signals. Using data for 100 complex traits from 149,195 individuals in the UK Biobank, we systematically compare the relative yield of these strategies in genetic association studies. We find that WGS and WES combined with arrays and imputation (WES + IMP) have the largest association yield. Although WGS results in an approximately fivefold increase in the total number of assayed variants over WES + IMP, the number of detected signals differed by only 1% for both single-variant and gene-based association analyses. Given that WES + IMP typically results in savings of lab and computational time and resources expended per sample, we evaluate the potential benefits of applying WES + IMP to larger samples. When we extend our WES + IMP analyses to 468,169 UK Biobank individuals, we observe an approximately fourfold increase in association signals with the threefold increase in sample size. We conclude that prioritizing WES + IMP and large sample sizes rather than contemporary short-read WGS alternatives will maximize the number of discoveries in genetic association studies. Comparison of association signals in UK Biobank using different strategies for assessing genetic variation shows that whole-exome sequencing combined with array genotyping and imputation offers similar performance to whole-genome sequencing at a reduced cost.

Journal Article

Share this book

Add to My Shelf

GlycReSoft: A Software Package for Automated Recognition of Glycans from LC/MS Data

by Tan, Yan , Benson, Gary , Hu, Han in Adducts , Adhesive strength , Animals

2012

Glycosylation modifies the physicochemical properties and protein binding functions of glycoconjugates. These modifications are biosynthesized in the endoplasmic reticulum and Golgi apparatus by a series of enzymatic transformations that are under complex control. As a result, mature glycans on a given site are heterogeneous mixtures of glycoforms. This gives rise to a spectrum of adhesive properties that strongly influences interactions with binding partners and resultant biological effects. In order to understand the roles glycosylation plays in normal and disease processes, efficient structural analysis tools are necessary. In the field of glycomics, liquid chromatography/mass spectrometry (LC/MS) is used to profile the glycans present in a given sample. This technology enables comparison of glycan compositions and abundances among different biological samples, i.e. normal versus disease, normal versus mutant, etc. Manual analysis of the glycan profiling LC/MS data is extremely time-consuming and efficient software tools are needed to eliminate this bottleneck. In this work, we have developed a tool to computationally model LC/MS data to enable efficient profiling of glycans. Using LC/MS data deconvoluted by Decon2LS/DeconTools, we built a list of unique neutral masses corresponding to candidate glycan compositions summarized over their various charge states, adducts and range of elution times. Our work aims to provide confident identification of true compounds in complex data sets that are not amenable to manual interpretation. This capability is an essential part of glycomics work flows. We demonstrate this tool, GlycReSoft, using an LC/MS dataset on tissue derived heparan sulfate oligosaccharides. The software, code and a test data set are publically archived under an open source license.

Journal Article

Share this book

Add to My Shelf

Global Gene Expression Analysis of Murine Limb Development

by Taher, Leila , Loots, Gabriela G. , Ovcharenko, Ivan in Analysis , Animals , BASIC BIOLOGICAL SCIENCES

2011

Detailed information about stage-specific changes in gene expression is crucial for understanding the gene regulatory networks underlying development and the various signal transduction pathways contributing to morphogenesis. Here we describe the global gene expression dynamics during early murine limb development, when cartilage, tendons, muscle, joints, vasculature and nerves are specified and the musculoskeletal system of limbs is established. We used whole-genome microarrays to identify genes with differential expression at 5 stages of limb development (E9.5 to 13.5), during fore- and hind-limb patterning. We found that the onset of limb formation is characterized by an up-regulation of transcription factors, which is followed by a massive activation of genes during E10.5 and E11.5 which levels off at later time points. Among the 3520 genes identified as significantly up-regulated in the limb, we find ~30% to be novel, dramatically expanding the repertoire of candidate genes likely to function in the limb. Hierarchical and stage-specific clustering identified expression profiles that are likely to correlate with functional programs during limb development and further characterization of these transcripts will provide new insights into specific tissue patterning processes. Here, we provide for the first time a comprehensive analysis of developmentally regulated genes during murine limb development, and provide some novel insights into the expression dynamics governing limb morphogenesis.

Journal Article

Share this book

Add to My Shelf

MicroRNAs and essential components of the microRNA processing machinery are not encoded in the genome of the ctenophore Mnemiopsis leidyi

by Baxevanis, Andreas D , Ryan, Joseph F , Schnitzler, Christine E in Algae , Animal Genetics and Genomics , Animals

2012

Background: MicroRNAs play a vital role in the regulation of gene expression and have been identified in every animal with a sequenced genome examined thus far, except for the placozoan Trichoplax. The genomic repertoires of metazoan microRNAs have become increasingly endorsed as phylogenetic characters and drivers of biological complexity. Results: In this study, we report the first investigation of microRNAs in a species from the phylum Ctenophora. We use short RNA sequencing and the assembled genome of the lobate ctenophore Mnemiopsis leidyi to show that this species appears to lack any recognizable microRNAs, as well as the nuclear proteins Drosha and Pasha, which are critical to canonical microRNA biogenesis. This finding represents the first reported case of a metazoan lacking a Drosha protein. Conclusions: Recent phylogenomic analyses suggest that Mnemiopsis may be the earliest branching metazoan lineage. If this is true, then the origins of canonical microRNA biogenesis and microRNA-mediated gene regulation may postdate the last common metazoan ancestor. Alternatively, canonical microRNA functionality may have been lost independently in the lineages leading to both Mnemiopsis and the placozoan Trichoplax, suggesting that microRNA functionality was not critical until much later in metazoan evolution.

Journal Article

Share this book

Add to My Shelf

Evolutionary profiling reveals the heterogeneous origins of classes of human disease genes: implications for modeling disease genetics in animals

by Moreland, R Travis , Baxevanis, Andreas D , Havlak, Paul in Animal diseases , Animal models , Animal species

2014

Background The recent expansion of whole-genome sequence data available from diverse animal lineages provides an opportunity to investigate the evolutionary origins of specific classes of human disease genes. Previous studies have observed that human disease genes are of particularly ancient origin. While this suggests that many animal species have the potential to serve as feasible models for research on genes responsible for human disease, it is unclear whether this pattern has meaningful implications and whether it prevails for every class of human disease. Results We used a comparative genomics approach encompassing a broad phylogenetic range of animals with sequenced genomes to determine the evolutionary patterns exhibited by human genes associated with different classes of disease. Our results support previous claims that most human disease genes are of ancient origin but, more importantly, we also demonstrate that several specific disease classes have a significantly large proportion of genes that emerged relatively recently within the metazoans and/or vertebrates. An independent assessment of the synonymous to non-synonymous substitution rates of human disease genes found in mammals reveals that disease classes that arose more recently also display unexpected rates of purifying selection between their mammalian and human counterparts. Conclusions Our results reveal the heterogeneity underlying the evolutionary origins of (and selective pressures on) different classes of human disease genes. For example, some disease gene classes appear to be of uncommonly recent ( i.e., vertebrate-specific) origin and, as a whole, have been evolving at a faster rate within mammals than the majority of disease classes having more ancient origins. The novel patterns that we have identified may provide new insight into cases where studies using traditional animal models were unable to produce results that translated to humans. Conversely, we note that the larger set of disease classes do have ancient origins, suggesting that many non-traditional animal models have the potential to be useful for studying many human disease genes. Taken together, these findings emphasize why model organism selection should be done on a disease-by-disease basis, with evolutionary profiles in mind.

Journal Article

Share this book

Add to My Shelf

Analysis of rare genetic variation underlying cardiometabolic diseases and traits among 200,000 individuals in the UK Biobank

by Pirruccello, James P. , Weng, Lu-Chen , Halford, Jennifer L. in 631/208/457 , 631/208/514/1948 , 692/308/2056

2022

Cardiometabolic diseases are the leading cause of death worldwide. Despite a known genetic component, our understanding of these diseases remains incomplete. Here, we analyzed the contribution of rare variants to 57 diseases and 26 cardiometabolic traits, using data from 200,337 UK Biobank participants with whole-exome sequencing. We identified 57 gene-based associations, with broad replication of novel signals in Geisinger MyCode. There was a striking risk associated with mutations in known Mendelian disease genes, including MYBPC3 , LDLR , GCK , PKD1 and TTN . Many genes showed independent convergence of rare and common variant evidence, including an association between GIGYF1 and type 2 diabetes. We identified several large effect associations for height and 18 unique genes associated with blood lipid or glucose levels. Finally, we found that between 1.0% and 2.4% of participants carried rare potentially pathogenic variants for cardiometabolic disorders. These findings may facilitate studies aimed at therapeutics and screening of these common disorders. Analysis of whole-exome sequencing data from over 200,000 individuals in the UK Biobank provides new insights into the contribution of rare variants to cardiometabolic diseases and traits.

Journal Article

Share this book

Add to My Shelf

Genome-wide association analyses highlight etiological differences underlying newly defined subtypes of diabetes

by Brosnan, Julia , Dwivedi, Om Prakash , Prasad, Rashmi B. in 45/43 , 692/699 , 692/699/2743/137

2021

Type 2 diabetes has been reproducibly clustered into five subtypes with different disease progression and risk of complications; however, etiological differences are unknown. We used genome-wide association and genetic risk score (GRS) analysis to compare the underlying genetic drivers. Individuals from the Swedish ANDIS (All New Diabetics In Scania) study were compared to individuals without diabetes; the Finnish DIREVA (Diabetes register in Vasa) and Botnia studies were used for replication. We show that subtypes differ with regard to family history of diabetes and association with GRS for diabetes-related traits. The severe insulin-resistant subtype was uniquely associated with GRS for fasting insulin but not with variants in the TCF7L2 locus or GRS reflecting insulin secretion. Further, an SNP (rs10824307) near LRMDA was uniquely associated with mild obesity-related diabetes. Therefore, we conclude that the subtypes have partially distinct genetic backgrounds indicating etiological differences. Genome-wide association and genetic risk score analyses highlight differences in genetic architecture across five subtypes of diabetes.

Journal Article

Share this book

Add to My Shelf

GWAS of serum ALT and AST reveals an association of SLC30A10 Thr95Ile with hypermanganesemia symptoms

by Haslett, Patrick A. J. , Ferreira, Manuel A. R. , Parker, Margaret M. in 45/43 , 631/208/205/2138 , 692/698/2741/288/2032

2021

Understanding mechanisms of hepatocellular damage may lead to new treatments for liver disease, and genome-wide association studies (GWAS) of alanine aminotransferase (ALT) and aspartate aminotransferase (AST) serum activities have proven useful for investigating liver biology. Here we report 100 loci associating with both enzymes, using GWAS across 411,048 subjects in the UK Biobank. The rare missense variant SLC30A10 Thr95Ile (rs188273166) associates with the largest elevation of both enzymes, and this association replicates in the DiscovEHR study. SLC30A10 excretes manganese from the liver to the bile duct, and rare homozygous loss of function causes the syndrome hypermanganesemia with dystonia-1 (HMNDYT1) which involves cirrhosis. Consistent with hematological symptoms of hypermanganesemia, SLC30A10 Thr95Ile carriers have increased hematocrit and risk of iron deficiency anemia. Carriers also have increased risk of extrahepatic bile duct cancer. These results suggest that genetic variation in SLC30A10 adversely affects more individuals than patients with diagnosed HMNDYT1. Circulating liver enzymes, like alanine aminotransferase (ALT) and aspartate aminotransferase (AST), are highly heritable and predictive of disease. Here, the authors perform a genome-wide association study on ALT and AST, revealing a rare variant in SLC30A10 associated with elevated ALT and AST.

Journal Article

Share this book

Add to My Shelf

The impact of common and rare genetic variants on bradyarrhythmia development

by Shah, Svati H. , Soliman, Elsayed Z. , Wang, Xin in 631/208/205/2138 , 692/699/75/29 , Agriculture

2025

To broaden our understanding of bradyarrhythmias and conduction disease, we performed common variant genome-wide association analyses in up to 1.3 million individuals and rare variant burden testing in 460,000 individuals for sinus node dysfunction (SND), distal conduction disease (DCD) and pacemaker (PM) implantation. We identified 13, 31 and 21 common variant loci for SND, DCD and PM, respectively. Four well-known loci ( SCN5A / SCN10A , CCDC141 , TBX20 and CAMK2D) were shared for SND and DCD, while others were more specific for SND or DCD. SND and DCD showed a moderate genetic correlation ( r g = 0.63). Cardiomyocyte-expressed genes were enriched for contributions to DCD heritability. Rare-variant analyses implicated LMNA for all bradyarrhythmia phenotypes, SMAD6 and SCN5A for DCD and TTN , MYBPC3 and SCN5A for PM. These results show that variation in multiple genetic pathways (for example, ion channel function, cardiac developmental programs, sarcomeric structure and cellular homeostasis) appear critical to the development of bradyarrhythmias. Genome-wide analyses identify variants associated with sinus node dysfunction, distal conduction disease and pacemaker implantation, implicating ion channel function, cardiac developmental programs and sarcomeric structure in bradyarrhythmia susceptibility.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter