Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
19 result(s) for "Mbatchou, Joelle"
Sort by:
Computationally efficient whole-genome regression for quantitative and binary traits
Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case–control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals. REGENIE is a whole-genome regression method based on ridge regression that enables highly parallelized analysis of quantitative and binary traits in biobank-scale data with reduced computational requirements.
Exome sequencing and analysis of 454,787 UK Biobank participants
A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing 1 to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study 2 . We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P  ≤ 2.18 × 10 −11 . Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension ( SLC9A3R2 ), diabetes ( MAP3K15 , FAM234A ) and asthma ( SLC27A3 ). Six genes were associated with brain imaging phenotypes, including two involved in neural development ( GBE1 , PLD1 ). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene–trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale. Whole-exome sequencing analysis of 454,787 individuals in the UK Biobank is used to examine the association of protein-coding variants with nearly 4,000 health-related traits, identifying 564 distinct genes with significant trait associations.
BRASS: Permutation methods for binary traits in genetic association studies with structured samples
In genetic association analysis of complex traits, permutation testing can be a valuable tool for assessing significance when the distribution of the test statistic is unknown or not well-approximated. This commonly arises, e.g, in tests of gene-set, pathway or genome-wide significance, or when the statistic is formed by machine learning or data adaptive methods. Existing applications include eQTL mapping, association testing with rare variants, inclusion of admixed individuals in genetic association analysis, and epistasis detection among many others. For genetic association testing in samples with population structure and/or relatedness, use of naive permutation can lead to inflated type 1 error. To address this in quantitative traits, the MVNpermute method was developed. However, for association mapping of a binary trait, the relationship between the mean and variance makes both naive permutation and the MVNpermute method invalid. We propose BRASS, a permutation method for binary traits, for use in association mapping in structured samples. In addition to modeling structure in the sample, BRASS allows for covariates, ascertainment and simultaneous testing of multiple markers, and it accommodates a wide range of test statistics. In simulation studies, we compare BRASS to other permutation and resampling-based methods in a range of scenarios that include population structure, familial relatedness, ascertainment and phenotype model misspecification. In these settings, we demonstrate the superior control of type 1 error by BRASS compared to the other 6 methods considered. We apply BRASS to assess genome-wide significance for association analyses in domestic dog for elbow dysplasia (ED) and idiopathic epilepsy (IE). For both traits we detect previously identified associations, and in addition, for ED, we detect significant association with a SNP on chromosome 35 that was not detected by previous analyses, demonstrating the potential of the method.
Heiomasia, a new genus in the lichen-forming family Graphidaceae (Ascomycota: Lecanoromycetes: Ostropales) with disjunct distribution in Southeastern North America and Southeast Asia
Heiomasia is a new genus that includes two sterile species producing unique isidia-like structures for vegetative dispersal. Their systematic position was clarified using molecular analysis of the small subunit of the mitochondrial ribosomal DNA (mtSSU). The two species form a strongly supported clade occupying a somewhat isolated position within Graphidaceae s.lat. The new genus is based on H. sipmanii (Aptroot, Lücking & Rivas Plata) Nelsen, Lücking & Rivas Plata comb. nov., a species from Southeast Asia with disc-shaped isidia-like structures, originally described in the genus Herpothallon. The second species, H. seaveyorum Nelsen & Lücking spec. nov., is known only from Florida; it produces robust, sausage-shaped isidia-like structures.
Common and rare variant associations with clonal haematopoiesis phenotypes
Clonal haematopoiesis involves the expansion of certain blood cell lineages and has been associated with ageing and adverse health outcomes 1 , 2 , 3 , 4 – 5 . Here we use exome sequence data on 628,388 individuals to identify 40,208 carriers of clonal haematopoiesis of indeterminate potential (CHIP). Using genome-wide and exome-wide association analyses, we identify 24 loci (21 of which are novel) where germline genetic variation influences predisposition to CHIP, including missense variants in the lymphocytic antigen coding gene LY75 , which are associated with reduced incidence of CHIP. We also identify novel rare variant associations with clonal haematopoiesis and telomere length. Analysis of 5,041 health traits from the UK Biobank (UKB) found relationships between CHIP and severe COVID-19 outcomes, cardiovascular disease, haematologic traits, malignancy, smoking, obesity, infection and all-cause mortality. Longitudinal and Mendelian randomization analyses revealed that CHIP is associated with solid cancers, including non-melanoma skin cancer and lung cancer, and that CHIP linked to DNMT3A is associated with the subsequent development of myeloid but not lymphoid leukaemias. Additionally, contrary to previous findings from the initial 50,000 UKB exomes 6 , our results in the full sample do not support a role for IL-6 inhibition in reducing the risk of cardiovascular disease among CHIP carriers. Our findings demonstrate that CHIP represents a complex set of heterogeneous phenotypes with shared and unique germline genetic causes and varied clinical implications. Exome sequence data from 628,388 individuals was used to identify 24 risk loci in 40,208 carriers of clonal haematopoiesis of indeterminate potential and link them to other conditions including COVID-19, cardiovascular disease and cancer.
Genotyping, sequencing and analysis of 140,000 adults from Mexico City
The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City 1 . Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using whole-genome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent. Genotype and exome sequencing of 150,000 participants and whole-genome sequencing of 9,950 selected individuals recruited into the Mexico City Prospective Study constitute a valuable, publicly available resource of non-European sequencing data.
Germline Mutations in CIDEB and Protection against Liver Disease
Exome sequencing in hundreds of thousands of persons may enable the identification of rare protein-coding genetic variants associated with protection from human diseases like liver cirrhosis, providing a strategy for the discovery of new therapeutic targets. We performed a multistage exome sequencing and genetic association analysis to identify genes in which rare protein-coding variants were associated with liver phenotypes. We conducted in vitro experiments to further characterize associations. The multistage analysis involved 542,904 persons with available data on liver aminotransferase levels, 24,944 patients with various types of liver disease, and 490,636 controls without liver disease. We found that rare coding variants in , , , and were associated with increased aminotransferase levels and an increased risk of liver disease. We also found that variants in , which encodes a structural protein found in hepatic lipid droplets, had a protective effect. The burden of rare predicted loss-of-function variants plus missense variants in (combined carrier frequency, 0.7%) was associated with decreased alanine aminotransferase levels (beta per allele, -1.24 U per liter; 95% confidence interval [CI], -1.66 to -0.83; P = 4.8×10 ) and with 33% lower odds of liver disease of any cause (odds ratio per allele, 0.67; 95% CI, 0.57 to 0.79; P = 9.9×10 ). Rare coding variants in were associated with a decreased risk of liver disease across different underlying causes and different degrees of severity, including cirrhosis of any cause (odds ratio per allele, 0.50; 95% CI, 0.36 to 0.70). Among 3599 patients who had undergone bariatric surgery, rare coding variants in were associated with a decreased nonalcoholic fatty liver disease activity score (beta per allele in score units, -0.98; 95% CI, -1.54 to -0.41 [scores range from 0 to 8, with higher scores indicating more severe disease]). In human hepatoma cell lines challenged with oleate, small interfering RNA knockdown prevented the buildup of large lipid droplets. Rare germline mutations in conferred substantial protection from liver disease. (Funded by Regeneron Pharmaceuticals.).
High heritability of ascending aortic diameter and trans-ancestry prediction of thoracic aortic disease
Enlargement of the aorta is an important risk factor for aortic aneurysm and dissection, a leading cause of morbidity in the developed world. Here we performed automated extraction of ascending aortic diameter from cardiac magnetic resonance images of 36,021 individuals from the UK Biobank, followed by genome-wide association. We identified lead variants across 41 loci, including genes related to cardiovascular development ( HAND2 , TBX20 ) and Mendelian forms of thoracic aortic disease ( ELN , FBN1 ). A polygenic score significantly predicted prevalent risk of thoracic aortic aneurysm and the need for surgical intervention for patients with thoracic aneurysm across multiple ancestries within the UK Biobank, FinnGen, the Penn Medicine Biobank and the Million Veterans Program (MVP). Additionally, we highlight the primary causal role of blood pressure in reducing aortic dilation using Mendelian randomization. Overall, our findings provide a roadmap for using genetic determinants of human anatomy to understand cardiovascular development while improving prediction of diseases of the thoracic aorta. Trans-ancestry genome-wide analyses identify multiple loci associated with ascending aortic diameter. A polygenic score constructed from these loci predicted prevalent risk of thoracic aortic aneurysm in independent populations.
Assessments of Significance for Genetic Association Analysis in Structured Samples
In this dissertation, we develop methods to address several problems that arise in the assessment of significance for genetic association analysis of complex traits in structured samples. In Chapter 2, we focus on phenotype resampling methods for binary trait analysis. We develop BRASS, a permutation-based approach to testing association between a binary trait and an arbitrary predictor in samples with population structure and/or related individuals. BRASS is applicable in various contexts, including (1) correction for multiple comparisons when testing for region-wide or genome-wide significance, and (2) assessment of significance for tests that combine test statistics that perform well in different scenarios. Previous methods are applicable only to analysis of a quantitative trait and do not perform well for a binary trait. BRASS allows for covariates, ascertainment and simultaneous testing of multiple markers, and it does not place strong restrictions on the test statistic used. We use an estimating equation approach that can be viewed as a hybrid of logistic regression and linear mixed-effects model methods, and we use a combination of principal components and a genetic relatedness matrix to account for sample structure. In simulation studies, we demonstrate that BRASS maintains correct control of type 1 error. We illustrate the proposed approach in two genome-wide analyses of binary traits in domestic dog.In Chapter 3, we focus on assessment of significance in genetic association analysis of single or multi-dimensional phenotypes where we consider test statistics of a certain form, allow association to be tested with single or multiple genetic markers simultaneously, and where there is population structure and/or relatedness. Existing approaches that can be used in this context are either computationally burdensome (permutation-based approaches), or do not perform well in settings such as small samples, high-dimensional traits, or misspecified phenotype model (asymptotic approximations based on prospective models), or require an assumption of second-order exchangeability of individuals’ genotypes, possibly after correction for ancestry-informative covariates (existing moment-matching methods for detecting association of two matrices). We develop JASPER, which can be viewed as an extension of existing moment-matching methods for detecting association of two matrices, to allow very general population structure and relatedness in the sample. JASPER can be used for a reasonably broad class of test statistics currently used in genetic association analysis, including most linear mixed model-based score tests and kernel-based test statistics. Notable features of JASPER are that it (1) is insensitive to misspecification of the phenotype model, (2) does not require knowledge of the distribution of the test statistic under the null hypothesis, (3) allows population structure, related individuals, covariates, ascertainment, rare variants, and multiple traits, and (4) with rare variant mapping, it does not require knowledge of the correlation structure among the rare variants. Through simulation studies, we demonstrate that JASPER properly controls type 1 error in the presence of sample structure and can provide substantial power gains compared to large-sample-based assessments of significance. JASPER is applied in a study of the genetic regulation of gene expression levels within biological pathways in data from the Framingham Heart Study.