Catalogue Search | MBRL

Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize

by Buckler, Edward S. , Ramstein, Guillaume P. in Accuracy , Angiosperms , Animal Genetics and Genomics

2022

Background Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at high resolution, within fewer than hundreds of base pairs. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we use genomic annotations to accurately predict nucleotide conservation across angiosperms, as a proxy for fitness effect of mutations. Results Using only sequence analysis, we annotate nonsynonymous mutations in 25,824 maize gene models, with information from bioinformatics and deep learning. Our predictions are validated by experimental information: within-species conservation, chromatin accessibility, and gene expression. According to gene ontology and pathway enrichment analyses, predicted nucleotide conservation points to genes in central carbon metabolism. Importantly, it improves genomic prediction for fitness-related traits such as grain yield, in elite maize panels, by stringent prioritization of fewer than 1% of single-site variants. Conclusions Our results suggest that predicting nucleotide conservation across angiosperms may effectively prioritize sites most likely to impact fitness-related traits in crops, without being limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Our approach—Prediction of mutation Impact by Calibrated Nucleotide Conservation (PICNC)—could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing. The trained PICNC models and predicted nucleotide conservation at protein-coding SNPs in maize are publicly available in CyVerse ( https://doi.org/10.25739/hybz-2957 ).

Journal Article

Share this book

Add to My Shelf

Breaking the curse of dimensionality to identify causal variants in Breeding 4

by Ramstein, Guillaume P , Buckler, Edward S , Jensen, Sarah E in Data processing , Genetic transformation , Genome editing

2019

In the past, plant breeding has undergone three major transformations and is currently transitioning to a new technological phase, Breeding 4. This phase is characterized by the development of methods for biological design of plant varieties, including transformation and gene editing techniques directed toward causal loci. The application of such technologies will require to reliably estimate the effect of loci in plant genomes by avoiding the situation where the number of loci assayed (p) surpasses the number of plant genotypes (n). Here, we discuss approaches to avoid this curse of dimensionality (n ≪ p), which will involve analyzing intermediate phenotypes such as molecular traits and component traits related to plant morphology or physiology. Because these approaches will rely on novel data types such as DNA sequences and high-throughput phenotyping images, Breeding 4 will call for analyses that are complementary to traditional quantitative genetic studies, being based on machine learning techniques which make efficient use of sequence and image data. In this article, we will present some of these techniques and their application for prioritizing causal loci and developing improved varieties in Breeding 4.

Journal Article

Share this book

Add to My Shelf

Haplotype associated RNA expression (HARE) improves prediction of complex traits in maize

by Khaipho-Burch, Merritt , Buckler, Edward S. , Ramstein, Guillaume P. in Biology and Life Sciences , Corn , Engineering and Technology

2021

Genomic prediction typically relies on associations between single-site polymorphisms and traits of interest. This representation of genomic variability has been successful for predicting many complex traits. However, it usually cannot capture the combination of alleles in haplotypes and it has generated little insight about the biological function of polymorphisms. Here we present a novel and cost-effective method for imputing cis haplotype associated RNA expression (HARE), studied their transferability across tissues, and evaluated genomic prediction models within and across populations. HARE focuses on tightly linked cis acting causal variants in the immediate vicinity of the gene, while excluding trans effects from diffusion and metabolism. Therefore, HARE estimates were more transferrable across different tissues and populations compared to measured transcript expression. We also showed that HARE estimates captured one-third of the variation in gene expression. HARE estimates were used in genomic prediction models evaluated within and across two diverse maize panels–a diverse association panel (Goodman Association panel) and a large half-sib panel (Nested Association Mapping panel)–for predicting 26 complex traits. HARE resulted in up to 15% higher prediction accuracy than control approaches that preserved haplotype structure, suggesting that HARE carried functional information in addition to information about haplotype structure. The largest increase was observed when the model was trained in the Nested Association Mapping panel and tested in the Goodman Association panel. Additionally, HARE yielded higher within-population prediction accuracy as compared to measured expression values. The accuracy achieved by measured expression was variable across tissues, whereas accuracy by HARE was more stable across tissues. Therefore, imputing RNA expression of genes by haplotype is stable, cost-effective, and transferable across populations.

Journal Article

Share this book

Add to My Shelf

Evolutionarily informed deep learning methods for predicting relative transcript abundance from DNA sequence

by Buckler, Edward S. , Washburn, Jacob D. , Kremling, Karl A. in 3' Untranslated regions , Abundance , Base Sequence - genetics

2019

Deep learning methodologies have revolutionized prediction in many fields and show potential to do the same in molecular biology and genetics. However, applying these methods in their current forms ignores evolutionary dependencies within biological systems and can result in false positives and spurious conclusions. We developed two approaches that account for evolutionary relatedness in machine learning models: (i) gene-family–guided splitting and (ii) ortholog contrasts. The first approach accounts for evolution by constraining model training and testing sets to include different gene families. The second approach uses evolutionarily informed comparisons between orthologous genes to both control for and leverage evolutionary divergence during the training process. The two approaches were explored and validated within the context of mRNA expression level prediction and have the area under the ROC curve (auROC) values ranging from 0.75 to 0.94. Model weight inspections showed biologically interpretable patterns, resulting in the hypothesis that the 3′ UTR is more important for fine-tuning mRNA abundance levels while the 5′ UTR is more important for large-scale changes.

Journal Article

Share this book

Add to My Shelf

Predicting phenotypes from genetic, environment, management, and historical data using CNNs

by Ramstein Guillaume , Reeves, Timothy , O’Briant Patrick in Neural networks , Phenotypes

2021

Key MessageConvolutional Neural Networks (CNNs) can perform similarly or better than standard genomic prediction methods when sufficient genetic, environmental, and management data are provided.Predicting phenotypes from genetic (G), environmental (E), and management (M) conditions is a long-standing challenge with implications to agriculture, medicine, and conservation. Most methods reduce the factors in a dataset (feature engineering) in a subjective and potentially oversimplified manner. Deep neural networks such as Multilayer Perceptrons (MPL) and Convolutional Neural Networks (CNN) can overcome this by allowing the data itself to determine which factors are most important. CNN models were developed for predicting agronomic yield from a combination of replicated trials and historical yield survey data. The results were more accurate than standard methods when tested on held-out G, E, and M data (r = 0.50 vs. r = 0.43), and performed slightly worse than standard methods when only G was held out (r = 0.74 vs. r = 0.80). Pre-training on historical data increased accuracy compared to trial data alone. Saliency map analysis indicated the CNN has “learned” to prioritize many factors of known agricultural importance.

Journal Article

Share this book

Add to My Shelf

Haplotype associated RNA expression

by Ramstein, Guillaume P , Khaipho-Burch, Merritt , Buckler, Edward S in Corn , Gene expression , Genetic aspects

2021

Genomic prediction typically relies on associations between single-site polymorphisms and traits of interest. This representation of genomic variability has been successful for predicting many complex traits. However, it usually cannot capture the combination of alleles in haplotypes and it has generated little insight about the biological function of polymorphisms. Here we present a novel and cost-effective method for imputing cis haplotype associated RNA expression (HARE), studied their transferability across tissues, and evaluated genomic prediction models within and across populations. HARE focuses on tightly linked cis acting causal variants in the immediate vicinity of the gene, while excluding trans effects from diffusion and metabolism. Therefore, HARE estimates were more transferrable across different tissues and populations compared to measured transcript expression. We also showed that HARE estimates captured one-third of the variation in gene expression. HARE estimates were used in genomic prediction models evaluated within and across two diverse maize panels-a diverse association panel (Goodman Association panel) and a large half-sib panel (Nested Association Mapping panel)-for predicting 26 complex traits. HARE resulted in up to 15% higher prediction accuracy than control approaches that preserved haplotype structure, suggesting that HARE carried functional information in addition to information about haplotype structure. The largest increase was observed when the model was trained in the Nested Association Mapping panel and tested in the Goodman Association panel. Additionally, HARE yielded higher within-population prediction accuracy as compared to measured expression values. The accuracy achieved by measured expression was variable across tissues, whereas accuracy by HARE was more stable across tissues. Therefore, imputing RNA expression of genes by haplotype is stable, cost-effective, and transferable across populations.

Journal Article

Share this book

Add to My Shelf

High-resolution genome-wide association study pinpoints metal transporter and chelator genes involved in the genetic control of element levels in maize grain

by Ramstein, Guillaume P , Gore, Michael A , Li, Xiaowei in Genomes , Molybdenum

2021

Despite its importance to plant function and human health, the genetics underpinning element levels in maize grain remain largely unknown. Through a genome-wide association study in the maize Ames panel of nearly 2,000 inbred lines that was imputed with ∼7.7 million SNP markers, we investigated the genetic basis of natural variation for the concentration of 11 elements in grain. Novel associations were detected for the metal transporter genes rte2 (rotten ear2) and irt1 (iron-regulated transporter1) with boron and nickel, respectively. We also further resolved loci that were previously found to be associated with one or more of five elements (copper, iron, manganese, molybdenum, and/or zinc), with two metal chelator and five metal transporter candidate causal genes identified. The nas5 (nicotianamine synthase5) gene involved in the synthesis of nicotianamine, a metal chelator, was found associated with both zinc and iron and suggests a common genetic basis controlling the accumulation of these two metals in the grain. Furthermore, moderate predictive abilities were obtained for the 11 elemental grain phenotypes with two whole-genome prediction models: Bayesian Ridge Regression (0.33–0.51) and BayesB (0.33–0.53). Of the two models, BayesB, with its greater emphasis on large-effect loci, showed ∼4–10% higher predictive abilities for nickel, molybdenum, and copper. Altogether, our findings contribute to an improved genotype-phenotype map for grain element accumulation in maize.

Journal Article

Share this book

Add to My Shelf

Extensions of BLUP Models for Genomic Prediction in Heterogeneous Populations: Application in a Diverse Switchgrass Sample

by Ramstein, Guillaume P , Casler, Michael D in Calibration

2019

Genomic prediction is a useful tool to accelerate genetic gain in selection using DNA marker information. However, this technology typically relies on standard prediction procedures, such as genomic BLUP, that are not designed to accommodate population heterogeneity resulting from differences in marker effects across populations. In this study, we assayed different prediction procedures to capture marker-by-population interactions in genomic prediction models. Prediction procedures included genomic BLUP and two kernel-based extensions of genomic BLUP which explicitly accounted for population heterogeneity. To model population heterogeneity, dissemblance between populations was either depicted by a unique coefficient (as previously reported), or a more flexible function of genetic distance between populations (proposed herein). Models under investigation were applied in a diverse switchgrass sample under two validation schemes: whole-sample calibration, where all individuals except selection candidates are included in the calibration set, and cross-population calibration, where the target population is entirely excluded from the calibration set. First, we showed that using fixed effects, from principal components or putative population groups, appeared detrimental to prediction accuracy, especially in cross-population calibration. Then we showed that modeling population heterogeneity by our proposed procedure resulted in highly significant improvements in model fit. In such cases, gains in accuracy were often positive. These results suggest that population heterogeneity may be parsimoniously captured by kernel methods. However, in cases where improvement in model fit by our proposed procedure is null-to-moderate, ignoring heterogeneity should probably be preferred due to the robustness and simplicity of the standard genomic BLUP model.

Journal Article

Share this book

Add to My Shelf

Utilizing evolutionary conservation to detect deleterious mutations and improve genomic prediction in cassava

by Buckler, Edward S. , Long, Evan M. , Romay, M. Cinta in Accuracy , Amino acids , Cassava

2023

Cassava (Manihot esculenta) is an annual root crop which provides the major source of calories for over half a billion people around the world. Since its domestication ~10,000 years ago, cassava has been largely clonally propagated through stem cuttings. Minimal sexual recombination has led to an accumulation of deleterious mutations made evident by heavy inbreeding depression. To locate and characterize these deleterious mutations, and to measure selection pressure across the cassava genome, we aligned 52 related Euphorbiaceae and other related species representing millions of years of evolution. With single base-pair resolution of genetic conservation, we used protein structure models, amino acid impact, and evolutionary conservation across the Euphorbiaceae to estimate evolutionary constraint. With known deleterious mutations, we aimed to improve genomic evaluations of plant performance through genomic prediction. We first tested this hypothesis through simulation utilizing multi-kernel GBLUP to predict simulated phenotypes across separate populations of cassava. Simulations showed a sizable increase of prediction accuracy when incorporating functional variants in the model when the trait was determined by<100 quantitative trait loci (QTL). Utilizing deleterious mutations and functional weights informed through evolutionary conservation, we saw improvements in genomic prediction accuracy that were dependent on trait and prediction. We showed the potential for using evolutionary information to track functional variation across the genome, in order to improve whole genome trait prediction. We anticipate that continued work to improve genotype accuracy and deleterious mutation assessment will lead to improved genomic assessments of cassava clones.

Journal Article

Share this book

Add to My Shelf

Genomic prediction in a small barley population can benefit from training on related populations

by Jensen, Jens Due , Olesen, Lotte , Ramstein, Guillaume P in Barley

2025

Genomic prediction (GP) has shown to be a valuable tool for genetic improvement in breeding programs but requires large training populations in order to build robust models. This is difficult to obtain for newly established breeding programs. Here, we aimed to overcome this challenge by combining datasets from 4 different barley breeding programs, utilizing up to 12 years of data to increase prediction accuracy in a more recently established 6-rowed winter (6RW) barley breeding program. By allowing data to accumulate in a breeding program as the years progress, we investigated when GP accuracy in 6RW benefitted from external populations. To do this, we focused on several parameters: training population size, choice of model for multipopulation GP (univariate versus multivariate), the key trait under investigation (grain yield, plant height, or rust resistance), and genetic distance between populations. We found that in the early stages of a breeding program, prediction of the 6RW population could benefit from inclusion of an external population, but the advantage depended on the specific population and trait under investigation. However, when data from all 4 years were available, multipopulation GP generally performed similarly to within-population GP. Additionally, when comparing multivariate and univariate models for multipopulation GP, the multivariate model often performed significantly worse, despite strong genetic correlations between the populations involved. This was especially the case when data were sparse and the model required estimation of numerous parameters from a small number of observations. Altogether, our results suggest that multipopulation GP is beneficial only in the very early stages of new breeding programs, emphasizing its relevance for newly established breeding programs or new breeding goals, especially for related populations.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter