Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
LanguageLanguage
-
SubjectSubject
-
Item TypeItem Type
-
DisciplineDiscipline
-
YearFrom:-To:
-
More FiltersMore FiltersIs Peer Reviewed
Done
Filters
Reset
8,485
result(s) for
"genomic predictions"
Sort by:
Can Deep Learning Improve Genomic Prediction of Complex Human Traits?
by
de los Campos, Gustavo
,
Pérez-Enciso, Miguel
,
Bellot, Pau
in
Artificial intelligence
,
Artificial neural networks
,
Bayesian analysis
2018
The current excitement around artificial intelligence and the renewed interest in “deep learning” (DL) have been applied to the genetic analysis of complex traits; however, the performance of DL for genomic prediction of complex... The genetic analysis of complex traits does not escape the current excitement around artificial intelligence, including a renewed interest in “deep learning” (DL) techniques such as Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). However, the performance of DL for genomic prediction of complex human traits has not been comprehensively tested. To provide an evaluation of MLPs and CNNs, we used data from distantly related white Caucasian individuals (n ∼100k individuals, m ∼500k SNPs, and k = 1000) of the interim release of the UK Biobank. We analyzed a total of five phenotypes: height, bone heel mineral density, body mass index, systolic blood pressure, and waist–hip ratio, with genomic heritabilities ranging from ∼0.20 to 0.70. After hyperparameter optimization using a genetic algorithm, we considered several configurations, from shallow to deep learners, and compared the predictive performance of MLPs and CNNs with that of Bayesian linear regressions across sets of SNPs (from 10k to 50k) that were preselected using single-marker regression analyses. For height, a highly heritable phenotype, all methods performed similarly, although CNNs were slightly but consistently worse. For the rest of the phenotypes, the performance of some CNNs was comparable or slightly better than linear methods. Performance of MLPs was highly dependent on SNP set and phenotype. In all, over the range of traits evaluated in this study, CNN performance was competitive to linear models, but we did not find any case where DL outperformed the linear model by a sizable margin. We suggest that more research is needed to adapt CNN methodology, originally motivated by image analysis, to genetic-based problems in order for CNNs to be competitive with linear models.
Journal Article
Optimising Genomic Selection in Wheat: Effect of Marker Density, Population Size and Population Structure on Prediction Accuracy
2018
Genomic selection applied to plant breeding enables earlier estimates of a line’s performance and significant reductions in generation interval. Several factors affecting prediction accuracy should be well understood if breeders are to harness genomic selection to its full potential. We used a panel of 10,375 bread wheat (Triticum aestivum) lines genotyped with 18,101 SNP markers to investigate the effect and interaction of training set size, population structure and marker density on genomic prediction accuracy. Through assessing the effect of training set size we showed the rate at which prediction accuracy increases is slower beyond approximately 2,000 lines. The structure of the panel was assessed via principal component analysis and K-means clustering, and its effect on prediction accuracy was examined through a novel cross-validation analysis according to the K-means clusters and breeding cohorts. Here we showed that accuracy can be improved by increasing the diversity within the training set, particularly when relatedness between training and validation sets is low. The breeding cohort analysis revealed that traits with higher selection pressure (lower allelic diversity) can be more accurately predicted by including several previous cohorts in the training set. The effect of marker density and its interaction with population structure was assessed for marker subsets containing between 100 and 17,181 markers. This analysis showed that response to increased marker density is largest when using a diverse training set to predict between poorly related material. These findings represent a significant resource for plant breeders and contribute to the collective knowledge on the optimal structure of calibration panels for genomic prediction.
Journal Article
Multimodal deep learning methods enhance genomic prediction of wheat breeding
by
Piñera, Francisco
,
Rivera, Carolina
,
Montesinos-López, Abelardo
in
Accuracy
,
Deep learning
,
Genomics
2023
While several statistical machine learning methods have been developed and studied for assessing the genomic prediction (GP) accuracy of unobserved phenotypes in plant breeding research, few methods have linked genomics and phenomics (imaging). Deep learning (DL) neural networks have been developed to increase the GP accuracy of unobserved phenotypes while simultaneously accounting for the complexity of genotype–environment interaction (GE); however, unlike conventional GP models, DL has not been investigated for when genomics is linked with phenomics. In this study we used 2 wheat data sets (DS1 and DS2) to compare a novel DL method with conventional GP models. Models fitted for DS1 were GBLUP, gradient boosting machine (GBM), support vector regression (SVR) and the DL method. Results indicated that for 1 year, DL provided better GP accuracy than results obtained by the other models. However, GP accuracy obtained for other years indicated that the GBLUP model was slightly superior to the DL. DS2 is comprised only of genomic data from wheat lines tested for 3 years, 2 environments (drought and irrigated) and 2–4 traits. DS2 results showed that when predicting the irrigated environment with the drought environment, DL had higher accuracy than the GBLUP model in all analyzed traits and years. When predicting drought environment with information on the irrigated environment, the DL model and GBLUP model had similar accuracy. The DL method used in this study is novel and presents a strong degree of generalization as several modules can potentially be incorporated and concatenated to produce an output for a multi-input data structure.
Journal Article
Multi-trait genomic prediction using in-season physiological parameters increases prediction accuracy of complex traits in US wheat
by
McBreen, Jordan
,
Khan, Naeem
,
Bai, Guihua
in
Accuracy
,
Analysis
,
Animal Genetics and Genomics
2022
Background
Recently genomic selection (GS) has emerged as an important tool for plant breeders to select superior genotypes. Multi-trait (MT) prediction model provides an opportunity to improve the predictive ability of expensive and labor-intensive traits. In this study, we assessed the potential use of a MT genomic prediction model by incorporating two physiological traits (canopy temperature, CT and normalized difference vegetation index, NDVI) to predict 5 complex primary traits (harvest index, HI; grain yield, GY; grain number, GN; spike partitioning index, SPI; fruiting efiiciency, FE) using two cross-validation schemes CV1 and CV2.
Results
In this study, we evaluated 236 wheat genotypes in two locations in 2 years. The wheat genotypes were genotyped with genotyping by sequencing approach which generated 27,466 SNPs. MT-CV2 (multi-trait cross validation 2) model improved predictive ability by 4.8 to 138.5% compared to ST-CV1(single-trait cross validation 1). However, the predictive ability of MT-CV1 was not significantly different compared to the ST-CV1 model.
Conclusions
The study showed that the genomic prediction of complex traits such as HI, GN, and GY can be improved when correlated secondary traits (cheaper and easier phenotyping) are used. MT genomic selection could accelerate breeding cycles and improve genetic gain for complex traits in wheat and other crops.
Journal Article
Multi-Trait Multi-Environment Genomic Prediction of Agronomic Traits in Advanced Breeding Lines of Winter Wheat
2021
Genomic prediction is a promising approach for accelerating the genetic gain of complex traits in wheat breeding. However, increasing the prediction accuracy (PA) of genomic prediction (GP) models remains a challenge in the successful implementation of this approach. Multivariate models have shown promise when evaluated using diverse panels of unrelated accessions; however, limited information is available on their performance in advanced breeding trials. Here, we used multivariate GP models to predict multiple agronomic traits using 314 advanced and elite breeding lines of winter wheat evaluated in 10 site-year environments. We evaluated a multi-trait (MT) model with two cross-validation schemes representing different breeding scenarios (CV1, prediction of completely unphenotyped lines; and CV2, prediction of partially phenotyped lines for correlated traits). Moreover, extensive data from multi-environment trials (METs) were used to cross-validate a Bayesian multi-trait multi-environment (MTME) model that integrates the analysis of multiple-traits, such as G × E interaction. The MT-CV2 model outperformed all the other models for predicting grain yield with significant improvement in PA over the single-trait (ST-CV1) model. The MTME model performed better for all traits, with average improvement over the ST-CV1 reaching up to 19, 71, 17, 48, and 51% for grain yield, grain protein content, test weight, plant height, and days to heading, respectively. Overall, the empirical analyses elucidate the potential of both the MT-CV2 and MTME models when advanced breeding lines are used as a training population to predict related preliminary breeding lines. Further, we evaluated the practical application of the MTME model in the breeding program to reduce phenotyping cost using a sparse testing design. This showed that complementing METs with GP can substantially enhance resource efficiency. Our results demonstrate that multivariate GS models have a great potential in implementing GS in breeding programs.
Journal Article
Multi-trait, Multi-environment Deep Learning Modeling for Genomic-Enabled Prediction of Plant Traits
by
Montesinos-López, Osval A
,
Hernández-Suárez, Carlos M
,
Crossa, José
in
Accuracy
,
Deep learning
,
Genotype & phenotype
2018
Multi-trait and multi-environment data are common in animal and plant breeding programs. However, what is lacking are more powerful statistical models that can exploit the correlation between traits to improve prediction accuracy in the context of genomic selection (GS). Multi-trait models are more complex than univariate models and usually require more computational resources, but they are preferred because they can exploit the correlation between traits, which many times helps improve prediction accuracy. For this reason, in this paper we explore the power of multi-trait deep learning (MTDL) models in terms of prediction accuracy. The prediction performance of MTDL models was compared to the performance of the Bayesian multi-trait and multi-environment (BMTME) model proposed by Montesinos-López et al. (2016), which is a multi-trait version of the genomic best linear unbiased prediction (GBLUP) univariate model. Both models were evaluated with predictors with and without the genotype×environment interaction term. The prediction performance of both models was evaluated in terms of Pearson’s correlation using cross-validation. We found that the best predictions in two of the three data sets were found under the BMTME model, but in general the predictions of both models, BTMTE and MTDL, were similar. Among models without the genotype×environment interaction, the MTDL model was the best, while among models with genotype×environment interaction, the BMTME model was superior. These results indicate that the MTDL model is very competitive for performing predictions in the context of GS, with the important practical advantage that it requires less computational resources than the BMTME model.
Journal Article
Genomic basis for drought resistance in European beech forests threatened by climate change
by
Markus Pfenninger
,
Nico Blüthgen
,
Cosima Caliendo
in
Acclimatization
,
Acclimatization - genetics
,
Amino acids
2021
In the course of global climate change, Central Europe is experiencing more frequent and prolonged periods of drought. The drought years 2018 and 2019 affected European beeches (
Fagus sylvatica
L.) differently: even in the same stand, drought-damaged trees neighboured healthy trees, suggesting that the genotype rather than the environment was responsible for this conspicuous pattern. We used this natural experiment to study the genomic basis of drought resistance with Pool-GWAS. Contrasting the extreme phenotypes identified 106 significantly associated single-nucleotide polymorphisms (SNPs) throughout the genome. Most annotated genes with associated SNPs (>70%) were previously implicated in the drought reaction of plants. Non-synonymous substitutions led either to a functional amino acid exchange or premature termination. A non-parametric machine learning approach on 98 validation samples yielded 20 informative loci which allowed an 88% prediction probability of the drought phenotype. Drought resistance in European beech is a moderately polygenic trait that should respond well to natural selection, selective management, and breeding.
Climate change is having a serious impact on many ecosystems. In the summer of 2018 and 2019, around two thirds of European beech trees were damaged or killed by extreme drought. It is critical to keep these beech woods healthy, as they are central to the survival of over 6,000 other species of animals and plants.
The level of damage caused by the drought varied between forests. However, not all the trees in each forest responded in the same way, with severely damaged trees often sitting next to fully healthy ones. This suggests that the genetic make-up of each tree determines how well it can adapt to drought rather than its local environment.
To investigate this further, Pfenninger et al. studied the genome of over 400 European beech trees from the Hesse region in Germany. The samples came from pairs of neighbouring trees that had responded differently to the droughts. The analysis found more than 80 parts of the genome that differed between healthy and damaged trees.
Pfenninger et al. then used this information to create a genetic test which can quickly and inexpensively predict how well an individual beech tree might survive in a drought. Applying this test to another 92 trees revealed that it can reliably detect which ones were healthy and which ones were damaged.
Beech forests are typically managed by private owners, agencies or breeders that could use this genetic test to select and reproduce trees that are better adapted to drought. The goal now is to develop the test so that it can be used more widely to manage European beech trees and potentially other species.
Journal Article
Genome-wide associations of sweetpotato metabolites enhance genomic prediction and identify genes in metabolic and regulatory pathways
2025
Global sweetpotato production is increasing due to its health benefits, including high levels of complex carbohydrates and bioactive compounds. To explore the genetic basis of carbohydrates and carotenoids, we conducted a genome-wide association study (GWAS) using diverse sweetpotato accessions, two decades of phenotypic data, and 252,975 dosage-based SNPs and INDELs. Our findings confirmed a negative correlation between dry matter and β-carotene and identified interconnected metabolic pathways regulating multiple traits. Notably, phytoene synthase, involved in carotene biosynthesis, was associated with dry matter. Other pathways linked to these traits include carbohydrate metabolism, cell wall modification, phosphate starvation, stress response, and flowering regulation. To evaluate the breeding potential of GWAS-assisted genomic prediction (GWABLUP), we found that the 500 top GWAS hits used for genomic prediction significantly enhanced predictive ability (PA) for six out of nine traits, improving PA by up to 6.7% to 15.9% compared to the Genomic Best Linear Unbiased Prediction (GBLUP), which utilized 41,551 and 500 markers, respectively. The best PA across traits ranged from 20.9% to 60.6%, with both additive and dominance effects playing an important role. Model selection, guided by resample model inclusion probability (RMIP), during GWABLUP and after each GWAS iteration typically yielded the highest PA. These results provide valuable insights for breeding strategies aimed at optimizing agronomic traits and addressing market demands for diverse value-added products.
Journal Article
A Multivariate Poisson Deep Learning Model for Genomic Prediction of Count Data
by
Montesinos-López, Abelardo
,
Lozano-Ramirez, Nerida
,
Montesinos-López, Osval Antonio
in
Deep learning
,
Neural networks
2020
The paradigm called genomic selection (GS) is a revolutionary way of developing new plants and animals. This is a predictive methodology, since it uses learning methods to perform its task. Unfortunately, there is no universal model that can be used for all types of predictions; for this reason, specific methodologies are required for each type of output (response variables). Since there is a lack of efficient methodologies for multivariate count data outcomes, in this paper, a multivariate Poisson deep neural network (MPDN) model is proposed for the genomic prediction of various count outcomes simultaneously. The MPDN model uses the minus log-likelihood of a Poisson distribution as a loss function, in hidden layers for capturing nonlinear patterns using the rectified linear unit (RELU) activation function and, in the output layer, the exponential activation function was used for producing outputs on the same scale of counts. The proposed MPDN model was compared to conventional generalized Poisson regression models and univariate Poisson deep learning models in two experimental data sets of count data. We found that the proposed MPDL outperformed univariate Poisson deep neural network models, but did not outperform, in terms of prediction, the univariate generalized Poisson regression models. All deep learning models were implemented in Tensorflow as back-end and Keras as front-end, which allows implementing these models on moderate and large data sets, which is a significant advantage over previous GS models for multivariate count data.
Journal Article
A marker weighting approach for enhancing within-family accuracy in genomic prediction
2024
Genomic selection is revolutionizing plant breeding. However, its practical implementation is still very challenging, since predicted values do not necessarily have high correspondence to the observed phenotypic values. When the goal is to predict within-family, it is not always possible to obtain reasonable accuracies, which is of paramount importance to improve the selection process. For this reason, in this research, we propose the Adversaria-Boruta (AB) method, which combines the virtues of the adversarial validation (AV) method and the Boruta feature selection method. The AB method operates primarily by minimizing the disparity between training and testing distributions. This is accomplished by reducing the weight assigned to markers that display the most significant differences between the training and testing sets. Therefore, the AB method built a weighted genomic relationship matrix that is implemented with the genomic best linear unbiased predictor (GBLUP) model. The proposed AB method is compared using 12 real data sets with the GBLUP model that uses a nonweighted genomic relationship matrix. Our results show that the proposed AB method outperforms the GBLUP by 8.6, 19.7, and 9.8% in terms of Pearson’s correlation, mean square error, and normalized root mean square error, respectively. Our results support that the proposed AB method is a useful tool to improve the prediction accuracy of a complete family, however, we encourage other investigators to evaluate the AB method to increase the empirical evidence of its potential.
Journal Article