Catalogue Search | MBRL

Technical note: Impact of pedigree depth on convergence of single-step genomic BLUP in a purebred swine population

by Lourenco, D A L , Bradford, H L , Chen, C Y in Algorithms , Animals , Genome - genetics

2017

In genomic evaluations, it is desirable to have low computing cost while retaining high accuracy of evaluation for young animals. When the population is large but only few animals have phenotypes, especially for low heritability traits, the convergence rate of BLUP or single-step genomic BLUP (ssGBLUP) can be very slow. This study investigates the effect of pedigree truncation on convergence rate and solutions of ssGBLUP for data exhibiting slow convergence. The data consisted of 216,000, 221,000, 732,000, and 579,000 phenotypes on 4 traits. Heritabilities were less than 0.1 for 2 traits and greater than 0.2 for the other 2 traits. The full pedigree consisted of 2.4 million animals. Genotypes were available for 33,000 animals and consisted of 60,000 SNP. Two bivariate animal models were fit using pedigree-based BLUP or ssGBLUP. Either a regular or the algorithm for proven and young (APY) inverse was used for the genomic relationship matrix. Different pedigree depths were analyzed including full pedigree and 1 to 5 ancestral generations. Pedigree depths were defined as n ancestral generations for animals with phenotypes. The number of animals in the reduced pedigrees varied from 226,000 and 760,000 for 1 generation to 228,000 and 767,000 for 5 generations. Genomic EBV (GEBV) for genotyped animals had correlations greater than 0.99 between runs with the full and reduced pedigrees with 2 to 5 generations. A single generation of pedigree was not sufficient to obtain the same GEBV as full pedigree. The convergence rate was the worst with the full pedigree and generally improved with reduced pedigrees. Using ssGBLUP with the APY inverse improved convergence without affecting accuracy. Reducing pedigrees and the APY are important tools to reduce the computational cost in the implementation of ssGBLUP.

Journal Article

Share this book

Add to My Shelf

Multi-trait and multi-environment genomic prediction enhances yield components improvement in durum wheat

by Puglisi, Damiano , Vitale, Paolo , Crossa, José in gene-based relationship matrix , genomic selection , GxE interaction

2026

Durum wheat [ Triticum turgidum L. ssp. durum (Desf.) Husn.] is a staple crop for the pasta and semolina industries, particularly in Mediterranean and semi-arid regions where climate variability poses major challenges to yield stability. This study evaluates the performance of single-environment (SE), multi-trait (MT), multi-environment (ME), and multi-trait–multi-environment (MTME) genomic prediction models across seven key traits, such as grain number per spike, grain weight per spike, number of spikelets per spike, spike length, spike weight, heading date, and plant height. Using genomic (G) and target gene-based (G2) relationship matrices with two cross-validation scenarios (CV1 and CV2), MTME models achieved the highest prediction accuracies, particularly under CV2 and sowing-by-season grouping. Modeling G2 information improved predictions for morpho-phenological traits (i.e. heading date and plant height), confirming the utility of functional allele data for capturing gene effects. MTME models effectively leveraged inter-trait and inter-environment covariance, providing biologically realistic predictions of genotype performance across simulated Mediterranean environments. These findings establish MTME genomic prediction as a powerful and scalable framework for climate-resilient durum wheat improvement, supporting predictive and data-driven breeding pipelines aimed at enhancing genetic gain and stability across years and environments.

Journal Article

Share this book

Add to My Shelf

Fast computation of the eigensystem of genomic similarity matrices

by Silverman, Edwin K. , Prokopenko, Dmitry , Lange, Christoph in Accuracy , Algebra , Algorithms

2024

The computation of a similarity measure for genomic data is a standard tool in computational genetics. The principal components of such matrices are routinely used to correct for biases due to confounding by population stratification, for instance in linear regressions. However, the calculation of both a similarity matrix and its singular value decomposition (SVD) are computationally intensive. The contribution of this article is threefold. First, we demonstrate that the calculation of three matrices (called the covariance matrix, the weighted Jaccard matrix, and the genomic relationship matrix) can be reformulated in a unified way which allows for the application of a randomized SVD algorithm, which is faster than the traditional computation. The fast SVD algorithm we present is adapted from an existing randomized SVD algorithm and ensures that all computations are carried out in sparse matrix algebra. The algorithm only assumes that row-wise and column-wise subtraction and multiplication of a vector with a sparse matrix is available, an operation that is efficiently implemented in common sparse matrix packages. An exception is the so-called Jaccard matrix, which does not have a structure applicable for the fast SVD algorithm. Second, an approximate Jaccard matrix is introduced to which the fast SVD computation is applicable. Third, we establish guaranteed theoretical bounds on the accuracy (in L 2 norm and angle) between the principal components of the Jaccard matrix and the ones of our proposed approximation, thus putting the proposed Jaccard approximation on a solid mathematical foundation, and derive the theoretical runtime of our algorithm. We illustrate that the approximation error is low in practice and empirically verify the theoretical runtime scalings on both simulated data and data of the 1000 Genome Project.

Journal Article

Share this book

Add to My Shelf

The Effect of Linkage Disequilibrium and Family Relationships on the Reliability of Genomic Prediction

by Calus, Mario P L , Wientjes, Yvonne C J , Veerkamp, Roel F in accuracy , angus cattle , Animals

2013

Although the concept of genomic selection relies on linkage disequilibrium (LD) between quantitative trait loci and markers, reliability of genomic predictions is strongly influenced by family relationships. In this study, we investigated the effects of LD and family relationships on reliability of genomic predictions and the potential of deterministic formulas to predict reliability using population parameters in populations with complex family structures. Five groups of selection candidates were simulated by taking different information sources from the reference population into account: (1) allele frequencies, (2) LD pattern, (3) haplotypes, (4) haploid chromosomes, and (5) individuals from the reference population, thereby having real family relationships with reference individuals. Reliabilities were predicted using genomic relationships among 529 reference individuals and their relationships with selection candidates and with a deterministic formula where the number of effective chromosome segments (Me) was estimated based on genomic and additive relationship matrices for each scenario. At a heritability of 0.6, reliabilities based on genomic relationships were 0.002 ± 0.0001 (allele frequencies), 0.022 ± 0.001 (LD pattern), 0.018 ± 0.001 (haplotypes), 0.100 ± 0.008 (haploid chromosomes), and 0.318 ± 0.077 (family relationships). At a heritability of 0.1, relative differences among groups were similar. For all scenarios, reliabilities were similar to predictions with a deterministic formula using estimated Me. So, reliabilities can be predicted accurately using empirically estimated Me and level of relationship with reference individuals has a much higher effect on the reliability than linkage disequilibrium per se. Furthermore, accumulated length of shared haplotypes is more important in determining the reliability of genomic prediction than the individual shared haplotype length.

Journal Article

Share this book

Add to My Shelf

snpReady: a tool to assist breeders in genomic analysis

by Fritsche-Neto, Roberto , Galli, Giovanni , de Oliveira Couto, Evellyn Giselly in Biomedical and Life Sciences , Biotechnology , computer software

2018

The snpReady R package is a new instrument developed to help breeders in genomic projects such as genomic prediction and association studies. This package offers three different methods to build the genomic relationship matrix, a new imputation method for missing markers based on Wright’s theory, and a population genetic overview. Therefore, we implemented three functions ( raw.data , G.matrix , and popgen ). Hence, this tool allows the raw data to be transformed from different genotyping platforms to numeric matrices and performs quality control (missing data and allele frequency). Moreover, the package generates and exports four different relationship matrices (proposed by Yang et al. (N 569:565–569, 2010), VanRaden (JDS 91:4414–23, 2008), and the Gaussian kernel) depending on the purpose and software to be used in further analysis. Finally, based on the genotypic matrix, the package estimates the genetic variability, effective population size, and endogamy, among other population genetic parameters. Empirical comparisons between the method of imputation proposed and other well-known approaches have shown a lower accuracy of imputation, however, with no significant impact on the genome prediction accuracies when a lower amount of missing data is allowed. The functions and arguments were designed to carry out the preparation of genomic datasets in a straightforward, fast, and more computationally efficient way. The package and its details are available at CRAN or http://www.github.com/italo-granato/snpReady .

Journal Article

Share this book

Add to My Shelf

Inexpensive Computation of the Inverse of the Genomic Relationship Matrix in Populations with Small Effective Population Size

by Misztal, Ignacy in Algorithms , Animals , Breeding

2016

Many computations with SNP data including genomic evaluation, parameter estimation, and genome-wide association studies use an inverse of the genomic relationship matrix. The cost of a regular inversion is cubic and is prohibitively expensive for large matrices. Recent studies in cattle demonstrated that the inverse can be computed in almost linear time by recursion on any subset of ∼10,000 individuals. The purpose of this study is to present a theory of why such a recursion works and its implication for other populations. Assume that, because of a small effective population size, the additive information in a genotyped population has a small dimensionality, even with a very large number of SNP markers. That dimensionality is visible as a limited number of effective SNP effects, independent chromosome segments, or the rank of the genomic relationship matrix. Decompose a population arbitrarily into core and noncore individuals, with the number of core individuals equal to that dimensionality. Then, breeding values of noncore individuals can be derived by recursions on breeding values of core individuals, with coefficients of the recursion computed from the genomic relationship matrix. A resulting algorithm for the inversion called “algorithm for proven and young” (APY) has a linear computing and memory cost for noncore animals. Noninfinitesimal genetic architecture can be accommodated through a trait-specific genomic relationship matrix, possibly derived from Bayesian regressions. For populations with small effective population size, the inverse of the genomic relationship matrix can be computed inexpensively for a very large number of genotyped individuals.

Journal Article

Share this book

Add to My Shelf

Efficient Methods to Compute Genomic Predictions

by VanRaden, P.M. in alleles , animal breeding , Animal productions

2008

Efficient methods for processing genomic data were developed to increase reliability of estimated breeding values and to estimate thousands of marker effects simultaneously. Algorithms were derived and computer programs tested with simulated data for 2,967 bulls and 50,000 markers distributed randomly across 30 chromosomes. Estimation of genomic inbreeding coefficients required accurate estimates of allele frequencies in the base population. Linear model predictions of breeding values were computed by 3 equivalent methods: 1) iteration for individual allele effects followed by summation across loci to obtain estimated breeding values, 2) selection index including a genomic relationship matrix, and 3) mixed model equations including the inverse of genomic relationships. A blend of first- and second-order Jacobi iteration using 2 separate relaxation factors converged well for allele frequencies and effects. Reliability of predicted net merit for young bulls was 63% compared with 32% using the traditional relationship matrix. Nonlinear predictions were also computed using iteration on data and nonlinear regression on marker deviations; an additional (about 3%) gain in reliability for young bulls increased average reliability to 66%. Computing times increased linearly with number of genotypes. Estimation of allele frequencies required 2 processor days, and genomic predictions required <1 d per trait, and traits were processed in parallel. Information from genotyping was equivalent to about 20 daughters with phenotypic records. Actual gains may differ because the simulation did not account for linkage disequilibrium in the base population or selection in subsequent generations.

Journal Article

Share this book

Add to My Shelf

Comparison of models for missing pedigree in single-step genomic prediction

by Ignacy Misztal , Matias Bermann , Yutaka Masuda in Animal breeding , Animal Genetics and Genomics , Animals

2021

Abstract Pedigree information is often missing for some animals in a breeding program. Unknown-parent groups (UPGs) are assigned to the missing parents to avoid biased genetic evaluations. Although the use of UPGs is well established for the pedigree model, it is unclear how UPGs are integrated into the inverse of the unified relationship matrix (H-inverse) required for single-step genomic best linear unbiased prediction. A generalization of the UPG model is the metafounder (MF) model. The objectives of this study were to derive 3 H-inverses and to compare genetic trends among models with UPG and MF H-inverses using a simulated purebred population. All inverses were derived using the joint density function of the random breeding values and genetic groups. The breeding values of genotyped animals (u2) were assumed to be adjusted for UPG effects (g) using matrix Q2 as u2∗=u2+Q2g before incorporating genomic information. The Quaas–Pollak-transformed (QP) H-inverse was derived using a joint density function of u2∗ and g updated with genomic information and assuming nonzero cov(u2∗,g′). The modified QP (altered) H-inverse also assumes that the genomic information updates u2∗ and g, but cov(u2∗,g′)=0. The UPG-encapsulated (EUPG) H-inverse assumed genomic information updates the distribution of u2∗. The EUPG H-inverse had the same structure as the MF H-inverse. Fifty percent of the genotyped females in the simulation had a missing dam, and missing parents were replaced with UPGs by generation. The simulation study indicated that u2∗ and g in models using the QP and altered H-inverses may be inseparable leading to potential biases in genetic trends. Models using the EUPG and MF H-inverses showed no genetic trend biases. These 2 H-inverses yielded the same genomic EBV (GEBV). The predictive ability and inflation of GEBVs from young genotyped animals were nearly identical among models using the QP, altered, EUPG, and MF H-inverses. Although the choice of H-inverse in real applications with enough data may not result in biased genetic trends, the EUPG and MF H-inverses are to be preferred because of theoretical justification and possibility to reduce biases.

Journal Article

Share this book

Add to My Shelf

The Dimensionality of Genomic Information and Its Effect on Genomic Prediction

by Legarra, Andres , Masuda, Yutaka , Pocrnic, Ivan in Accuracy , Agriculture , Animals

2016

The genomic relationship matrix (GRM) can be inverted by the algorithm for proven and young (APY) based on recursion on a random subset of animals. While a regular inverse has a cubic cost, the cost of the APY inverse can be close to linear. Theory for the APY assumes that the optimal size of the subset (maximizing accuracy of genomic predictions) is due to a limited dimensionality of the GRM, which is a function of the effective population size (Ne). The objective of this study was to evaluate these assumptions by simulation. Six populations were simulated with approximate effective population size (Ne) from 20 to 200. Each population consisted of 10 nonoverlapping generations, with 25,000 animals per generation and phenotypes available for generations 1–9. The last 3 generations were fully genotyped assuming genome length L = 30. The GRM was constructed for each population and analyzed for distribution of eigenvalues. Genomic estimated breeding values (GEBV) were computed by single-step GBLUP, using either a direct or an APY inverse of GRM. The sizes of the subset in APY were set to the number of the largest eigenvalues explaining x% of variation (EIGx, x = 90, 95, 98, 99) in GRM. Accuracies of GEBV for the last generation with the APY inverse peaked at EIG98 and were slightly lower with EIG95, EIG99, or the direct inverse. Most information in the GRM is contained in ∼NeL largest eigenvalues, with no information beyond 4NeL. Genomic predictions with the APY inverse of the GRM are more accurate than by the regular inverse.

Journal Article

Share this book

Add to My Shelf

A relationship matrix including full pedigree and genomic information

by Legarra, A , Aguilar, I , Misztal, I in Agricultural sciences , Animal productions , Animals

2009

Dense molecular markers are being used in genetic evaluation for parts of the population. This requires a two-step procedure where pseudo-data (for instance, daughter yield deviations) are computed from full records and pedigree data and later used for genomic evaluation. This results in bias and loss of information. One way to incorporate the genomic information into a full genetic evaluation is by modifying the numerator relationship matrix. A naive proposal is to substitute the relationships of genotyped animals with the genomic relationship matrix. However, this results in incoherencies because the genomic relationship matrix includes information on relationships among ancestors and descendants. In other words, using the pedigree-derived covariance between genotyped and ungenotyped individuals, with the pretense that genomic information does not exist, leads to inconsistencies. It is proposed to condition the genetic value of ungenotyped animals on the genetic value of genotyped animals via the selection index (e.g., pedigree information), and then use the genomic relationship matrix for the latter. This results in a joint distribution of genotyped and ungenotyped genetic values, with a pedigree-genomic relationship matrix H. In this matrix, genomic information is transmitted to the covariances among all ungenotyped individuals. The matrix is (semi)positive definite by construction, which is not the case for the naive approach. Numerical examples and alternative expressions are discussed. Matrix H is suitable for iteration on data algorithms that multiply a vector times a matrix, such as preconditioned conjugated gradients.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter