Catalogue Search | MBRL

How array design creates SNP ascertainment bias

by Weigend, Annett , Simianer, Henner , Pook, Torsten in Analysis , Animal sciences , Animal welfare

2021

Single nucleotide polymorphisms (SNPs), genotyped with arrays, have become a widely used marker type in population genetic analyses over the last 10 years. However, compared to whole genome re-sequencing data, arrays are known to lack a substantial proportion of globally rare variants and tend to be biased towards variants present in populations involved in the development process of the respective array. This affects population genetic estimators and is known as SNP ascertainment bias. We investigated factors contributing to ascertainment bias in array development by redesigning the Axiom ™ Genome-Wide Chicken Array in silico and evaluating changes in allele frequency spectra and heterozygosity estimates in a stepwise manner. A sequential reduction of rare alleles during the development process was shown. This was mainly caused by the identification of SNPs in a limited set of populations and a within-population selection of common SNPs when aiming for equidistant spacing. These effects were shown to be less severe with a larger discovery panel. Additionally, a generally massive overestimation of expected heterozygosity for the ascertained SNP sets was shown. This overestimation was 24% higher for populations involved in the discovery process than not involved populations in case of the original array. The same was observed after the SNP discovery step in the redesign. However, an unequal contribution of populations during the SNP selection can mask this effect but also adds uncertainty. Finally, we make suggestions for the design of specialized arrays for large scale projects where whole genome re-sequencing techniques are still too expensive.

Journal Article

Share this book

Add to My Shelf

MoBPS - Modular Breeding Program Simulator

by Schlather, Martin , Pook, Torsten , Simianer, Henner in Animal sciences , Breeding of animals , Cell division

2020

The R-package MoBPS provides a computationally efficient and flexible framework to simulate complex breeding programs and compare their economic and genetic impact. Simulations are performed on the base of individuals. MoBPS utilizes a highly efficient implementation with bit-wise data storage and matrix multiplications from the associated R-package miraculix allowing to handle large scale populations. Individual haplotypes are not stored but instead automatically derived based on points of recombination and mutations. The modular structure of MoBPS allows to combine rather coarse simulations, as needed to generate founder populations, with a very detailed modeling of todays’ complex breeding programs, making use of all available biotechnologies. MoBPS provides pre-implemented functions for common breeding practices such as optimum genetic contributions and single-step GBLUP but also allows the user to replace certain steps with personalized and/or self-written solutions.

Journal Article

Share this book

Add to My Shelf

Analysis of different genotyping and selection strategies in laying hen breeding programs

by Pook, Torsten , Büttgen, Lisa , Simianer, Henner in Accuracy , Agriculture , Animal breeding

2025

Background Genomic selection has become an integral component of modern animal breeding programs, having the potential to improve the efficiency of layer breeding programs both by obtaining higher prediction accuracies and reducing the generation interval, particularly for males, who cannot be phenotyped for sex-limited traits such as laying performance. In the current study, we investigate different strategies to reduce the generation interval either for both sexes or only for the male side of the breeding scheme based on stochastic simulation using the software MoBPS. Additionally, prediction accuracies based on varying proportions of genotyping and phenotype- and pedigree-based selection as well as genomic breeding values are compared. Results Selection of hens based on estimated breeding values, either pedigree-based or genomic, increased genetic gain compared to selection based on phenotypes only. The use of two time-shifted subpopulations with exchange of males between subpopulations to reduce the generation interval on the male side led to significantly higher genetic gains. Reducing the generation interval for both males and females was only efficient when population sizes were maintained, which result in doubling of the number of females to genotype and phenotype within the same time frame compared to the scenarios with the longer generation intervals. Although substantially higher gains were obtained by in particular pedigree-based selection of females and a reduction of generation intervals this led to substantially greater rates of inbreeding per year. The use of a genomic relationship matrix in breeding value estimation instead of a pedigree-based relationship matrix not only increased genetic gains but also reduced inbreeding rates. The use of optimum contribution selection led to basically the same genetic gains as without it but reduced inbreeding rates. However, overall differences obtained with optimal contribution selection were small compared to differences caused by the other effects that were considered. Conclusions The reduction of the generation interval on the male side by the use of genomic estimated breeding values was highly beneficial. Reduction of the generation interval on the female side was only beneficial when a high proportion of hens was genotyped and housing capacities were increased. On the female side of a layer breeding program, selection based on pedigree-based estimated breeding values was inferior to phenotypic selection, as it resulted in a substantial increase in inbreeding rates.

Journal Article

Share this book

Add to My Shelf

Newly Developed MAGIC Population Allows Identification of Strong Associations and Candidate Genes for Anthocyanin Pigmentation in Eggplant

by Arrones, Andrea , Plazas, Mariola , Gramazio, Pietro in Abiotic stress , Anthocyanins , Biosynthesis

2022

Multi-parent advanced generation inter-cross (MAGIC) populations facilitate the genetic dissection of complex quantitative traits in plants and are valuable breeding materials. We report the development of the first eggplant MAGIC population (S3 Magic EGGplant InCanum, S3MEGGIC; 8-way), constituted by the 420 S3 individuals developed from the intercrossing of seven cultivated eggplant ( Solanum melongena ) and one wild relative ( S. incanum ) parents. The S3MEGGIC recombinant population was genotyped with the eggplant 5k probes SPET platform and phenotyped for anthocyanin presence in vegetative plant tissues (PA) and fruit epidermis (FA), and for the light-insensitive anthocyanic pigmentation under the calyx (PUC). The 7,724 filtered high-confidence single-nucleotide polymorphisms (SNPs) confirmed a low residual heterozygosity (6.87%), a lack of genetic structure in the S3MEGGIC population, and no differentiation among subpopulations carrying a cultivated or wild cytoplasm. Inference of haplotype blocks of the nuclear genome revealed an unbalanced representation of the founder genomes, suggesting a cryptic selection in favour or against specific parental genomes. Genome-wide association study (GWAS) analysis for PA, FA, and PUC detected strong associations with two myeloblastosis (MYB) genes similar to MYB113 involved in the anthocyanin biosynthesis pathway, and with a COP1 gene which encodes for a photo-regulatory protein and may be responsible for the PUC trait. Evidence was found of a duplication of an ancestral MYB113 gene with a translocation from chromosome 10 to chromosome 1 compared with the tomato genome. Parental genotypes for the three genes were in agreement with the identification of the candidate genes performed in the S3MEGGIC population. Our new eggplant MAGIC population is the largest recombinant population in eggplant and is a powerful tool for eggplant genetics and breeding studies.

Journal Article

Share this book

Add to My Shelf

Improving Imputation Quality in BEAGLE for Crop and Livestock Data

by Mayer, Manfred , Simianer, Henner , Cavero, David in Genetic diversity

2020

Imputation is one of the key steps in the preprocessing and quality control protocol of any genetic study. Most imputation algorithms were originally developed for the use in human genetics and thus are optimized for a high level of genetic diversity. Different versions of BEAGLE were evaluated on genetic datasets of doubled haploids of two European maize landraces, a commercial breeding line and a diversity panel in chicken, respectively, with different levels of genetic diversity and structure which can be taken into account in BEAGLE by parameter tuning. Especially for phasing BEAGLE 5.0 outperformed the newest version (5.1) which in turn also lead to improved imputation. Earlier versions were far more dependent on the adaption of parameters in all our tests. For all versions, the parameter ne (effective population size) had a major effect on the error rate for imputation of ungenotyped markers, reducing error rates by up to 98.5%. Further improvement was obtained by tuning of the parameters affecting the structure of the haplotype cluster that is used to initialize the underlying Hidden Markov Model of BEAGLE. The number of markers with extremely high error rates for the maize datasets were more than halved by the use of a flint reference genome (F7, PE0075 etc.) instead of the commonly used B73. On average, error rates for imputation of ungenotyped markers were reduced by 8.5% by excluding genetically distant individuals from the reference panel for the chicken diversity panel. To optimize imputation accuracy one has to find a balance between representing as much of the genetic diversity as possible while avoiding the introduction of noise by including genetically distant individuals.

Journal Article

Share this book

Add to My Shelf

HaploBlocker: Creation of Subgroup-Specific Haplotype Blocks and Libraries

by Schlather, Martin , de los Campos, Gustavo , Mayer, Manfred in Algorithms , Animals , Computational Biology

2019

The concept of haplotype blocks has been shown to be useful in genetics. Fields of application range from the detection of regions under positive selection to statistical methods that make use of dimension reduction... The concept of haplotype blocks has been shown to be useful in genetics. Fields of application range from the detection of regions under positive selection to statistical methods that make use of dimension reduction. We propose a novel approach (“HaploBlocker”) for defining and inferring haplotype blocks that focuses on linkage instead of the commonly used population-wide measures of linkage disequilibrium. We define a haplotype block as a sequence of genetic markers that has a predefined minimum frequency in the population, and only haplotypes with a similar sequence of markers are considered to carry that block, effectively screening a dataset for group-wise identity-by-descent. From these haplotype blocks, we construct a haplotype library that represents a large proportion of genetic variability with a limited number of blocks. Our method is implemented in the associated R-package HaploBlocker, and provides flexibility not only to optimize the structure of the obtained haplotype library for subsequent analyses, but also to handle datasets of different marker density and genetic diversity. By using haplotype blocks instead of single nucleotide polymorphisms (SNPs), local epistatic interactions can be naturally modeled, and the reduced number of parameters enables a wide variety of new methods for further genomic analyses such as genomic prediction and the detection of selection signatures. We illustrate our methodology with a dataset comprising 501 doubled haploid lines in a European maize landrace genotyped at 501,124 SNPs. With the suggested approach, we identified 2991 haplotype blocks with an average length of 2685 SNPs that together represent 94% of the dataset.

Journal Article

Share this book

Add to My Shelf

How imputation can mitigate SNP ascertainment Bias

by Weigend, Annett , Simianer, Henner , Pook, Torsten in Analysis , Animal Genetics and Genomics , Animals

2021

Background Population genetic studies based on genotyped single nucleotide polymorphisms (SNPs) are influenced by a non-random selection of the SNPs included in the used genotyping arrays. The resulting bias in the estimation of allele frequency spectra and population genetics parameters like heterozygosity and genetic distances relative to whole genome sequencing (WGS) data is known as SNP ascertainment bias. Full correction for this bias requires detailed knowledge of the array design process, which is often not available in practice. This study suggests an alternative approach to mitigate ascertainment bias of a large set of genotyped individuals by using information of a small set of sequenced individuals via imputation without the need for prior knowledge on the array design. Results The strategy was first tested by simulating additional ascertainment bias with a set of 1566 chickens from 74 populations that were genotyped for the positions of the Affymetrix Axiom™ 580 k Genome-Wide Chicken Array. Imputation accuracy was shown to be consistently higher for populations used for SNP discovery during the simulated array design process. Reference sets of at least one individual per population in the study set led to a strong correction of ascertainment bias for estimates of expected and observed heterozygosity, Wright’s Fixation Index and Nei’s Standard Genetic Distance. In contrast, unbalanced reference sets (overrepresentation of populations compared to the study set) introduced a new bias towards the reference populations. Finally, the array genotypes were imputed to WGS by utilization of reference sets of 74 individuals (one per population) to 98 individuals (additional commercial chickens) and compared with a mixture of individually and pooled sequenced populations. The imputation reduced the slope between heterozygosity estimates of array data and WGS data from 1.94 to 1.26 when using the smaller balanced reference panel and to 1.44 when using the larger but unbalanced reference panel. This generally supported the results from simulation but was less favorable, advocating for a larger reference panel when imputing to WGS. Conclusions The results highlight the potential of using imputation for mitigation of SNP ascertainment bias but also underline the need for unbiased reference sets.

Journal Article

Share this book

Add to My Shelf

Development and validation of a horse reference panel for genotype imputation

by Pook, Torsten , Falker-Gieske, Clemens , Reich, Paula in Accuracy , Agriculture , Analysis

2022

Background Genotype imputation is a cost-effective method to generate sequence-level genotypes for a large number of animals. Its application can improve the power of genomic studies, provided that the accuracy of imputation is sufficiently high. The purpose of this study was to develop an optimal strategy for genotype imputation from genotyping array data to sequence level in German warmblood horses, and to investigate the effect of different factors on the accuracy of imputation. Publicly available whole-genome sequence data from 317 horses of 46 breeds was used to conduct the analyses. Results Depending on the size and composition of the reference panel, the accuracy of imputation from medium marker density (60K) to sequence level using the software Beagle 5.1 ranged from 0.64 to 0.70 for horse chromosome 3. Generally, imputation accuracy increased as the size of the reference panel increased, but if genetically distant individuals were included in the panel, the accuracy dropped. Imputation was most precise when using a reference panel of multiple but related breeds and the software Beagle 5.1, which outperformed the other two tested computer programs, Impute 5 and Minimac 4. Genome-wide imputation for this scenario resulted in a mean accuracy of 0.66. Stepwise imputation from 60K to 670K markers and subsequently to sequence level did not improve the accuracy of imputation. However, imputation from higher density (670K) was considerably more accurate (about 0.90) than from medium density. Likewise, imputation in genomic regions with a low marker coverage resulted in a reduced accuracy of imputation. Conclusions The accuracy of imputation in horses was influenced by the size and composition of the reference panel, the marker density of the genotyping array, and the imputation software. Genotype imputation can be used to extend the limited amount of available sequence-level data from horses in order to boost the power of downstream analyses, such as genome-wide association studies, or the detection of embryonic lethal variants.

Journal Article

Share this book

Add to My Shelf

Genomic prediction using information across years with epistatic models and dimension reduction via haplotype blocks

by Vojgani, Elaheh , Mayer, Manfred , Simianer, Henner in Accuracy , Analysis , Biology and Life Sciences

2023

The importance of accurate genomic prediction of phenotypes in plant breeding is undeniable, as higher prediction accuracy can increase selection responses. In this regard, epistasis models have shown to be capable of increasing the prediction accuracy while their high computational load is challenging. In this study, we investigated the predictive ability obtained in additive and epistasis models when utilizing haplotype blocks versus pruned sets of SNPs by including phenotypic information from the last growing season. This was done by considering a single biological trait in two growing seasons (2017 and 2018) as separate traits in a multi-trait model. Thus, bivariate variants of the Genomic Best Linear Unbiased Prediction (GBLUP) as an additive model, Epistatic Random Regression BLUP (ERRBLUP) and selective Epistatic Random Regression BLUP (sERRBLUP) as epistasis models were compared with respect to their prediction accuracies for the second year. The prediction accuracies of bivariate GBLUP, ERRBLUP and sERRBLUP were assessed with eight phenotypic traits for 471/402 doubled haploid lines in the European maize landrace Kemater Landmais Gelb/Petkuser Ferdinand Rot. The results indicate that the obtained prediction accuracies are similar when utilizing a pruned set of SNPs or haplotype blocks, while utilizing haplotype blocks reduces the computational load significantly compared to the pruned sets of SNPs. The number of interactions considered in the model was reduced from 323.5/456.4 million for the pruned SNP panel to 4.4/5.5 million in the haplotype block dataset for Kemater and Petkuser landraces, respectively. Since the computational load scales linearly with the number of parameters in the model, this leads to a reduction in computational time of 98.9% from 13.5 hours for the pruned set of markers to 9 minutes for the haplotype block dataset. We further investigated the impact of genomic correlation, phenotypic correlation and trait heritability as factors affecting the bivariate models’ prediction accuracy, identifying the genomic correlation between years as the most influential one. As computational load is substantially reduced, while the accuracy of genomic prediction is unchanged, the here proposed framework to use haplotype blocks in sERRBLUP provided a solution for the practical implementation of sERRBLUP in real breeding programs. Furthermore, our results indicate that sERRBLUP is not only suitable for prediction across different locations, but also for the prediction across growing seasons.

Journal Article

Share this book

Add to My Shelf

Imputation of low‐density marker chip data in plant breeding: Evaluation of methods based on sugar beet

by Beissinger, Timothy , Gholami, Mahmood , Niehoff, Tobias in Accuracy , Animals , Beagle

2022

Low‐density genotyping followed by imputation reduces genotyping costs while still providing high‐density marker information. An increased marker density has the potential to improve the outcome of all applications that are based on genomic data. This study investigates techniques for 1k to 20k genomic marker imputation for plant breeding programs with sugar beet (Beta vulgaris L. ssp. vulgaris) as an example crop, where these are realistic marker numbers for modern breeding applications. The generally accepted ‘gold standard’ for imputation, Beagle 5.1, was compared with the recently developed software AlphaPlantImpute2 which is designed specifically for plant breeding. For Beagle 5.1 and AlphaPlantImpute2, the imputation strategy as well as the imputation parameters were optimized in this study. We found that the imputation accuracy of Beagle could be tremendously improved (0.22 to 0.67) by tuning parameters, mainly by lowering the values for the parameter for the effective population size and increasing the number of iterations performed. Separating the phasing and imputation steps also improved accuracies when optimized parameters were used (0.67 to 0.82). We also found that the imputation accuracy of Beagle decreased when more low‐density lines were included for imputation. AlphaPlantImpute2 produced very high accuracies without optimization (0.89) and was generally less responsive to optimization. Overall, AlphaPlantImpute2 performed relatively better for imputation whereas Beagle was better for phasing. Combining both tools yielded the highest accuracies. Core Ideas Beagle is sensitive to parameter tuning. Best imputation accuracies could be achieved by using a combination of Beagle and AlphaPlantImpute2. The population structure influenced imputation accuracy.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter