Catalogue Search | MBRL

Applications of random forest feature selection for fine‐scale genetic population assignment

by Sylvester, Emma V. A. , Horne, John , Bradbury, Ian R. in Accuracy , conservation genetics , Datasets

2018

Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine‐learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with FST ranking for selection of single nucleotide polymorphisms (SNP) for fine‐scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon (Salmo salar) and a published SNP data set for Alaskan Chinook salmon (Oncorhynchus tshawytscha). In each species, we identified the minimum panel size required to obtain a self‐assignment accuracy of at least 90% using each method to create panels of 50–700 markers Panels of SNPs identified using random forest‐based methods performed up to 7.8 and 11.2 percentage points better than FST‐selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self‐assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each data set, respectively, a level of accuracy never reached for these species using FST‐selected panels. Our results demonstrate a role for machine‐learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations.

Journal Article

Share this book

Add to My Shelf

Variable selection strategies for genomic prediction of growth and carcass related traits in experimental Nellore cattle herds under different selection criteria

by Oliveira, Henrique N. , Valente, Júlia P. S. , Cyrillo, Joslaine N. S. G. in 631/208/1348 , 631/208/212 , 631/208/721

2025

Genomic selection (GS) has become a widely used tool in breeding programs, enhancing selection accuracy and leading to faster genetic progress. However, in small populations, GS faces challenges due to limited data and a large number of markers potentially leading to biased predictions. Implementing feature selection strategies is essential to improve prediction accuracy and avoid overfitting. Hence, we compared the predictive ability of genomic best linear unbiased prediction (GBLUP), Bayesian B (BayesB), and elastic net (ENet) models, using all markers and feature selection via GWAS and fixation index (FST) to reduce marker numbers, for growth and ultrasound carcass traits in three Nellore cattle populations differentially selected for yearling body weight (YBW). The populations evaluated included: Nellore Control (NeC), selected for YBW; Nellore Selection (NeS), selected for maximum YBW; and Nellore Traditional (NeT), selected for maximum YBW and lower residual feed intake (RFI) since 2013. Comparing the statistical approaches using GBLUP as the reference, ENet improved prediction accuracy by 10% for growth traits and 12% for carcass traits, while BayesB showed no improvement for growth traits but achieved a 3% gain for carcass traits. When comparing models using all markers to those with variable selection, both GWAS and FST improved prediction accuracy across models, with FST outperforming GWAS in stratified populations. A stricter GWAS threshold (> 1.0% explained variance), compared to a less conservative criterion (> 0.5%), reduced BayesB prediction accuracy (6.8%), while slightly increasing accuracy for GBLUP (1.3%) and ENet (2.4%). Similarly, a more restrictive FST threshold (> 0.2) against a less conservative (> 0.1) resulted in smaller gains for GBLUP (4%) and ENet (5%), but reduced BayesB accuracy (− 4%). Overall, selecting markers through GWAS and FST improves prediction accuracy for both growth and carcass traits, particularly in stratified populations. However, stricter thresholds can negatively impact accuracy, highlighting the need for optimized marker selection strategies.

Journal Article

Share this book

Add to My Shelf

Selection of SNP from 50K and 777K arrays to predict breed of origin in cattle

by Calus, M P L , Windig, J J , Hulsegge, B in Alleles , ancestry , Animals

2013

Reliable breed assignment can be performed with SNP. Currently, high density SNP chips are available with large numbers of SNP from which the most informative SNP can be selected for breed assignment. Several methods have been published to select the most informative SNP to distinguish among breeds. In this study, we evaluated Delta, Wright's FST, and Weir and Cockerham's FST, and extended these methods by adding a rule to avoid selection of sets of SNP in high linkage disequilibrium (LD) providing the same information. The SNP that had a r2 value>0.3 with any of the SNP already selected were discarded. The different selection methods were evaluated for both the 50K SNP and 777K Bovine BeadChip. Animals from 4 cattle breeds (989 Holstein Friesian, 97 Groningen White headed, 137 Meuse-Rhine-Yssel, and 64 Dutch Friesian) were genotyped. After editing 30,447 and 452,525 SNP were available for the 50K and 777K SNP chip, respectively. All selection methods showed that only a small set of SNP is needed to differentiate among the 4 Dutch cattle breeds, whereas comparison of the selection methods showed only small differences. In general, the 777K performed marginally better than the 50K BeadChip, especially at higher confidence thresholds. The rule to avoid selection of SNP in high LD reduced the required number of SNP to achieve correct breed assignment. The Global Weir and Cockerham's FST performed marginally better than other selection methods. There was little overlap in the SNP selected from the 2 BeadChips, whereas the number of SNP selected was about the same.

Journal Article

Share this book

Add to My Shelf

Association Analysis of SLC11A1 Polymorphisms with Somatic Cell Score in Chinese Holstein Cows

by Li, Qiuling , Li, Kaiyang , Yang, Yuze in Animal lactation , Bacterial infections , candidate genes

2025

Mastitis is an important disease limiting milk production in dairy cows. Somatic cell score is commonly used as one of the main ways to gauge the level of mastitis in dairy cows, with higher somatic cell scores usually indicating possible mastitis. However, the main molecular markers affecting somatic cell scores remain unknown. The aim of this study was to investigate the association between single nucleotide polymorphisms in the SLC11A1 gene and somatic cell score in Chinese Holstein cows. In this study, 210 Chinese Holstein cows were genotyped and potential SNPs were detected by DNA sequencing, PCR-SSCP and PCR-RFLP analysis. Our results revealed two SNPs were identified in the CDS region of SLC11A1: c.723C>T and c.1144C>G. For the c.723C>T polymorphic site, two genotypes (AA, AB) were found and the genotype frequencies were 0.790 and 0.210, respectively. The results of the association analysis showed that the mean somatic cell score of the AA genotypes were significantly lower than those of the AB genotypes, suggesting that the A allele is a potential marker for improving mastitis resistance in Chinese Holstein cows. For the c.1144C>G polymorphic site, three genotypes (CC, CD, and DD) were found and the genotype frequencies were 0.629, 0.352 and 0.019, respectively. The association analysis revealed that the mean somatic cell score of CC genotypes was lower than that of CD and DD genotypes, however, no significant differences were observed among the various genotype groups when subjected to pair-wise comparisons. The bioinformatic analysis showed that these mutations affected the secondary and tertiary structure of SLC11A1 mRNA, suggesting that they may affect gene expression or protein translation and function. Finally, we predicted the SLC11A1 protein interaction network and found that SPI1, NOD2, TLR2 and S100A12 interacted with SLC11A1 and were reported as candidate genes associated with mastitis resistance. The results indicated that the SNP (c.723C>T) could be potential molecular marker for improving mastitis resistance traits in Chinese Holstein cows. We recommend further validation of this SNP in larger populations and its potential integration into breeding programs to enhance mastitis resistance in dairy cows.

Journal Article

Share this book

Add to My Shelf

Identification of SNPs Associated with Somatic Cell Score in Candidate Genes in Italian Holstein Friesian Bulls

by Soglia, Dominga , Sartore, Stefano , Rasero, Roberto in antibiotic resistance , Antimicrobial agents , candidate genes

2021

Mastitis is an infectious disease affecting the mammary gland, leading to inflammatory reactions and to heavy economic losses due to milk production decrease. One possible way to tackle the antimicrobial resistance issue stemming from antimicrobial therapy is to select animals with a genetic resistance to this disease. Therefore, aim of this study was to analyze the genetic variability of the SNPs found in candidate genes related to mastitis resistance in Holstein Friesian bulls. Target regions were amplified, sequenced by Next-Generation Sequencing technology on the Illumina® MiSeq, and then analyzed to find correlation with mastitis related phenotypes in 95 Italian Holstein bulls chosen with the aid of a selective genotyping approach. On a total of 557 detected mutations, 61 showed different genotype distribution in the tails of the deregressed EBVs for SCS and 15 were identified as significantly associated with the phenotype using two different approaches. The significant SNPs were identified in intergenic or intronic regions of six genes, known to be key components in the immune system (namely CXCR1, DCK, NOD2, MBL2, MBL1 and M-SAA3.2). These SNPs could be considered as candidates for a future genetic selection for mastitis resistance, although further studies are required to assess their presence in other dairy cattle breeds and their possible negative correlation with other traits.

Journal Article

Share this book

Add to My Shelf

Bases to inform a genetic line of whiteleg shrimp Penaeus (Litopenaeus) vannamei of Mexican origin

by Llera-Herrera, Raúl , Perez-Enriquez, Ricardo , Magallón-Barajas, Francisco J. in Aquaculture , Breeding , Composition

2024

The whiteleg shrimp Penaeus (Litopenaeus) vannamei is one of the most relevant aquaculture species in Latin America and globally. Among several elements, the improvement of its production depends on the larval genetic quality produced in commercial hatcheries. A strategy for achievement is setting up a long-term management plan that includes the genetic settlement of a breeding population with broad genetic variability and reduced inbreeding levels and the design of adequate management and crossbreeding schemes. The settlement of the breeding population requires a detailed characterization of the genetic composition and diversity of the breeding line(s) that are being managed. The present study evaluated the genetic composition of six wild populations from the southern and northern coasts of the Mexican Pacific (Oaxaca, Guerrero, Nayarit, and Sinaloa) and 56 breeding lots maintained in commercial hatcheries. The genetic profiles of a low-density SNP marker panel (171 and 152 loci for the wild and hatchery-reared groups, respectively) were used to estimate genetic diversity and differentiation within and among samples. The wild population presented significant genetic differences between southern and northern Pacific locations. Although these populations showed higher diversity levels than the cultivated stocks, the genetic pool of the total 56 lots was highly variable with low inbreeding levels. The genetic characteristics of the analyzed populations and cultivated stocks warrant the constitution of a Mexican-origin breeding line with future potential for selection to the environmental conditions of the northwestern region of Mexico.

Journal Article

Share this book

Add to My Shelf

Discovery of SNP markers of red shrimp Aristeus antennatus for population structure in Western Mediterranean Sea

by Estonba, Andone , Trotta, Jean Remi , Grau, Antoni Maria in Animal Genetics and Genomics , Aristeus antennatus , Biodiversity

2021

Aristeus antennatus is one of the most exploited and economically important resource for fisheries in the Western and Central Mediterranean Sea displaying low population differentiation with mitochondrial and microsatellite markers. The recent development of Genotyping-by-Sequencing (GBS) methods may contribute to the discovery of SNPs and the assessment of genetic differences between populations of this species for fisheries management. Using samples from four geographical sites in Western Mediterranean and Eastern Atlantic, 115,071 putative SNPs were detected. After the stringent quality control measures and the filtering procedure, 232 SNP loci were discovered. Finally, we selected 80 SNP subset panel for Fluidigm Dynamic array application. The results showed significant differentiation among populations from the four sampling sites. Population assignment power and patterns of population differentiation were comparable between the two SNP panels. These markers represent a useful tool for future genetic application of A. antennatus populations.

Journal Article

Share this book

Add to My Shelf

Genomic breeding value prediction: methods and procedures

by Calus, M. P. L. in accuracy , Animal breeding , Animals

2010

Animal breeding faces one of the most significant changes of the past decades – the implementation of genomic selection. Genomic selection uses dense marker maps to predict the breeding value of animals with reported accuracies that are up to 0.31 higher than those of pedigree indexes, without the need to phenotype the animals themselves, or close relatives thereof. The basic principle is that because of the high marker density, each quantitative trait loci (QTL) is in linkage disequilibrium (LD) with at least one nearby marker. The process involves putting a reference population together of animals with known phenotypes and genotypes to estimate the marker effects. Marker effects have been estimated with several different methods that generally aim at reducing the dimensions of the marker data. Nearly all reported models only included additive effects. Once the marker effects are estimated, breeding values of young selection candidates can be predicted with reported accuracies up to 0.85. Although results from simulation studies suggest that different models may yield more accurate genomic estimated breeding values (GEBVs) for different traits, depending on the underlying QTL distribution of the trait, there is so far only little evidence from studies based on real data to support this. The accuracy of genomic predictions strongly depends on characteristics of the reference populations, such as number of animals, number of markers, and the heritability of the recorded phenotype. Another important factor is the relationship between animals in the reference population and the evaluated animals. The breakup of LD between markers and QTL across generations advocates frequent re-estimation of marker effects to maintain the accuracy of GEBVs at an acceptable level. Therefore, at low frequencies of re-estimating marker effects, it becomes more important that the model that estimates the marker effects capitalizes on LD information that is persistent across generations.

Journal Article

Share this book

Add to My Shelf

Genomic Selection Using Extreme Phenotypes and Pre-Selection of SNPs in Large Yellow Croaker (Larimichthys crocea)

by Wang, Zhiyong , Xiao, Shijun , Chen, Junwei in Animals , Biomedical and Life Sciences , Biotechnology

2016

Genomic selection (GS) is an effective method to improve predictive accuracies of genetic values. However, high cost in genotyping will limit the application of this technology in some species. Therefore, it is necessary to find some methods to reduce the genotyping costs in genomic selection. Large yellow croaker is one of the most commercially important marine fish species in southeast China and Eastern Asia. In this study, genotyping-by-sequencing was used to construct the libraries for the NGS sequencing and find 29,748 SNPs in the genome. Two traits, eviscerated weight (EW) and the ratio between eviscerated weight and whole body weight (REW), were chosen to study. Two strategies to reduce the costs were proposed as follows: selecting extreme phenotypes (EP) for genotyping in reference population or pre-selecting SNPs to construct low-density marker panels in candidates. Three methods of pre-selection of SNPs, i.e., pre-selecting SNPs by absolute effects (SE), by single marker analysis (SMA), and by fixed intervals of sequence number (EL), were studied. The results showed that using EP was a feasible method to save the genotyping costs in reference population. Heritability did not seem to have obvious influences on the predictive abilities estimated by EP. Using SMA was the most feasible method to save the genotyping costs in candidates. In addition, the combination of EP and SMA in genomic selection also showed good results, especially for trait of REW. We also described how to apply the new methods in genomic selection and compared the genotyping costs before and after using the new methods. Our study may not only offer a reference for aquatic genomic breeding but also offer a reference for genomic prediction in other species including livestock and plants, etc.

Journal Article

Share this book

Add to My Shelf

The Type I Diabetes Genetics Consortium ‘Rapid Response’ family-based candidate gene study: strategy, genes selection, and main outcome

by Nierras, C , Morahan, G , Julier, C in Biomedical and Life Sciences , Biomedicine , Cancer Research

2009

Candidate gene studies have long been the principal method for identification of susceptibility genes for type I diabetes (T1D), resulting in the discovery of HLA , INS , PTPN22 , CTLA4 , and IL2RA . However, many of the initial studies that relied on this strategy were largely underpowered, because of the limitations in genomic information and genotyping technology, as well as the limited size of available cohorts. The Type I Diabetes Genetic Consortium (T1DGC) has established resources to re-evaluate earlier reported genes associated with T1D, using its collection of 2298 Caucasian affected sib-pair families (with 11 159 individuals). A total of 382 single-nucleotide polymorphisms (SNPs) located in 21 T1D candidate genes were selected for this study and genotyped in duplicate on two platforms, Illumina and Sequenom. The genes were chosen based on published literature as having been either ‘confirmed’ (replicated) or not (candidates). This study showed several important features of genetic association studies. First, it showed the major impact of small rates of genotyping errors on association statistics. Second, it confirmed associations at INS, PTPN22, IL2RA, IFIH1 (earlier confirmed genes), and CTLA4 (earlier confirmed, with distinct SNPs) loci. Third, it did not find evidence for an association with T1D at SUMO4 , despite confirmed association in Asian populations, suggesting the potential for population-specific gene effects. Fourth, at PTPN22 , there was evidence for a novel contribution to T1D risk, independent of the replicated effect of the R620W variant. Fifth, among the candidate genes selected for replication, the association of TCF7 -P19T with T1D was newly replicated in this study. In summary, this study was able to replicate some genetic effects, reject others, and provide suggestions of association with several of the other candidate genes in stratified analyses (age at onset, HLA status, population of origin). These results have generated additional interesting functional hypotheses that will require further replication in independent cohorts.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter