Catalogue Search | MBRL

An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar

by Quick, Joshua , Gangavarapu, Karthik , Brackney, Doug E. in Accuracy , Amplicon sequencing , Animal Genetics and Genomics

2019

How viruses evolve within hosts can dictate infection outcomes; however, reconstructing this process is challenging. We evaluate our multiplexed amplicon approach, PrimalSeq, to demonstrate how virus concentration, sequencing coverage, primer mismatches, and replicates influence the accuracy of measuring intrahost virus diversity. We develop an experimental protocol and computational tool, iVar, for using PrimalSeq to measure virus diversity using Illumina and compare the results to Oxford Nanopore sequencing. We demonstrate the utility of PrimalSeq by measuring Zika and West Nile virus diversity from varied sample types and show that the accumulation of genetic diversity is influenced by experimental and biological systems.

Journal Article

Share this book

Add to My Shelf

Extensive sequence duplication in Arabidopsis revealed by pseudo-heterozygosity

by Pisupati, Rahul , Burns, Robin , Rabanal, Fernando A. in Animal Genetics and Genomics , Arabidopsis , Arabidopsis - genetics

2023

Background It is apparent that genomes harbor much structural variation that is largely undetected for technical reasons. Such variation can cause artifacts when short-read sequencing data are mapped to a reference genome. Spurious SNPs may result from mapping of reads to unrecognized duplicated regions. Calling SNP using the raw reads of the 1001 Arabidopsis Genomes Project we identified 3.3 million (44%) heterozygous SNPs. Given that Arabidopsis thaliana ( A. thaliana ) is highly selfing, and that extensively heterozygous individuals have been removed, we hypothesize that these SNPs reflected cryptic copy number variation. Results The heterozygosity we observe consists of particular SNPs being heterozygous across individuals in a manner that strongly suggests it reflects shared segregating duplications rather than random tracts of residual heterozygosity due to occasional outcrossing. Focusing on such pseudo-heterozygosity in annotated genes, we use genome-wide association to map the position of the duplicates. We identify 2500 putatively duplicated genes and validate them using de novo genome assemblies from six lines. Specific examples included an annotated gene and nearby transposon that transpose together. We also demonstrate that cryptic structural variation produces highly inaccurate estimates of DNA methylation polymorphism. Conclusions Our study confirms that most heterozygous SNP calls in A. thaliana are artifacts and suggest that great caution is needed when analyzing SNP data from short-read sequencing. The finding that 10% of annotated genes exhibit copy-number variation, and the realization that neither gene- nor transposon-annotation necessarily tells us what is actually mobile in the genome suggests that future analyses based on independently assembled genomes will be very informative.

Journal Article

Share this book

Add to My Shelf

Duet: SNP-assisted structural variant calling and phasing using Oxford nanopore sequencing

by Zhou, Yekai , Luo, Ruibang , Leung, Amy Wing-Sze in Accuracy , Algorithms , Analysis

2022

Background Whole genome sequencing using the long-read Oxford Nanopore Technologies (ONT) MinION sequencer provides a cost-effective option for structural variant (SV) detection in clinical applications. Despite the advantage of using long reads, however, accurate SV calling and phasing are still challenging. Results We introduce Duet, an SV detection tool optimized for SV calling and phasing using ONT data. The tool uses novel features integrated from both SV signatures and single-nucleotide polymorphism signatures, which can accurately distinguish SV haplotype from a false signal. Duet was benchmarked against state-of-the-art tools on multiple ONT sequencing datasets of sequencing coverage ranging from 8× to 40×. At low sequencing coverage of 8×, Duet performs better than all other tools in SV calling, SV genotyping and SV phasing. When the sequencing coverage is higher (20× to 40×), the F1-score for SV phasing is further improved in comparison to the performance of other tools, while its performance of SV genotyping and SV calling remains higher than other tools. Conclusion Duet can perform accurate SV calling, SV genotyping and SV phasing using low-coverage ONT data, making it very useful for low-coverage genomes. It has great performance when scaled to high-coverage genomes, which is adaptable to various clinical applications. Duet is open source and is available at https://github.com/yekaizhou/duet .

Journal Article

Share this book

Add to My Shelf

Reidentification of hybridization events with transcriptomic data and phylogenomic study in Seabuckthorn

by Zhang, Hui , Yang, Lujie , Fang, Jing in 631/208 , 631/449 , 631/553

2025

Natural hybridization in sea buckthorn ( Hippophae spp .) is well documented. While the parental species involved in these events have been identified, distinctions between F1 hybrids and later-generation (Fn) hybrids remain insufficiently explored, and their genetic compositions are not yet fully understood. In this study, we employed transcriptomic data and reference genomes to identify Fn hybrids in two natural hybrid populations, confirming eight individuals—including H . goniocarpa Lian. X. L. Chen et K. Sun and four members of a hybrid swarm from Qinghai, China—as F1 hybrids. These findings support the hypothesis that H. goniocarpa is not a distinct species, but rather an F1 hybrid within the genus. Additionally, we discuss limitations specific to SNP calling from transcriptomic data—such as allele-specific expression and low transcript abundance—which may lead to the misclassification of heterozygous sites as homozygous. Finally, we constructed the first phylogenomic tree of the Hippophae genus using transcriptomic data and performed a comparative analysis of interspecific relationships based on SNP and indel markers derived from the same dataset.

Journal Article

Share this book

Add to My Shelf

Whole-transcriptome identification of deleterious variants in candidate genes linked to bovine paratuberculosis

by Lam, Stephanie , Badia-Bringué, Gerard , Alonso-Hearn, Marta in 631/208 , 631/250 , 631/326

2026

RNA-Sequencing (RNA-Seq) represents a powerful approach for discovering SNPs in coding regions (cSNPs) which can alter the amino acid sequence of the encoded proteins and have predicted deleterious effects in proteins, underlying disease susceptibility or resistance. RNA-Seq data from peripheral blood (PB) and ileocecal valve (ICV) samples collected from fourteen Holstein cattle with focal ( N = 5) and diffuse ( N = 5) paratuberculosis (PTB)-associated lesions and without lesions ( N = 4) in gut tissues was used to identify deleterious cSNPs that were unique to each group of animals. PB and ICV samples from each animal were subjected to RNA extraction, library preparation, and paired-end RNA-Sequencing (RNA-Seq). The RNA-Seq reads were aligned against the bovine ARS-UCD1.2.109 reference genome using the STAR aligner generating an average of 21,331,835 and 19,506,829 uniquely mapped reads in the PB and ICV samples, respectively. SNP calling was performed on the RNA-Seq data of each group of animals using bcftools v1.11 . To ensure high-confidence cSNP calls, highly stringent SNP filtering criteria were applied: minimum read depth (≥ 10), supporting reads for alternative allele (≥ 4), Phred score of the alternative allele (≥ 30), minor allele frequency ( > 20%), maximum proportion of missing data per site ( < 80%), and distance from indels (SNPs within 5 bp of insertions/deletions were excluded). From the 856, 625, and 603 identified cSNPs that were uniquely present in the transcriptome of the control cows and cows with focal and diffuse lesions, 31, 15, and 31 variants had predicted deleterious effects, respectively. The major histocompatibility complex II gene ( BOLA ) was the only candidate gene affected by different predicted deleterious cSNPs in the three groups of animals. Using the candidate genes, gene set enrichment analysis (GSEA) revealed distinct biological processes and metabolic pathways associated with each group of cows. Cows without lesions showed enrichment in 11 GO terms and 6 metabolic pathways, particularly involving BOLA , AP3B1 , and CHGA genes. These leading-edge genes are linked to antigen processing and presentation, phagosome maturation, lysosome function, and intestinal immune homeostasis. Cows with focal lesions had enrichment in the negative regulation of apoptosis and cellular metabolism with two leading-edge genes, ORMD1 and KANK2 . Predicted deleterious cSNPs in these leading-edge genes may help the host modulate immune responses and maintain low bacterial load during the subclinical stage of MAP infection. Finally, cows with diffuse lesions showed enrichment in 27 metabolic pathways, including Th1/Th2 cell differentiation, antigen presentation, bile secretion, and antifolate resistance. Further validation of the cSNPs and candidate genes in additional independent populations may lead to their use in SNP-based selection strategies for increasing resistance to MAP infection.

Journal Article

Share this book

Add to My Shelf

UGbS-Flex, a novel bioinformatics pipeline for imputation-free SNP discovery in polyploids without a reference genome: finger millet as a case study

by Qi, Peng , Schröder, Stephan , Devos, Katrien M. in Agriculture , allotetraploidy , Analysis

2018

Background Research on orphan crops is often hindered by a lack of genomic resources. With the advent of affordable sequencing technologies, genotyping an entire genome or, for large-genome species, a representative fraction of the genome has become feasible for any crop. Nevertheless, most genotyping-by-sequencing (GBS) methods are geared towards obtaining large numbers of markers at low sequence depth, which excludes their application in heterozygous individuals. Furthermore, bioinformatics pipelines often lack the flexibility to deal with paired-end reads or to be applied in polyploid species. Results UGbS-Flex combines publicly available software with in-house python and perl scripts to efficiently call SNPs from genotyping-by-sequencing reads irrespective of the species’ ploidy level, breeding system and availability of a reference genome. Noteworthy features of the UGbS-Flex pipeline are an ability to use paired-end reads as input, an effective approach to cluster reads across samples with enhanced outputs, and maximization of SNP calling. We demonstrate use of the pipeline for the identification of several thousand high-confidence SNPs with high representation across samples in an F 3 -derived F 2 population in the allotetraploid finger millet. Robust high-density genetic maps were constructed using the time-tested mapping program MAPMAKER which we upgraded to run efficiently and in a semi-automated manner in a Windows Command Prompt Environment. We exploited comparative GBS with one of the diploid ancestors of finger millet to assign linkage groups to subgenomes and demonstrate the presence of chromosomal rearrangements. Conclusions The paper combines GBS protocol modifications, a novel flexible GBS analysis pipeline, UGbS-Flex, recommendations to maximize SNP identification, updated genetic mapping software, and the first high-density maps of finger millet. The modules used in the UGbS-Flex pipeline and for genetic mapping were applied to finger millet, an allotetraploid selfing species without a reference genome, as a case study. The UGbS-Flex modules, which can be run independently, are easily transferable to species with other breeding systems or ploidy levels.

Journal Article

Share this book

Add to My Shelf

A soybean quantitative trait locus that promotes flowering under long days is identified as FT5a, a FLOWERING LOCUS T ortholog

by Zhao, Chen , Kong, Fanjiang , Zhu, Jianghui in Flowers - genetics , Flowers - growth & development , Flowers - physiology

2016

FLOWERING LOCUS T (FT) is an important floral integrator whose functions are conserved across plant species. In soybean, two orthologs, FT2a and FT5a, play a major role in initiating flowering. Their expression in response to different photoperiods is controlled by allelic combinations at the maturity loci E1 to E4, generating variation in flowering time among cultivars. We determined the molecular basis of a quantitative trait locus (QTL) for flowering time in linkage group J (Chromosome 16). Fine-mapping delimited the QTL to a genomic region of 107 kb that harbors FT5a. We detected 15 DNA polymorphisms between parents with the early-flowering (ef) and late-flowering (lf) alleles in the promoter region, an intron, and the 3′ untranslated region of FT5a, although the FT5a coding regions were identical. Transcript abundance of FT5a was higher in near-isogenic lines for ef than in those for lf, suggesting that different transcriptional activities or mRNA stability caused the flowering time difference. Single-nucleotide polymorphism (SNP) calling from re-sequencing data for 439 cultivated and wild soybean accessions indicated that ef is a rare haplotype that is distinct from common haplotypes including lf. The ef allele at FT5a may play an adaptive role at latitudes where early flowering is desirable.

Journal Article

Share this book

Add to My Shelf

Patterns of cross-contamination in a multispecies population genomic project: detection, quantification, impact, and solutions

by Faivre, Nicolas , Ballenghien, Marion , Galtier, Nicolas in Acids , Alarm systems , Alignment

2017

Background Contamination is a well-known but often neglected problem in molecular biology. Here, we investigated the prevalence of cross-contamination among 446 samples from 116 distinct species of animals, which were processed in the same laboratory and subjected to subcontracted transcriptome sequencing. Results Using cytochrome oxidase 1 as a barcode, we identified a minimum of 782 events of between-species contamination, with approximately 80% of our samples being affected. An analysis of laboratory metadata revealed a strong effect of the sequencing center: nearly all the detected events of between-species contamination involved species that were sent the same day to the same company. We introduce new methods to address the amount of within-species, between-individual contamination, and to correct for this problem when calling genotypes from base read counts. Conclusions We report evidence for pervasive within-species contamination in this data set, and show that classical population genomic statistics, such as synonymous diversity, the ratio of non-synonymous to synonymous diversity, inbreeding coefficient F IT , and Tajima’s D, are sensitive to this problem to various extents. Control analyses suggest that our published results are probably robust to the problem of contamination. Recommendations on how to prevent or avoid contamination in large-scale population genomics/molecular ecology are provided based on this analysis.

Journal Article

Share this book

Add to My Shelf

Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP

by Quintero, Juan Camilo , Duitama, Jorge , Cruz, Daniel Felipe in Animal Genetics and Genomics , Biomedical and Life Sciences , Computational Biology

2016

Background Therecent development and availability of different genotype by sequencing (GBS) protocols provided a cost-effective approach to perform high-resolution genomic analysis of entire populations in different species. The central component of all these protocols is the digestion of the initial DNA with known restriction enzymes, to generate sequencing fragments at predictable and reproducible sites. This allows to genotype thousands of genetic markers on populations with hundreds of individuals. Because GBS protocols achieve parallel genotyping through high throughput sequencing (HTS), every GBS protocol must include a bioinformatics pipeline for analysis of HTS data. Our bioinformatics group recently developed the Next Generation Sequencing Eclipse Plugin (NGSEP) for accurate, efficient, and user-friendly analysis of HTS data. Results Here we present the latest functionalities implemented in NGSEP in the context of the analysis of GBS data. We implemented a one step wizard to perform parallel read alignment, variants identification and genotyping from HTS reads sequenced from entire populations. We added different filters for variants, samples and genotype calls as well as calculation of summary statistics overall and per sample, and diversity statistics per site. NGSEP includes a module to translate genotype calls to some of the most widely used input formats for integration with several tools to perform downstream analyses such as population structure analysis, construction of genetic maps, genetic mapping of complex traits and phenotype prediction for genomic selection. We assessed the accuracy of NGSEP on two highly heterozygous F1 cassava populations and on an inbred common bean population, and we showed that NGSEP provides similar or better accuracy compared to other widely used software packages for variants detection such as GATK, Samtools and Tassel. Conclusions NGSEP is a powerful, accurate and efficient bioinformatics software tool for analysis of HTS data, and also one of the best bioinformatic packages to facilitate the analysis and to maximize the genomic variability information that can be obtained from GBS experiments for population genomics.

Journal Article

Share this book

Add to My Shelf

The Effect of Genome Parametrization and SNP Marker Subsetting on Genomic Selection in Autotetraploid Alfalfa

by Franguelli, Nicolò , Ferrari, Barbara , Nazzicari, Nelson in Agricultural production , alfalfa , Alleles

2024

Background: Alfalfa, the most economically important forage legume worldwide, features modest genetic progress due to long selection cycles and the extent of the non-additive genetic variance associated with its autotetraploid genome. Methods: To improve the efficiency of genomic selection in alfalfa, we explored the effects of genome parametrization (as tetraploid and diploid dosages, plus allele ratios) and SNP marker subsetting (all available SNPs, only genic regions, and only non-genic regions) on genomic regressions, together with various levels of filtering on reading depth and missing rates. We used genotyping by sequencing-generated data and focused on traits of different genetic complexity, i.e., dry biomass yield in moisture-favorable (FE) and drought stress (SE) environments, leaf size, and the onset of flowering, which were assessed in 143 genotyped plants from a genetically broad European reference population and their phenotyped half-sib progenies. Results: On average, the allele ratio improved the predictive ability compared with other genome parametrizations (+7.9% vs. tetraploid dosage, +12.6% vs. diploid dosage), while using all the SNPs offered an advantage compared with any specific SNP subsetting (+3.7% vs. genic regions, +7.6% vs. non-genic regions). However, when focusing on specific traits, different combinations of genome parametrization and subsetting achieved better performances. We also released Legpipe2, an SNP calling pipeline tailored for reduced representation (GBS, RAD) in medium-sized genotyping experiments.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter