Catalogue Search | MBRL

FMLRC: Hybrid long read error correction using an FM-index

by Wang, Jeremy R. , McMillan, Leonard , Jones, Corbin D. in Algorithms , Bioinformatics , Biomedical and Life Sciences

2018

Background Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy. Results We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. Conclusion Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.

Journal Article

Share this book

Add to My Shelf

Saliva sampling method influences oral microbiome composition and taxa distribution associated with oral diseases

by Wolfgang, Matthew C. , Roca, Cristian , Alkhateeb, Alaa A. in Biology and Life Sciences , Composition , Dental Caries

2024

Saliva is a readily accessible and inexpensive biological specimen that enables investigation of the oral microbiome, which can serve as a biomarker of oral and systemic health. There are two routine approaches to collect saliva, stimulated and unstimulated; however, there is no consensus on how sampling method influences oral microbiome metrics. In this study, we analyzed paired saliva samples (unstimulated and stimulated) from 88 individuals, aged 7–18 years. Using 16S rRNA gene sequencing, we investigated the differences in bacterial microbiome composition between sample types and determined how sampling method affects the distribution of taxa associated with untreated dental caries and gingivitis. Our analyses indicated significant differences in microbiome composition between the sample types. Both sampling methods were able to detect significant differences in microbiome composition between healthy subjects and subjects with untreated caries. However, only stimulated saliva revealed a significant association between microbiome diversity and composition in individuals with diagnosed gingivitis. Furthermore, taxa previously associated with dental caries and gingivitis were preferentially enriched in individuals with each respective disease only in stimulated saliva. Our study suggests that stimulated saliva provides a more nuanced readout of microbiome composition and taxa distribution associated with untreated dental caries and gingivitis compared to unstimulated saliva.

Journal Article

Share this book

Add to My Shelf

Evidence for extensive horizontal gene transfer from the draft genome of a tardigrade

by Jones, Corbin D. , Nishimura, Erin Osborne , Li, Qing in Animals , Archaea , Biological Sciences

2015

Horizontal gene transfer (HGT), or the transfer of genes between species, has been recognized recently as more pervasive than previously suspected. Here, we report evidence for an unprecedented degree of HGT into an animal genome, based on a draft genome of a tardigrade,Hypsibius dujardini. Tardigrades are microscopic eight-legged animals that are famous for their ability to survive extreme conditions. Genome sequencing, direct confirmation of physical linkage, and phylogenetic analysis revealed that a large fraction of theH. dujardinigenome is derived from diverse bacteria as well as plants, fungi, and Archaea. We estimate that approximately one-sixth of tardigrade genes entered by HGT, nearly double the fraction found in the most extreme cases of HGT into animals known to date. Foreign genes have supplemented, expanded, and even replaced some metazoan gene families within the tardigrade genome. Our results demonstrate that an unexpectedly large fraction of an animal genome can be derived from foreign sources. We speculate that animals that can survive extremes may be particularly prone to acquiring foreign genes.

Journal Article

Share this book

Add to My Shelf

Subspecific origin and haplotype diversity in the laboratory mouse

by Wang, Jeremy R , Buus, Ryan J , Bell, Timothy A in 631/1647/334/1874/345 , 631/208/727/728 , Agriculture

2011

Fernando Pardo-Manuel de Villena, Gary Churchill and colleagues provide a high-resolution phylogenetic map of mouse inbred strains based on comparisons to wild-caught mice. They show that the genomes of classical strains are overwhelmingly derived from Mus musculus domesticus whereas wild-derived laboratory strains include a broad sampling of diversity from multiple subspecies with pervasive introgression. The subspecific origin, haplotype diversity and identity-by-descent map of laboratory strains can be visualized at http://msub.csbio.unc.edu/PhylogenyTool.html . Here we provide a genome-wide, high-resolution map of the phylogenetic origin of the genome of most extant laboratory mouse inbred strains. Our analysis is based on the genotypes of wild-caught mice from three subspecies of Mus musculus . We show that classical laboratory strains are derived from a few fancy mice with limited haplotype diversity. Their genomes are overwhelmingly Mus musculus domesticus in origin, and the remainder is mostly of Japanese origin. We generated genome-wide haplotype maps based on identity by descent from fancy mice and show that classical inbred strains have limited and non-randomly distributed genetic diversity. In contrast, wild-derived laboratory strains represent a broad sampling of diversity within M. musculus . Intersubspecific introgression is pervasive in these strains, and contamination by laboratory stocks has played a role in this process. The subspecific origin, haplotype diversity and identity by descent maps can be visualized using the Mouse Phylogeny Viewer (see URLs ).

Journal Article

Share this book

Add to My Shelf

Variability and bias in microbiome metagenomic sequencing: an interlaboratory study comparing experimental protocols

by Sevilla, Samantha , Gevers, Dirk , Soh, Keng in 631/326/2565/2134 , 631/326/2565/2142 , Bacteria - classification

2024

Several studies have documented the significant impact of methodological choices in microbiome analyses. The myriad of methodological options available complicate the replication of results and generally limit the comparability of findings between independent studies that use differing techniques and measurement pipelines. Here we describe the Mosaic Standards Challenge (MSC), an international interlaboratory study designed to assess the impact of methodological variables on the results. The MSC did not prescribe methods but rather asked participating labs to analyze 7 shared reference samples (5 × human stool samples and 2 × mock communities) using their standard laboratory methods. To capture the array of methodological variables, each participating lab completed a metadata reporting sheet that included 100 different questions regarding the details of their protocol. The goal of this study was to survey the methodological landscape for microbiome metagenomic sequencing (MGS) analyses and the impact of methodological decisions on metagenomic sequencing results. A total of 44 labs participated in the MSC by submitting results (16S or WGS) along with accompanying metadata; thirty 16S rRNA gene amplicon datasets and 14 WGS datasets were collected. The inclusion of two types of reference materials (human stool and mock communities) enabled analysis of both MGS measurement variability between different protocols using the biologically-relevant stool samples, and MGS bias with respect to ground truth values using the DNA mixtures. Owing to the compositional nature of MGS measurements, analyses were conducted on the ratio of Firmicutes: Bacteroidetes allowing us to directly apply common statistical methods. The resulting analysis demonstrated that protocol choices have significant effects, including both bias of the MGS measurement associated with a particular methodological choices, as well as effects on measurement robustness as observed through the spread of results between labs making similar methodological choices. In the analysis of the DNA mock communities, MGS measurement bias was observed even when there was general consensus among the participating laboratories. This study was the result of a collaborative effort that included academic, commercial, and government labs. In addition to highlighting the impact of different methodological decisions on MGS result comparability, this work also provides insights for consideration in future microbiome measurement study design.

Journal Article

Share this book

Add to My Shelf

Mpox surveillance: the need for enhanced testing and genomic epidemiology

by Wang, Jeremy R in Disease transmission , Epidemics , Epidemiology

2024

Journal Article

Share this book

Add to My Shelf

Correcting nucleotide-specific biases in high-throughput sequencing data

by Wang, Jeremy R. , Quach, Bryan , Furey, Terrence S. in Adapters , Algorithms , Assaying

2017

Background High-throughput sequence (HTS) data exhibit position-specific nucleotide biases that obscure the intended signal and reduce the effectiveness of these data for downstream analyses. These biases are particularly evident in HTS assays for identifying regulatory regions in DNA (DNase-seq, ChIP-seq, FAIRE-seq, ATAC-seq). Biases may result from many experiment-specific factors, including selectivity of DNA restriction enzymes and fragmentation method, as well as sequencing technology-specific factors, such as choice of adapters/primers and sample amplification methods. Results We present a novel method to detect and correct position-specific nucleotide biases in HTS short read data. Our method calculates read-specific weights based on aligned reads to correct the over- or underrepresentation of position-specific nucleotide subsequences, both within and adjacent to the aligned read, relative to a baseline calculated in assay-specific enriched regions. Using HTS data from a variety of ChIP-seq, DNase-seq, FAIRE-seq, and ATAC-seq experiments, we show that our weight-adjusted reads reduce the position-specific nucleotide imbalance across reads and improve the utility of these data for downstream analyses, including identification and characterization of open chromatin peaks and transcription-factor binding sites. Conclusions A general-purpose method to characterize and correct position-specific nucleotide sequence biases fills the need to recognize and deal with, in a systematic manner, binding-site preference for the growing number of HTS-based epigenetic assays. As the breadth and impact of these biases are better understood, the availability of a standard toolkit to correct them will be important.

Journal Article

Share this book

Add to My Shelf

Transcriptome profiling of pediatric extracranial solid tumors and lymphomas enables rapid low-cost diagnostic classification

by Alexander, Thomas B. , Roush, Sophia M. , Tomoka, Tamiwe in 631/114/1305 , 631/67/2332 , 631/67/69

2024

Approximately 80% of pediatric tumors occur in low- and middle-income countries (LMIC), where diagnostic tools essential for treatment decisions are often unavailable or incomplete. Development of cost-effective molecular diagnostics will help bridge the cancer diagnostic gap and ultimately improve pediatric cancer outcomes in LMIC settings. We investigated the feasibility of using nanopore whole transcriptome sequencing on formalin-fixed paraffin embedded (FFPE)-derived RNA and a composite machine learning model for pediatric solid tumor diagnosis. Transcriptome cDNA sequencing was performed on a heterogenous set of 221 FFPE and 32 fresh frozen pediatric solid tumor and lymphoma specimens on Oxford Nanopore Technologies’ sequencing platforms. A composite machine learning model was then used to classify transcriptional profiles into clinically actionable tumor types and subtypes. In total, 95.6% and 89.7% of pediatric solid tumors and lymphoma specimens were correctly classified, respectively. 71.5% of pediatric solid tumors had prediction probabilities > 0.8 and were classified with 100% accuracy. Similarly, for lymphomas, 72.4% of samples that had prediction probabilities > 0.6 were classified with 97.6% accuracy. Additionally, FOXO1 fusion status was predicted accurately for 97.4% of rhabdomyosarcomas and MYCN amplification was predicted with 88% accuracy in neuroblastoma. Whole transcriptome sequencing from FFPE-derived pediatric solid tumor and lymphoma samples has the potential to provide clinical classification of both tissue lineage and core genomic classification. Further expansion, refinement, and validation of this approach is necessary to explore whether this technology could be part of the solution of addressing the diagnostic limitations in LMIC.

Journal Article

Share this book

Add to My Shelf

Genetic Architecture of Skewed X Inactivation in the Laboratory Mouse

by Pardo-Manuel de Villena, Fernando , Lenarcic, Alan B. , Searle, Jeremy B. in Alleles , Animals , Architecture

2013

X chromosome inactivation (XCI) is the mammalian mechanism of dosage compensation that balances X-linked gene expression between the sexes. Early during female development, each cell of the embryo proper independently inactivates one of its two parental X-chromosomes. In mice, the choice of which X chromosome is inactivated is affected by the genotype of a cis-acting locus, the X-chromosome controlling element (Xce). Xce has been localized to a 1.9 Mb interval within the X-inactivation center (Xic), yet its molecular identity and mechanism of action remain unknown. We combined genotype and sequence data for mouse stocks with detailed phenotyping of ten inbred strains and with the development of a statistical model that incorporates phenotyping data from multiple sources to disentangle sources of XCI phenotypic variance in natural female populations on X inactivation. We have reduced the Xce candidate 10-fold to a 176 kb region located approximately 500 kb proximal to Xist. We propose that structural variation in this interval explains the presence of multiple functional Xce alleles in the genus Mus. We have identified a new allele, Xce(e) present in Mus musculus and a possible sixth functional allele in Mus spicilegus. We have also confirmed a parent-of-origin effect on X inactivation choice and provide evidence that maternal inheritance magnifies the skewing associated with strong Xce alleles. Based on the phylogenetic analysis of 155 laboratory strains and wild mice we conclude that Xce(a) is either a derived allele that arose concurrently with the domestication of fancy mice but prior the derivation of most classical inbred strains or a rare allele in the wild. Furthermore, we have found that despite the presence of multiple haplotypes in the wild Mus musculus domesticus has only one functional Xce allele, Xce(b). Lastly, we conclude that each mouse taxa examined has a different functional Xce allele.

Journal Article

Share this book

Add to My Shelf

Crohn’s Disease Differentially Affects Region-Specific Composition and Aerotolerance Profiles of Mucosally Adherent Bacteria

by Wang, Jeremy R , Nix, B Darren , Furey, Terrence S in Aerobiosis , Bacteria , Basic Science Research

2020

Abstract Background The intestinal microbiota play a key role in the onset, progression, and recurrence of Crohn disease (CD). Most microbiome studies assay fecal material, which does not provide region-specific information on mucosally adherent bacteria that directly interact with host systems. Changes in luminal oxygen have been proposed as a contributor to CD dybiosis. Methods The authors generated 16S rRNA data using colonic and ileal mucosal bacteria from patients with CD and without inflammatory bowel disease. We developed profiles reflecting bacterial abundance within defined aerotolerance categories. Bacterial diversity, composition, and aerotolerance profiles were compared across intestinal regions and disease phenotypes. Results Bacterial diversity decreased in CD in both the ileum and the colon. Aerotolerance profiles significantly differed between intestinal segments in patients without inflammatory bowel disease, although both were dominated by obligate anaerobes, as expected. In CD, high relative levels of obligate anaerobes were maintained in the colon and increased in the ileum. Relative abundances of similar and distinct taxa were altered in colon and ileum. Notably, several obligate anaerobes, such as Bacteroides fragilis, dramatically increased in CD in one or both intestinal segments, although specific increasing taxa varied across patients. Increased abundance of taxa from the Proteobacteria phylum was found only in the ileum. Bacterial diversity was significantly reduced in resected tissues of patients who developed postoperative disease recurrence across 2 independent cohorts, with common lower abundance of bacteria from the Bacteroides, Streptococcus, and Blautia genera. Conclusions Mucosally adherent bacteria in the colon and ileum show distinct alterations in CD that provide additional insights not revealed in fecal material.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter