Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
101 result(s) for "Marth, Gabor T."
Sort by:
MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).
SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications
The Smith-Waterman algorithm, which produces the optimal pairwise alignment between two sequences, is frequently used as a key component of fast heuristic read mapping and variation detection tools for next-generation sequencing data. Though various fast Smith-Waterman implementations are developed, they are either designed as monolithic protein database searching tools, which do not return detailed alignment, or are embedded into other tools. These issues make reusing these efficient Smith-Waterman implementations impractical. To facilitate easy integration of the fast Single-Instruction-Multiple-Data Smith-Waterman algorithm into third-party software, we wrote a C/C++ library, which extends Farrar's Striped Smith-Waterman (SSW) to return alignment information in addition to the optimal Smith-Waterman score. In this library we developed a new method to generate the full optimal alignment results and a suboptimal score in linear space at little cost of efficiency. This improvement makes the fast Single-Instruction-Multiple-Data Smith-Waterman become really useful in genomic applications. SSW is available both as a C/C++ software library, as well as a stand-alone alignment tool at: https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library. The SSW library has been used in the primary read mapping tool MOSAIK, the split-read mapping program SCISSORS, the MEI detector TANGRAM, and the read-overlap graph generation program RZMBLR. The speeds of the mentioned software are improved significantly by replacing their ordinary Smith-Waterman or banded Smith-Waterman module with the SSW Library.
SpeedSeq: ultra-fast personal genome analysis and interpretation
SpeedSeq is an open-source software suite offering very fast, accurate and comprehensive analysis of single-nucleotide and structural variants from whole genome sequencing data. SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.
GIGGLE: a search engine for large-scale integrated genome analysis
GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.
Pyrobayes: an improved base caller for SNP discovery in pyrosequences
Previously reported applications of the 454 Life Sciences pyrosequencing technology have relied on deep sequence coverage for accurate polymorphism discovery because of frequent insertion and deletion sequence errors. Here we report a new base calling program, Pyrobayes, for pyrosequencing reads. Pyrobayes permits accurate single-nucleotide polymorphism (SNP) calling in resequencing applications, even in shallow read coverage, primarily because it produces more confident base calls than the native base calling program.
Automated size selection for short cell-free DNA fragments enriches for circulating tumor DNA and improves error correction during next generation sequencing
Circulating tumor-derived cell-free DNA (ctDNA) enables non-invasive diagnosis, monitoring, and treatment susceptibility testing in human cancers. However, accurate detection of variant alleles, particularly during untargeted searches, remains a principal obstacle to widespread application of cell-free DNA in clinical oncology. In this study, isolation of short cell-free DNA fragments is shown to enrich for tumor variants and improve correction of PCR- and sequencing-associated errors. Subfractions of the mononucleosome of circulating cell-free DNA (ccfDNA) were isolated from patients with melanoma, pancreatic ductal adenocarcinoma, and colorectal adenocarcinoma using a high-throughput-capable automated gel-extraction platform. Using a 128-gene (128 kb) custom next-generation sequencing panel, variant alleles were on average 2-fold enriched in the short fraction (median insert size: ~142 bp) compared to the original ccfDNA sample, while 0.7-fold reduced in the fraction corresponding to the principal peak of the mononucleosome (median insert size: ~167 bp). Size-selected short fractions compared to the original ccfDNA yielded significantly larger family sizes (i.e., PCR duplicates) during in silico consensus sequence interpretation via unique molecular identifiers. Increments in family size were associated with a progressive reduction of PCR and sequencing errors. Although consensus read depth also decreased at larger family sizes, the variant allele frequency in the short ccfDNA fraction remained consistent, while variant detection in the original ccfDNA was commonly lost at family sizes necessary to minimize errors. These collective findings support the automated extraction of short ccfDNA fragments to enrich for ctDNA while concomitantly reducing false positives through in silico error correction.
A Comprehensive Map of Mobile Element Insertion Polymorphisms in Humans
As a consequence of the accumulation of insertion events over evolutionary time, mobile elements now comprise nearly half of the human genome. The Alu, L1, and SVA mobile element families are still duplicating, generating variation between individual genomes. Mobile element insertions (MEI) have been identified as causes for genetic diseases, including hemophilia, neurofibromatosis, and various cancers. Here we present a comprehensive map of 7,380 MEI polymorphisms from the 1000 Genomes Project whole-genome sequencing data of 185 samples in three major populations detected with two detection methods. This catalog enables us to systematically study mutation rates, population segregation, genomic distribution, and functional properties of MEI polymorphisms and to compare MEI to SNP variation from the same individuals. Population allele frequencies of MEI and SNPs are described, broadly, by the same neutral ancestral processes despite vastly different mutation mechanisms and rates, except in coding regions where MEI are virtually absent, presumably due to strong negative selection. A direct comparison of MEI and SNP diversity levels suggests a differential mobile element insertion rate among populations.
A DOC2 Protein Identified by Mutational Profiling Is Essential for Apicomplexan Parasite Exocytosis
Exocytosis is essential to the lytic cycle of apicomplexan parasites and required for the pathogenesis of toxoplasmosis and malaria. DOC2 proteins recruit the membrane fusion machinery required for exocytosis in a Ca²⁺-dependent fashion. Here, the phenotype of a Toxoplasma gondii conditional mutant impaired in host cell invasion and egress was pinpointed to a defect in secretion of the micronemes, an apicomplexan-specific organelle that contains adhesion proteins. Whole-genome sequencing identified the etiological point mutation in TgDOC2.1. A conditional allele of the orthologous gene engineered into Plasmodium falciparum was also defective in microneme secretion. However, the major effect was on invasion, suggesting that microneme secretion is dispensable for Plasmodium egress.
The stochastic nature of errors in next-generation sequencing of circulating cell-free DNA
Challenges with distinguishing circulating tumor DNA (ctDNA) from next-generation sequencing (NGS) artifacts limits variant searches to established solid tumor mutations. Here we show early and random PCR errors are a principal source of NGS noise that persist despite duplex molecular barcoding, removal of artifacts due to clonal hematopoiesis of indeterminate potential, and suppression of patterned errors. We also demonstrate sample duplicates are necessary to eliminate the stochastic noise associated with NGS. Integration of sample duplicates into NGS analytics may broaden ctDNA applications by removing NGS-related errors that confound identification of true very low frequency variants during searches for ctDNA without a priori knowledge of specific mutations to target.