Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
1,859
result(s) for
"Sequence Analysis, DNA - statistics "
Sort by:
Integrating Hi-C links with assembly graphs for chromosome-scale assembly
2019
Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at https://github.com/machinegun/SALSA.
Journal Article
Tackling the widespread and critical impact of batch effects in high-throughput data
by
Baggerly, Keith
,
Scharpf, Robert B.
,
Irizarry, Rafael A.
in
631/1647/1513
,
631/1647/48
,
Agriculture
2010
Batch effects can lead to incorrect biological conclusions but are not widely considered. The authors show that batch effects are relevant to a range of high-throughput 'omics' data sets and are crucial to address. They also explain how batch effects can be mitigated.
High-throughput technologies are widely used, for example to assay genetic variants, gene and protein expression, and epigenetic modifications. One often overlooked complication with such studies is batch effects, which occur because measurements are affected by laboratory conditions, reagent lots and personnel differences. This becomes a major problem when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. Using both published studies and our own analyses, we argue that batch effects (as well as other technical and biological artefacts) are widespread and critical to address. We review experimental and computational approaches for doing so.
Journal Article
A framework for variation discovery and genotyping using next-generation DNA sequencing data
by
Rivas, Manuel A
,
Philippakis, Anthony A
,
Banks, Eric
in
631/208/2489/144
,
631/208/514/2254
,
Agriculture
2011
Mark DePristo and colleagues report an analytical framework to discover and genotype variation using whole exome and genome resequencing data from next-generation sequencing technologies. They apply these methods to low-pass population sequencing data from the 1000 Genomes Project.
Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.
Journal Article
Sequencing-based methods and resources to study antimicrobial resistance
by
Boolchandani Manish
,
D’Souza Alaric W
,
Dantas Gautam
in
Antibiotics
,
Antimicrobial agents
,
Antimicrobial resistance
2019
Antimicrobial resistance extracts high morbidity, mortality and economic costs yearly by rendering bacteria immune to antibiotics. Identifying and understanding antimicrobial resistance are imperative for clinical practice to treat resistant infections and for public health efforts to limit the spread of resistance. Technologies such as next-generation sequencing are expanding our abilities to detect and study antimicrobial resistance. This Review provides a detailed overview of antimicrobial resistance identification and characterization methods, from traditional antimicrobial susceptibility testing to recent deep-learning methods. We focus on sequencing-based resistance discovery and discuss tools and databases used in antimicrobial resistance studies.Next-generation sequencing has improved the identification and characterization of antimicrobial resistance. Focusing on sequence-based discovery of antibiotic resistance genes, this Review discusses computational strategies and resources for resistance gene identification in genomic and metagenomic samples, including recent deep-learning approaches.
Journal Article
Twelve quick steps for genome assembly and annotation in the classroom
by
Chung, J. Sook
,
Nam, Bo-Hye
,
Jung, Hyungtaek
in
Agricultural economics
,
Agricultural production
,
Animals
2020
Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.
Journal Article
Statistical power and significance testing in large-scale genetic studies
2014
Key Points
Significance testing, with appropriate multiple testing correction, is currently the most convenient method for summarizing the evidence for association between a disease and a genetic variant.
Inadequate statistical power increases not only the probability of missing genuine associations but also the probability that significant associations represent false-positive findings.
Statistical power declines rapidly with decreasing allele frequency and effect size, but it can be enhanced by increasing sample size and by selecting appropriate subjects (for example, family history positive cases and 'super normal' controls).
Exome sequencing studies can often identify the mutation responsible for a Mendelian disease by filtering out common variants, synonymous variants or variants that do not co-segregate with disease, and then assigning priority to the remaining variants using bioinformatic tools.
Adequate statistical power for rare-variant association analyses in complex diseases requires the aggregation of the effects of multiple rare variants within a defined portion of the genome (for example, a set of related genes).
Various computational tools are available for calculating the statistical power of genetic studies.
This Review discusses the principles and applications of significance testing and power calculation, including recently proposed gene-based tests for rare variants.
Significance testing was developed as an objective method for summarizing statistical evidence for a hypothesis. It has been widely adopted in genetic studies, including genome-wide association studies and, more recently, exome sequencing studies. However, significance testing in both genome-wide and exome-wide studies must adopt stringent significance thresholds to allow multiple testing, and it is useful only when studies have adequate statistical power, which depends on the characteristics of the phenotype and the putative genetic variant, as well as the study design. Here, we review the principles and applications of significance testing and power calculation, including recently proposed gene-based tests for rare variants.
Journal Article
Systematic Evaluation of Sanger Validation of Next-Generation Sequencing Variants
by
Biesecker, Leslie G
,
Mullikin, James C
,
Beck, Tyler F
in
Animals
,
Deoxyribonucleic acid
,
Design
2016
Next-generation sequencing (NGS) data are used for both clinical care and clinical research. DNA sequence variants identified using NGS are often returned to patients/participants as part of clinical or research protocols. The current standard of care is to validate NGS variants using Sanger sequencing, which is costly and time-consuming.
We performed a large-scale, systematic evaluation of Sanger-based validation of NGS variants using data from the ClinSeq® project. We first used NGS data from 19 genes in 5 participants, comparing them to high-throughput Sanger sequencing results on the same samples, and found no discrepancies among 234 NGS variants. We then compared NGS variants in 5 genes from 684 participants against data from Sanger sequencing.
Of over 5800 NGS-derived variants, 19 were not validated by Sanger data. Using newly designed sequencing primers, Sanger sequencing confirmed 17 of the NGS variants, and the remaining 2 variants had low quality scores from exome sequencing. Overall, we measured a validation rate of 99.965% for NGS variants using Sanger sequencing, which was higher than many existing medical tests that do not necessitate orthogonal validation.
A single round of Sanger sequencing is more likely to incorrectly refute a true-positive variant from NGS than to correctly identify a false-positive variant from NGS. Validation of NGS-derived variants using Sanger sequencing has limited utility, and best practice standards should not include routine orthogonal Sanger validation of NGS variants.
Journal Article
Graphtyper enables population-scale genotyping using pangenome graphs
by
Masson, Gisli
,
Halldorsson, Bjarni V
,
Kristmundsdottir, Snaedis
in
631/208
,
631/208/212
,
631/208/457
2017
Graphtyper is a fast and scalable method for variant genotyping that aligns short-read sequence data to a pangenome. Graphtyper was able to accurately genotype ∼90 million sequence variants in the whole genomes of ∼28,000 Icelanders, including those in six HLA genes.
A fundamental requirement for genetic studies is an accurate determination of sequence variation. While human genome sequence diversity is increasingly well characterized, there is a need for efficient ways to use this knowledge in sequence analysis. Here we present Graphtyper, a publicly available novel algorithm and software for discovering and genotyping sequence variants. Graphtyper realigns short-read sequence data to a pangenome, a variation-aware graph structure that encodes sequence variation within a population by representing possible haplotypes as graph paths. Our results show that Graphtyper is fast, highly scalable, and provides sensitive and accurate genotype calls. Graphtyper genotyped 89.4 million sequence variants in the whole genomes of 28,075 Icelanders using less than 100,000 CPU days, including detailed genotyping of six human leukocyte antigen (HLA) genes. We show that Graphtyper is a valuable tool in characterizing sequence variation in both small and population-scale sequencing studies.
Journal Article
Improved Algorithmic Complexity for the 3SEQ Recombination Detection Algorithm
2018
Identifying recombinant sequences in an era of large genomic databases is challenging as it requires an efficient algorithm to identify candidate recombinants and parents, as well as appropriate statistical methods to correct for the large number of comparisons performed. In 2007, a computation was introduced for an exact nonparametric mosaicism statistic that gave high-precision P values for putative recombinants. This exact computation meant that multiple-comparisons corrected P values also had high precision, which is crucial when performing millions or billions of tests in large databases. Here, we introduce an improvement to the algorithmic complexity of this computation from O(mn3) to O(mn2), where m and n are the numbers of recombination-informative sites in the candidate recombinant. This new computation allows for recombination analysis to be performed in alignments with thousands of polymorphic sites. Benchmark runs are presented on viral genome sequence alignments, new features are introduced, and applications outside recombination analysis are discussed.
Journal Article
SLAF-seq: An Efficient Method of Large-Scale De Novo SNP Discovery and Genotyping Using High-Throughput Sequencing
2013
Large-scale genotyping plays an important role in genetic association studies. It has provided new opportunities for gene discovery, especially when combined with high-throughput sequencing technologies. Here, we report an efficient solution for large-scale genotyping. We call it specific-locus amplified fragment sequencing (SLAF-seq). SLAF-seq technology has several distinguishing characteristics: i) deep sequencing to ensure genotyping accuracy; ii) reduced representation strategy to reduce sequencing costs; iii) pre-designed reduced representation scheme to optimize marker efficiency; and iv) double barcode system for large populations. In this study, we tested the efficiency of SLAF-seq on rice and soybean data. Both sets of results showed strong consistency between predicted and practical SLAFs and considerable genotyping accuracy. We also report the highest density genetic map yet created for any organism without a reference genome sequence, common carp in this case, using SLAF-seq data. We detected 50,530 high-quality SLAFs with 13,291 SNPs genotyped in 211 individual carp. The genetic map contained 5,885 markers with 0.68 cM intervals on average. A comparative genomics study between common carp genetic map and zebrafish genome sequence map showed high-quality SLAF-seq genotyping results. SLAF-seq provides a high-resolution strategy for large-scale genotyping and can be generally applicable to various species and populations.
Journal Article