Catalogue Search | MBRL

SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500

by Huang, Jin , Zhou, Rongfang , Liu, Chen in Algorithms , BGISEQ-500 , Bioinformatics

2019

Background The sequencing platform BGISEQ-500 is based on DNBSEQ technology and provides high throughput with low costs. This sequencer has been widely used in various areas of scientific and clinical research. A better understanding of the sequencing process and performance of this system is essential for stabilizing the sequencing process, accurately interpreting sequencing results and efficiently solving sequencing problems. To address these concerns, a comprehensive database, SEQdata-BEACON, was constructed to accumulate the run performance data in BGISEQ-500. Results A total of 60 BGISEQ-500 instruments in the BGI-Wuhan lab were used to collect sequencing performance data. Lanes in paired-end 100 (PE100) sequencing using 10 bp barcode were chosen, and each lane was assigned a unique entry number as its identification number (ID). From November 2018 to April 2019, 2236 entries were recorded in the database containing 65 metrics about sample, yield, quality, machine state and supplies information. Using a correlation matrix, 52 numerical metrics were clustered into three groups signifying yield-quality, machine state and sequencing calibration. The distributions of the metrics also delivered information about patterns and rendered clues for further explanation or analysis of the sequencing process. Using the data of a total of 200 cycles, a linear regression model well simulated the final outputs. Moreover, the predicted final yield could be provided in the 15th cycle of the early stage of sequencing, and the corresponding R 2 of the 200th and 15th cycle models were 0.97 and 0.81, respectively. The model was run with the test sets obtained from May 2019 to predict the yield, which resulted in an R 2 of 0.96. These results indicate that our simulation model was reliable and effective. Conclusions Data sources, statistical findings and application tools provide a constantly updated reference for BGISEQ-500 users to comprehensively understand DNBSEQ technology, solve sequencing problems and optimize run performance. These resources are available on our website http://seqBEACON.genomics.cn:443/home.html .

Journal Article

Share this book

Add to My Shelf

Protocol variations in run-on transcription dataset preparation produce detectable signatures in sequencing libraries

by Sigauke, Rutendo F. , Hunter, Samuel , Dowell, Robin D. in Animal Genetics and Genomics , Biomedical and Life Sciences , Data entry

2022

Background A variety of protocols exist for producing whole genome run-on transcription datasets. However, little is known about how differences between these protocols affect the signal within the resulting libraries. Results Using run-on transcription datasets generated from the same biological system, we show that a variety of GRO- and PRO-seq preparation methods leave identifiable signatures within each library. Specifically we show that the library preparation method results in differences in quality control metrics, as well as differences in the signal distribution at the 5 ′ end of transcribed regions. These shifts lead to disparities in eRNA identification, but do not impact analyses aimed at inferring the key regulators involved in changes to transcription. Conclusions Run-on sequencing protocol variations result in technical signatures that can be used to identify both the enrichment and library preparation method of a particular data set. These technical signatures are batch effects that limit detailed comparisons of pausing ratios and eRNAs identified across protocols. However, these batch effects have only limited impact on our ability to infer which regulators underlie the observed transcriptional changes.

Journal Article

Share this book

Add to My Shelf

Early-life undernutrition induces enhancer RNA remodeling in mice liver

by Wang, Yinyu , Yi, Xianfu , Huang, Hefeng in Animal Genetics and Genomics , Animals , Biomedical and Life Sciences

2021

Background Maternal protein restriction diet (PRD) increases the risk of metabolic dysfunction in adulthood, the mechanisms during the early life of offspring are still poorly understood. Apart from genetic factors, epigenetic mechanisms are crucial to offer phenotypic plasticity in response to environmental situations and transmission. Enhancer-associated noncoding RNAs (eRNAs) transcription serves as a robust indicator of enhancer activation, and have potential roles in mediating enhancer functions and gene transcription. Results Using global run-on sequencing (GRO-seq) of nascent RNA including eRNA and total RNA sequencing data, we show that early-life undernutrition causes remodeling of enhancer activity in mouse liver. Differentially expressed nascent active genes were enriched in metabolic pathways. Besides, our work detected a large number of high confidence enhancers based on eRNA transcription at the ages of 4 weeks and 7 weeks, respectively. Importantly, except for ~ 1000 remodeling enhancers, the early-life undernutrition induced instability of enhancer activity which decreased in 4 weeks and increased in adulthood. eRNA transcription mainly contributes to the regulation of some important metabolic enzymes, suggesting a link between metabolic dysfunction and enhancer transcriptional control. We discovered a novel eRNA that is positively correlated to the expression of circadian gene Cry1 with increased binding of epigenetic cofactor p300. Conclusions Our study reveals novel insights into mechanisms of metabolic dysfunction. Enhancer activity in early life acts on metabolism-associated genes, leading to the increased susceptibility of metabolic disorders.

Journal Article

Share this book

Add to My Shelf

Comparison of structural variant callers for massive whole-genome sequence data

by Kim, Jun , Kim, Seon-Young , Park, Ji-Hwan in Accuracy , Animal Genetics and Genomics , Biomedical and Life Sciences

2024

Background Detecting structural variations (SVs) at the population level using next-generation sequencing (NGS) requires substantial computational resources and processing time. Here, we compared the performances of 11 SV callers: Delly, Manta, GridSS, Wham, Sniffles, Lumpy, SvABA, Canvas, CNVnator, MELT, and INSurVeyor. These SV callers have been recently published and have been widely employed for processing massive whole-genome sequencing datasets. We evaluated the accuracy, sequence depth, running time, and memory usage of the SV callers. Results Notably, several callers exhibited better calling performance for deletions than for duplications, inversions, and insertions. Among the SV callers, Manta identified deletion SVs with better performance and efficient computing resources, and both Manta and MELT demonstrated relatively good precision regarding calling insertions. We confirmed that the copy number variation callers, Canvas and CNVnator, exhibited better performance in identifying long duplications as they employ the read-depth approach. Finally, we also verified the genotypes inferred from each SV caller using a phased long-read assembly dataset, and Manta showed the highest concordance in terms of the deletions and insertions. Conclusions Our findings provide a comprehensive understanding of the accuracy and computational efficiency of SV callers, thereby facilitating integrative analysis of SV profiles in diverse large-scale genomic datasets.

Journal Article

Share this book

Add to My Shelf

Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools

by Alganmi, Nofe , Abusamra, Heba in Analysis , Best practice , Bioinformatics

2023

The next-generation sequencing (NGS) technology represents a significant advance in genomics and medical diagnosis. Nevertheless, the time it takes to perform sequencing, data analysis, and variant interpretation is a bottleneck in using next-generation sequencing in precision medicine. For accurate and efficient performance in clinical diagnostic lab practice, a consistent data analysis pipeline is necessary to avoid false variant calls and achieve optimum accuracy. This study aims to compare the performance of two NGS data analysis pipeline compartments, including short-read mapping (BWA-MEM and BWA-MEM2) and variant calling (GATK-HaplotypeCaller and DRAGEN-GATK). On Whole Exome Sequencing (WES) data, computational performance was assessed using several criteria, including mapping efficiency, variant calling performance, false positive calls rate, and time. We examined four gold-standard WES data sets: Ashkenazim father (NA24149), Ashkenazim mother (NA24143), Ashkenazim son (NA24385), and Asian son (NA25631). In addition, eighteen exome samples were analyzed based on different read counts, and coverage was used precisely in the run-time assessment. By using BWA-MEM 2 and Dragen-GATK, this study achieved faster and more accurate detection for SNVs and indels than the standard GATK Best Practices workflow. This systematic comparison will enable the bioinformatics community to develop a more efficient and faster solution for analyzing NGS data.

Journal Article

Share this book

Add to My Shelf

Accelerating genomic workflows using NVIDIA Parabricks

by Gorrell, Laura M. , Engelken, Haley T. , Carlson, Thad B. in Algorithms , Amazon Web Services , Analysis

2023

Background As genome sequencing becomes better integrated into scientific research, government policy, and personalized medicine, the primary challenge for researchers is shifting from generating raw data to analyzing these vast datasets. Although much work has been done to reduce compute times using various configurations of traditional CPU computing infrastructures, Graphics Processing Units (GPUs) offer opportunities to accelerate genomic workflows by orders of magnitude. Here we benchmark one GPU-accelerated software suite called NVIDIA Parabricks on Amazon Web Services (AWS), Google Cloud Platform (GCP), and an NVIDIA DGX cluster. We benchmarked six variant calling pipelines, including two germline callers (HaplotypeCaller and DeepVariant) and four somatic callers (Mutect2, Muse, LoFreq, SomaticSniper). Results We achieved up to 65 × acceleration with germline variant callers, bringing HaplotypeCaller runtimes down from 36 h to 33 min on AWS, 35 min on GCP, and 24 min on the NVIDIA DGX. Somatic callers exhibited more variation between the number of GPUs and computing platforms. On cloud platforms, GPU-accelerated germline callers resulted in cost savings compared with CPU runs, whereas some somatic callers were more expensive than CPU runs because their GPU acceleration was not sufficient to overcome the increased GPU cost. Conclusions Germline variant callers scaled well with the number of GPUs across platforms, whereas somatic variant callers exhibited more variation in the number of GPUs with the fastest runtimes, suggesting that, at least with the version of Parabricks used here, these workflows are less GPU optimized and require benchmarking on the platform of choice before being deployed at production scales. Our study demonstrates that GPUs can be used to greatly accelerate genomic workflows, thus bringing closer to grasp urgent societal advances in the areas of biosurveillance and personalized medicine.

Journal Article

Share this book

Add to My Shelf

LRScaf: improving draft genomes using long noisy reads

by Wu, Shigang , Ruan, Jue , Feng, Hu in Algorithms , Animal Genetics and Genomics , Arabidopsis thaliana

2019

Background The advent of third-generation sequencing (TGS) technologies opens the door to improve genome assembly. Long reads are promising for enhancing the quality of fragmented draft assemblies constructed from next-generation sequencing (NGS) technologies. To date, a few algorithms that are capable of improving draft assemblies have released. There are SSPACE-LongRead, OPERA-LG, SMIS, npScarf, DBG2OLC, Unicycler, and LINKS. Hybrid assembly on large genomes remains challenging, however. Results We develop a scalable and computationally efficient scaffolder, Long Reads Scaffolder (LRScaf, https://github.com/shingocat/lrscaf ), that is capable of significantly boosting assembly contiguity using long reads. In this study, we summarise a comprehensive performance assessment for state-of-the-art scaffolders and LRScaf on seven organisms, i.e., E. coli , S. cerevisiae , A. thaliana , O. sativa , S. pennellii , Z. mays , and H. sapiens . LRScaf significantly improves the contiguity of draft assemblies, e.g., increasing the NGA50 value of CHM1 from 127.1 kbp to 9.4 Mbp using 20-fold coverage PacBio dataset and the NGA50 value of NA12878 from 115.3 kbp to 12.9 Mbp using 35-fold coverage Nanopore dataset. Besides, LRScaf generates the best contiguous NGA50 on A. thaliana , S. pennellii , Z. mays , and H. sapiens . Moreover, LRScaf has the shortest run time compared with other scaffolders, and the peak RAM of LRScaf remains practical for large genomes (e.g., 20.3 and 62.6 GB on CHM1 and NA12878, respectively). Conclusions The new algorithm, LRScaf, yields the best or, at least, moderate scaffold contiguity and accuracy in the shortest run time compared with other scaffolding algorithms. Furthermore, LRScaf provides a cost-effective way to improve contiguity of draft assemblies on large genomes.

Journal Article

Share this book

Add to My Shelf

Systematic and benchmarking studies of pipelines for mammal WGBS data in the novel NGS platform

by Zhang, Xin , Liu, Yong-feng , Lin, Qun-ting in Algorithms , Animals , BatMeth2

2023

Background Whole genome bisulfite sequencing (WGBS), possesses the aptitude to dissect methylation status at the nucleotide-level resolution of 5-methylcytosine (5-mC) on a genome-wide scale. It is a powerful technique for epigenome in various cell types, and tissues. As a recently established next-generation sequencing (NGS) platform, GenoLab M is a promising alternative platform. However, its comprehensive evaluation for WGBS has not been reported. We sequenced two bisulfite-converted mammal DNA in this research using our GenoLab M and NovaSeq 6000, respectively. Then, we systematically compared those data via four widely used WGBS tools (BSMAP, Bismark, BatMeth2, BS-Seeker2) and a new bisulfite-seq tool (BSBolt). We interrogated their computational time, genome depth and coverage, and evaluated their percentage of methylated Cs. Result Here, benchmarking a combination of pre- and post-processing methods, we found that trimming improved the performance of mapping efficiency in eight datasets. The data from two platforms uncovered ~ 80% of CpG sites genome-wide in the human cell line. Those data sequenced by GenoLab M achieved a far lower proportion of duplicates (~ 5.5%). Among pipelines, BSMAP provided an intriguing representation of 5-mC distribution at CpG sites with 5-mC levels > ~ 78% in datasets from human cell lines, especially in the GenoLab M. BSMAP performed more advantages in running time, uniquely mapped reads percentages, genomic coverage, and quantitative accuracy. Finally, compared with the previous methylation pattern of human cell line and mouse tissue, we confirmed that the data from GenoLab M performed similar consistency and accuracy in methylation levels of CpG sites with that from NovaSeq 6000. Conclusion Together we confirmed that GenoLab M was a qualified NGS platform for WGBS with high performance. Our results showed that BSMAP was the suitable pipeline that allowed for WGBS studies on the GenoLab M platform.

Journal Article

Share this book

Add to My Shelf

Genomic Sequencing to Detect Cross-Breeding Quality in Dogs: An Example Studying Disorders in Sexual Development

by Cicirelli, Vincenzo , de Gennaro, Luciana , Burgio, Matteo in Animals , Breeding , Causes of

2024

Disorders of sexual development (DSDs) in dogs, similar to humans, arise from genetic mutations, gonadal differentiation, or phenotypic sex development. The French Bulldog, a breed that has seen a surge in popularity and demand, has also shown a marked increase in DSD incidence. This study aims to characterize the genetic underpinnings of DSDs in a French Bulldog named Brutus, exhibiting ambiguous genitalia and internal sexual anatomy, and to explore the impact of breeding practices on genetic diversity within the breed. We utilized a comprehensive approach combining conventional cytogenetics, molecular techniques, and deep sequencing to investigate the genetic profile of Brutus. The sequence data were compared to three other male French Bulldogs’ genome sequences with typical reproductive anatomy, including Brutus’s father and the canine reference genome (CanFam6). We found a Robertsonian fusion involving chromosome 23 previously reported in dogs as a causative mutation responsible for sex reversal syndrome. Our findings revealed a 22% mosaicism (78,XX/77,XX), the absence of the sex-determining region (SRY) gene, and the presence of 43 unique Single Nucleotide Variants (SNVs) not inherited from the father. Notably, the run of homozygosity (ROH) analysis showed Brutus has a higher number of homozygous segments compared to other Bulldogs, with a total length of these fragments 50% greater than the average, strongly suggesting this dog is the product of the mating between siblings. Although no direct causative genes for the DSD phenotype were identified, four candidate loci warrant further investigation. Our study highlighted the need for a better annotated and curated reference dog genome to define genes causative of any specific phenotype, suggests a potential genetic basis for the DSD phenotype in dogs, and underscores the consequences of uncontrolled breeding practices in French Bulldogs. These findings highlight the importance of implementing strategic genetic management to preserve genetic health and diversity in canine populations.

Journal Article

Share this book

Add to My Shelf

DREAM-Stellar: parallel and space efficient exact local alignment

by Gottlieb, Simon Gene , Aasna, Evelin , Ehrhardt, Marcel in Algorithms , Alignment , Amino acid sequence

2026

Background Searching large genomic data sets for local alignments poses a computational challenge. A particular obstacle is the handling of repetitive sequences that appear in various contexts and incur a high runtime cost. For practical homology search, it is important to develop a specific but sensitive filter. Good filters reduce the search space before alignment without missing significant matches. Results We introduce DREAM-Stellar, a parallelized, updated version of the pairwise local aligner Stellar. The new aligner, DREAM-Stellar, is composed of four steps: preprocessing the queries and references, building a data structure for distributing the queries, computing in parallel the results and finally combining them. For distributing the queries we use the IBF data structure and a new prefilter for local alignments. We present our comparison of five local aligners on simulated and real genomic data and conclude that heuristic tools like BLAST miss a large percentage of significant local alignments or \"drown\" them in millions of less significant matches. This new version of Stellar is up to 900 times faster on 32 parallel threads than its single-threaded predecessor and can find all alignments between a pair of genomes in minutes. With that, the runtime of DREAM-Stellar is on par with tools like BLAST etc. Conclusions DREAM-Stellar is very practical and fast on very long sequences which makes it a suitable new tool for finding local alignments between genomic sequences under the edit distance model. The software is freely available for Linux and Mac OS X at https://github.com/seqan/dream-stellar

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter