Catalogue Search | MBRL

DNA methylation-calling tools for Oxford Nanopore sequencing: a survey and human epigenome-wide evaluation

by Cheng, Albert , Rosikiewicz, Wojciech , Li, Sheng in 5-Methylcytosine - analysis , Accuracy , Animal Genetics and Genomics

2021

Background Nanopore long-read sequencing technology greatly expands the capacity of long-range, single-molecule DNA-modification detection. A growing number of analytical tools have been developed to detect DNA methylation from nanopore sequencing reads. Here, we assess the performance of different methylation-calling tools to provide a systematic evaluation to guide researchers performing human epigenome-wide studies. Results We compare seven analytic tools for detecting DNA methylation from nanopore long-read sequencing data generated from human natural DNA at a whole-genome scale. We evaluate the per-read and per-site performance of CpG methylation prediction across different genomic contexts, CpG site coverage, and computational resources consumed by each tool. The seven tools exhibit different performances across the evaluation criteria. We show that the methylation prediction at regions with discordant DNA methylation patterns, intergenic regions, low CG density regions, and repetitive regions show room for improvement across all tools. Furthermore, we demonstrate that 5hmC levels at least partly contribute to the discrepancy between bisulfite and nanopore sequencing. Lastly, we provide an online DNA methylation database ( https://nanome.jax.org ) to display the DNA methylation levels detected by nanopore sequencing and bisulfite sequencing data across different genomic contexts. Conclusions Our study is the first systematic benchmark of computational methods for detection of mammalian whole-genome DNA modifications in nanopore sequencing. We provide a broad foundation for cross-platform standardization and an evaluation of analytical tools designed for genome-scale modified base detection using nanopore sequencing.

Journal Article

Share this book

Add to My Shelf

Understanding prokaryotic adaptation through advanced DNA methylation detection techniques

by Chen, Ziming , Ross, Elizabeth M , Ong, Chian Teng in Adaptation , Adaptation, Physiological , Adenine

2025

DNA methylation, a versatile epigenetic modification in prokaryotes, is a crucial regulator of various biological activities, such as genome defence, gene expression, and DNA repair. The most common DNA methylation form in prokaryotes is N6-methyladenine, where a methyl group is added to the adenine. Orphan and restriction-modification system methylases constitute the main methylation systems in prokaryotes. Prokaryotes can adapt to environmental fluctuations through orphan methylase regulation and phase variation of restriction-modification systems, which generate diversified methylomes that modulate the expression of genes. Modern sequencing techniques, including single-molecule real-time sequencing and Nanopore sequencing, enable the characterization of several methylation patterns simultaneously and facilitate the study of prokaryotic epigenomics. This review introduces the prokaryotic DNA methylation systems and prokaryotic adaptation through DNA methylation. Finally, we summarize the current sequencing techniques capable of characterizing methylation forms applicable to prokaryotes and their future perspectives.

Journal Article

Share this book

Add to My Shelf

Extensive sequence duplication in Arabidopsis revealed by pseudo-heterozygosity

by Pisupati, Rahul , Burns, Robin , Rabanal, Fernando A. in Animal Genetics and Genomics , Arabidopsis , Arabidopsis - genetics

2023

Background It is apparent that genomes harbor much structural variation that is largely undetected for technical reasons. Such variation can cause artifacts when short-read sequencing data are mapped to a reference genome. Spurious SNPs may result from mapping of reads to unrecognized duplicated regions. Calling SNP using the raw reads of the 1001 Arabidopsis Genomes Project we identified 3.3 million (44%) heterozygous SNPs. Given that Arabidopsis thaliana ( A. thaliana ) is highly selfing, and that extensively heterozygous individuals have been removed, we hypothesize that these SNPs reflected cryptic copy number variation. Results The heterozygosity we observe consists of particular SNPs being heterozygous across individuals in a manner that strongly suggests it reflects shared segregating duplications rather than random tracts of residual heterozygosity due to occasional outcrossing. Focusing on such pseudo-heterozygosity in annotated genes, we use genome-wide association to map the position of the duplicates. We identify 2500 putatively duplicated genes and validate them using de novo genome assemblies from six lines. Specific examples included an annotated gene and nearby transposon that transpose together. We also demonstrate that cryptic structural variation produces highly inaccurate estimates of DNA methylation polymorphism. Conclusions Our study confirms that most heterozygous SNP calls in A. thaliana are artifacts and suggest that great caution is needed when analyzing SNP data from short-read sequencing. The finding that 10% of annotated genes exhibit copy-number variation, and the realization that neither gene- nor transposon-annotation necessarily tells us what is actually mobile in the genome suggests that future analyses based on independently assembled genomes will be very informative.

Journal Article

Share this book

Add to My Shelf

Assessment of nanopore RNA modification calling in human cell lines and synthetic systems

by Akeson, Stuart , Daneshvar Kakhaki, Pooria , Ghohabi Esfahani, Neda in Animal Genetics and Genomics , Bioinformatics , Biomedical and Life Sciences

2026

Background Nanopore technology enables the direct sequencing of intact RNA molecules allowing for the detection of native chemical modifications. In 2024, Oxford Nanopore Technologies updated direct RNA sequencing from RNA002 to RNA004 platform as well as releasing an improved basecaller (Dorado) capable of de novo detection of eight RNA modifications. We compare RNA002 and RNA004 platforms for poly(A) RNA from GM12878 and HEK293 cell lines and evaluate Dorado-based RNA modification calling. Results We compute U-to-C mismatches, previously used to identify putative pseudouridine sites, and run m6anet for identifying putative N6-methyladenosine sites. We find that Dorado identifies global and site-specific differences when compared to RNA002 methods. We examine eight RNA modifications detected by Dorado for Nanopore direct RNA sequencing data and propose an analysis strategy for curating RNA modification predictions, including thresholds for read coverage and modification occupancy, canonical RNA-based false positive correction, and comparison with orthogonal information. When comparing modification sites called by Dorado versus those documented by orthogonal datasets, we note significant discordance and we document disagreements between our results and orthogonal datasets. Conclusions The transition from RNA002 to RNA004 substantially improves sequencing accuracy and modification calling. However, Nanopore direct RNA sequencing-based RNA modification detection requires careful validation. We recommend combining Nanopore direct RNA sequencing with orthogonal methods and appropriate filtering strategies for increased confidence in modification calls. Graphical abstract

Journal Article

Share this book

Add to My Shelf

Benchmarking UMI-aware and standard variant callers for low frequency ctDNA variant detection

by Brierley, Liam , Fowler, Anna , Jorgensen, Andrea in Analysis , Animal Genetics and Genomics , Artifact identification

2024

Background Circulating tumour DNA (ctDNA) is a subset of cell free DNA (cfDNA) released by tumour cells into the bloodstream. Circulating tumour DNA has shown great potential as a biomarker to inform treatment in cancer patients. Collecting ctDNA is minimally invasive and reflects the entire genetic makeup of a patient’s cancer. ctDNA variants in NGS data can be difficult to distinguish from sequencing and PCR artefacts due to low abundance, particularly in the early stages of cancer. Unique Molecular Identifiers (UMIs) are short sequences ligated to the sequencing library before amplification. These sequences are useful for filtering out low frequency artefacts. The utility of ctDNA as a cancer biomarker depends on accurate detection of cancer variants. Results In this study, we benchmarked six variant calling tools, including two UMI-aware callers for their ability to call ctDNA variants. The standard variant callers tested included Mutect2, bcftools, LoFreq and FreeBayes. The UMI-aware variant callers benchmarked were UMI-VarCal and UMIErrorCorrect. We used both datasets with known variants spiked in at low frequencies, and datasets containing ctDNA, and generated synthetic UMI sequences for these datasets. Variant callers displayed different preferences for sensitivity and specificity. Mutect2 showed high sensitivity, while returning more privately called variants than any other caller in data without synthetic UMIs – an indicator of false positive variant discovery. In data encoded with synthetic UMIs, UMI-VarCal detected fewer putative false positive variants than all other callers in synthetic datasets. Mutect2 showed a balance between high sensitivity and specificity in data encoded with synthetic UMIs. Conclusions Our results indicate UMI-aware variant callers have potential to improve sensitivity and specificity in calling low frequency ctDNA variants over standard variant calling tools. There is a growing need for further development of UMI-aware variant calling tools if effective early detection methods for cancer using ctDNA samples are to be realised.

Journal Article

Share this book

Add to My Shelf

The evaluation of different combinations of enzyme set, aligner and caller in GBS sequencing of soybean

by Zamalutdinov, Aleksei , Boldyrev, Stepan , Ben, Cécile in Biological Techniques , Biomedical and Life Sciences , Breeding of animals

2025

Background Genotype-by-sequencing (GBS) is a cost-effective method for large-scale genotyping, widely used across various species, particularly those with large genomes. A critical aspect of GBS lies in the selection of restriction enzymes for genome digestion and the optimization of data analysis pipelines. However, few studies have comprehensively examined the combined effects of enzyme choice and pipeline configuration. Results In this study, we created GBS libraries using three commonly used restriction enzyme combinations ( HindIII - NlaIII , PstI - MspI , and ApeKI ) and assessed multiple SNP-calling pipelines in 15 soybean varieties. We tested four aligners (BWA-MEM, Bowtie2, BBMap, and Strobealign) and seven SNP callers (Bcftools, Stacks, DeepVariant, FreeBayes, VarScan, BBCallVariants, and GATK). Our finding reveal that enzyme choice significantly influences the number of identified SNP, gene localization preferences, and accuracy. Furthermore, the performance of SNP callers varied markedly in terms of SNP count, precision, recall, and false discovery rate (FDR). DeepVariant exhibited the highest accuracy, with 76.0% of its SNPs intersecting with whole-genome sequencing (WGS)-derived SNPs and an FDR of 0.0095, compared to FreeBayes, which had 47.8% intersection and an FDR of 0.6321. Conclusions Our findings underscore the importance of optimizing both enzyme selection for sequencing libraries and data analysis pipelines to ensure robust and reproducible results. This study provides a general framework for designing large-scale genotyping experiments aimed to specific quality and quantity requirements in various plant species.

Journal Article

Share this book

Add to My Shelf

HERON: A Novel Tool Enables Identification of Long, Weakly Enriched Genomic Domains in ChIP-seq Data

by Wilczynski, Bartek , Macioszek, Anna in Algorithms , Datasets , Experiments

2021

The explosive development of next-generation sequencing-based technologies has allowed us to take an unprecedented look at many molecular signatures of the non-coding genome. In particular, the ChIP-seq (Chromatin ImmunoPrecipitation followed by sequencing) technique is now very commonly used to assess the proteins associated with different non-coding DNA regions genome-wide. While the analysis of such data related to transcription factor binding is relatively straightforward, many modified histone variants, such as H3K27me3, are very important for the process of gene regulation but are very difficult to interpret. We propose a novel method, called HERON (HiddEn MaRkov mOdel based peak calliNg), for genome-wide data analysis that is able to detect DNA regions enriched for a certain feature, even in difficult settings of weakly enriched long DNA domains. We demonstrate the performance of our method both on simulated and experimental data.

Journal Article

Share this book

Add to My Shelf

The Pattern and Distribution of Induced Mutations in J. curcas Using Reduced Representation Sequencing

by Maghuly, Fatemeh , Krainer, Julie , Pabinger, Stephan in biofuel , Bioinformatics , Biosynthesis

2018

Mutagenesis in combination with Genotyping by Sequencing (GBS) is a powerful tool for introducing variation, studying gene function and identifying causal mutations underlying phenotypes of interest in crop plant genomes. About 400 million paired-end reads were obtained from 82 ethylmethane sulfonate (EMS) induced mutants and 14 wild-type accessions of for the detection of Single Nucleotide Polymorphisms (SNPs) and Insertion/Deletions (InDels) by two different approaches (nGBS and ddGBS) on an Illumina HiSeq 2000 sequencer. Using bioinformatics analyses, 1,452 induced SNPs and InDels were identified in coding regions, which were distributed across 995 genes. The predominantly observed mutations were G/C to A/T transitions (64%), while transversions were observed at a lower frequency (36%). Regarding the effect of mutations on gene function, 18% of the mutations were located in intergenic regions. In fact, mutants with the highest number of heterozygous SNPs were found in samples treated with 0.8% EMS for 3 h. Reconstruction of the metabolic pathways showed that in total 16 SNPs were located in six KEGG pathways by nGBS and two pathways by ddGBS. The most highly represented pathways were ether-lipid metabolism and glycerophospholipid metabolism, followed by starch and sucrose metabolism by nGBS and triterpenoid biosynthesis as well as steroid biosynthesis by ddGBS. Furthermore, high genome methylation was observed in , which might help to understand the plasticity of the genome in response to environmental factors. At last, the results showed that continuously vegetatively propagated tissue is a fast, efficient and accurate method to dissolve chimeras, especially for long-lived plants like . Obtained data showed that allelic variations and analyses of gene functions (gene function prediction), which control important traits, could be identified in mutant populations using nGBS and ddGBS. However, the handling of GBS data is more difficult and more challenging than the traditional TILLING strategy in mutated plants, since the genome sequence is incomplete, which makes alignment and variant analysis of target sequence reads challenging to perform and interpret. Therefore, providing a complete reference genome sequence with high quality should be a priority for any breeding program.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter