Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
8
result(s) for
"Methylation calling"
Sort by:
DNA methylation-calling tools for Oxford Nanopore sequencing: a survey and human epigenome-wide evaluation
by
Cheng, Albert
,
Rosikiewicz, Wojciech
,
Li, Sheng
in
5-Methylcytosine - analysis
,
Accuracy
,
Animal Genetics and Genomics
2021
Background
Nanopore long-read sequencing technology greatly expands the capacity of long-range, single-molecule DNA-modification detection. A growing number of analytical tools have been developed to detect DNA methylation from nanopore sequencing reads. Here, we assess the performance of different methylation-calling tools to provide a systematic evaluation to guide researchers performing human epigenome-wide studies.
Results
We compare seven analytic tools for detecting DNA methylation from nanopore long-read sequencing data generated from human natural DNA at a whole-genome scale. We evaluate the per-read and per-site performance of CpG methylation prediction across different genomic contexts, CpG site coverage, and computational resources consumed by each tool. The seven tools exhibit different performances across the evaluation criteria. We show that the methylation prediction at regions with discordant DNA methylation patterns, intergenic regions, low CG density regions, and repetitive regions show room for improvement across all tools. Furthermore, we demonstrate that 5hmC levels at least partly contribute to the discrepancy between bisulfite and nanopore sequencing. Lastly, we provide an online DNA methylation database (
https://nanome.jax.org
) to display the DNA methylation levels detected by nanopore sequencing and bisulfite sequencing data across different genomic contexts.
Conclusions
Our study is the first systematic benchmark of computational methods for detection of mammalian whole-genome DNA modifications in nanopore sequencing. We provide a broad foundation for cross-platform standardization and an evaluation of analytical tools designed for genome-scale modified base detection using nanopore sequencing.
Journal Article
Understanding prokaryotic adaptation through advanced DNA methylation detection techniques
by
Chen, Ziming
,
Ross, Elizabeth M
,
Ong, Chian Teng
in
Adaptation
,
Adaptation, Physiological
,
Adenine
2025
DNA methylation, a versatile epigenetic modification in prokaryotes, is a crucial regulator of various biological activities, such as genome defence, gene expression, and DNA repair. The most common DNA methylation form in prokaryotes is N6-methyladenine, where a methyl group is added to the adenine. Orphan and restriction-modification system methylases constitute the main methylation systems in prokaryotes. Prokaryotes can adapt to environmental fluctuations through orphan methylase regulation and phase variation of restriction-modification systems, which generate diversified methylomes that modulate the expression of genes. Modern sequencing techniques, including single-molecule real-time sequencing and Nanopore sequencing, enable the characterization of several methylation patterns simultaneously and facilitate the study of prokaryotic epigenomics. This review introduces the prokaryotic DNA methylation systems and prokaryotic adaptation through DNA methylation. Finally, we summarize the current sequencing techniques capable of characterizing methylation forms applicable to prokaryotes and their future perspectives.
Journal Article
Extensive sequence duplication in Arabidopsis revealed by pseudo-heterozygosity
by
Pisupati, Rahul
,
Burns, Robin
,
Rabanal, Fernando A.
in
Animal Genetics and Genomics
,
Arabidopsis
,
Arabidopsis - genetics
2023
Background
It is apparent that genomes harbor much structural variation that is largely undetected for technical reasons. Such variation can cause artifacts when short-read sequencing data are mapped to a reference genome. Spurious SNPs may result from mapping of reads to unrecognized duplicated regions. Calling SNP using the raw reads of the 1001 Arabidopsis Genomes Project we identified 3.3 million (44%) heterozygous SNPs. Given that
Arabidopsis thaliana
(
A. thaliana
) is highly selfing, and that extensively heterozygous individuals have been removed, we hypothesize that these SNPs reflected cryptic copy number variation.
Results
The heterozygosity we observe consists of particular SNPs being heterozygous across individuals in a manner that strongly suggests it reflects shared segregating duplications rather than random tracts of residual heterozygosity due to occasional outcrossing. Focusing on such pseudo-heterozygosity in annotated genes, we use genome-wide association to map the position of the duplicates. We identify 2500 putatively duplicated genes and validate them using de novo genome assemblies from six lines. Specific examples included an annotated gene and nearby transposon that transpose together. We also demonstrate that cryptic structural variation produces highly inaccurate estimates of DNA methylation polymorphism.
Conclusions
Our study confirms that most heterozygous SNP calls in
A. thaliana
are artifacts and suggest that great caution is needed when analyzing SNP data from short-read sequencing. The finding that 10% of annotated genes exhibit copy-number variation, and the realization that neither gene- nor transposon-annotation necessarily tells us what is actually mobile in the genome suggests that future analyses based on independently assembled genomes will be very informative.
Journal Article
Assessment of nanopore RNA modification calling in human cell lines and synthetic systems
by
Akeson, Stuart
,
Daneshvar Kakhaki, Pooria
,
Ghohabi Esfahani, Neda
in
Animal Genetics and Genomics
,
Bioinformatics
,
Biomedical and Life Sciences
2026
Background
Nanopore technology enables the direct sequencing of intact RNA molecules allowing for the detection of native chemical modifications. In 2024, Oxford Nanopore Technologies updated direct RNA sequencing from RNA002 to RNA004 platform as well as releasing an improved basecaller (Dorado) capable of de novo detection of eight RNA modifications. We compare RNA002 and RNA004 platforms for poly(A) RNA from GM12878 and HEK293 cell lines and evaluate Dorado-based RNA modification calling.
Results
We compute U-to-C mismatches, previously used to identify putative pseudouridine sites, and run m6anet for identifying putative N6-methyladenosine sites. We find that Dorado identifies global and site-specific differences when compared to RNA002 methods. We examine eight RNA modifications detected by Dorado for Nanopore direct RNA sequencing data and propose an analysis strategy for curating RNA modification predictions, including thresholds for read coverage and modification occupancy, canonical RNA-based false positive correction, and comparison with orthogonal information. When comparing modification sites called by Dorado versus those documented by orthogonal datasets, we note significant discordance and we document disagreements between our results and orthogonal datasets.
Conclusions
The transition from RNA002 to RNA004 substantially improves sequencing accuracy and modification calling. However, Nanopore direct RNA sequencing-based RNA modification detection requires careful validation. We recommend combining Nanopore direct RNA sequencing with orthogonal methods and appropriate filtering strategies for increased confidence in modification calls.
Graphical abstract
Journal Article
Benchmarking UMI-aware and standard variant callers for low frequency ctDNA variant detection
by
Brierley, Liam
,
Fowler, Anna
,
Jorgensen, Andrea
in
Analysis
,
Animal Genetics and Genomics
,
Artifact identification
2024
Background
Circulating tumour DNA (ctDNA) is a subset of cell free DNA (cfDNA) released by tumour cells into the bloodstream. Circulating tumour DNA has shown great potential as a biomarker to inform treatment in cancer patients. Collecting ctDNA is minimally invasive and reflects the entire genetic makeup of a patient’s cancer. ctDNA variants in NGS data can be difficult to distinguish from sequencing and PCR artefacts due to low abundance, particularly in the early stages of cancer. Unique Molecular Identifiers (UMIs) are short sequences ligated to the sequencing library before amplification. These sequences are useful for filtering out low frequency artefacts. The utility of ctDNA as a cancer biomarker depends on accurate detection of cancer variants.
Results
In this study, we benchmarked six variant calling tools, including two UMI-aware callers for their ability to call ctDNA variants. The standard variant callers tested included Mutect2, bcftools, LoFreq and FreeBayes. The UMI-aware variant callers benchmarked were UMI-VarCal and UMIErrorCorrect. We used both datasets with known variants spiked in at low frequencies, and datasets containing ctDNA, and generated synthetic UMI sequences for these datasets. Variant callers displayed different preferences for sensitivity and specificity. Mutect2 showed high sensitivity, while returning more privately called variants than any other caller in data without synthetic UMIs – an indicator of false positive variant discovery. In data encoded with synthetic UMIs, UMI-VarCal detected fewer putative false positive variants than all other callers in synthetic datasets. Mutect2 showed a balance between high sensitivity and specificity in data encoded with synthetic UMIs.
Conclusions
Our results indicate UMI-aware variant callers have potential to improve sensitivity and specificity in calling low frequency ctDNA variants over standard variant calling tools. There is a growing need for further development of UMI-aware variant calling tools if effective early detection methods for cancer using ctDNA samples are to be realised.
Journal Article
The evaluation of different combinations of enzyme set, aligner and caller in GBS sequencing of soybean
by
Zamalutdinov, Aleksei
,
Boldyrev, Stepan
,
Ben, Cécile
in
Biological Techniques
,
Biomedical and Life Sciences
,
Breeding of animals
2025
Background
Genotype-by-sequencing (GBS) is a cost-effective method for large-scale genotyping, widely used across various species, particularly those with large genomes. A critical aspect of GBS lies in the selection of restriction enzymes for genome digestion and the optimization of data analysis pipelines. However, few studies have comprehensively examined the combined effects of enzyme choice and pipeline configuration.
Results
In this study, we created GBS libraries using three commonly used restriction enzyme combinations (
HindIII
-
NlaIII
,
PstI
-
MspI
, and
ApeKI
) and assessed multiple SNP-calling pipelines in 15 soybean varieties. We tested four aligners (BWA-MEM, Bowtie2, BBMap, and Strobealign) and seven SNP callers (Bcftools, Stacks, DeepVariant, FreeBayes, VarScan, BBCallVariants, and GATK). Our finding reveal that enzyme choice significantly influences the number of identified SNP, gene localization preferences, and accuracy. Furthermore, the performance of SNP callers varied markedly in terms of SNP count, precision, recall, and false discovery rate (FDR). DeepVariant exhibited the highest accuracy, with 76.0% of its SNPs intersecting with whole-genome sequencing (WGS)-derived SNPs and an FDR of 0.0095, compared to FreeBayes, which had 47.8% intersection and an FDR of 0.6321.
Conclusions
Our findings underscore the importance of optimizing both enzyme selection for sequencing libraries and data analysis pipelines to ensure robust and reproducible results. This study provides a general framework for designing large-scale genotyping experiments aimed to specific quality and quantity requirements in various plant species.
Journal Article
HERON: A Novel Tool Enables Identification of Long, Weakly Enriched Genomic Domains in ChIP-seq Data
2021
The explosive development of next-generation sequencing-based technologies has allowed us to take an unprecedented look at many molecular signatures of the non-coding genome. In particular, the ChIP-seq (Chromatin ImmunoPrecipitation followed by sequencing) technique is now very commonly used to assess the proteins associated with different non-coding DNA regions genome-wide. While the analysis of such data related to transcription factor binding is relatively straightforward, many modified histone variants, such as H3K27me3, are very important for the process of gene regulation but are very difficult to interpret. We propose a novel method, called HERON (HiddEn MaRkov mOdel based peak calliNg), for genome-wide data analysis that is able to detect DNA regions enriched for a certain feature, even in difficult settings of weakly enriched long DNA domains. We demonstrate the performance of our method both on simulated and experimental data.
Journal Article
The Pattern and Distribution of Induced Mutations in J. curcas Using Reduced Representation Sequencing
2018
Mutagenesis in combination with Genotyping by Sequencing (GBS) is a powerful tool for introducing variation, studying gene function and identifying causal mutations underlying phenotypes of interest in crop plant genomes. About 400 million paired-end reads were obtained from 82 ethylmethane sulfonate (EMS) induced mutants and 14 wild-type accessions of
for the detection of Single Nucleotide Polymorphisms (SNPs) and Insertion/Deletions (InDels) by two different approaches (nGBS and ddGBS) on an Illumina HiSeq 2000 sequencer. Using bioinformatics analyses, 1,452 induced SNPs and InDels were identified in coding regions, which were distributed across 995 genes. The predominantly observed mutations were G/C to A/T transitions (64%), while transversions were observed at a lower frequency (36%). Regarding the effect of mutations on gene function, 18% of the mutations were located in intergenic regions. In fact, mutants with the highest number of heterozygous SNPs were found in samples treated with 0.8% EMS for 3 h. Reconstruction of the metabolic pathways showed that in total 16 SNPs were located in six KEGG pathways by nGBS and two pathways by ddGBS. The most highly represented pathways were ether-lipid metabolism and glycerophospholipid metabolism, followed by starch and sucrose metabolism by nGBS and triterpenoid biosynthesis as well as steroid biosynthesis by ddGBS. Furthermore, high genome methylation was observed in
, which might help to understand the plasticity of the
genome in response to environmental factors. At last, the results showed that continuously vegetatively propagated tissue is a fast, efficient and accurate method to dissolve chimeras, especially for long-lived plants like
. Obtained data showed that allelic variations and
analyses of gene functions (gene function prediction), which control important traits, could be identified in mutant populations using nGBS and ddGBS. However, the handling of GBS data is more difficult and more challenging than the traditional TILLING strategy in mutated plants, since the
genome sequence is incomplete, which makes alignment and variant analysis of target sequence reads challenging to perform and interpret. Therefore, providing a complete
reference genome sequence with high quality should be a priority for any breeding program.
Journal Article