Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
101
result(s) for
"Luo, Ruibang"
Sort by:
A multi-task convolutional deep neural network for variant calling in single molecule sequencing
2019
The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5–15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieves 99.67, 95.78, 90.53% F1-score on 1KP common variants, and 98.65, 92.57, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than 2 h on a standard server. Furthermore, we present 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source (
https://github.com/aquaskyline/Clairvoyante
), with modules to train, utilize and visualize the model.
Single Molecule Sequencing (SMS) technologies generate long but noisy reads data. Here, the authors develop Clairvoyante, a deep neural network-based method for variant calling with SMS reads such as PacBio and ONT data.
Journal Article
Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports
2022
Pre-training lays the foundation for recent successes in radiograph analysis supported by deep learning. It learns transferable image representations by conducting large-scale fully- or self-supervised learning on a source domain; however, supervised pre-training requires a complex and labour-intensive two-stage human-assisted annotation process, whereas self-supervised learning cannot compete with the supervised paradigm. To tackle these issues, we propose a cross-supervised methodology called reviewing free-text reports for supervision (REFERS), which acquires free supervision signals from the original radiology reports accompanying the radiographs. The proposed approach employs a vision transformer and is designed to learn joint representations from multiple views within every patient study. REFERS outperforms its transfer learning and self-supervised learning counterparts on four well-known X-ray datasets under extremely limited supervision. Moreover, REFERS even surpasses methods based on a source domain of radiographs with human-assisted structured labels; it therefore has the potential to replace canonical pre-training methodologies.
To train machine learning models for medical imaging, large amounts of training data are needed. Zhou and colleagues instead propose a method of weak supervision which uses the information of radiology reports to learn visual features without the need for expert labelling.
Journal Article
cuteFC: regenotyping structural variants through an accurate and efficient force-calling method
2025
Long-read sequencing technologies have great potential for the comprehensive discovery of structural variations (SVs). However, accurate genotype assignment for SVs remains challenging due to unavoidable sequencing errors, limited coverage, and the complexity of SVs. Herein, we propose cuteFC, which employs self-adaptive clustering along with a multiallele-aware clustering to achieve accurate SV regenotyping through a force-calling approach. cuteFC also applies a Genome Position Scanner algorithm to improve its application efficiency. Benchmarking evaluations demonstrate that cuteFC outperforms state-of-the-art methods with 2–5% higher
F
1 scores and constructs a higher-quality genomic atlas with minimal computational resources. cuteFC is available at
https://github.com/Meltpinkg/cuteFC
and
https://zenodo.org/records/14671406
.
Journal Article
Single-base resolution maps of cultivated and wild rice methylomes and regulatory roles of DNA methylation in plant gene expression
by
Yu, Jian
,
Zhang, Shilai
,
Sun, Jingfeng
in
Analysis
,
Animal Genetics and Genomics
,
Arabidopsis
2012
Background
DNA methylation plays important biological roles in plants and animals. To examine the rice genomic methylation landscape and assess its functional significance, we generated single-base resolution DNA methylome maps for Asian cultivated rice
Oryza sativa
ssp
. japonica
,
indica
and their wild relatives,
Oryza rufipogon
and
Oryza nivara
.
Results
The overall methylation level of rice genomes is four times higher than that of
Arabidopsis
. Consistent with the results reported for
Arabidopsis
, methylation in promoters represses gene expression while gene-body methylation generally appears to be positively associated with gene expression
.
Interestingly, we discovered that methylation in gene transcriptional termination regions (TTRs) can significantly repress gene expression, and the effect is even stronger than that of promoter methylation. Through integrated analysis of genomic, DNA methylomic and transcriptomic differences between cultivated and wild rice, we found that primary DNA sequence divergence is the major determinant of methylational differences at the whole genome level, but DNA methylational difference alone can only account for limited gene expression variation between the cultivated and wild rice. Furthermore, we identified a number of genes with significant difference in methylation level between the wild and cultivated rice.
Conclusions
The single-base resolution methylomes of rice obtained in this study have not only broadened our understanding of the mechanism and function of DNA methylation in plant genomes, but also provided valuable data for future studies of rice epigenetics and the epigenetic differentiation between wild and cultivated rice.
Journal Article
ClairS-TO: a deep-learning method for long-read tumor-only somatic small variant calling
2025
Accurate detection of somatic variants in tumors is of critical importance and remains challenging. Current methods typically require matched normal samples for reliable detection, which are often unavailable in real-world research and clinical scenarios. Without a matched normal sample, more proficient algorithms are required to distinguish true somatic variants from germline variants and technical artifacts. However, existing tumor-only somatic variant callers that were designed for short-read sequencing data are not able to work well with long-read data. To fill the gap, we present ClairS-TO, a deep-learning-based method for long-read tumor-only somatic variant calling. ClairS-TO uses an ensemble of two disparate neural networks trained from the same samples but for opposite tasks—how likely/not likely a candidate is a somatic variant. Benchmarks using COLO829 and HCC1395 cancer cell lines show that ClairS-TO outperforms DeepSomatic and smrest in ONT and PacBio long-read data. ClairS-TO is also applicable to short-read data and outperforms Mutect2, Octopus, Pisces, and DeepSomatic. Extensive experiments across various sequencing coverages, variant allelic fractions, and tumor purities support that ClairS-TO is a reliable tool for somatic variant discovery. ClairS-TO is open-source, available at
https://github.com/HKU-BAL/ClairS-TO
.
The accurate detection of somatic variants in cancer without matched normal controls remains challenging, particularly for long-read sequencing data. Here, the authors develop ClairS-TO, a deep learning method for long-read tumour-only somatic variant calling that outperforms similar algorithms and can also work with short-read sequencing data.
Journal Article
Boosting variant-calling performance with multi-platform sequencing data using Clair3-MP
by
Yu, Huijing
,
Luo, Ruibang
,
Su, Junhao
in
Algorithms
,
Bioinformatics
,
Biomedical and Life Sciences
2023
Background
With the continuous advances in third-generation sequencing technology and the increasing affordability of next-generation sequencing technology, sequencing data from different sequencing technology platforms is becoming more common. While numerous benchmarking studies have been conducted to compare variant-calling performance across different platforms and approaches, little attention has been paid to the potential of leveraging the strengths of different platforms to optimize overall performance, especially integrating Oxford Nanopore and Illumina sequencing data.
Results
We investigated the impact of multi-platform data on the performance of variant calling through carefully designed experiments with a deep learning-based variant caller named Clair3-MP (Multi-Platform). Through our research, we not only demonstrated the capability of ONT-Illumina data for improved variant calling, but also identified the optimal scenarios for utilizing ONT-Illumina data. In addition, we revealed that the improvement in variant calling using ONT-Illumina data comes from an improvement in difficult genomic regions, such as the large low-complexity regions and segmental and collapse duplication regions. Moreover, Clair3-MP can incorporate reference genome stratification information to achieve a small but measurable improvement in variant calling. Clair3-MP is accessible as an open-source project at:
https://github.com/HKU-BAL/Clair3-MP
.
Conclusions
These insights have important implications for researchers and practitioners alike, providing valuable guidance for improving the reliability and efficiency of genomic analysis in diverse applications.
Journal Article
Exploring the limit of using a deep neural network on pileup data for germline variant calling
2020
Single-molecule sequencing technologies have emerged in recent years and revolutionized structural variant calling, complex genome assembly and epigenetic mark detection. However, the lack of a highly accurate small variant caller has limited these technologies from being more widely used. Here, we present Clair, the successor to Clairvoyante, a program for fast and accurate germline small variant calling, using single-molecule sequencing data. For Oxford Nanopore Technology data, Clair achieves better precision, recall and speed than several competing programs, including Clairvoyante, Longshot and Medaka. Through studying the missed variants and benchmarking intentionally overfitted models, we found that Clair may be approaching the limit of possible accuracy for germline small variant calling using pileup data and deep neural networks. Clair requires only a conventional central processing unit (CPU) for variant calling and is an open-source project available at
https://github.com/HKU-BAL/Clair
.
A lack of accurate and efficient variant calling methods has held back single-molecule sequencing technologies from clinical applications. The authors present a deep-learning method for fast and accurate germline small variant calling, using single-molecule sequencing data.
Journal Article
Duet: SNP-assisted structural variant calling and phasing using Oxford nanopore sequencing
2022
Background
Whole genome sequencing using the long-read Oxford Nanopore Technologies (ONT) MinION sequencer provides a cost-effective option for structural variant (SV) detection in clinical applications. Despite the advantage of using long reads, however, accurate SV calling and phasing are still challenging.
Results
We introduce Duet, an SV detection tool optimized for SV calling and phasing using ONT data. The tool uses novel features integrated from both SV signatures and single-nucleotide polymorphism signatures, which can accurately distinguish SV haplotype from a false signal. Duet was benchmarked against state-of-the-art tools on multiple ONT sequencing datasets of sequencing coverage ranging from 8× to 40×. At low sequencing coverage of 8×, Duet performs better than all other tools in SV calling, SV genotyping and SV phasing. When the sequencing coverage is higher (20× to 40×), the F1-score for SV phasing is further improved in comparison to the performance of other tools, while its performance of SV genotyping and SV calling remains higher than other tools.
Conclusion
Duet can perform accurate SV calling, SV genotyping and SV phasing using low-coverage ONT data, making it very useful for low-coverage genomes. It has great performance when scaled to high-coverage genomes, which is adaptable to various clinical applications. Duet is open source and is available at
https://github.com/yekaizhou/duet
.
Journal Article