Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
4
result(s) for
"Twin contrastive learning"
Sort by:
CGRclust: Chaos Game Representation for twin contrastive clustering of unlabelled DNA sequences
by
Alipour, Fatemeh
,
Kari, Lila
,
Hill, Kathleen A.
in
Algorithms
,
Alignment
,
Alignment-free DNA sequence comparison
2024
Background
Traditional supervised learning methods applied to DNA sequence taxonomic classification rely on the labor-intensive and time-consuming step of labelling the primary DNA sequences. Additionally, standard DNA classification/clustering methods involve time-intensive multiple sequence alignments, which impacts their applicability to large genomic datasets or distantly related organisms. These limitations indicate a need for robust, efficient, and scalable unsupervised DNA sequence clustering methods that do not depend on sequence labels or alignment.
Results
This study proposes CGRclust, a novel combination of unsupervised twin contrastive clustering of Chaos Game Representations (CGR) of DNA sequences, with convolutional neural networks (CNNs). To the best of our knowledge, CGRclust is the first method to use unsupervised learning for image classification (herein applied to two-dimensional CGR images) for clustering datasets of DNA sequences. CGRclust overcomes the limitations of traditional sequence classification methods by leveraging unsupervised twin contrastive learning to detect distinctive sequence patterns, without requiring DNA sequence alignment or biological/taxonomic labels. CGRclust accurately clustered twenty-five diverse datasets, with sequence lengths ranging from 664 bp to 100 kbp, including mitochondrial genomes of fish, fungi, and protists, as well as viral whole genome assemblies and synthetic DNA sequences. Compared with three recent clustering methods for DNA sequences (DeLUCS,
i
DeLUCS, and MeShClust v3.0.), CGRclust is the only method that surpasses 81.70% accuracy across all four taxonomic levels tested for mitochondrial DNA genomes of fish. Moreover, CGRclust also consistently demonstrates superior performance across all the viral genomic datasets. The high clustering accuracy of CGRclust on these twenty-five datasets, which vary significantly in terms of sequence length, number of genomes, number of clusters, and level of taxonomy, demonstrates its robustness, scalability, and versatility.
Conclusion
CGRclust is a novel, scalable, alignment-free DNA sequence clustering method that uses CGR images of DNA sequences and CNNs for twin contrastive clustering of unlabelled primary DNA sequences, achieving superior or comparable accuracy and performance over current approaches. CGRclust demonstrated enhanced reliability, by consistently achieving over 80% accuracy in more than 90% of the datasets analyzed. In particular, CGRclust performed especially well in clustering viral DNA datasets, where it consistently outperformed all competing methods.
Journal Article
CLOUDSPAM: Contrastive Learning On Unlabeled Data for Segmentation and Pre-Training Using Aggregated Point Clouds and MoCo
by
Mahmoudi Kouhi, Reza
,
Giguère, Philippe
,
Daniel, Sylvie
in
contrastive learning
,
Data augmentation
,
Datasets
2024
SegContrast first paved the way for contrastive learning on outdoor point clouds. Its original formulation targeted individual scans in applications like autonomous driving and object detection. However, mobile mapping purposes such as digital twin cities and urban planning require large-scale dense datasets to capture the full complexity and diversity present in outdoor environments. In this paper, the SegContrast method is revisited and adapted to overcome its limitations associated with mobile mapping datasets, namely the scarcity of contrastive pairs and memory constraints. To overcome the scarcity of contrastive pairs, we propose the merging of heterogeneous datasets. However, this merging is not a straightforward procedure due to the variety of size and number of points in the point clouds of these datasets. Therefore, a data augmentation approach is designed to create a vast number of segments while optimizing the size of the point cloud samples to the allocated memory. This methodology, called CLOUDSPAM, guarantees the performance of the self-supervised model for both small- and large-scale mobile mapping point clouds. Overall, the results demonstrate the benefits of utilizing datasets with a wide range of densities and class diversity. CLOUDSPAM matched the state of the art on the KITTI-360 dataset, with a 63.6% mIoU, and came in second place on the Toronto-3D dataset. Finally, CLOUDSPAM achieved competitive results against its fully supervised counterpart with only 10% of labeled data.
Journal Article
A Data-Driven Approach for Leveraging Inline and Offline Data to Determine the Causes of Monoclonal Antibody Productivity Reduction in the Commercial-Scale Cell Culture Process
2024
The monoclonal antibody (mAb) manufacturing process comes with high profits and high costs, and thus mAb productivity is of vital importance. However, many factors can impact the cell culture process, and lead to mAb productivity reduction. Nowadays, the biopharma industry is actively employing manufacturing information systems, which enable the integration of both online data and offline data. Although the volume of data is large, related data mining studies for mAb productivity improvement are rare. Therefore, a data-driven approach is proposed in this study to leverage both the inline and offline data of the cell culture process to discover the causes of mAb productivity reduction. The approach consists of four steps, namely data preprocessing, phase division, feature extraction and fusion, and cluster comparing. First, data quality issues are solved during the data preprocessing step. Next, the inline data are divided into several phases based on the moving window k-nearest neighbor method. Then, the inline data features are extracted via functional data analysis and combined with the offline data features. Finally, the causes of mAb productivity reduction are identified using the contrasting clusters via the principal component analysis method. A commercial-scale cell culture process case study is provided in this research to verify the effectiveness of the approach. Data from 35 batches were collected, and each batch contained nine inline variables and seven offline variables. The causes of mAb productivity reduction were identified to be the lack of nutrients, and recommended actions were taken according to the result, which was subsequently proven by six validation batches.
Journal Article
Feature-Differencing-Based Self-Supervised Pre-Training for Land-Use/Land-Cover Change Detection in High-Resolution Remote Sensing Images
2024
Land-use and land-cover (LULC) change detection (CD) is a pivotal research area in remote sensing applications, posing a significant challenge due to variations in illumination, radiation, and image noise between bi-temporal images. Currently, deep learning solutions, particularly convolutional neural networks (CNNs), represent the state of the art (SOTA) for CD. However, CNN-based models require substantial amounts of annotated data, which can be both expensive and time-consuming. Conversely, acquiring a large volume of unannotated images is relatively easy. Recently, self-supervised contrastive learning has emerged as a promising method for learning from unannotated images, thereby reducing the need for annotation. However, most existing methods employ random values or ImageNet pre-trained models to initialize their encoders and lack prior knowledge tailored to the demands of CD tasks, thus constraining the performance of CD models. To address these challenges, we introduce a novel feature-differencing-based framework called Barlow Twins for self-supervised pre-training and fine-tuning in CD (BTCD). The proposed approach employs absolute feature differences to directly learn unique representations associated with regions that have changed from unlabeled bi-temporal remote sensing images in a self-supervised manner. Moreover, we introduce invariant prediction loss and change consistency regularization loss to enhance image alignment between bi-temporal images in both the decision and feature space during network training, thereby mitigating the impact of variation in radiation conditions, noise, and imaging viewpoints. We select the improved UNet++ model for fine-tuning self-supervised pre-training models and conduct experiments using two publicly available LULC CD datasets. The experimental results demonstrate that our proposed approach outperforms existing SOTA methods in terms of competitive quantitative and qualitative performance metrics.
Journal Article