Catalogue Search | MBRL

Prediction of plant phase-separating proteins using positive-unlabeled learning

by Zhao, Anwen , Tian, Yisu , Wang, Xiangfeng in Algorithms , Animal Genetics and Genomics , Arabidopsis

2026

Liquid–liquid phase separation regulates biological processes through dynamic condensates. Despite its significance, experimentally validated phase-separating proteins in plants remain limited, complicating predictions. We overcome this gap by applying positive-unlabeled learning, a semi-supervised approach optimized for imbalanced datasets. Leveraging 6,559 reported plant phase-separating proteins from eight species, we train a model integrating sequence-structural features, enabling prediction of 174,656 high-confidence candidates across 14 species. Experimental validation confirms liquid–liquid phase separation in 67.9% of the candidate proteins from Arabidopsis , rice, and maize. This positive-unlabeled framework demonstrates robust predictive power while providing open resources to advance plant phase separation research.

Journal Article

Share this book

Add to My Shelf

SSLA: a semi-supervised framework for real-time injection detection and anomaly monitoring in cloud-based web applications with real-world implementation and evaluation

by Sefati, Seyed Salar , Fratu, Octavian , Halunga, Simona in Anomalies , Anomaly detection , Applications programs

2025

Injection attacks and anomalies pose significant threats to the security and reliability of cloud-based web applications. Traditional detection methods, such as rule-based systems and supervised learning techniques, often struggle to adapt to evolving threats and large-scale, unstructured log data. This paper introduces a novel framework, the Semi-Supervised Log Analyzer (SSLA), designed for real-time injection detection and anomaly monitoring in cloud environments. SSLA uses semi-supervised learning to utilize both labeled and unlabeled data, reducing the reliance on extensive annotated datasets. A similarity graph is built from the log data, allowing for effective anomaly detection using graph-based methods. At the same time, privacy-preserving techniques are integrated to protect sensitive information. The proposed method is evaluated on large-scale datasets, including Hadoop Distributed File System (HDFS) and BlueGene/L (BGL) logs, demonstrating superior performance in terms of precision, recall, and scalability compared to state-of-the-art methods. SSLA achieves high detection accuracy with minimal computational overhead, ensuring reliable, real-time protection for cloud-based web applications.

Journal Article

Share this book

Add to My Shelf

Persistent Laplacian-enhanced algorithm for scarcely labeled data classification

by Merkurjev, Ekaterina , Wei, Guo-Wei , Bhusal, Gokul in Algorithms , Artificial Intelligence , Classification

2024

The success of many machine learning (ML) methods depends crucially on having large amounts of labeled data. However, obtaining enough labeled data can be expensive, time-consuming, and subject to ethical constraints for many applications. One approach that has shown tremendous value in addressing this challenge is semi-supervised learning (SSL); this technique utilizes both labeled and unlabeled data during training, often with much less labeled data than unlabeled data, which is often relatively easy and inexpensive to obtain. In fact, SSL methods are particularly useful in applications where the cost of labeling data is especially expensive, such as medical analysis, natural language processing, or speech recognition. A subset of SSL methods that have achieved great success in various domains involves algorithms that integrate graph-based techniques. These procedures are popular due to the vast amount of information provided by the graphical framework. In this work, we propose an algebraic topology-based semi-supervised method called persistent Laplacian-enhanced graph MBO by integrating persistent spectral graph theory with the classical Merriman–Bence–Osher (MBO) scheme. Specifically, we use a filtration procedure to generate a sequence of chain complexes and associated families of simplicial complexes, from which we construct a family of persistent Laplacians. Overall, it is a very efficient procedure that requires much less labeled data to perform well compared to many ML techniques, and it can be adapted for both small and large datasets. We evaluate the performance of our method on classification, and the results indicate that the technique outperforms other existing semi-supervised algorithms.

Journal Article

Share this book

Add to My Shelf

Active vision enhancement of new media images based on semi supervised feature fusion algorithm

by Liu, Dan in Algorithms , Color , Computer Imaging

2024

In order to improve the image visual effect, a new media image active vision enhancement method based on semi supervised feature fusion algorithm is proposed. Firstly, the image feature extraction is carried out from the three perspectives of color feature, texture feature and image subject feature. The image feature is projected into the single image model, the regularization framework is established, and the multiple graphical model based on semi supervised learning method is constructed to complete the image feature fusion and determine the image enhancement strength. HSV model is introduced to decompose the image into three channel components of H, S and V. Through the adaptive component adjustment of the three channels and RBG color space conversion, the active visual enhancement of the image is achieved. The test results show that the proposed method effectively avoids the overexposure and local distortion of the image, and retains the clarity of image details to the greatest extent. The overall quality of the enhanced image is high.

Journal Article

Share this book

Add to My Shelf

Deformable Pyramid Sparse Transformer for Semi-Supervised Driver Distraction Detection

by Zhao, Qiang , Yu, Zhichao , Lin, Yuchu in Accuracy , Algorithms , Annotations

2026

Ensuring sustained driver attention is critical for intelligent transportation safety systems; however, the performance of data-driven driver distraction detection models is often limited by the high cost of large-scale manual annotation. To address this challenge, this paper proposes an adaptive semi-supervised driver distraction detection framework based on teacher–student learning and deformable pyramid feature fusion. The framework leverages a limited amount of labeled data together with abundant unlabeled samples to achieve robust and scalable distraction detection. An adaptive pseudo-label optimization strategy is introduced, incorporating category-aware pseudo-label thresholding, delayed pseudo-label scheduling, and a confidence-weighted pseudo-label loss to dynamically balance pseudo-label quality and training stability. To enhance fine-grained perception of subtle driver behaviors, a Deformable Pyramid Sparse Transformer (DPST) module is integrated into a lightweight YOLOv11 detector, enabling precise multi-scale feature alignment and efficient cross-scale semantic fusion. Furthermore, a teacher-guided feature consistency distillation mechanism is employed to promote semantic alignment between teacher and student models at the feature level, mitigating the adverse effects of noisy pseudo-labels. Extensive experiments conducted on the Roboflow Distracted Driving Dataset demonstrate that the proposed method outperforms representative fully supervised baselines in terms of mAP@0.5 and mAP@0.5:0.95 while maintaining a balanced trade-off between precision and recall. These results indicate that the proposed framework provides an effective and practical solution for real-world driver monitoring systems under limited annotation conditions.

Journal Article

Share this book

Add to My Shelf

Data augmentation based semi-supervised method to improve COVID-19 CT classification

by Wang, Peng , Chen, Xiangtao , Luo, Jiawei in Accuracy , Classification , Computed tomography

2023

The Coronavirus (COVID-19) outbreak of December 2019 has become a serious threat to people around the world, creating a health crisis that infected millions of lives, as well as destroying the global economy. Early detection and diagnosis are essential to prevent further transmission. The detection of COVID-19 computed tomography images is one of the important approaches to rapid diagnosis. Many different branches of deep learning methods have played an important role in this area, including transfer learning, contrastive learning, ensemble strategy, etc. However, these works require a large number of samples of expensive manual labels, so in order to save costs, scholars adopted semi-supervised learning that applies only a few labels to classify COVID-19 CT images. Nevertheless, the existing semi-supervised methods focus primarily on class imbalance and pseudo-label filtering rather than on pseudo-label generation. Accordingly, in this paper, we organized a semi-supervised classification framework based on data augmentation to classify the CT images of COVID-19. We revised the classic teacher-student framework and introduced the popular data augmentation method Mixup, which widened the distribution of high confidence to improve the accuracy of selected pseudo-labels and ultimately obtain a model with better performance. For the COVID-CT dataset, our method makes precision, F1 score, accuracy and specificity 21.04%, 12.95%, 17.13% and 38.29% higher than average values for other methods respectively, For the SARS-COV-2 dataset, these increases were 8.40%, 7.59%, 9.35% and 12.80% respectively. For the Harvard Dataverse dataset, growth was 17.64%, 18.89%, 19.81% and 20.20% respectively. The codes are available at https://github.com/YutingBai99/COVID-19-SSL .

Journal Article

Share this book

Add to My Shelf

A Semi-Supervised Learning Framework for Classifying Colorectal Neoplasia Based on the NICE Classification

by Ni, Haoxiang , Yin, Qi , Wang, Yu in Accuracy , Classification , Clustering

2024

Labelling medical images is an arduous and costly task that necessitates clinical expertise and large numbers of qualified images. Insufficient samples can lead to underfitting during training and poor performance of supervised learning models. In this study, we aim to develop a SimCLR-based semi-supervised learning framework to classify colorectal neoplasia based on the NICE classification. First, the proposed framework was trained under self-supervised learning using a large unlabelled dataset; subsequently, it was fine-tuned on a limited labelled dataset based on the NICE classification. The model was evaluated on an independent dataset and compared with models based on supervised transfer learning and endoscopists using accuracy, Matthew’s correlation coefficient (MCC), and Cohen’s kappa. Finally, Grad-CAM and t-SNE were applied to visualize the models’ interpretations. A ResNet-backboned SimCLR model (accuracy of 0.908, MCC of 0.862, and Cohen’s kappa of 0.896) outperformed supervised transfer learning-based models (means: 0.803, 0.698, and 0.742) and junior endoscopists (0.816, 0.724, and 0.863), while performing only slightly worse than senior endoscopists (0.916, 0.875, and 0.944). Moreover, t-SNE showed a better clustering of ternary samples through self-supervised learning in SimCLR than through supervised transfer learning. Compared with traditional supervised learning, semi-supervised learning enables deep learning models to achieve improved performance with limited labelled endoscopic images.

Journal Article

Share this book

Add to My Shelf

Forest Disturbance Classification Under Imbalanced and Small-Sample Conditions Based on Collaborative Semi-Supervised Learning and Sample Generation

by Yan, Yan , Qu, Xinqi , Shao, Yan in Accuracy , Algorithms , Analysis

2026

Accurate and timely information on forest disturbance drivers is important for sustainable forest management, global carbon cycle accounting, and climate change response. However, forest disturbance classification is difficult due to two major challenges: limited labeled samples and highly imbalanced disturbance class distribution. In this article, a new framework for multi-type forest disturbance classification based on collaborative semi-supervised learning and sample generation was proposed. First, forest disturbance is detected using long-term remote sensing time series data and disturbance detection algorithms. Spatiotemporal, spectral and terrain features of different disturbance types are extracted. On this basis, to address the problem of imbalanced and small-sample conditions, a collaborative classification strategy is developed. Based on a small number of labeled samples, Support Vector Machine (SVM) and Random Forest (RF) are used to build dual base classifiers. A confident learning (CL) framework is applied to select high-confidence pseudo-labeled samples from unlabeled data. Then, a latent diffusion model (LDM) is introduced to generate high-fidelity pseudo-samples. This increases the sample size and balances the class distribution. Based on the augmented dataset, the dual classifiers are iteratively optimized using a co-training strategy, which improves model generalization under complex conditions. The results show that the proposed framework could generate high-quality pseudo-samples and effectively reduce class imbalance. The overall accuracy (OA) of the proposed framework reaches 93.2%, which is 5.7% and 4.4% higher than single classifier baselines, respectively. After introducing the LDM-based balancing mechanism, performance is further improved by 1.8% compared with the pure semi-supervised framework. This study provides an efficient and reliable solution for large-scale forest ecosystem monitoring.

Journal Article

Share this book

Add to My Shelf

Federated learning-based CT liver tumor detection using a teacher‒student SANet with semisupervised learning

by Lee, Cheng-Shun , Chain, Kai , Huang, Li-Chun in Accuracy , Annotations , Brain cancer

2025

Background Detecting liver tumors via computed tomography (CT) scans is a critical but labor-intensive task. Extensive expert annotations are needed to train effective machine learning models. This study presents an innovative approach that leverages federated learning in combination with a teacher‒student framework, an enhanced slice-aware network (SANet), and semisupervised learning (SSL) techniques to improve the CT-based liver tumor detection process while significantly reducing its labor and time costs. Methods Federated learning enables collaborative model training to be performed across multiple institutions without sharing sensitive patient data, thus ensuring privacy and security. The teacher–student SANet framework takes advantage of both teacher and student models, with the teacher model providing reliable pseudolabels that guide the student model in a semisupervised manner. This method not only improves the accuracy of liver tumor detection but also reduces the dependence on extensively annotated datasets. Results The proposed method was validated through simulation experiments conducted in four scenarios, and it demonstrated a model accuracy of 83%, which represents an improvement over the original locally trained models. Conclusions This study presents a promising method for enhancing the CT-based liver tumor detection while reducing the incurred labor and time costs by utilizing federated learning, the teacher-student SANet framework, and SSL techniques. Compared with previous approaches, the proposed method achieved a model accuracy of 83%, representing a significant improvement. Trial registration Not applicable.

Journal Article

Share this book

Add to My Shelf

Classification of acoustical signals by combining active learning strategies with semi-supervised learning schemes

by Karlos, Stamatis , Aridas, Christos , Kanas, Vasileios G. in Active learning , Algorithms , Artificial Intelligence

2023

In real-world cases, handling both labeled and unlabeled data has raised the interest of several Data Scientists and Machine Learning engineers, leading to several demonstrations that apply data-augmenting approaches in order to obtain a robust and, at the same time, accurate enough learning behavior. The main reason is the existence of much unlabeled data that are ignored by conventional supervised approaches, reducing the chance of enriching the final formatted hypothesis. However, the majority of the proposed methods that operate using both kinds of these data are oriented toward exploiting only one category of these algorithms, without combining their strategies. Since the most popular of them regarding the classification task are Active and Semi-supervised Learning approaches, we aim to design a framework that combines both of them trying to fuse their advantages during the main core of the learning process. Thus, we conduct an empirical evaluation of such a combinatory approach over three problems, which stem from various fields but are all tackled through the use of acoustical signals, operating under the pool-based scenario: gender identification, emotion detection and automatic speaker recognition. Into the proposed combinatory framework, which operates under training sets with small cardinality, our results prove the benefits of adopting such kind of semi-automated approaches regarding both the achieved predictive correctness when reduced consumption of resources takes place, as well as the smoothness of the learning convergence. Several learners have been examined for reaching to more general conclusions, and a variant of self-training scheme has been also examined.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter