Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
904
result(s) for
"Class imbalance"
Sort by:
Drug-target interaction prediction via class imbalance-aware ensemble learning
2016
Background
Multiple computational methods for predicting drug-target interactions have been developed to facilitate the drug discovery process. These methods use available data on known drug-target interactions to train classifiers with the purpose of predicting new undiscovered interactions. However, a key challenge regarding this data that has not yet been addressed by these methods, namely
class imbalance
, is potentially degrading the prediction performance. Class imbalance can be divided into two sub-problems. Firstly, the number of known interacting drug-target pairs is much smaller than that of non-interacting drug-target pairs. This imbalance ratio between interacting and non-interacting drug-target pairs is referred to as the
between-class
imbalance. Between-class imbalance degrades prediction performance due to the bias in prediction results towards the majority class (i.e. the non-interacting pairs), leading to more prediction errors in the minority class (i.e. the interacting pairs). Secondly, there are multiple types of drug-target interactions in the data with some types having relatively fewer members (or are less represented) than others. This variation in representation of the different interaction types leads to another kind of imbalance referred to as the
within-class
imbalance. In within-class imbalance, prediction results are biased towards the better represented interaction types, leading to more prediction errors in the less represented interaction types.
Results
We propose an ensemble learning method that incorporates techniques to address the issues of between-class imbalance and within-class imbalance. Experiments show that the proposed method improves results over 4 state-of-the-art methods. In addition, we simulated cases for
new
drugs and targets to see how our method would perform in predicting their interactions. New drugs and targets are those for which no prior interactions are known. Our method displayed satisfactory prediction performance and was able to predict many of the interactions successfully.
Conclusions
Our proposed method has improved the prediction performance over the existing work, thus proving the importance of addressing problems pertaining to class imbalance in the data.
Journal Article
The balancing trick: Optimized sampling of imbalanced datasets—A brief survey of the recent State of the Art
2021
This survey paper focuses on one of the current primary issues challenging data mining researchers experimenting on real‐world datasets. The problem is that of imbalanced class distribution that generates a bias toward the majority class due to insufficient training samples from the minority class. The current machine learning and deep learning algorithms are trained on datasets that are insufficiently represented in certain categories. On the other hand, some other classes have surplus samples due to the ready availability of data from these categories. Conventional solutions suggest undersampling of the majority class and/or oversampling of the minority class for balancing the class distribution prior to the learning phase. Though this problem of uneven class distribution is, by and large, ignored by researchers focusing on the learning technology, a need has now arisen for incorporating balance correction and data pruning procedures within the learning process itself. This paper surveys a plethora of conventional and recent techniques that address this issue through intelligent representations of samples from the majority and minority classes, that are given as input to the learning module. The application of nature‐inspired evolutionary algorithms to intelligent sampling is examined, and so are hybrid sampling strategies that select and retain the difficult‐to‐learn samples and discard the easy‐to‐learn samples. The findings by various researchers are summarized to a logical end, and various possibilities and challenges for future directions in research are outlined. This paper surveys recent sampling techniques addressing the class‐imbalance issue. The application of nature‐inspired evolutionary optimization techniques to intelligent sampling is examined and so are hybrid sampling strategies that select and retain the difficult‐to‐learn samples and discard the easy‐to‐learn samples. The findings by various researchers are summarized to a logical end, and various possibilities for the future are outlined.
Journal Article
Clustering-Based Oversampling Algorithm for Multi-class Imbalance Learning
2025
Multi-class imbalanced data learning faces many challenges. Its complex structural characteristics cause severe intra-class imbalance or overgeneralization in most solution strategies. This negatively affects data learning. This paper proposes a clustering-based oversampling algorithm (COM) to handle multi-class imbalance learning. In order to avoid the loss of important information, COM clusters the minority class based on the structural characteristics of the instances, among which rare instances and outliers are carefully portrayed through assigning a sampling weight to each of the clusters. Clusters with high densities are given low weights, and then, oversampling is performed within clusters to avoid overgeneralization. COM avoids intra-class imbalance effectively because low-density clusters are more likely than high-density ones to be selected to synthesize instances. Our study used the UCI and KEEL imbalanced datasets to demonstrate the effectiveness and stability of the proposed method.
Journal Article
FedNDA: Enhancing Federated Learning with Noisy Client Detection and Robust Aggregation
2025
Federated Learning is a novel decentralized methodology that enables multiple clients to collaboratively train a global model while preserving the privacy of their local data. Although federated learning enhances data privacy, it faces challenges related to data quality and client behavior. A fundamental issue is the presence of noisy labels in certain clients, which damages the global model's performance. To address this problem, this paper introduces a Federated learning framework with Noisy client Detection and robust Aggregation, FedNDA. In the first stage, FedNDA detects noisy clients by analyzing the distribution of their local losses. A noisy client exhibits a loss distribution distinct from that of clean clients. To handle class imbalance issue in local data, we utilize per-class losses instead of the total loss. We then assign each client a noisiness score, calculated as the Earth Mover’s Distance between the per-class loss distribution of the client and the average distribution of all clean clients. This noisiness metric is more sensitive for detecting noisy clients compared to conventional metrics such as Euclidean distance or L1 norm. The noisiness score is subsequently transfered to and used in the server-side aggregation function to prioritize clean clients while reducing the influence of noisy clients. Experimental results demonstrate that FedNDA outperforms FedAvg and FedNoRo by 4.68% and 3.6% on the CIFAR-10 dataset, and by 10.65% and 0.48% on the ICH dataset, respectively, in a high noisy setting.
Journal Article
Combined Effect of Concept Drift and Class Imbalance on Model Performance During Stream Classification
by
Sattar Palli, Abdul
,
Alsughayyir, Aeshah
,
Jaafar, Jafreezal
in
Classification
,
Data transmission
,
Drift
2023
Every application in a smart city environment like the smart grid, health monitoring, security, and surveillance generates non-stationary data streams. Due to such nature, the statistical properties of data changes over time, leading to class imbalance and concept drift issues. Both these issues cause model performance degradation. Most of the current work has been focused on developing an ensemble strategy by training a new classifier on the latest data to resolve the issue. These techniques suffer while training the new classifier if the data is imbalanced. Also, the class imbalance ratio may change greatly from one input stream to another, making the problem more complex. The existing solutions proposed for addressing the combined issue of class imbalance and concept drift are lacking in understating of correlation of one problem with the other. This work studies the association between concept drift and class imbalance ratio and then demonstrates how changes in class imbalance ratio along with concept drift affect the classifier’s performance. We analyzed the effect of both the issues on minority and majority classes individually. To do this, we conducted experiments on benchmark datasets using state-of-the-art classifiers especially designed for data stream classification. Precision, recall, F1 score, and geometric mean were used to measure the performance. Our findings show that when both class imbalance and concept drift problems occur together the performance can decrease up to 15%. Our results also show that the increase in the imbalance ratio can cause a 10% to 15% decrease in the precision scores of both minority and majority classes. The study findings may help in designing intelligent and adaptive solutions that can cope with the challenges of non-stationary data streams like concept drift and class imbalance.
Journal Article
A Review of Thermal Comfort in Primary Schools and Future Challenges in Machine Learning Based Prediction for Children
2022
Children differ from adults in their physiology and cognitive ability. Thus, they are extremely vulnerable to classroom thermal comfort. However, very few reviews on the thermal comfort of primary school students are available. Further, children-focused surveys have not reviewed the state-of-the-art in thermal comfort prediction using machine learning (AI/ML). Consequently, there is a need for discussion on children-specific challenges in AI/ML-based prediction. This article bridges these research gaps. It presents a comprehensive review of thermal comfort studies in primary school classrooms since 1962. It considers both conventional (non-ML) studies and the recent AI/ML studies performed for children, classrooms, and primary students. It also underscores the importance of AI/ML prediction by analyzing adaptive opportunities for children/students in classrooms. Thereafter, a review of AI/ML-based prediction studies is presented. Through an AI/ML case-study, it demonstrates that model performance for children and adults differs markedly. Performance of classification models trained on ASHRAE-II database and a recent primary students’ dataset shows a 29% difference in thermal sensation and 86% difference in thermal preference, between adults and children. It then highlights three major children-specific AI/ML challenges, viz., “illogical votes”, “multiple comfort metrics”, and “extreme class imbalance”. Finally, it offers several technical solutions and discusses open problems.
Journal Article
Evaluation of the Improved Extreme Learning Machine for Machine Failure Multiclass Classification
2023
The recent advancements in sensor, big data, and artificial intelligence (AI) have introduced digital transformation in the manufacturing industry. Machine maintenance has been one of the central subjects in digital transformation in the manufacturing industry. Predictive maintenance is the latest maintenance strategy that relies on data and artificial intelligence techniques to predict machine failure and remaining life assessment. However, the imbalanced nature of machine data can result in inaccurate machine failure predictions. This research will use techniques and algorithms centered on Extreme Learning Machine (ELM) and their development to find a suitable algorithm to overcome imbalanced machine datasets. The dataset used in this research is Microsoft Azure for Predictive Maintenance, which has significantly imbalanced failure classes. Four improved ELM methods are evaluated in this paper, i.e., extreme machine learning with under-sampling/over-sampling, weighted-ELM, and weighted-ELM with radial basis function (RBF) kernel and particle swarm optimization (PSO). Our simulation results show that the combination of ELM with under-sampling gained the highest performance result, in which the average F1-score reached 0.9541 for binary classification and 0.9555 for multiclass classification.
Journal Article
The class imbalance problem in deep learning
by
Japkowicz, Nathalie
,
Corizzo, Roberto
,
Krawczyk, Bartosz
in
Artificial Intelligence
,
class imbalance
,
Computer Science
2024
Deep learning has recently unleashed the ability for Machine learning (ML) to make unparalleled strides. It did so by confronting and successfully addressing, at least to a certain extent, the knowledge bottleneck that paralyzed ML and artificial intelligence for decades. The community is currently basking in deep learning’s success, but a question that comes to mind is: have all of the issues previously affecting machine learning systems been solved by deep learning or do some issues remain for which deep learning is not a bulletproof solution? This question in the context of the class imbalance becomes a motivation for this paper. Imbalance problem was first recognized almost three decades ago and has remained a critical challenge at least for traditional learning approaches. Our goal is to investigate whether the tight dependency between class imbalances, concept complexities, dataset size and classifier performance, known to exist in traditional learning systems, is alleviated in any way in deep learning approaches and to what extent, if any, network depth and regularization can help. To answer these questions we conduct a survey of the recent literature focused on deep learning and the class imbalance problem as well as a series of controlled experiments on both artificial and real-world domains. This allows us to formulate lessons learned about the impact of class imbalance on deep learning models, as well as pose open challenges that should be tackled by researchers in this field.
Journal Article
A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning
by
Kamalov, Firuz
,
Atiya, Amir F.
,
Elreedy, Dina
in
Artificial Intelligence
,
Computer Science
,
Control
2024
Class imbalance occurs when the class distribution is not equal. Namely, one class is under-represented (minority class), and the other class has significantly more samples in the data (majority class). The class imbalance problem is prevalent in many real world applications. Generally, the under-represented minority class is the class of interest. The synthetic minority over-sampling technique (SMOTE) method is considered the most prominent method for handling unbalanced data. The SMOTE method generates new synthetic data patterns by performing linear interpolation between minority class samples and their K nearest neighbors. However, the SMOTE generated patterns do not necessarily conform to the original minority class distribution. This paper develops a novel theoretical analysis of the SMOTE method by deriving the probability distribution of the SMOTE generated samples. To the best of our knowledge, this is the first work deriving a mathematical formulation for the SMOTE patterns’ probability distribution. This allows us to compare the density of the generated samples with the true underlying class-conditional density, in order to assess how representative the generated samples are. The derived formula is verified by computing it on a number of densities versus densities computed and estimated empirically.
Journal Article
Handling imbalanced medical datasets: review of a decade of research
by
Salmi, Mabrouka
,
Atif, Dalia
,
Oliva, Diego
in
Algorithms
,
Artificial Intelligence
,
Best practice
2024
Machine learning and medical diagnostic studies often struggle with the issue of class imbalance in medical datasets, complicating accurate disease prediction and undermining diagnostic tools. Despite ongoing research efforts, specific characteristics of medical data frequently remain overlooked. This article comprehensively reviews advances in addressing imbalanced medical datasets over the past decade, offering a novel classification of approaches into preprocessing, learning levels, and combined techniques. We present a detailed evaluation of the medical datasets and metrics used, synthesizing the outcomes of previous research to reflect on the effectiveness of the methodologies despite methodological constraints. Our review identifies key research trends and offers speculative insights and research trajectories to enhance diagnostic performance. Additionally, we establish a consensus on best practices to mitigate persistent methodological issues, assisting the development of generalizable, reliable, and consistent results in medical diagnostics.
Journal Article