Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
144 result(s) for "over-sampling"
Sort by:
A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning
Class imbalance occurs when the class distribution is not equal. Namely, one class is under-represented (minority class), and the other class has significantly more samples in the data (majority class). The class imbalance problem is prevalent in many real world applications. Generally, the under-represented minority class is the class of interest. The synthetic minority over-sampling technique (SMOTE) method is considered the most prominent method for handling unbalanced data. The SMOTE method generates new synthetic data patterns by performing linear interpolation between minority class samples and their K nearest neighbors. However, the SMOTE generated patterns do not necessarily conform to the original minority class distribution. This paper develops a novel theoretical analysis of the SMOTE method by deriving the probability distribution of the SMOTE generated samples. To the best of our knowledge, this is the first work deriving a mathematical formulation for the SMOTE patterns’ probability distribution. This allows us to compare the density of the generated samples with the true underlying class-conditional density, in order to assess how representative the generated samples are. The derived formula is verified by computing it on a number of densities versus densities computed and estimated empirically.
Tomek Link and SMOTE Approaches for Machine Fault Classification with an Imbalanced Dataset
Data-driven methods have prominently featured in the progressive research and development of modern condition monitoring systems for electrical machines. These methods have the advantage of simplicity when it comes to the implementation of effective fault detection and diagnostic systems. Despite their many advantages, the practical implementation of data-driven approaches still faces challenges such as data imbalance. The lack of sufficient and reliable labeled fault data from machines in the field often poses a challenge in developing accurate supervised learning-based condition monitoring systems. This research investigates the use of a Naïve Bayes classifier, support vector machine, and k-nearest neighbors together with synthetic minority oversampling technique, Tomek link, and the combination of these two resampling techniques for fault classification with simulation and experimental imbalanced data. A comparative analysis of these techniques is conducted for different imbalanced data cases to determine the suitability thereof for condition monitoring on a wound-rotor induction generator. The precision, recall, and f1-score matrices are applied for performance evaluation. The results indicate that the technique combining the synthetic minority oversampling technique with the Tomek link provides the best performance across all tested classifiers. The k-nearest neighbors, together with this combination resampling technique yielded the most accurate classification results. This research is of interest to researchers and practitioners working in the area of condition monitoring in electrical machines, and the findings and presented approach of the comparative analysis will assist with the selection of the most suitable technique for handling imbalanced fault data. This is especially important in the practice of condition monitoring on electrical rotating machines, where fault data are very limited.
A comprehensive evaluation of oversampling techniques for enhancing text classification performance
Class imbalance is a common and critical challenge in text classification tasks, where the underrepresentation of certain classes often impairs the ability of classifiers to learn minority class patterns effectively. According to the “garbage in, garbage out” principle, even high-performing models may fail when trained on skewed distributions. To address this issue, this study investigates the impact of oversampling techniques, specifically the Synthetic Minority Over-sampling Technique (SMOTE) and thirty of its variants, on two benchmark text classification datasets: TREC and Emotions. Each dataset was vectorized using the MiniLMv2 transformer model to obtain semantically rich representations, and classification was performed using six machine learning algorithms. The balanced and imbalanced scenarios were compared in terms of F1-Score and Balanced Accuracy. This work constitutes, to the best of our knowledge, the first large-scale, systematic benchmarking of SMOTE-based oversampling methods in the context of transformer-embedded text classification. Furthermore, statistical significance of the observed performance differences was validated using the Friedman test. The results provide practical insights into the selection of oversampling techniques tailored to dataset characteristics and classifier sensitivity, supporting more robust and fair learning in imbalanced natural language processing tasks.
Enhancing Credit Card Fraud Detection: An Ensemble Machine Learning Approach
In the era of digital advancements, the escalation of credit card fraud necessitates the development of robust and efficient fraud detection systems. This paper delves into the application of machine learning models, specifically focusing on ensemble methods, to enhance credit card fraud detection. Through an extensive review of existing literature, we identified limitations in current fraud detection technologies, including issues like data imbalance, concept drift, false positives/negatives, limited generalisability, and challenges in real-time processing. To address some of these shortcomings, we propose a novel ensemble model that integrates a Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Random Forest (RF), Bagging, and Boosting classifiers. This ensemble model tackles the dataset imbalance problem associated with most credit card datasets by implementing under-sampling and the Synthetic Over-sampling Technique (SMOTE) on some machine learning algorithms. The evaluation of the model utilises a dataset comprising transaction records from European credit card holders, providing a realistic scenario for assessment. The methodology of the proposed model encompasses data pre-processing, feature engineering, model selection, and evaluation, with Google Colab computational capabilities facilitating efficient model training and testing. Comparative analysis between the proposed ensemble model, traditional machine learning methods, and individual classifiers reveals the superior performance of the ensemble in mitigating challenges associated with credit card fraud detection. Across accuracy, precision, recall, and F1-score metrics, the ensemble outperforms existing models. This paper underscores the efficacy of ensemble methods as a valuable tool in the battle against fraudulent transactions. The findings presented lay the groundwork for future advancements in the development of more resilient and adaptive fraud detection systems, which will become crucial as credit card fraud techniques continue to evolve.
Modified-generative adversarial networks for imbalance text classification
Textual data faces a significant challenge known as class imbalance. This challenge arises when the distribution of classes within a dataset is highly disproportionate. To address this problem, effective algorithms that can handle the imbalance in the data while maintaining high predictive accuracy are needed. To solve the class imbalance problem, numerous methods have been proposed, which include SMOTE (Synthetic Minority Over Sampling), ADASYN (Adaptive Synthetic Sampling), ensemble methods, etc. In this paper we propose the novel Synthetic Minority Over Sampling Technique Nominal Continuous-Generative Adversarial Network (SMOTE NC-GAN) Model to tackle the problem of imbalanced text datasets. This method provides a novel solution for addressing the class imbalance problem. It involves generating synthetic data from the minority class of dataset using SMOTE-NC and subsequently applying that data along with real data fed to the Generative Adversarial Network. By using a combination of SMOTE-NC and GAN techniques, our proposed method can generate synthetic samples that enhance the quality of minority classes in the text datasets. Our proposed model addresses the limitations of both GAN and SMOTE. Empirical findings on various benchmark datasets have demonstrated the superior performance of the SMOTE NC-GAN Model over different state-of-the-art methods. This model enhances sample quality and offers computational efficiency, making it a promising solution for handling imbalanced datasets across various domains. The classification performance is significantly improved when utilizing the SMOTE NC-GAN Model in comparison to the Synthetic Minority Oversampling Technique.
Enhancing Network Intrusion Detection Using an Ensemble Voting Classifier for Internet of Things
In the context of 6G technology, the Internet of Everything aims to create a vast network that connects both humans and devices across multiple dimensions. The integration of smart healthcare, agriculture, transportation, and homes is incredibly appealing, as it allows people to effortlessly control their environment through touch or voice commands. Consequently, with the increase in Internet connectivity, the security risk also rises. However, the future is centered on a six-fold increase in connectivity, necessitating the development of stronger security measures to handle the rapidly expanding concept of IoT-enabled metaverse connections. Various types of attacks, often orchestrated using botnets, pose a threat to the performance of IoT-enabled networks. Detecting anomalies within these networks is crucial for safeguarding applications from potentially disastrous consequences. The voting classifier is a machine learning (ML) model known for its effectiveness as it capitalizes on the strengths of individual ML models and has the potential to improve overall predictive performance. In this research, we proposed a novel classification technique based on the DRX approach that combines the advantages of the Decision tree, Random forest, and XGBoost algorithms. This ensemble voting classifier significantly enhances the accuracy and precision of network intrusion detection systems. Our experiments were conducted using the NSL-KDD, UNSW-NB15, and CIC-IDS2017 datasets. The findings of our study show that the DRX-based technique works better than the others. It achieved a higher accuracy of 99.88% on the NSL-KDD dataset, 99.93% on the UNSW-NB15 dataset, and 99.98% on the CIC-IDS2017 dataset, outperforming the other methods. Additionally, there is a notable reduction in the false positive rates to 0.003, 0.001, and 0.00012 for the NSL-KDD, UNSW-NB15, and CIC-IDS2017 datasets.
A Multi-Class Intrusion Detection System for DDoS Attacks in IoT Networks Using Deep Learning and Transformers
The rapid proliferation of Internet of Things (IoT) devices has significantly increased vulnerability to Distributed Denial of Service (DDoS) attacks, which can severely disrupt network operations. DDoS attacks in IoT networks disrupt communication and compromise service availability, causing severe operational and economic losses. In this paper, we present a Deep Learning (DL)-based Intrusion Detection System (IDS) tailored for IoT environments. Our system employs three architectures—Convolutional Neural Networks (CNNs), Deep Neural Networks (DNNs), and Transformer-based models—to perform binary, three-class, and 12-class classification tasks on the CiC IoT 2023 dataset. Data preprocessing includes log normalization to stabilize feature distributions and SMOTE-based oversampling to mitigate class imbalance. Experiments on the CIC-IoT 2023 dataset show that, in the binary classification task, the DNN achieved 99.2% accuracy, the CNN 99.0%, and the Transformer 98.8%. In three-class classification (benign, DDoS, and non-DDoS), all models attained near-perfect performance (approximately 99.9–100%). In the 12-class scenario (benign plus 12 attack types), the DNN, CNN, and Transformer reached 93.0%, 92.7%, and 92.5% accuracy, respectively. The high precision, recall, and ROC-AUC values corroborate the efficacy and generalizability of our approach for IoT DDoS detection. Comparative analysis indicates that our proposed IDS outperforms state-of-the-art methods in terms of detection accuracy and efficiency. These results underscore the potential of integrating advanced DL models into IDS frameworks, thereby providing a scalable and effective solution to secure IoT networks against evolving DDoS threats. Future work will explore further enhancements, including the use of deeper Transformer architectures and cross-dataset validation, to ensure robustness in real-world deployments.
A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems
There has been growing attention to multi-class classification problems, particularly those challenges of imbalanced class distributions. To address these challenges, various strategies, including data-level re-sampling treatment and ensemble methods, have been introduced to bolster the performance of predictive models and Artificial Intelligence (AI) algorithms in scenarios where excessive level of imbalance is present. While most research and algorithm development have been focused on binary classification problems, in health informatics there is an increased interest in the field to address the problem of multi-class classification in imbalanced datasets. Multi-class imbalance problems bring forth more complex challenges, as a delicate approach is required to generate synthetic data and simultaneously maintain the relationship between the multiple classes. The aim of this review paper is to examine over-sampling methods tailored for medical and other datasets with multi-class imbalance. Out of 2,076 peer-reviewed papers identified through searches, 197 eligible papers were chosen and thoroughly reviewed for inclusion, narrowing to 37 studies being selected for in-depth analysis. These studies are categorised into four categories: metric, adaptive, structure-based, and hybrid approaches. The most significant finding is the emerging trend toward hybrid resampling methods that combine the strengths of various techniques to effectively address the problem of imbalanced data. This paper provides an extensive analysis of each selected study, discusses their findings, and outlines directions for future research.
LVQ-SMOTE – Learning Vector Quantization based Synthetic Minority Over–sampling Technique for biomedical data
Background Over-sampling methods based on Synthetic Minority Over-sampling Technique (SMOTE) have been proposed for classification problems of imbalanced biomedical data. However, the existing over-sampling methods achieve slightly better or sometimes worse result than the simplest SMOTE. In order to improve the effectiveness of SMOTE, this paper presents a novel over-sampling method using codebooks obtained by the learning vector quantization. In general, even when an existing SMOTE applied to a biomedical dataset, its empty feature space is still so huge that most classification algorithms would not perform well on estimating borderlines between classes. To tackle this problem, our over-sampling method generates synthetic samples which occupy more feature space than the other SMOTE algorithms. Briefly saying, our over-sampling method enables to generate useful synthetic samples by referring to actual samples taken from real-world datasets. Results Experiments on eight real-world imbalanced datasets demonstrate that our proposed over-sampling method performs better than the simplest SMOTE on four of five standard classification algorithms. Moreover, it is seen that the performance of our method increases if the latest SMOTE called MWMOTE is used in our algorithm. Experiments on datasets for β -turn types prediction show some important patterns that have not been seen in previous analyses. Conclusions The proposed over-sampling method generates useful synthetic samples for the classification of imbalanced biomedical data. Besides, the proposed over-sampling method is basically compatible with basic classification algorithms and the existing over-sampling methods.