Catalogue Search | MBRL

New hybrid ensemble method for anomaly detection in data science

by Mohamed Elmahalwy, Amina , Mousa, Hayam M. , Amin, Khalid M.

2023

Anomaly detection is a significant research area in data science. Anomaly detection is used to find unusual points or uncommon events in data streams. It is gaining popularity not only in the business world but also in different of other fields, such as cyber security, fraud detection for financial systems, and healthcare. Detecting anomalies could be useful to find new knowledge in the data. This study aims to build an effective model to protect the data from these anomalies. We propose a new hyper ensemble machine learning method that combines the predictions from two methodologies the outcomes of isolation forest-k-means and random forest using a voting majority. Several available datasets, including KDD Cup-99, Credit Card, Wisconsin Prognosis Breast Cancer (WPBC), Forest Cover, and Pima, were used to evaluate the proposed method. The experimental results exhibit that our proposed model gives the highest realization in terms of receiver operating characteristic performance, accuracy, precision, and recall. Our approach is more efficient in detecting anomalies than other approaches. The highest accuracy rate achieved is 99.9%, compared to accuracy without a voting method, which achieves 97%.

Journal Article

Share this book

Add to My Shelf

Unsupervised Anomaly Detection for Mineral Prospectivity Mapping Using Isolation Forest and Extended Isolation Forest Algorithms

by Hezarkhani, Ardeshir , Mirzabozorg, Seyyed Ataollah Agha Seyyed , Shirazi, Aref in Algorithms , Analysis , Anomalies

2025

Unsupervised anomaly detection algorithms have gained significant attention in the field of mineral prospectivity mapping (MPM) due to their ability to reveal hidden mineralization zones by effectively modeling complex, nonlinear relationships between exploration data and mineral deposits. This study utilizes two tree-based anomaly detection algorithms, namely, isolation forest (IF) and extended isolation forest (EIF), to enhance MPM and exploration targeting. According to the conceptual model of porphyry copper deposits, several evidence layers were generated, including fault density, multi-element geochemical signatures, proximity to various alteration types (phyllic, argillic, propylitic, and iron oxide), and proximity to intrusive rocks. These layers were integrated using IF and EIF algorithms, and their results were subsequently compared with a geological map of the study area. The comparison revealed a high degree of overlap between the identified anomalous zones and geological features, such as andesitic rocks, tuffs, rhyolites, pyroclastics, and intrusions. Additionally, quantitative assessments through prediction-area plots validated the efficacy of both models in generating prospective targets. The results highlight the significant influence of hyperparameter tuning on the accuracy of prospectivity models. Furthermore, the study demonstrates that hyperparameter tuning is more intuitive and straightforward in IF, as it provides a clear and distinct tuning pattern, whereas EIF lacks such clarity, complicating the optimization process.

Journal Article

Share this book

Add to My Shelf

Evaluation of Deep Isolation Forest (DIF) Algorithm for Mineral Prospectivity Mapping of Polymetallic Deposits

by Zoheir, Basem , Maghsoudi, Abbas , Beiranvand Pour, Amin in Algorithms , Analysis , Anomalies

2024

Mineral prospectivity mapping (MPM) is crucial for efficient mineral exploration, where prospective zones are identified in a cost-effective manner. This study focuses on generating prospectivity maps for hydrothermal polymetallic mineralization in the Feizabad area, in northeastern Iran, using unsupervised anomaly detection methods, i.e., isolation forest (IForest) and deep isolation forest (DIF) algorithms. As mineralization events are rare and complex, traditional approaches continue to encounter difficulties, despite advances in MPM. In this respect, unsupervised anomaly detection algorithms, which do not rely on ground truth samples, offer a suitable solution. Here, we compile geospatial datasets on the Feizabad area, which is known for its polymetallic mineralization showings. Fourteen evidence layers were created, based on the geology and mineralization characteristics of the area. Both the IForest and DIF algorithms were employed to identify areas with high mineralization potential. The DIF, which uses neural networks to handle non-linear relationships in high-dimensional data, outperformed the traditional decision tree-based IForest algorithm. The results, evaluated through a success rate curve, demonstrated that the DIF provided more accurate prospectivity maps, effectively capturing complex, non-linear relationships. This highlights the DIF algorithm’s suitability for MPM, offering significant advantages over the IForest algorithm. The present study concludes that the DIF algorithm, and similar unsupervised anomaly detection algorithms, are highly effective for MPM, making them valuable tools for both brownfield and greenfield exploration.

Journal Article

Share this book

Add to My Shelf

The influence of feature grouping algorithm in outlier detection with categorical data

by Veerabahu, Vidhya , Viswanathan, Rajalakshmi , Nathaniel, Sharon Femi Paul Sunder in Algorithms , Anomalies , Data analysis

2024

Outlier mining has become a rapidly developing domain over the recent years with increasing importance in the fields like banking, sensor networks, and health care. In general, anomaly detection methods are compatible with numerical data and ignore categorical data. However, in real-time problems, both numerical and categorical data are to be considered to obtain accurate results. There are several methods available for the outlier detection of high dimensional data in numerical data. In this paper, a feature grouping algorithm for anomaly detection is proposed that considers the categorical data also. This algorithm correlates the features of categorical data and forms feature clusters and detects the outliers. The features are assigned feature weights based on their levels of appearance and the outlier scores are determined. The performance of the feature grouping algorithm is then compared with the traditional algorithms like LOF and Isolation Forest algorithm and state-of-the-art methods like WATCH on UCI datasets. From the experimental evaluation of the results obtained, it is found that the proposed algorithm is comparatively better than the existing algorithms for categorical data.

Journal Article

Share this book

Add to My Shelf

Devising Isolation Forest-Based Method to Investigate the sRNAome of Mycobacterium tuberculosis Using sRNA-seq Data

by Aggarwal, Ritika , Venkatraman, Divya Lakshmi , Balasubramanian, Rami in Algorithms , Bacteria , Gene sequencing

2024

Small non-coding RNAs (sRNAs) regulate the synthesis of virulence factors and other pathogenic traits, which enables the bacteria to survive and proliferate after host infection. While high-throughput sequencing data have proved useful in identifying sRNAs from the intergenic regions (IGRs) of the genome, it remains a challenge to present a complete genome-wide map of the expression of the sRNAs. Moreover, existing methodologies necessitate multiple dependencies for executing their algorithm and also lack a targeted approach for the de novo sRNA identification. We developed an Isolation Forest algorithm-based method and the tool Prediction Of sRNAs using Isolation Forest for the de novo identification of sRNAs from available bacterial sRNA-seq data (http://posif.ibab.ac.in/). Using this framework, we predicted 1120 sRNAs and 46 small proteins in Mycobacterium tuberculosis. Besides, we highlight the context-dependent expression of novel sRNAs, their probable synthesis, and their potential relevance in stress response mechanisms manifested by M. tuberculosis.

Journal Article

Share this book

Add to My Shelf

Detecting Network Attacks on Software Configured Networks Using the Isolating Forest Algorithm

by Lavrova, D. S , Stepanov, M. D , Pavlenko, E. Yu in Accuracy , Algorithms , Denial of service attacks

2021

AbstractAn approach is proposed to detect network attacks in software-defined networks. These networks are specific from a security standpoint, so the modified algorithm of the isolation forest is taken as a network security basis. The results of experimental studies are presented featuring optimal parameters for conventional and enhanced isolation forest algorithms. Based on the results, a conclusion is made about the efficiency of isolation forest to detect network attacks in software-configured networks.

Journal Article

Share this book

Add to My Shelf

An Ensemble Learning Based Intrusion Detection Model for Industrial IoT Security

by Benkirane, Said , Farhaoui, Yousef , Guezzaz, Azidine in Access control , Algorithms , Artificial intelligence

2023

Industrial Internet of Things (IIoT) represents the expansion of the Internet of Things (IoT) in industrial sectors. It is designed to implicate embedded technologies in manufacturing fields to enhance their operations. However, IIoT involves some security vulnerabilities that are more damaging than those of IoT. Accordingly, Intrusion Detection Systems (IDSs) have been developed to forestall inevitable harmful intrusions. IDSs survey the environment to identify intrusions in real time. This study designs an intrusion detection model exploiting feature engineering and machine learning for IIoT security. We combine Isolation Forest (IF) with Pearson’s Correlation Coefficient (PCC) to reduce computational cost and prediction time. IF is exploited to detect and remove outliers from datasets. We apply PCC to choose the most appropriate features. PCC and IF are applied exchangeably (PCCIF and IFPCC). The Random Forest (RF) classifier is implemented to enhance IDS performances. For evaluation, we use the Bot-IoT and NF-UNSW-NB15-v2 datasets. RF-PCCIF and RF-IFPCC show noteworthy results with 99.98% and 99.99% Accuracy (ACC) and 6.18 s and 6.25 s prediction time on Bot-IoT, respectively. The two models also score 99.30% and 99.18% ACC and 6.71 s and 6.87 s prediction time on NF-UNSW-NB15-v2, respectively. Results prove that our designed model has several advantages and higher performance than related models.

Journal Article

Share this book

Add to My Shelf

Explainable Anomaly Detection Framework for Maritime Main Engine Sensor Data

by Kim, Donghyun , Lee, Jihwan , Antariksa, Gian in Algorithms , anomaly detection , Clustering

2021

In this study, we proposed a data-driven approach to the condition monitoring of the marine engine. Although several unsupervised methods in the maritime industry have existed, the common limitation was the interpretation of the anomaly; they do not explain why the model classifies specific data instances as an anomaly. This study combines explainable AI techniques with anomaly detection algorithm to overcome the limitation above. As an explainable AI method, this study adopts Shapley Additive exPlanations (SHAP), which is theoretically solid and compatible with any kind of machine learning algorithm. SHAP enables us to measure the marginal contribution of each sensor variable to an anomaly. Thus, one can easily specify which sensor is responsible for the specific anomaly. To illustrate our framework, the actual sensor stream obtained from the cargo vessel collected over 10 months was analyzed. In this analysis, we performed hierarchical clustering analysis with transformed SHAP values to interpret and group common anomaly patterns. We showed that anomaly interpretation and segmentation using SHAP value provides more useful interpretation compared to the case without using SHAP value.

Journal Article

Share this book

Add to My Shelf

Interpretable model for rockburst intensity prediction based on Shapley values-based Optuna-random forest

by Wang, Yongbing , Shen, Yaxi , Yang, Zhiquan in Accuracy , Algorithms , Data collection

2025

To address the limitation of traditional machine learning models in explaining the rockburst intensity prediction process, this study proposes an interpretable rockburst intensity prediction model. The model was developed using 350 sets of actual rockburst sample data to explore the impact of input metrics on the final rockburst intensity level. The collected data underwent pre-processing using the isolation forest algorithm and synthetic minority oversampling technique. The random forest model was optimized through 5-fold cross-validation and the Optuna framework, resulting in the establishment of an Optuna-random forest (Op-RF) model that generates decision rules through its internal decision tree, utilizing the properties of the random forest model. The model was further interpreted using the Shapley additive explanations algorithm, both locally and globally. The results demonstrate that the proposed model achieved an area under curve score of 0.984. In comparison to eight other machine learning models, the proposed Op-RF model demonstrated superior accuracy, precision, recall, and F1 score. The model provides a transparent explanation of the prediction process, linking impact characteristics to the final output. Additionally, a cloud deployment method for the rockburst intensity prediction model is provided and its effectiveness is demonstrated through engineering verification. The proposed model offers a new approach to the application of machine learning in rockburst intensity prediction.

Journal Article

Share this book

Add to My Shelf

LSTM Short-Term Wind Power Prediction Method Based on Data Preprocessing and Variational Modal Decomposition for Soft Sensors

by Li, Tianyu , Ma, Fanglan , Zhu, Changsheng in Accuracy , Algorithms , Analysis

2024

Soft sensors have been extensively utilized to approximate real-time power prediction in wind power generation, which is challenging to measure instantaneously. The short-term forecast of wind power aims at providing a reference for the dispatch of the intraday power grid. This study proposes a soft sensor model based on the Long Short-Term Memory (LSTM) network by combining data preprocessing with Variational Modal Decomposition (VMD) to improve wind power prediction accuracy. It does so by adopting the isolation forest algorithm for anomaly detection of the original wind power series and processing the missing data by multiple imputation. Based on the process data samples, VMD technology is used to achieve power data decomposition and noise reduction. The LSTM network is introduced to predict each modal component separately, and further sum reconstructs the prediction results of each component to complete the wind power prediction. From the experimental results, it can be seen that the LSTM network which uses an Adam optimizing algorithm has better convergence accuracy. The VMD method exhibited superior decomposition outcomes due to its inherent Wiener filter capabilities, which effectively mitigate noise and forestall modal aliasing. The Mean Absolute Percentage Error (MAPE) was reduced by 9.3508%, which indicates that the LSTM network combined with the VMD method has better prediction accuracy.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter