Catalogue Search | MBRL

A novel and innovative cancer classification framework through a consecutive utilization of hybrid feature selection

by Mahto, Rajul , Shah, Mohd Asif , Ahmed, Saboor Uddin in Accuracy , Algorithms , Analysis

2023

Cancer prediction in the early stage is a topic of major interest in medicine since it allows accurate and efficient actions for successful medical treatments of cancer. Mostly cancer datasets contain various gene expression levels as features with less samples, so firstly there is a need to eliminate similar features to permit faster convergence rate of classification algorithms. These features (genes) enable us to identify cancer disease, choose the best prescription to prevent cancer and discover deviations amid different techniques. To resolve this problem, we proposed a hybrid novel technique CSSMO-based gene selection for cancer classification. First, we made alteration of the fitness of spider monkey optimization (SMO) with cuckoo search algorithm (CSA) algorithm viz., CSSMO for feature selection, which helps to combine the benefit of both metaheuristic algorithms to discover a subset of genes which helps to predict a cancer disease in early stage. Further, to enhance the accuracy of the CSSMO algorithm, we choose a cleaning process, minimum redundancy maximum relevance (mRMR) to lessen the gene expression of cancer datasets. Next, these subsets of genes are classified using deep learning (DL) to identify different groups or classes related to a particular cancer disease. Eight different benchmark microarray gene expression datasets of cancer have been utilized to analyze the performance of the proposed approach with different evaluation matrix such as recall, precision, F1-score, and confusion matrix. The proposed gene selection method with DL achieves much better classification accuracy than other existing DL and machine learning classification models with all large gene expression dataset of cancer.

Journal Article

Share this book

Add to My Shelf

Optimizing Gene Selection and Cancer Classification with Hybrid Sine Cosine and Cuckoo Search Algorithm

in Algorithms , Biological activity , Breast cancer

2024

Gene expression datasets offer a wide range of information about various biological processes. However, it is difficult to find the important genes among the high-dimensional biological data due to the existence of redundant and unimportant ones. Numerous Feature Selection (FS) techniques have been created to get beyond this obstacle. Improving the efficacy and precision of FS methodologies is crucial in order to identify significant genes amongst complicated complex biological data. In this work, we present a novel approach to gene selection called the Sine Cosine and Cuckoo Search Algorithm (SCACSA). This hybrid method is designed to work with well-known machine learning classifiers Support Vector Machine (SVM). Using a dataset on breast cancer, the hybrid gene selection algorithm's performance is carefully assessed and compared to other feature selection methods. To improve the quality of the feature set, we use minimum Redundancy Maximum Relevance (mRMR) as a filtering strategy in the first step. The hybrid SCACSA method is then used to enhance and optimize the gene selection procedure. Lastly, we classify the dataset according to the chosen genes by using the SVM classifier. Given the pivotal role gene selection plays in unraveling complex biological datasets, SCACSA stands out as an invaluable tool for the classification of cancer datasets. The findings help medical practitioners make well-informed decisions about cancer diagnosis and provide them with a valuable tool for navigating the complex world of gene expression data.

Journal Article

Share this book

Add to My Shelf

Bearing Fault Diagnosis Using a Particle Swarm Optimization-Least Squares Wavelet Support Vector Machine Classifier

by Van, Mien , Kang, Hee Jun , Hoang, Duy Tang in Accuracy , Algorithms , Classification

2020

Bearing is one of the key components of a rotating machine. Hence, monitoring health condition of the bearing is of paramount importace. This paper develops a novel particle swarm optimization (PSO)-least squares wavelet support vector machine (PSO-LSWSVM) classifier, which is designed based on a combination between a PSO, a least squares procedure, and a new wavelet kernel function-based support vector machine (SVM), for bearing fault diagnosis. In this work, bearing fault classification is transformed into a pattern recognition problem, which consists of three stages of data processing. Firstly, a rich information dataset is built by extracting the features from the signals, which are decomposed by the nonlocal means (NLM) and empirical mode decomposition (EMD). Secondly, a minimum-redundancy maximum-relevance (mRMR) method is employed to determine a subset of feature that can provide an optimal performance. Thirdly, a novel classifier, namely LSWSVM, is proposed with the aid of a PSO, to provide higher classification accuracy. The key innovative science of this work is to propropose a new classifier with the aid of an new wavelet kernel type to increase the classification precision of bearing fault diagnosis. The merit features of the proposed approach are demonstrated based on a benchmark bearing dataset and a comprehensive comparison procedure.

Journal Article

Share this book

Add to My Shelf

Hyperspectral indices data fusion-based machine learning enhanced by MRMR algorithm for estimating maize chlorophyll content

by Nxumalo, Gift Siphiwe , Nagy, Attila , Bódi, Erika Budayné in Algorithms , Chlorophyll , Corn

2024

Accurate estimation of chlorophyll is essential for monitoring maize health and growth, for which hyperspectral imaging provides rich data. In this context, this paper presents an innovative method to estimate maize chlorophyll by combining hyperspectral indices and advanced machine learning models. The methodology of this study focuses on the development of machine learning models using proprietary hyperspectral indices to estimate corn chlorophyll content. Six advanced machine learning models were used, including robust linear stepwise regression, support vector machines (SVM), fine Gaussian SVM, Matern 5/2 Gaussian stepwise regression, and three-layer neural network. The MRMR algorithm was integrated into the process to improve feature selection by identifying the most informative spectral bands, thereby reducing data redundancy and improving model performance. The results showed significant differences in the performance of the six machine learning models applied to chlorophyll estimation. Among the models, the Matern 5/2 Gaussian process regression model showed the highest prediction accuracy. The model achieved R 2 = 0.71 for the training set, RMSE = 338.46 µg/g and MAE = 264.30 µg/g. In the case of the validation set, the Matern 5/2 Gaussian process regression model further improved its performance, reaching R 2 =0.79, RMSE=296.37 µg/g, MAE=237.12 µg/g. These metrics show that Matern’s 5/2 Gaussian process regression model combined with the MRMR algorithm to select optimal traits is highly effective in predicting corn chlorophyll content. This research has important implications for precision agriculture, particularly for real-time monitoring and management of crop health. Accurate estimation of chlorophyll allows farmers to take timely and targeted action.

Journal Article

Share this book

Add to My Shelf

A two-phase cuckoo search based approach for gene selection and deep learning classification of cancer disease using gene expression data with a novel fitness function

by Aziz, Rabia Musheer , Joshi, Amol Avinash in Accuracy , Algorithms , Cancer

2024

The early detection of cancer is of paramount importance in the medical field, as it can lead to more precise and effective interventions for successful cancer treatments. Cancer datasets typically contain numerous gene expression levels as features but with a limited number of samples. Thus, feature selection is a crucial initial step to streamline prediction algorithms. These selected features, or genes, play a pivotal role in cancer identification, treatment selection, and variation analysis among different techniques. To address this challenge, present two novel methodologies, by combining Cuckoo Search (CS) and Spider Monkey Optimization (SMO), referred to as SMOCS (Cuckoo Search followed by Spider Monkey Optimization) and CSSMO (Spider Monkey Optimization followed by Cuckoo Search). These approaches are designed for harnessing the strengths of both metaheuristic algorithms to identify a subset of genes that aid in early-stage cancer prediction. Additionally, to enhance the accuracy of the both algorithms, we employ a gene expression reduction method known as minimum Redundancy Maximum Relevance (mRMR) to reduce redundancy in cancer datasets. Subsequently, these gene subsets are classified using Deep Learning (DL) to identify distinct groups or classes associated with specific cancer types. We evaluate the performance of our proposed approaches using six different cancer datasets, assessing cancer sample classification and prediction through metrics such as Recall, Precision, F1-Score, and confusion matrix analysis. Our gene selection methods, in conjunction with DL, achieves significantly improved prediction accuracy when applied to large gene expression datasets compared to existing Deep Learning (DL) and Machine learning models. Experimental results shows that both SMOCS and CSSMO tend to classify cancer with high prediction accuracy, but SMOCS algorithm gives higher prediction accuracy for all the six datasets used with a maximum accuracy of 100%.

Journal Article

Share this book

Add to My Shelf

Intelligent Fault Diagnosis of Hydraulic Multi-Way Valve Using the Improved SECNN-GRU Method with mRMR Feature Selection

by Guan, Hanlin , Xiang, Jiawei , Yan, Ren in Accuracy , Algorithms , Analysis

2023

Hydraulic multi-way valves as core components are widely applied in engineering machinery, mining machinery, and metallurgical industries. Due to the harsh working environment, faults in hydraulic multi-way valves are prone to occur, and the faults that occur are hidden. Moreover, hydraulic multi-way valves are expensive, and multiple experiments are difficult to replicate to obtain true fault data. Therefore, it is not easy to achieve fault diagnosis of hydraulic multi-way valves. To address this problem, an effective intelligent fault diagnosis method is proposed using an improved Squeeze-Excitation Convolution Neural Network and Gated Recurrent Unit (SECNN-GRU). The effectiveness of the method is verified by designing a simulation model for a hydraulic multi-way valve to generate fault data, as well as the actual data obtained by establishing an experimental platform for a directional valve. In this method, shallow statistical features are first extracted from data containing fault information, and then fault features with high correlation with fault types are selected using the Maximum Relevance Minimum Redundancy algorithm (mRMR). Next, spatial dimension features are extracted through CNN. By adding the Squeeze-Excitation Block, different weights are assigned to features to obtain weighted feature vectors. Finally, the time-dimension features of the weighted feature vectors are extracted and fused through GRU, and the fused features are classified using a classifier. The fault data obtained from the simulation model verifies that the average diagnostic accuracy of this method can reach 98.94%. The average accuracy of this method can reach 92.10% (A1 sensor as an example) through experimental data validation of the directional valve. Compared with other intelligent diagnostic algorithms, the proposed method has better stationarity and higher diagnostic accuracy, providing a feasible solution for fault diagnosis of the hydraulic multi-way valve.

Journal Article

Share this book

Add to My Shelf

A feature selection-based oblique hyperplane for oblique random survival forests

by Oulhaj, Abderrahim , Mehedy Masud, Mohammad , Abdullahi, Aminu S. in Algorithms , Artificial intelligence , Business metrics

2026

Background Recently, several machine learning (ML) algorithms for right-censored data, including Oblique Random Survival Forest (ORSF), have been utilized to develop risk prediction tools in cardiovascular disease (CVD) and oncology research. ORSF employs hyperplanes to represent a singular split, as opposed to the conventional univariate (axis) based approach, such as Random Survival Forest (RSF). However, ORSF encounters a hurdle in identifying the relevant features while constructing the hyperplane. Hence, we aim to propose and evaluate the predictive performance of three novel feature selection-based hyperplanes for ORSF. Methods We propose three variants of ORSF: (a) ORSF-LASSO, based on LASSO regression, (b) ORSF-MRMR, based on the Minimum Redundancy Maximum Relevance (MRMR) framework, and (c) ORSF-CARS, based on correlation-adjusted regression survival (CARS) score. Nine versions of these variants were evaluated against the Penalized Cox Proportional Hazards Model, RSF, and the original ORSF on ten public CVD and oncology datasets using Harrell’s C-index, D-Calibration, and integrated Brier score (IBS). The models were trained using three-fold cross-validation in R version 4.2.1 and the mlr3 ecosystem. Results The newly proposed models have shown high discrimination in CVD datasets, with ORSF-LASSO-min being the most consistently best-performing model. Furthermore, in CVD datasets, one of the proposed models demonstrated the lowest D-calibration compared to existing models. In oncology datasets, one or more new models outperformed existing models in three out of five datasets. ORSF-MRMR-3q, a novel model, exhibited the lowest D-calibration across two oncology datasets. The sensitivity analysis indicated that the performance of the newly proposed methods aligned with the primary analysis. Conclusion Our findings suggest that the three proposed new variants have the potential to predict time-to-event outcomes in CVD and oncology prognosis research. Nonetheless, the proposed methods need to be validated in comprehensive and varied datasets with prolonged follow-up periods and across multiple health domains.

Journal Article

Share this book

Add to My Shelf

Automatic Life Detection Based on Efficient Features of Ground-Penetrating Rescue Radar Signals

by Gidion, Gunnar , Rupitsch, Stefan J. , Reindl, Leonhard M. in Analysis , binary classification , Creative process

2023

Good feature engineering is a prerequisite for accurate classification, especially in challenging scenarios such as detecting the breathing of living persons trapped under building rubble using bioradar. Unlike monitoring patients’ breathing through the air, the measuring conditions of a rescue bioradar are very complex. The ultimate goal of search and rescue is to determine the presence of a living person, which requires extracting representative features that can distinguish measurements with the presence of a person and without. To address this challenge, we conducted a bioradar test scenario under laboratory conditions and decomposed the radar signal into different range intervals to derive multiple virtual scenes from the real one. We then extracted physical and statistical quantitative features that represent a measurement, aiming to find those features that are robust to the complexity of rescue-radar measuring conditions, including different rubble sites, breathing rates, signal strengths, and short-duration disturbances. To this end, we utilized two methods, Analysis of Variance (ANOVA), and Minimum Redundancy Maximum Relevance (MRMR), to analyze the significance of the extracted features. We then trained the classification model using a linear kernel support vector machine (SVM). As the main result of this work, we identified an optimal feature set of four features based on the feature ranking and the improvement in the classification accuracy of the SVM model. These four features are related to four different physical quantities and independent from different rubble sites.

Journal Article

Share this book

Add to My Shelf

Minimum redundancy maximum relevance (mRMR) based feature selection from endoscopic images for automatic gastrointestinal polyp detection

by Billah, Mustain , Waheed, Sajjad in Artificial neural networks , Classifiers , Codes

2020

In this paper, a computer based system has been proposed as a support to gastrointestinal polyp detection. It can detect and classify gastrointestinal polyps from endoscopic video. Color wavelet (CW) features and convolutional neural network (CNN) features of endoscopic video frames are extracted. Mutual information based feature selection technique-Minimum redundancy maximum relevance (mRMR) is used to scale down feature vector. Instead of using a single classifier, Bootstrap Aggregrating (Bagging)- an ensemble classifier is used. Proposed system has been assessed against different public databases and our own datasets. Evaluation shows that, the system outperforms the existing methods.

Journal Article

Share this book

Add to My Shelf

Enhanced Feature Engineering Symmetry Model Based on Novel Dolphin Swarm Algorithm

by Gao, Fei , Abisado, Mideth in Ablation , Accuracy , Algorithms

2025

This study addresses the challenges of high-dimensional data, such as the curse of dimensionality and feature redundancy, which can be viewed as an inherent asymmetry in the data space. To restore a balanced symmetry and build a more complete feature representation, we propose an enhanced feature engineering model (EFEM) that employs a novel dual-strategy approach. First, we present a symmetrical feature selection algorithm that combines an improved Dolphin Swarm Algorithm (DSA) with the Maximum Relevance–Minimum Redundancy (mRMR) criterion. This method not only selects an optimal, high-relevance feature subset, but also identifies the remaining features as a complementary, redundant subset. Second, an ensemble learning-based feature reconstruction algorithm is introduced to mine potential information from these redundant features. This process transforms fragmented, redundant information into a new, synthetic feature, thereby establishing a form of information symmetry with the selected optimal subset. Finally, the EFEM constructs a high-performance feature space by symmetrically integrating the optimal feature subset with the synthetic feature. The model’s superior performance is extensively validated on nine standard UCI regression datasets, with comparative analysis showing that it significantly outperforms similar algorithms and achieves an average goodness-of-fit of 0.9263. The statistical significance of this improvement is confirmed by the Wilcoxon signed-rank test. Comprehensive analyses of parameter sensitivity, robustness, convergence, and runtime, as well as ablation experiments, further validate the efficiency and stability of the proposed algorithm. The successful application of the EFEM in a real-world product demand forecasting task fully demonstrates its practical value in complex scenarios.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter