Catalogue Search | MBRL

A review of unsupervised feature selection methods

by Martínez-Trinidad, José Fco , Ariel, Carrasco-Ochoa J , Solorio-Fernández Saúl in Algorithms , Artificial intelligence , Classification

2020

In recent years, unsupervised feature selection methods have raised considerable interest in many research areas; this is mainly due to their ability to identify and select relevant features without needing class label information. In this paper, we provide a comprehensive and structured review of the most relevant and recent unsupervised feature selection methods reported in the literature. We present a taxonomy of these methods and describe the main characteristics and the fundamental ideas they are based on. Additionally, we summarized the advantages and disadvantages of the general lines in which we have categorized the methods analyzed in this review. Moreover, an experimental comparison among the most representative methods of each approach is also presented. Finally, we discuss some important open challenges in this research area.

Journal Article

Share this book

Add to My Shelf

A Review of Feature Selection and Its Methods

by Anuradha, J. , Venkatesh, B. in Algorithms , Dimensionality Reduction (DR) , Feature Extraction (FE)

2019

Nowadays, being in digital era the data generated by various applications are increasing drastically both row-wise and column wise; this creates a bottleneck for analytics and also increases the burden of machine learning algorithms that work for pattern recognition. This cause of dimensionality can be handled through reduction techniques. The Dimensionality Reduction (DR) can be handled in two ways namely Feature Selection (FS) and Feature Extraction (FE). This paper focuses on a survey of feature selection methods, from this extensive survey we can conclude that most of the FS methods use static data. However, after the emergence of IoT and web-based applications, the data are generated dynamically and grow in a fast rate, so it is likely to have noisy data, it also hinders the performance of the algorithm. With the increase in the size of the data set, the scalability of the FS methods becomes jeopardized. So the existing DR algorithms do not address the issues with the dynamic data. Using FS methods not only reduces the burden of the data but also avoids overfitting of the model.

Journal Article

Share this book

Add to My Shelf

Unsupervised feature selection via multiple graph fusion and feature weight learning

by Zhu, Xinzhong , Zheng, Xiao , Zhu, En in Algorithms , Cognitive tasks , Computer Science

2023

Unsupervised feature selection attempts to select a small number of discriminative features from original high-dimensional data and preserve the intrinsic data structure without using data labels. As an unsupervised learning task, most previous methods often use a coefficient matrix for feature reconstruction or feature projection, and a certain similarity graph is widely utilized to regularize the intrinsic structure preservation of original data in a new feature space. However, a similarity graph with poor quality could inevitably affect the final results. In addition, designing a rational and effective feature reconstruction/projection model is not easy. In this paper, we introduce a novel and effective unsupervised feature selection method via multiple graph fusion and feature weight learning (MGF 2 WL) to address these issues. Instead of learning the feature coefficient matrix, we directly learn the weights of different feature dimensions by introducing a feature weight matrix, and the weighted features are projected into the label space. Aiming to exploit sufficient relation of data samples, we develop a graph fusion term to fuse multiple predefined similarity graphs for learning a unified similarity graph, which is then deployed to regularize the local data structure of original data in a projected label space. Finally, we design a block coordinate descent algorithm with a convergence guarantee to solve the resulting optimization problem. Extensive experiments with sufficient analyses on various datasets are conducted to validate the efficacy of our proposed MGF 2 WL.

Journal Article

Share this book

Add to My Shelf

Optimizing epileptic seizure recognition performance with feature scaling and dropout layers

by Omar, Ahmed , Abd El-Hafeez, Tarek in Accuracy , Artificial Intelligence , Artificial neural networks

2024

Epilepsy is a widespread neurological disorder characterized by recurring seizures that have a significant impact on individuals' lives. Accurately recognizing epileptic seizures is crucial for proper diagnosis and treatment. Deep learning models have shown promise in improving seizure recognition accuracy. However, optimizing their performance for this task remains challenging. This study presents a new approach to optimize epileptic seizure recognition using deep learning models. The study employed a dataset of Electroencephalography (EEG) recordings from multiple subjects and trained nine deep learning architectures with different preprocessing techniques. By combining a 1D convolutional neural network (Conv1D) with a Long Short-Term Memory (LSTM) network, we developed the Conv1D + LSTM architecture. This architecture, augmented with dropout layers, achieved an effective test accuracy of 0.993. The LSTM architecture alone achieved a slightly lower accuracy of 0.986. Additionally, the Bidirectional LSTM (BiLSTM) and Gated Recurrent Unit (GRU) architectures performed exceptionally well, with accuracies of 0.983 and 0.984, respectively. Notably, standard scaling proved to be advantageous, significantly improving the accuracy of both BiLSTM and GRU compared to MinMax scaling. These models consistently achieved high test accuracies across different percentages of Principal Component Analysis (PCA), with the best results obtained when retaining 50% and 90% of the features. Chi-square feature selection also enhanced the classification performance of BiLSTM and GRU models. The study reveals that different deep learning architectures respond differently to feature scaling, PCA, and feature selection methods. Understanding these nuances can lead to optimized models for epileptic seizure recognition, ultimately improving patient outcomes and quality of life.

Journal Article

Share this book

Add to My Shelf

Face Alignment by Explicit Shape Regression

by Wei, Yichen , Cao, Xudong , Sun, Jian in Alignment , Artificial Intelligence , Computer Imaging

2014

We present a very efficient, highly accurate, “Explicit Shape Regression” approach for face alignment. Unlike previous regression-based approaches, we directly learn a vectorial regression function to infer the whole facial shape (a set of facial landmarks) from the image and explicitly minimize the alignment errors over the training data. The inherent shape constraint is naturally encoded into the regressor in a cascaded learning framework and applied from coarse to fine during the test, without using a fixed parametric shape model as in most previous methods. To make the regression more effective and efficient, we design a two-level boosted regression, shape indexed features and a correlation-based feature selection method. This combination enables us to learn accurate models from large training data in a short time (20 min for 2,000 training images), and run regression extremely fast in test (15 ms for a 87 landmarks shape). Experiments on challenging data show that our approach significantly outperforms the state-of-the-art in terms of both accuracy and efficiency.

Journal Article

Share this book

Add to My Shelf

A diabetes prediction model based on Boruta feature selection and ensemble learning

by Xin, Yinbo , Zhou, Hongfang , Li, Suli in Accuracy , Algorithms , Analysis

2023

Background and objective As a common chronic disease, diabetes is called the “second killer” among modern diseases. Currently, there is no medical cure for diabetes. We can only rely on medication for auxiliary treatment. However, many diabetic patients still die each year. In addition, a considerable number of people do not pay attention to their physical health or opt out of treatment due to lack of money, which eventually leads to various complications. Therefore, diagnosing diabetes at an early stage and intervening early is necessary; thus, developing an early detection method for diabetes is essential. Methods In this study, a diabetes prediction model based on Boruta feature selection and ensemble learning is proposed. The model contains the use of Boruta feature selection, the extraction of salient features from datasets, the use of the K-Means++ algorithm for unsupervised clustering of data and stacking of an ensemble learning method for classification. It has been validated on a diabetes dataset. Results The experiments were performed on the PIMA Indian diabetes dataset. The model was evaluated by accuracy, precision and F1 index. The obtained results show that the accuracy rate of the model reaches 98% and achieves good results. Conclusion Compared with other diabetes prediction models, this model achieved better results, and the obtained results indicate that this model is superior to other models in diabetes prediction and has better performance.

Journal Article

Share this book

Add to My Shelf

Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network

by Baloch, Naveed Khan , Farooq, Misbah , Hussain, Fawad in Accuracy , Acoustics , Algorithms

2020

Speech emotion recognition (SER) plays a significant role in human–machine interaction. Emotion recognition from speech and its precise classification is a challenging task because a machine is unable to understand its context. For an accurate emotion classification, emotionally relevant features must be extracted from the speech data. Traditionally, handcrafted features were used for emotional classification from speech signals; however, they are not efficient enough to accurately depict the emotional states of the speaker. In this study, the benefits of a deep convolutional neural network (DCNN) for SER are explored. For this purpose, a pretrained network is used to extract features from state-of-the-art speech emotional datasets. Subsequently, a correlation-based feature selection technique is applied to the extracted features to select the most appropriate and discriminative features for SER. For the classification of emotions, we utilize support vector machines, random forests, the k-nearest neighbors algorithm, and neural network classifiers. Experiments are performed for speaker-dependent and speaker-independent SER using four publicly available datasets: the Berlin Dataset of Emotional Speech (Emo-DB), Surrey Audio Visual Expressed Emotion (SAVEE), Interactive Emotional Dyadic Motion Capture (IEMOCAP), and the Ryerson Audio Visual Dataset of Emotional Speech and Song (RAVDESS). Our proposed method achieves an accuracy of 95.10% for Emo-DB, 82.10% for SAVEE, 83.80% for IEMOCAP, and 81.30% for RAVDESS, for speaker-dependent SER experiments. Moreover, our method yields the best results for speaker-independent SER with existing handcrafted features-based SER approaches.

Journal Article

Share this book

Add to My Shelf

Spatial Transferability of Random Forest Models for Crop Type Classification Using Sentinel-1 and Sentinel-2

by Orynbaikyzy, Aiym , Gessner, Ursula , Conrad, Christopher in Accuracy , Agriculture , Cereals

2022

Large-scale crop type mapping often requires prediction beyond the environmental settings of the training sites. Shifts in crop phenology, field characteristics, or ecological site conditions in the previously unseen area, may reduce the classification performance of machine learning classifiers that often overfit to the training sites. This study aims to assess the spatial transferability of Random Forest models for crop type classification across Germany. The effects of different input datasets, i.e., only optical, only Synthetic Aperture Radar (SAR), and optical-SAR data combination, and the impact of spatial feature selection were systematically tested to identify the optimal approach that shows the highest accuracy in the transfer region. The spatial feature selection, a feature selection approach combined with spatial cross-validation, should remove features that carry site-specific information in the training data, which in turn can reduce the accuracy of the classification model in previously unseen areas. Seven study sites distributed over Germany were analyzed using reference data for the major 11 crops grown in the year 2018. Sentinel-1 and Sentinel-2 data from October 2017 to October 2018 were used as input. The accuracy estimation was performed using the spatially independent sample sets. The results of the optical-SAR combination outperformed those of single sensors in the training sites (maximum F1-score–0.85), and likewise in the areas not covered by training data (maximum F1-score–0.79). Random forest models based on only SAR features showed the lowest accuracy losses when transferred to unseen regions (average F1loss–0.04). In contrast to using the entire feature set, spatial feature selection substantially reduces the number of input features while preserving good predictive performance on unseen sites. Altogether, applying spatial feature selection to a combination of optical-SAR features or using SAR-only features is beneficial for large-scale crop type classification where training data is not evenly distributed over the complete study region.

Journal Article

Share this book

Add to My Shelf

An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost

by Demir, Selçuk , Sahin, Emrehan Kutlug in Accuracy , Adaptive algorithms , Algorithms

2023

Previous major earthquake events have revealed that soils susceptible to liquefaction are one of the factors causing significant damages to the structures. Therefore, accurate prediction of the liquefaction phenomenon is an important task in earthquake engineering. Over the past decade, several researchers have been extensively applied machine learning (ML) methods to predict soil liquefaction. This paper presents the prediction of soil liquefaction from the SPT dataset by using relatively new and robust tree-based ensemble algorithms, namely Adaptive Boosting, Gradient Boosting Machine, and eXtreme Gradient Boosting (XGBoost). The innovation points introduced in this paper are presented briefly as follows. Firstly, Stratified Random Sampling was utilized to ensure equalized sampling between each class selection. Secondly, feature selection methods such as Recursive Feature Elimination, Boruta, and Stepwise Regression were applied to develop models with a high degree of accuracy and minimal complexity by selecting the variables with significant predictive features. Thirdly, the performance of ML algorithms with feature selection methods was compared in terms of four performance metrics, Overall Accuracy, Precision, Recall, and F-measure to select the best model. Lastly, the best predictive model was determined using a statistical significance test called Wilcoxon’s sign rank test. Furthermore, computational cost analyses of the tree-based ensemble algorithms were performed based on parallel and non-parallel processing. The results of the study suggest that all developed tree-based ensemble models could reliably estimate soil liquefaction. In conclusion, according to both validation and statistical results, the XGBoost with the Boruta model achieved the most stable and better prediction performance than the other models in all considered cases.

Journal Article

Share this book

Add to My Shelf

Thyroid Disease Prediction Using Selective Features and Machine Learning Techniques

by Rodríguez, Carmen Lili , Chaganti, Rajasekhar , Rustam, Furqan in Accuracy , Algorithms , Classification

2022

Thyroid disease prediction has emerged as an important task recently. Despite existing approaches for its diagnosis, often the target is binary classification, the used datasets are small-sized and results are not validated either. Predominantly, existing approaches focus on model optimization and the feature engineering part is less investigated. To overcome these limitations, this study presents an approach that investigates feature engineering for machine learning and deep learning models. Forward feature selection, backward feature elimination, bidirectional feature elimination, and machine learning-based feature selection using extra tree classifiers are adopted. The proposed approach can predict Hashimoto’s thyroiditis (primary hypothyroid), binding protein (increased binding protein), autoimmune thyroiditis (compensated hypothyroid), and non-thyroidal syndrome (NTIS) (concurrent non-thyroidal illness). Extensive experiments show that the extra tree classifier-based selected feature yields the best results with 0.99 accuracy and an F1 score when used with the random forest classifier. Results suggest that the machine learning models are a better choice for thyroid disease detection regarding the provided accuracy and the computational complexity. K-fold cross-validation and performance comparison with existing studies corroborate the superior performance of the proposed approach.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter