Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
244 result(s) for "Recursive feature elimination"
Sort by:
Predicting chatter using machine learning and acoustic signals from low-cost microphones
Machining chatter is a phenomenon resulting from self-oscillation between a machining tool and workpiece. This self-oscillation results in variation on the machined product that reduces the ability to meet desired specifications. Chatter is a widely studied topic as it directly relates to the quality of machined products. This study details the application of a Random Forest (RF) classifier with Recursive Feature Elimination (RFE) to machining audio collected by a single microphone during down-milling operations. This approach allows straightforward feature elimination that results in an easily understood set of analyzed dimensions. Stability is predicted solely based on the classification output of the RF classifier. Our approach proves highly predictive with consistent machining setup and a small sample set. We also review transferability between machining setups and present key findings. Our RF approach demonstrates the ability to analyze and classify chatter through a low-cost approach with limited training data required. The motivation for using a single microphone is to enable detection on machines without other sensors, such as accelerometers, present in the machining setup. The value of the in-process sensor and chatter classifier is highlighted because the machining setup included asymmetric dynamics that reduced the accuracy of the traditional analytical stability solution. We see a natural progression to deploying this audio-only methodology with real-time processing and classification using either a laptop or smartphone. This progression will allow visual indicators during the machining process that can alert machinists of progression into unstable machining processes.
SVM-RFE: selection and visualization of the most relevant features through non-linear kernels
Background Support vector machines (SVM) are a powerful tool to analyze data with a number of predictors approximately equal or larger than the number of observations. However, originally, application of SVM to analyze biomedical data was limited because SVM was not designed to evaluate importance of predictor variables. Creating predictor models based on only the most relevant variables is essential in biomedical research. Currently, substantial work has been done to allow assessment of variable importance in SVM models but this work has focused on SVM implemented with linear kernels. The power of SVM as a prediction model is associated with the flexibility generated by use of non-linear kernels. Moreover, SVM has been extended to model survival outcomes. This paper extends the Recursive Feature Elimination (RFE) algorithm by proposing three approaches to rank variables based on non-linear SVM and SVM for survival analysis. Results The proposed algorithms allows visualization of each one the RFE iterations, and hence, identification of the most relevant predictors of the response variable. Using simulation studies based on time-to-event outcomes and three real datasets, we evaluate the three methods, based on pseudo-samples and kernel principal component analysis, and compare them with the original SVM-RFE algorithm for non-linear kernels. The three algorithms we proposed performed generally better than the gold standard RFE for non-linear kernels, when comparing the truly most relevant variables with the variable ranks produced by each algorithm in simulation studies. Generally, the RFE-pseudo-samples outperformed the other three methods, even when variables were assumed to be correlated in all tested scenarios. Conclusions The proposed approaches can be implemented with accuracy to select variables and assess direction and strength of associations in analysis of biomedical data using SVM for categorical or time-to-event responses. Conducting variable selection and interpreting direction and strength of associations between predictors and outcomes with the proposed approaches, particularly with the RFE-pseudo-samples approach can be implemented with accuracy when analyzing biomedical data. These approaches, perform better than the classical RFE of Guyon for realistic scenarios about the structure of biomedical data.
An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost
Previous major earthquake events have revealed that soils susceptible to liquefaction are one of the factors causing significant damages to the structures. Therefore, accurate prediction of the liquefaction phenomenon is an important task in earthquake engineering. Over the past decade, several researchers have been extensively applied machine learning (ML) methods to predict soil liquefaction. This paper presents the prediction of soil liquefaction from the SPT dataset by using relatively new and robust tree-based ensemble algorithms, namely Adaptive Boosting, Gradient Boosting Machine, and eXtreme Gradient Boosting (XGBoost). The innovation points introduced in this paper are presented briefly as follows. Firstly, Stratified Random Sampling was utilized to ensure equalized sampling between each class selection. Secondly, feature selection methods such as Recursive Feature Elimination, Boruta, and Stepwise Regression were applied to develop models with a high degree of accuracy and minimal complexity by selecting the variables with significant predictive features. Thirdly, the performance of ML algorithms with feature selection methods was compared in terms of four performance metrics, Overall Accuracy, Precision, Recall, and F-measure to select the best model. Lastly, the best predictive model was determined using a statistical significance test called Wilcoxon’s sign rank test. Furthermore, computational cost analyses of the tree-based ensemble algorithms were performed based on parallel and non-parallel processing. The results of the study suggest that all developed tree-based ensemble models could reliably estimate soil liquefaction. In conclusion, according to both validation and statistical results, the XGBoost with the Boruta model achieved the most stable and better prediction performance than the other models in all considered cases.
Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in Soybean
Recent substantial advances in high-throughput field phenotyping have provided plant breeders with affordable and efficient tools for evaluating a large number of genotypes for important agronomic traits at early growth stages. Nevertheless, the implementation of large datasets generated by high-throughput phenotyping tools such as hyperspectral reflectance in cultivar development programs is still challenging due to the essential need for intensive knowledge in computational and statistical analyses. In this study, the robustness of three common machine learning (ML) algorithms, multilayer perceptron (MLP), support vector machine (SVM), and random forest (RF), were evaluated for predicting soybean ( Glycine max ) seed yield using hyperspectral reflectance. For this aim, the hyperspectral reflectance data for the whole spectra ranged from 395 to 1005 nm, which were collected at the R4 and R5 growth stages on 250 soybean genotypes grown in four environments. The recursive feature elimination (RFE) approach was performed to reduce the dimensionality of the hyperspectral reflectance data and select variables with the largest importance values. The results indicated that R5 is more informative stage for measuring hyperspectral reflectance to predict seed yields. The 395 nm reflectance band was also identified as the high ranked band in predicting the soybean seed yield. By considering either full or selected variables as the input variables, the ML algorithms were evaluated individually and combined-version using the ensemble–stacking (E–S) method to predict the soybean yield. The RF algorithm had the highest performance with a value of 84% yield classification accuracy among all the individual tested algorithms. Therefore, by selecting RF as the metaClassifier for E–S method, the prediction accuracy increased to 0.93, using all variables, and 0.87, using selected variables showing the success of using E–S as one of the ensemble techniques. This study demonstrated that soybean breeders could implement E–S algorithm using either the full or selected spectra reflectance to select the high-yielding soybean genotypes, among a large number of genotypes, at early growth stages.
Hybrid-Recursive Feature Elimination for Efficient Feature Selection
As datasets continue to increase in size, it is important to select the optimal feature subset from the original dataset to obtain the best performance in machine learning tasks. Highly dimensional datasets that have an excessive number of features can cause low performance in such tasks. Overfitting is a typical problem. In addition, datasets that are of high dimensionality can create shortages in space and require high computing power, and models fitted to such datasets can produce low classification accuracies. Thus, it is necessary to select a representative subset of features by utilizing an efficient selection method. Many feature selection methods have been proposed, including recursive feature elimination. In this paper, a hybrid-recursive feature elimination method is presented which combines the feature-importance-based recursive feature elimination methods of the support vector machine, random forest, and generalized boosted regression algorithms. From the experiments, we confirm that the performance of the proposed method is superior to that of the three single recursive feature elimination methods.
A Deep Neural Network for Early Detection and Prediction of Chronic Kidney Disease
Diabetes and high blood pressure are the primary causes of Chronic Kidney Disease (CKD). Glomerular Filtration Rate (GFR) and kidney damage markers are used by researchers around the world to identify CKD as a condition that leads to reduced renal function over time. A person with CKD has a higher chance of dying young. Doctors face a difficult task in diagnosing the different diseases linked to CKD at an early stage in order to prevent the disease. This research presents a novel deep learning model for the early detection and prediction of CKD. This research objectives to create a deep neural network and compare its performance to that of other contemporary machine learning techniques. In tests, the average of the associated features was used to replace all missing values in the database. After that, the neural network’s optimum parameters were fixed by establishing the parameters and running multiple trials. The foremost important features were selected by Recursive Feature Elimination (RFE). Hemoglobin, Specific Gravity, Serum Creatinine, Red Blood Cell Count, Albumin, Packed Cell Volume, and Hypertension were found as key features in the RFE. Selected features were passed to machine learning models for classification purposes. The proposed Deep neural model outperformed the other four classifiers (Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Logistic regression, Random Forest, and Naive Bayes classifier) by achieving 100% accuracy. The proposed approach could be a useful tool for nephrologists in detecting CKD.
Recursive Feature Elimination with Cross-Validation with Decision Tree: Feature Selection Method for Machine Learning-Based Intrusion Detection Systems
The frequency of cyber-attacks on the Internet of Things (IoT) networks has significantly increased in recent years. Anomaly-based network intrusion detection systems (NIDSs) offer an additional layer of network protection by detecting and reporting the infamous zero-day attacks. However, the efficiency of real-time detection systems relies on several factors, including the number of features utilized to make a prediction. Thus, minimizing them is crucial as it implies faster prediction and lower storage space. This paper utilizes recursive feature elimination with cross-validation using a decision tree model as an estimator (DT-RFECV) to select an optimal subset of 15 of UNSW-NB15’s 42 features and evaluates them using several ML classifiers, including tree-based ones, such as random forest. The proposed NIDS exhibits an accurate prediction model for network flow with a binary classification accuracy of 95.30% compared to 95.56% when using the entire feature set. The reported scores are comparable to those attained by the state-of-the-art systems despite decreasing the number of utilized features by about 65%.
Ensemble Boosting and Bagging Based Machine Learning Models for Groundwater Potential Prediction
Due to the rapidly increasing demand for groundwater, as one of the principal freshwater resources, there is an urge to advance novel prediction systems to more accurately estimate the groundwater potential for an informed groundwater resource management. Ensemble machine learning methods are generally reported to produce more accurate results. However, proposing the novel ensemble models along with running comparative studies for performance evaluation of these models would be equally essential to precisely identify the suitable methods. Thus, the current study is designed to provide knowledge on the performance of the four ensemble models i.e., Boosted generalized additive model (GamBoost), adaptive Boosting classification trees (AdaBoost), Bagged classification and regression trees (Bagged CART), and random forest (RF). To build the models, 339 groundwater resources’ locations and the spatial groundwater potential conditioning factors were used. Thereafter, the recursive feature elimination (RFE) method was applied to identify the key features. The RFE specified that the best number of features for groundwater potential modeling was 12 variables among 15 (with a mean Accuracy of about 0.84). The modeling results indicated that the Bagging models (i.e., RF and Bagged CART) had a higher performance than the Boosting models (i.e., AdaBoost and GamBoost). Overall, the RF model outperformed the other models (with accuracy = 0.86, Kappa = 0.67, Precision = 0.85, and Recall = 0.91). Also, the topographic position index’s predictive variables, valley depth, drainage density, elevation, and distance from stream had the highest contribution in the modeling process. Groundwater potential maps predicted in this study can help water resources managers and policymakers in the fields of watershed and aquifer management to preserve an optimal exploit from this important freshwater.
Locating a disinfection facility for hazardous healthcare waste in the COVID-19 era: a novel approach based on Fermatean fuzzy ITARA-MARCOS and random forest recursive feature elimination algorithm
Hazardous healthcare waste (HCW) management system is one of the most critical urban systems affected by the COVID-19 pandemic due to the increase in waste generation rate in hospitals and medical centers dealing with infected patients as well as the degree of hazardousness of generated waste due to exposure to the virus. In this regard, waste network flow would face severe problems without taking care of hazardous waste through disinfection facilities. For this purpose, this study aims to develop an advanced decision support system based on a multi-stage model that was combined with the random forest recursive feature elimination (RF-RFE) algorithm, the indifference threshold-based attribute ratio analysis (ITARA), and measurement of alternatives and ranking according to compromise solution (MARCOS) methods into a unique framework under the Fermatean fuzzy environment. In the first stage, the innovative Fermatean fuzzy RF-RFE algorithm extracts core criteria from a finite set of initial criteria. In the second stage, the novel Fermatean fuzzy ITARA determines the semi-objective importance of the core criteria. In the third stage, the new Fermatean fuzzy MARCOS method ranks alternatives. A real-life case study in Istanbul, Turkey, illustrates the applicability of the introduced methodology. Our empirical findings indicate that “Pendik” is the best among five candidate locations for sitting a new disinfection facility for hazardous HCW in Istanbul. The sensitivity and comparative analyses confirmed that our approach is highly robust and reliable. This approach could be used to tackle other critical multi-dimensional problems related to COVID-19 and support sustainability and circular economy.
Approach for Detecting Attacks on IoT Networks Based on Ensemble Feature Selection and Deep Learning Models
The Internet of Things (IoT) has transformed our interaction with technology and introduced security challenges. The growing number of IoT attacks poses a significant threat to organizations and individuals. This paper proposes an approach for detecting attacks on IoT networks using ensemble feature selection and deep learning models. Ensemble feature selection combines filter techniques such as variance threshold, mutual information, Chi-square, ANOVA, and L1-based methods. By leveraging the strengths of each technique, the ensemble is formed by the union of selected features. However, this union operation may overlook redundancy and irrelevance, potentially leading to a larger feature set. To address this, a wrapper algorithm called Recursive Feature Elimination (RFE) is applied to refine the feature selection. The impact of the selected feature set on the performance of Deep Learning (DL) models (CNN, RNN, GRU, and LSTM) is evaluated using the IoT-Botnet 2020 dataset, considering detection accuracy, precision, recall, F1-measure, and False Positive Rate (FPR). All DL models achieved the highest detection accuracy, precision, recall, and F1 measure values, ranging from 97.05% to 97.87%, 96.99% to 97.95%, 99.80% to 99.95%, and 98.45% to 98.87%, respectively.