Catalogue Search | MBRL

New Strategies for Intelligent Computing in Improving the Accuracy of Engineering Costs

by Song, Yunfei in 03B70 , AdaBoost , Boruta algorithm

2024

Accurate construction cost calculation is crucial for assessing project viability and selecting design programs. This paper enhances calculation accuracy by first employing the Boruta algorithm to identify vital cost-influencing factors, which serve as the basis for an improved construction cost model. We introduce an enhanced Artificial Neural Network (ANN) model that integrates the AdaBoost algorithm and cost-sensitive methods to refine construction cost estimations. The efficacy of this model is demonstrated through its overall engineering cost error rate of 3.92%, with specific errors in single-side cost, labor, materials, and machinery usage at 3.51%, 7.09%, 3.36%, and 7.93%, respectively. These results meet established accuracy standards, showcasing the model’s potential to significantly improve construction cost management and control.

Journal Article

Share this book

Add to My Shelf

Breast cancer recurrence prediction with ensemble methods and cost-sensitive learning

by Yang, Pei-Tse , Wu, Wen-Shuo , Wu, Chia-Chun in Accuracy , AdaBoost , Algorithms

2021

Breast cancer is one of the most common cancers in women all over the world. Due to the improvement of medical treatments, most of the breast cancer patients would be in remission. However, the patients have to face the next challenge, the recurrence of breast cancer which may cause more severe effects, and even death. The prediction of breast cancer recurrence is crucial for reducing mortality. This paper proposes a prediction model for the recurrence of breast cancer based on clinical nominal and numeric features. In this study, our data consist of 1,061 patients from Breast Cancer Registry from Shin Kong Wu Ho-Su Memorial Hospital between 2011 and 2016, in which 37 records are denoted as breast cancer recurrence. Each record has 85 features. Our approach consists of three stages. First, we perform data preprocessing and feature selection techniques to consolidate the dataset. Among all features, six features are identified for further processing in the following stages. Next, we apply resampling techniques to resolve the issue of class imbalance. Finally, we construct two classifiers, AdaBoost and cost-sensitive learning, to predict the risk of recurrence and carry out the performance evaluation in three-fold cross-validation. By applying the AdaBoost method, we achieve accuracy of 0.973 and sensitivity of 0.675. By combining the AdaBoost and cost-sensitive method of our model, we achieve a reasonable accuracy of 0.468 and substantially high sensitivity of 0.947 which guarantee almost no false dismissal. Our model can be used as a supporting tool in the setting and evaluation of the follow-up visit for early intervention and more advanced treatments to lower cancer mortality.

Journal Article

Share this book

Add to My Shelf

An effective two-stage training scheme for boundary decision of imbalanced samples

by Yuan, Guan , Wu, Tao , Li, He in Accuracy , Artificial Intelligence , Artificial neural networks

2025

How to categorize imbalanced data is an active research direction in data mining and machine learning research areas. In order to dynamically reduce the negative influence of imbalanced samples on the loss in the phase of training, the existing cost-sensitive re-weighing methods assign different weights to imbalanced samples. However, it will be overfitted for deep neural networks (DNNs) to process hard samples, because the existing cost-sensitive re-weighting methods cannot effectively guide DNNs to reasonably partition the decision boundaries of the samples. In this study, we propose a new self-balanced loss function, called SBLoss, which can adaptively assign different weights to the samples according to the influence of the samples on the decision boundary in order to reduce the overfitting phenomena caused by hard samples. Extensive experiments are conducted on multiple real imbalanced datasets, and the experimental results show that the imbalanced data classification method based on a two-stage training scheme have high accuracy and robustness, which outperforms the state-of-the-art methods.

Journal Article

Share this book

Add to My Shelf

QUALITY OF DATA

by Pedrycz, Witold , Homenda, Władysław in algorithmic methods , cost‐sensitive methods , data imputation

2018

This chapter discusses important and commonly encountered problems of incomplete and imbalanced data and presents ways of alleviating them through mechanisms of data imputation and data balancing. Along with an elaboration on different imputation algorithms, a particular attention is focused on the role of information granularity in the quantification of the quality of data both when coping with incompleteness and a lack of balance of data. There are numerous ways of realizing data imputation, and the quality of the results depends upon the number of missing data and a reason behind the data being missed. Sometimes random imputation methods are sought: one seeks to predict the conditional probability distribution of the missing values, given the non‐missing ones, and then draws random values accordingly. As the data balancing techniques exhibit a great deal of diversity, it is convenient to organize them into several main categories, in particular sampling methods, algorithmic methods, and cost‐sensitive methods.

Book Chapter

Share this book

Add to My Shelf

A survey on addressing high-class imbalance in big data

by Leevy, Joffrey L. , Seliya, Naeem , Khoshgoftaar, Taghi M. in Algorithms , Bias , Big Data

2018

In a majority–minority classification problem, class imbalance in the dataset(s) can dramatically skew the performance of classifiers, introducing a prediction bias for the majority class. Assuming the positive (minority) class is the group of interest and the given application domain dictates that a false negative is much costlier than a false positive, a negative (majority) class prediction bias could have adverse consequences. With big data, the mitigation of class imbalance poses an even greater challenge because of the varied and complex structure of the relatively much larger datasets. This paper provides a large survey of published studies within the last 8 years, focusing on high-class imbalance (i.e., a majority-to-minority class ratio between 100:1 and 10,000:1) in big data in order to assess the state-of-the-art in addressing adverse effects due to class imbalance. In this paper, two techniques are covered which include Data-Level (e.g., data sampling) and Algorithm-Level (e.g., cost-sensitive and hybrid/ensemble) Methods. Data sampling methods are popular in addressing class imbalance, with Random Over-Sampling methods generally showing better overall results. At the Algorithm-Level, there are some outstanding performers. Yet, in the published studies, there are inconsistent and conflicting results, coupled with a limited scope in evaluated techniques, indicating the need for more comprehensive, comparative studies.

Journal Article

Share this book

Add to My Shelf

Imbalanced fault diagnosis of rotating machinery via multi-domain feature extraction and cost-sensitive learning

by Lu Shixiang , Xu, Qifa , Jiang Cuixia in Adaptive sampling , Advanced manufacturing technologies , Algorithms

2020

Fault diagnosis plays an essential role in rotating machinery manufacturing systems to reduce their maintenance costs. How to improve diagnosis accuracy remains an open issue. To this end, we develop a novel framework through combined use of multi-domain vibration feature extraction, feature selection and cost-sensitive learning method. First, we extract time-domain, frequency-domain, and time-frequency-domain features to make full use of vibration signals. Second, a feature selection technique is employed to obtain a feature subset with good generalization properties, by simultaneously measuring the relevance and redundancy of features. Third, a cost-sensitive learning method is designed for a classifier to effectively learn the discriminating boundaries, with an extremely imbalanced distribution of fault instances. For illustration, a real-world dataset of rotating machinery collected from an oil refinery in China is utilized. The extensive experiments have demonstrated that our multi-domain feature extraction and feature selection can significantly improve the diagnosis accuracy. Meanwhile, our cost-sensitive learning method consistently outperforms the traditional classifiers such as support vector machine (SVM), gradient boosting decision tree (GBDT), etc., and even better than the classification method calibrated by six popular imbalanced data resampling algorithms, such as the Synthetic Minority Over-sampling Technique (SMOTE) and the Adaptive Synthetic sampling method (ADASYN), in terms of decreasing missed alarms and reducing the average cost. Owing to its high evaluation scores and low average misclassification cost, cost-sensitive GBDT (CS-GBDT) is preferred for imbalanced fault diagnosis in practice.

Journal Article

Share this book

Add to My Shelf

A Machine Learning Method with Filter-Based Feature Selection for Improved Prediction of Chronic Kidney Disease

by Mienye, Ibomoiye Domor , Ebiaredoh-Mienye, Sarah A. , Swart, Theo G. in Accuracy , AdaBoost , Algorithms

2022

The high prevalence of chronic kidney disease (CKD) is a significant public health concern globally. The condition has a high mortality rate, especially in developing countries. CKD often go undetected since there are no obvious early-stage symptoms. Meanwhile, early detection and on-time clinical intervention are necessary to reduce the disease progression. Machine learning (ML) models can provide an efficient and cost-effective computer-aided diagnosis to assist clinicians in achieving early CKD detection. This research proposed an approach to effectively detect CKD by combining the information-gain-based feature selection technique and a cost-sensitive adaptive boosting (AdaBoost) classifier. An approach like this could save CKD screening time and cost since only a few clinical test attributes would be needed for the diagnosis. The proposed approach was benchmarked against recently proposed CKD prediction methods and well-known classifiers. Among these classifiers, the proposed cost-sensitive AdaBoost trained with the reduced feature set achieved the best classification performance with an accuracy, sensitivity, and specificity of 99.8%, 100%, and 99.8%, respectively. Additionally, the experimental results show that the feature selection positively impacted the performance of the various classifiers. The proposed approach has produced an effective predictive model for CKD diagnosis and could be applied to more imbalanced medical datasets for effective disease detection.

Journal Article

Share this book

Add to My Shelf

Diabetes mellitus risk prediction in the presence of class imbalance using flexible machine learning methods

by Ramezankhani, Azra , Khalili, Davood , Parsaeian, Mahboubeh in Algorithms , Analysis , Artificial neural networks

2022

Background Early detection and prediction of type two diabetes mellitus incidence by baseline measurements could reduce associated complications in the future. The low incidence rate of diabetes in comparison with non-diabetes makes accurate prediction of minority diabetes class more challenging. Methods Deep neural network (DNN), extremely gradient boosting (XGBoost), and random forest (RF) performance is compared in predicting minority diabetes class in Tehran Lipid and Glucose Study (TLGS) cohort data. The impact of changing threshold, cost-sensitive learning, over and under-sampling strategies as solutions to class imbalance have been compared in improving algorithms performance. Results DNN with the highest accuracy in predicting diabetes, 54.8%, outperformed XGBoost and RF in terms of AUROC, g-mean, and f1-measure in original imbalanced data. Changing threshold based on the maximum of f1-measure improved performance in g-mean, and f1-measure in three algorithms. Repeated edited nearest neighbors (RENN) under-sampling in DNN and cost-sensitive learning in tree-based algorithms were the best solutions to tackle the imbalance issue. RENN increased ROC and Precision-Recall AUCs, g-mean and f1-measure from 0.857, 0.603, 0.713, 0.575 to 0.862, 0.608, 0.773, 0.583, respectively in DNN. Weighing improved g-mean and f1-measure from 0.667, 0.554 to 0.776, 0.588 in XGBoost, and from 0.659, 0.543 to 0.775, 0.566 in RF, respectively. Also, ROC and Precision-Recall AUCs in RF increased from 0.840, 0.578 to 0.846, 0.591, respectively. Conclusion G-mean experienced the most increase by all imbalance solutions. Weighing and changing threshold as efficient strategies, in comparison with resampling methods are faster solutions to handle class imbalance. Among sampling strategies, under-sampling methods had better performance than others.

Journal Article

Share this book

Add to My Shelf

Intelligent Spectrum Sensing for NOMA Systems: A Cost-Sensitive LightGBM Approach with Objective-Driven Learning

by Srisomboon, Kanabadee , Prayote, Akara , Lee, Wilaiporn in Accuracy , Classification , cognitive radio networks

2026

In NOMA-enabled CR systems, superposed PU signals with unequal power levels and independent activity significantly complicate spectrum sensing and channel state discrimination. To address this issue, ML-based sensing exploits spectrum-domain features to perform channel state classification. However, the ML-based methods remain limited under independent PU activity and suffer from the performance tradeoff issue since the spectrum sensing constraints are not explicitly incorporated into the learning process. In this paper, we propose an OCL method that aligns LightGBM multiclass training with spectrum sensing objectives and leverages eigenvalue-based features to capture discriminative signal patterns under dynamic NOMA transmission. The cost-sensitive learning strategy is used to guide the classifier while the objective-driven tuning is used to optimize hyperparameters toward spectrum sensing objectives. To evaluate the overall performance toward Pd and Pfa, we propose an overall sensing ability score by adopting the SPOTIS method. As a result, the proposed OCL method achieves the highest overall sensing ability scores with an average score of 0.638, outperforming EBSS-RF at 0.610 and FBSS-LR at 0.221. Under challenging signal pattern discrimination conditions, the OCL method improves the overall sensing ability score by 6.26% and 0.9 under different power coefficients compared to EBSS-RF, highlighting its effectiveness in addressing the performance tradeoff issue.

Journal Article

Share this book

Add to My Shelf

Learning misclassification costs for imbalanced classification on gene expression data

by Ke Yan , Minchao Ye , Huijuan Lu in Accuracy , Algorithms , Analysis

2019

Background Cost-sensitive algorithm is an effective strategy to solve imbalanced classification problem. However, the misclassification costs are usually determined empirically based on user expertise, which leads to unstable performance of cost-sensitive classification. Therefore, an efficient and accurate method is needed to calculate the optimal cost weights. Results In this paper, two approaches are proposed to search for the optimal cost weights, targeting at the highest weighted classification accuracy (WCA). One is the optimal cost weights grid searching and the other is the function fitting. Comparisons are made between these between the two algorithms above. In experiments, we classify imbalanced gene expression data using extreme learning machine to test the cost weights obtained by the two approaches. Conclusions Comprehensive experimental results show that the function fitting method is generally more efficient, which can well find the optimal cost weights with acceptable WCA.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter