Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
292 result(s) for "CatBoost algorithm"
Sort by:
Machine Learning Techniques to Predict the Air Quality Using Meteorological Data in Two Urban Areas in Sri Lanka
The effect of bad air quality on human health is a well-known risk. Annual health costs have significantly been increased in many countries due to adverse air quality. Therefore, forecasting air quality-measuring parameters in highly impacted areas is essential to enhance the quality of life. Though this forecasting is usual in many countries, Sri Lanka is far behind the state-of-the-art. The country has increasingly reported adverse air quality levels with ongoing industrialization in urban areas. Therefore, this research study, for the first time, mainly focuses on forecasting the PM10 values of the air quality for the two urbanized areas of Sri Lanka, Battaramulla (an urban area in Colombo), and Kandy. Twelve air quality parameters were used with five models, including extreme gradient boosting (XGBoost), CatBoost, light gradient-boosting machine (LightBGM), long short-term memory (LSTM), and gated recurrent unit (GRU) to forecast the PM10 levels. Several performance indices, including the coefficient of determination (R2), root mean squared error (RMSE), mean absolute error (MAE), mean squared error (MSE), mean absolute relative error (MARE), and the Nash–Sutcliffe efficiency (NSE), were used to test the forecasting models. It was identified that the LightBGM algorithm performed better in forecasting PM10 in Kandy (R2=0.99, MSE =0.02, MAE=0.002, RMSE =0.1225, MARE =1.0, and NSE=0.99). In contrast, the LightBGM achieved a higher performance (R2=0.99, MSE =0.002, MAE =0.012 , RMSE =1.051, MARE =0.00, and NSE=0.99) for the forecasting PM10 for the Battaramulla region. As per the results, it can be concluded that there is a necessity to develop forecasting models for different land areas. Moreover, it was concluded that the PM10 in Kandy and Battaramulla increased slightly with existing seasonal changes.
Predicting biomarkers from classifier for liver metastasis of colorectal adenocarcinomas using machine learning models
Background Early diagnosis of liver metastasis is of great importance for enhancing the survival of colorectal adenocarcinoma (CAD) patients, and the combined use of a single biomarker in a classier model has shown great improvement in predicting the metastasis of several types of cancers. However, it is little reported for CAD. This study therefore aimed to screen an optimal classier model of CAD with liver metastasis and explore the metastatic mechanisms of genes when applying this classier model. Methods The differentially expressed genes between primary CAD samples and CAD with metastasis samples were screened from the Moffitt Cancer Center (MCC) dataset GSE131418. The classification performances of six selected algorithms, namely, LR, RF, SVM, GBDT, NN, and CatBoost, for classification of CAD with liver metastasis samples were compared using the MCC dataset GSE131418 by detecting their classification test accuracy. In addition, the consortium datasets of GSE131418 and GSE81558 were used as internal and external validation sets to screen the optimal method. Subsequently, functional analyses and a drug‐targeted network construction of the feature genes when applying the optimal method were conducted. Results The optimal CatBoost model with the highest accuracy of 99%, and an area under the curve of 1, was screened, which consisted of 33 feature genes. A functional analysis showed that the feature genes were closely associated with a “steroid metabolic process” and “lipoprotein particle receptor binding” (eg APOB and APOC3). In addition, the feature genes were significantly enriched in the “complement and coagulation cascade” pathways (eg FGA, F2, and F9). In a drug‐target interaction network, F2 and F9 were predicted as targets of menadione. Conclusion The CatBoost model constructed using 33 feature genes showed the optimal classification performance for identifying CAD with liver metastasis. APOB, APOC3, FGA, F2, F9, and NKX2‐3 were potential biomarkers for classification of CAD with liver metastasis. Menadione might be a promising anti‐metastatic drug of CAD cells through functioning its role at sites of F2 and F9. CatBoost model constructed by 33 feature genes showed the optimal classification performance for identifying CAD liver metastasis.
The use of genetic algorithm and particle swarm optimization on tiered feature selection method in machine learning-based coronary heart disease diagnosis system
Coronary heart disease (CHD) is a leading global cause of death. Early detection is the right step to reduce mortality rates and treatment costs. Early detection can be developed using machine learning by utilizing patient medical record datasets. Unfortunately, this dataset has excessive features which can reduce machine learning performance. For this reason, it is necessary to reduce the number of redundant features and irrelevant data to improve machine learning performance. Therefore, this research proposes a tiered of feature selection model with genetic algorithm (GA) and particle swarm optimization (PSO) to improve the performance of the diagnosis model. The feature selection model is evaluated using parameters derived from the confusion matrix and using the CatBoost machine learning algorithm. Model testing uses z-Alizadeh Sani, Cleveland, Statlog, and Hungarian datasets. The best results for this model were obtained on the z-Alizadeh Sani dataset with 6 selected features from 54 features and the resulting performance for accuracy parameters was 99.32%, specificity 98.57%, sensitivity 100.00%, area under the curve (AUC) 99.28%, and F1-Score 99.37%. The proposed feature selection model is able to provide machine learning performance in the very good category. The diagnostic model proposed is of excellent standard.
Research on Wind Turbine Fault Detection Based on the Fusion of ASL-CatBoost and TtRSA
The internal structure of wind turbines is intricate and precise, although the challenging working conditions often give rise to various operational faults. This study aims to address the limitations of traditional machine learning algorithms in wind turbine fault detection and the imbalance of positive and negative samples in the fault detection dataset. To achieve the real-time detection of wind turbine group faults and to capture wind turbine fault state information, an enhanced ASL-CatBoost algorithm is proposed. Additionally, a crawling animal search algorithm that incorporates the Tent chaotic mapping and t-distribution mutation strategy is introduced to assess the sensitivity of the ASL-CatBoost algorithm toward hyperparameters and the difficulty of manual hyperparameter setting. The effectiveness of the proposed hyperparameter optimization strategy, termed the TtRSA algorithm, is demonstrated through a comparison of traditional intelligent optimization algorithms using 11 benchmark test functions. When applied to the hyperparameter optimization of the ASL-CatBoost algorithm, the TtRSA-ASL-CatBoost algorithm exhibits notable enhancements in accuracy, recall, and other performance measures compared with the ASL-CatBoost algorithm and other ensemble learning algorithms. The experimental results affirm that the proposed algorithm model improvement strategy effectively enhances the wind turbine fault detection classification recognition rate.
A novel scheme for employee churn problem using multi-attribute decision making approach and machine learning
Employee churn (ECn) is a crucial problem for any organization that adversely affects its overall revenue and brand image. Many machine learning (ML) based systems have been developed to solve the ECn problem. However, they miss out on some essential issues such as employee categorization, category-wise churn prediction, and retention policy for effectively addressing the ECn problem. By considering all these issues, we propose, in this paper, a multi-attribute decision making (MADM) based scheme coupled with ML algorithms. The proposed scheme is referred as employee churn prediction and retention (ECPR). We first design an accomplishment-based employee importance model (AEIM) that utilizes a two-stage MADM approach for grouping the employees in various categories. Preliminarily, we formulate an improved version of the entropy weight method (IEWM) for assigning relative weights to the employee accomplishments. Then, we utilize the technique for order preference by similarity to ideal solution (TOPSIS) for quantifying the importance of the employees to perform their class-based categorization. The CatBoost algorithm is then applied for predicting class-wise employee churn. Finally, we propose a retention policy based on the prediction results and ranking of the features. The proposed ECPR scheme is tested on a benchmark dataset of the human resource information system (HRIS), and the results are compared with other ML algorithms using various performance metrics. We show that the system using the CatBoost algorithm outperforms other ML algorithms.
Lipoproteins and metabolites in diagnosing and predicting Alzheimer’s disease using machine learning
Background Alzheimer’s disease (AD) is a chronic neurodegenerative disorder that poses a substantial economic burden. The Random forest algorithm is effective in predicting AD; however, the key factors influencing AD onset remain unclear. This study aimed to analyze the key lipoprotein and metabolite factors influencing AD onset using machine-learning methods. It provides new insights for researchers and medical personnel to understand AD and provides a reference for the early diagnosis, treatment, and early prevention of AD. Methods A total of 603 participants, including controls and patients with AD with complete lipoprotein and metabolite data from the Alzheimer’s disease Neuroimaging Initiative (ADNI) database between 2005 and 2016, were enrolled. Random forest, Lasso regression, and CatBoost algorithms were employed to rank and filter 213 lipoprotein and metabolite variables. Variables with consistently high importance rankings from any two methods were incorporated into the models. Finally, the variables selected from the three methods, with the participants’ age, sex, and marital status, were used to construct a random forest predictive model. Results Fourteen lipoprotein and metabolite variables were screened using the three methods, and 17 variables were included in the AD prediction model based on age, sex, and marital status of the participants. The optimal random forest modeling was constructed with “mtry” set to 3 and “ntree” set to 300. The model exhibited an accuracy of 71.01%, a sensitivity of 79.59%, a specificity of 65.28%, and an AUC (95%CI) of 0.724 (0.645–0.804). When Mean Decrease Accuracy and Gini were used to rank the proteins, age, phospholipids to total lipids ratio in intermediate-density lipoproteins (IDL_PL_PCT), and creatinine were among the top five variables. Conclusions Age, IDL_PL_PCT, and creatinine levels play crucial roles in AD onset. Regular monitoring of lipoproteins and their metabolites in older individuals is significant for early AD diagnosis and prevention.
Analyzing the Determinants of U.S. Residential Energy Usage and Spending: A Machine Learning Approach
This study explores the factors that impact residential energy usage and spending in the United States. Using data from the 2020 Residential Energy Consumption Survey (RECS), we investigate the significance of different energy consumption determinants at various analysis levels. Our analysis covers residential energy usage, electricity, natural gas, propane, and fuel oil consumption. We also examine energy usage for space heating, cooling, and water heating. To leverage the extensive RECS data, which includes over 180 variables, we utilized machine learning (ML) techniques for feature selection and determined their Shapley contribution for different target outcomes. Our results indicate that the CatBoost algorithm outperforms other ML techniques on the 2020 Residential Energy Consumption Survey sample. Our findings demonstrate that it is not appropriate to aggregate consumption and expenditure, as each level has distinct important features. JEL Classification: D12, Q41, R21
Building gender-specific sexually transmitted infection risk prediction models using CatBoost algorithm and NHANES data
Background and aims Sexually transmitted infections (STIs) are a significant global public health challenge due to their high incidence rate and potential for severe consequences when early intervention is neglected. Research shows an upward trend in absolute cases and DALY numbers of STIs, with syphilis, chlamydia, trichomoniasis, and genital herpes exhibiting an increasing trend in age-standardized rate (ASR) from 2010 to 2019. Machine learning (ML) presents significant advantages in disease prediction, with several studies exploring its potential for STI prediction. The objective of this study is to build males-based and females-based STI risk prediction models based on the CatBoost algorithm using data from the National Health and Nutrition Examination Survey (NHANES) for training and validation, with sub-group analysis performed on each STI. The female sub-group also includes human papilloma virus (HPV) infection. Methods The study utilized data from the National Health and Nutrition Examination Survey (NHANES) program to build males-based and females-based STI risk prediction models using the CatBoost algorithm. Data was collected from 12,053 participants aged 18 to 59 years old, with general demographic characteristics and sexual behavior questionnaire responses included as features. The Adaptive Synthetic Sampling Approach (ADASYN) algorithm was used to address data imbalance, and 15 machine learning algorithms were evaluated before ultimately selecting the CatBoost algorithm. The SHAP method was employed to enhance interpretability by identifying feature importance in the model’s STIs risk prediction. Results The CatBoost classifier achieved AUC values of 0.9995, 0.9948, 0.9923, and 0.9996 and 0.9769 for predicting chlamydia, genital herpes, genital warts, gonorrhea, and overall STIs infections among males. The CatBoost classifier achieved AUC values of 0.9971, 0.972, 0.9765, 1, 0.9485 and 0.8819 for predicting chlamydia, genital herpes, genital warts, gonorrhea, HPV and overall STIs infections among females. The characteristics of having sex with new partner/year, times having sex without condom/year, and the number of female vaginal sex partners/lifetime have been identified as the top three significant predictors for the overall risk of male STIs. Similarly, ever having anal sex with a man, age and the number of male vaginal sex partners/lifetime have been identified as the top three significant predictors for the overall risk of female STIs. Conclusions This study demonstrated the effectiveness of the CatBoost classifier in predicting STI risks among both male and female populations. The SHAP algorithm revealed key predictors for each infection, highlighting consistent demographic characteristics and sexual behaviors across different STIs. These insights can guide targeted prevention strategies and interventions to alleviate the impact of STIs on public health.
Interactive 3D Vase Design Based on Gradient Boosting Decision Trees
Traditionally, ceramic design began with sketches on rough paper and later evolved into using CAD software for more complex designs and simulations. With technological advancements, optimization algorithms have gradually been introduced into ceramic design to enhance design efficiency and creative diversity. The use of Interactive Genetic Algorithms (IGAs) for ceramic design is a new approach, but an IGA requires a significant amount of user evaluation, which can result in user fatigue. To overcome this problem, this paper introduces the LightGBM algorithm and the CatBoost algorithm to improve the IGA because they have excellent predictive capabilities that can assist users in evaluations. The algorithms are also applied to a vase design platform for validation. First, bicubic Bézier surfaces are used for modeling, and the genetic encoding of the vase is designed with appropriate evolutionary operators selected. Second, user data from the online platform are collected to train and optimize the LightGBM and CatBoost algorithms. Finally, LightGBM and CatBoost are combined with an IGA and applied to the vase design platform to verify their effectiveness. Comparing the improved algorithm to traditional IGAs, KD trees, Random Forest, and XGBoost, it is found that IGAs improve with LightGBM, and CatBoost performs better overall, requiring fewer evaluations and less time. Its R2 is higher than other proxy models, achieving 0.816 and 0.839, respectively. The improved method proposed in this paper can effectively alleviate user fatigue and enhance the user experience in product design participation.
Segmentation and classification of white blood cancer cells from bone marrow microscopic images using duplet-convolutional neural network design
Cancer is a disease linked to the untamed and rapid division of cells in the body. Cancer detection through conventional methods like complete blood count is a tedious and time-consuming task prone to human errors. The introduction of image processing techniques and computer-aided diagnostics is beneficial to this field as the results obtained by utilizing these methods are quick and accurate. The proposed method in this paper uses a design Convolutional Leaky RELU with CatBoost and XGBoost (CLR-CXG) to segment the images and extract the important features that help in classification. The binary classification algorithm and gradient boosting algorithm CatBoost (Categorical Boost) and XGBoost (Extreme Gradient Boost) are implemented individually. Moreover, Convolutional Leaky RELU with CatBoost (CLRC) is designed to decrease bias and provide high accuracy, while Convolutional Leaky RELU with XGBoost (CLRXG) is designed for classification or regression prediction problems which will increase the speed of executing the algorithm and improve its performance. Thus the CLR-CXG classifies the test images into Acute Lymphoblastic Leukemia (ALL) or Multiple Myeloma (MM). Finally, the CLRC algorithm achieved 100% accuracy in classifying cancer cells, and the recorded run time is 10s. Moreover, the CLRXG algorithm has gained an accuracy of 97.12% for classifying cancer cells and 12 s for executing the process.