Catalogue Search | MBRL

Using Explainable Machine Learning to Explore the Impact of Synoptic Reporting on Prostate Cancer

by Aben, Katja K. H. , Janssen, Femke M. , Heesterman, Berdine L. in Cox Proportional Hazards (CPH) , Datasets , explainable AI

2022

Machine learning (ML) models have proven to be an attractive alternative to traditional statistical methods in oncology. However, they are often regarded as black boxes, hindering their adoption for answering real-life clinical questions. In this paper, we show a practical application of explainable machine learning (XML). Specifically, we explored the effect that synoptic reporting (SR; i.e., reports where data elements are presented as discrete data items) in Pathology has on the survival of a population of 14,878 Dutch prostate cancer patients. We compared the performance of a Cox Proportional Hazards model (CPH) against that of an eXtreme Gradient Boosting model (XGB) in predicting patient ranked survival. We found that the XGB model (c-index = 0.67) performed significantly better than the CPH (c-index = 0.58). Moreover, we used Shapley Additive Explanations (SHAP) values to generate a quantitative mathematical representation of how features—including usage of SR—contributed to the models’ output. The XGB model in combination with SHAP visualizations revealed interesting interaction effects between SR and the rest of the most important features. These results hint that SR has a moderate positive impact on predicted patient survival. Moreover, adding an explainability layer to predictive ML models can open their black box, making them more accessible and easier to understand by the user. This can make XML-based techniques appealing alternatives to the classical methods used in oncological research and in health care in general.

Journal Article

Share this book

Add to My Shelf

Do Large Datasets or Hybrid Integrated Models Outperform Simple Ones in Predicting Commodity Prices and Foreign Exchange Rates?

by Shang, Jin , Hamori, Shigeyuki in Accuracy , Commodities , Commodity futures

2023

With the continuous advancement of machine learning and the increasing availability of internet-based information, there is a belief that these approaches and datasets enhance the accuracy of price prediction. However, this study aims to investigate the validity of this claim. The study examines the effectiveness of a large dataset and sophisticated methodologies in forecasting foreign exchange rates (FX) and commodity prices. Specifically, we employ sentiment analysis to construct a robust sentiment index and explore whether combining sentiment analysis with machine learning surpasses the performance of a large dataset when predicting FX and commodity prices. Additionally, we apply machine learning methodologies such as random forest (RF), eXtreme gradient boosting (XGB), and long short-term memory (LSTM), alongside the classical statistical model autoregressive integrated moving average (ARIMA), to forecast these prices and compare the models’ performance. Based on the results, we propose novel methodologies that integrate wavelet transformation with classical ARIMA and machine learning techniques (seasonal-decomposition-ARIMA-LSTM, wavelet-ARIMA-LSTM, wavelet-ARIMA-RF, wavelet-ARIMA-XGB). We apply this analysis procedure to the commodity gold futures prices and the euro foreign exchange rates against the US dollar.

Journal Article

Share this book

Add to My Shelf

Predicting Hard Rock Pillar Stability Using GBDT, XGBoost, and LightGBM Algorithms

by Zhao, Guoyan , Wu, Hao , Liang, Weizhang in Algorithms , Decision trees , Discrete element method

2020

Predicting pillar stability is a vital task in hard rock mines as pillar instability can cause large-scale collapse hazards. However, it is challenging because the pillar stability is affected by many factors. With the accumulation of pillar stability cases, machine learning (ML) has shown great potential to predict pillar stability. This study aims to predict hard rock pillar stability using gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) algorithms. First, 236 cases with five indicators were collected from seven hard rock mines. Afterwards, the hyperparameters of each model were tuned using a five-fold cross validation (CV) approach. Based on the optimal hyperparameters configuration, prediction models were constructed using training set (70% of the data). Finally, the test set (30% of the data) was adopted to evaluate the performance of each model. The precision, recall, and F1 indexes were utilized to analyze prediction results of each level, and the accuracy and their macro average values were used to assess the overall prediction performance. Based on the sensitivity analysis of indicators, the relative importance of each indicator was obtained. In addition, the safety factor approach and other ML algorithms were adopted as comparisons. The results showed that GBDT, XGBoost, and LightGBM algorithms achieved a better comprehensive performance, and their prediction accuracies were 0.8310, 0.8310, and 0.8169, respectively. The average pillar stress and ratio of pillar width to pillar height had the most important influences on prediction results. The proposed methodology can provide a reliable reference for pillar design and stability risk management.

Journal Article

Share this book

Add to My Shelf

Comparative Analysis of Classifiers for Classification of Emergency Braking of Road Motor Vehicles

by Ivan Tanev , Vsevolod Nikulin , Albert Podusenko in Braking , Classifiers , driver-assisting agent

2017

We investigate the feasibility of classifying (inferring) the emergency braking situations in road vehicles from the motion pattern of the accelerator pedal. We trained and compared several classifiers and employed genetic algorithms to tune their associated hyperparameters. Using offline time series data of the dynamics of the accelerator pedal as the test set, the experimental results suggest that the evolved classifiers detect the emergency braking situation with at least 93% accuracy. The best performing classifier could be integrated into the agent that perceives the dynamics of the accelerator pedal in real time and—if emergency braking is detected—acts by applying full brakes well before the driver would have been able to apply them.

Journal Article

Share this book

Add to My Shelf

Short-Term Load Forecasting Using EMD-LSTM Neural Networks with a Xgboost Algorithm for Feature Importance Evaluation

by Yuan, Jiabin , Chen, Long , Zheng, Huiting in Algorithms , empirical mode decomposition , extreme gradient boosting

2017

Accurate load forecasting is an important issue for the reliable and efficient operation of a power system. This study presents a hybrid algorithm that combines similar days (SD) selection, empirical mode decomposition (EMD), and long short-term memory (LSTM) neural networks to construct a prediction model (i.e., SD-EMD-LSTM) for short-term load forecasting. The extreme gradient boosting-based weighted k-means algorithm is used to evaluate the similarity between the forecasting and historical days. The EMD method is employed to decompose the SD load to several intrinsic mode functions (IMFs) and residual. Separated LSTM neural networks were also employed to forecast each IMF and residual. Lastly, the forecasting values from each LSTM model were reconstructed. Numerical testing demonstrates that the SD-EMD-LSTM method can accurately forecast the electric load.

Journal Article

Share this book

Add to My Shelf

Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration

by Zhou, Jian , Li, Chuanqi , Yang, Peixi in Algorithms , Blasting (explosive) , Controllability

2022

Accurate prediction of ground vibration caused by blasting has always been a significant issue in the mining industry. Ground vibration caused by blasting is a harmful phenomenon to nearby buildings and should be prevented. In this regard, a new intelligent method for predicting peak particle velocity (PPV) induced by blasting had been developed. Accordingly, 150 sets of data composed of thirteen uncontrollable and controllable indicators are selected as input dependent variables, and the measured PPV is used as the output target for characterizing blast-induced ground vibration. Also, in order to enhance its predictive accuracy, the gray wolf optimization (GWO), whale optimization algorithm (WOA) and Bayesian optimization algorithm (BO) are applied to fine-tune the hyper-parameters of the extreme gradient boosting (XGBoost) model. According to the root mean squared error (RMSE), determination coefficient (R2), the variance accounted for (VAF), and mean absolute error (MAE), the hybrid models GWO-XGBoost, WOA-XGBoost, and BO-XGBoost were verified. Additionally, XGBoost, CatBoost (CatB), Random Forest, and gradient boosting regression (GBR) were also considered and used to compare the multiple hybrid-XGBoost models that have been developed. The values of RMSE, R2, VAF, and MAE obtained from WOA-XGBoost, GWO-XGBoost, and BO-XGBoost models were equal to (3.0538, 0.9757, 97.68, 2.5032), (3.0954, 0.9751, 97.62, 2.5189), and (3.2409, 0.9727, 97.65, 2.5867), respectively. Findings reveal that compared with other machine learning models, the proposed WOA-XGBoost became the most reliable model. These three optimized hybrid models are superior to the GBR model, CatB model, Random Forest model, and the XGBoost model, confirming the ability of the meta-heuristic algorithm to enhance the performance of the PPV model, which can be helpful for mine planners and engineers using advanced supervised machine learning with metaheuristic algorithms for predicting ground vibration caused by explosions.

Journal Article

Share this book

Add to My Shelf

Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models

by Wu, Zhenxing , Wang, Zhe , Hsieh, Chang-Yu in ADME/T prediction , Algorithms , Chemistry

2021

Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.

Journal Article

Share this book

Add to My Shelf

Machine Learning-Based Radiomics Signatures for EGFR and KRAS Mutations Prediction in Non-Small-Cell Lung Cancer

by Chen, Yung-Chieh , Nguyen, Van Hiep , Kha, Quang Hien in Accuracy , Algorithms , Artificial intelligence

2021

Early identification of epidermal growth factor receptor (EGFR) and Kirsten rat sarcoma viral oncogene homolog (KRAS) mutations is crucial for selecting a therapeutic strategy for patients with non-small-cell lung cancer (NSCLC). We proposed a machine learning-based model for feature selection and prediction of EGFR and KRAS mutations in patients with NSCLC by including the least number of the most semantic radiomics features. We included a cohort of 161 patients from 211 patients with NSCLC from The Cancer Imaging Archive (TCIA) and analyzed 161 low-dose computed tomography (LDCT) images for detecting EGFR and KRAS mutations. A total of 851 radiomics features, which were classified into 9 categories, were obtained through manual segmentation and radiomics feature extraction from LDCT. We evaluated our models using a validation set consisting of 18 patients derived from the same TCIA dataset. The results showed that the genetic algorithm plus XGBoost classifier exhibited the most favorable performance, with an accuracy of 0.836 and 0.86 for detecting EGFR and KRAS mutations, respectively. We demonstrated that a noninvasive machine learning-based model including the least number of the most semantic radiomics signatures could robustly predict EGFR and KRAS mutations in patients with NSCLC.

Journal Article

Share this book

Add to My Shelf

Effective Intrusion Detection System Using XGBoost

by Dhaliwal, Sukhpreet Singh , Nahid, Abdullah-Al , Abbas, Robert in classifiers , eXtreme Gradient Boosting (XGBoost) , intrusion detection system (IDS)

2018

As the world is on the verge of venturing into fifth-generation communication technology and embracing concepts such as virtualization and cloudification, the most crucial aspect remains “security”, as more and more data get attached to the internet. This paper reflects a model designed to measure the various parameters of data in a network such as accuracy, precision, confusion matrix, and others. XGBoost is employed on the NSL-KDD (network socket layer-knowledge discovery in databases) dataset to get the desired results. The whole motive is to learn about the integrity of data and have a higher accuracy in the prediction of data. By doing so, the amount of mischievous data floating in a network can be minimized, making the network a secure place to share information. The more secure a network is, the fewer situations where data is hacked or modified. By changing various parameters of the model, future research can be done to get the most out of the data entering and leaving a network. The most important player in the network is data, and getting to know it more closely and precisely is half the work done. Studying data in a network and analyzing the pattern and volume of data leads to the emergence of a solid Intrusion Detection System (IDS), that keeps the network healthy and a safe place to share confidential information.

Journal Article

Share this book

Add to My Shelf

Do Large Datasets or Hybrid Integrated Models Outperform Simple Ones in Predicting Commodity Prices and Foreign Exchange Rates?

by Jin Shang , Shigeyuki Hamori in hybrid forecasting approaches; two-step forecasting approaches; gold; euro; sentiment analysis; machine learning; ARIMA; wavelet transformation; seasonal decomposition; long short-term memory; random forest; eXtreme gradient boosting

2023

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter