Catalogue Search | MBRL

Establishing a machine learning dementia progression prediction model with multiple integrated data

by Lu, Chi-Jie , Huang, Yung-Chuan , Liu, Tzu-Chi in Aged , Aged, 80 and over , Biomarkers

2024

Objective Dementia is a significant medical and social issue in most developed countries. Practical tools for predicting the progression of degenerative dementia are highly valuable. Machine learning (ML) methods facilitate the construction of effective models using real-world data, which may include missing values and various integrated datasets. Method This retrospective study analyzed data from 679 patients diagnosed with degenerative dementia at Fu Jen Catholic University Hospital, who were evaluated by neurologists, psychologists and followed for over two years. Predictive variables were categorized into demographic (D), clinical dementia rating (CDR), mini-mental state examination (MMSE), and laboratory data value (LV) groups. These categories were further integrated into three subgroups (D-CDR, D-CDR-MMSE, and D-CDR-MMSE-LV). We utilized the extreme gradient boosting (XGB) model to rank the importance of variables and identify the most effective feature combination via a step-wise approach. Result The D-CDR-MMSE-LV model combination showed robust performance with an excellent area under the receiver operating characteristic curve (AUC) and the highest sensitivity value (84.66). Employing both demographic and neuropsychiatric variables, our prediction model achieved an AUC of 83.74. By incorporating additional clinical information from laboratory data and applying our proposed feature selection strategy, we constructed a model based on eight variables that achieved an AUC of 85.12 using the XGB technique. Conclusion We established a machine-learning model to monitor the progression of dementia using a limited, real-world clinical dataset. The XGB technique identified eight critical variables across our integrated datasets, potentially providing clinicians with valuable guidance.

Journal Article

Share this book

Add to My Shelf

Risk Prediction for Early Chronic Kidney Disease: Results from an Adult Health Examination Program of 19,270 Individuals

by Chen, Gin-Den , Chang, Chi-Chang , Shih, Chin-Chuan in Algorithms , Blood pressure , Cholesterol

2020

Developing effective risk prediction models is a cost-effective approach to predicting complications of chronic kidney disease (CKD) and mortality rates; however, there is inadequate evidence to support screening for CKD. In this study, four data mining algorithms, including a classification and regression tree, a C4.5 decision tree, a linear discriminant analysis, and an extreme learning machine, are used to predict early CKD. The study includes datasets from 19,270 patients, provided by an adult health examination program from 32 chain clinics and three special physical examination centers, between 2015 and 2019. There were 11 independent variables, and the glomerular filtration rate (GFR) was used as the predictive variable. The C4.5 decision tree algorithm outperformed the three comparison models for predicting early CKD based on accuracy, sensitivity, specificity, and area under the curve metrics. It is, therefore, a promising method for early CKD prediction. The experimental results showed that Urine protein and creatinine ratio (UPCR), Proteinuria (PRO), Red blood cells (RBC), Glucose Fasting (GLU), Triglycerides (TG), Total Cholesterol (T-CHO), age, and gender are important risk factors. CKD care is closely related to primary care level and is recognized as a healthcare priority in national strategy. The proposed risk prediction models can support the important influence of personality and health examination representations in predicting early CKD.

Journal Article

Share this book

Add to My Shelf

Hybrid Basketball Game Outcome Prediction Model by Integrating Data Mining Methods for the National Basketball Association

by Lee, Tian-Shyug , Chen, Wei-Jen , Lu, Chi-Jie in Algorithms , Artificial neural networks , Athletic drafts & trades

2021

The sports market has grown rapidly over the last several decades. Sports outcomes prediction is an attractive sports analytic challenge as it provides useful information for operations in the sports market. In this study, a hybrid basketball game outcomes prediction scheme is developed for predicting the final score of the National Basketball Association (NBA) games by integrating five data mining techniques, including extreme learning machine, multivariate adaptive regression splines, k-nearest neighbors, eXtreme gradient boosting (XGBoost), and stochastic gradient boosting. Designed features are generated by merging different game-lags information from fundamental basketball statistics and used in the proposed scheme. This study collected data from all the games of the NBA 2018–2019 seasons. There are 30 teams in the NBA and each team play 82 games per season. A total of 2460 NBA game data points were collected. Empirical results illustrated that the proposed hybrid basketball game prediction scheme achieves high prediction performance and identifies suitable game-lag information and relevant game features (statistics). Our findings suggested that a two-stage XGBoost model using four pieces of game-lags information achieves the best prediction performance among all competing models. The six designed features, including averaged defensive rebounds, averaged two-point field goal percentage, averaged free throw percentage, averaged offensive rebounds, averaged assists, and averaged three-point field goal attempts, from four game-lags have a greater effect on the prediction of final scores of NBA games than other game-lags. The findings of this study provide relevant insights and guidance for other team or individual sports outcomes prediction research.

Journal Article

Share this book

Add to My Shelf

Assessing the Impact of Aviation Noise on Housing Prices Using New Estimated Noise Value: The Case of Taiwan Taoyuan International Airport

by Lu, Chi-Jie , Tsao, Hsiu-Chang in Air pollution , Aircraft , Airports

2022

Aviation noise at airports has a significant impact on nearby residents’ quality of life and residential property values. This study evaluated the impact of aviation noise based on house prices by using three different hedonic price models. Two novel independent noise variables, the estimated aviation noise value and noise reward fund are proposed for constructing effective hedonic price models. The real data of real estate transactions from the region defined by the Taoyuan International Airport’s 60–64 dB day-night average sound level (Ldn) and ≥65 dB Ldn noise contours are adopted as empirical data. Empirical results showed that the double-log hedonic price model with the proposed estimated aviation noise variables is the most suitable model for this study. Based on the double-log model, this study found that aviation noise has a significant negative impact on house prices in both noise contour areas of 60–64 dB Ldn and ≥65 dB Ldn. The rate of decline in house prices is approximately USD 2356.02/dB and USD 3622.78/dB in the 60–64 dB Ldn and ≥65 dB Ldn contour areas, respectively. Our results also showed that the noise reward fund had no significant impact on the house prices which implies that the current subsidy method has been maintained at an appropriate level for Taoyuan International Airport.

Journal Article

Share this book

Add to My Shelf

Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification

by Sun, Cheuk-Kay , Sun, Jeffrey , Tang, Yun-Xuan in Age groups , Artificial intelligence , Breast cancer

2023

Mammography is considered the gold standard for breast cancer screening. Multiple risk factors that affect breast cancer development have been identified; however, there is an ongoing debate regarding the significance of these factors. Machine learning (ML) models and Shapley Additive Explanation (SHAP) methodology can rank risk factors and provide explanatory model results. This study used ML algorithms with SHAP to analyze the risk factors between two different age groups and evaluate the impact of each factor in predicting positive mammography. The ML model was built using data from the risk factor questionnaires of women participating in a breast cancer screening program from 2017 to 2021. Three ML models, least absolute shrinkage and selection operator (lasso) logistic regression, extreme gradient boosting (XGBoost), and random forest (RF), were applied. RF generated the best performance. The SHAP values were then applied to the RF model for further analysis. The model identified age at menarche, education level, parity, breast self-examination, and BMI as the top five significant risk factors affecting mammography outcomes. The differences between age groups ranked by reproductive lifespan and BMI were higher in the younger and older age groups, respectively. The use of SHAP frameworks allows us to understand the relationships between risk factors and generate individualized risk factor rankings. This study provides avenues for further research and individualized medicine.

Journal Article

Share this book

Add to My Shelf

The Potential of SHAP and Machine Learning for Personalized Explanations of Influencing Factors in Myopic Treatment for Children

by Chen, Jun-Wei , Chen, Hsin-An , Wu, Tzu-En in Adolescent , Analysis , Atrophy

2025

Background and Objectives: The rising prevalence of myopia is a significant global health concern. Atropine eye drops are commonly used to slow myopia progression in children, but their long-term use raises concern about intraocular pressure (IOP). This study uses SHapley Additive exPlanations (SHAP) to improve the interpretability of machine learning (ML) model predicting end IOP, offering clinicians explainable insights for personalized patient management. Materials and Methods: This retrospective study analyzed data from 1191 individual eyes of 639 boys and 552 girls with myopia treated with atropine. The average age of the whole group was 10.6 ± 2.5 years old. The refractive error of spherical equivalent (SE) in myopia degree was base SE at 2.63D and end SE at 3.12D. Data were collected from clinical records, including demographic information, IOP measurements, and atropine treatment details. The patients were divided into two subgroups based on a baseline IOP of 14 mmHg. ML models, including Lasso, CART, XGB, and RF, were developed to predict the end IOP value. Then, the best-performing model was further interpreted using SHAP values. The SHAP module created a personalized and dynamic graphic to illustrate how various factors (e.g., age, sex, cumulative duration, and dosage of atropine treatment) affect the end IOP. Results: RF showed the best performance, with superior error metrics in both subgroups. The interpretation of RF with SHAP revealed that age and the recruitment duration of atropine consistently influenced IOP across subgroups, while other variables had varying effects. SHAP values also offer insights, helping clinicians understand how different factors contribute to predicted IOP value in individual children. Conclusions: SHAP provides an alternative approach to understand the factors affecting IOP in children with myopia treated with atropine. Its enhanced interpretability helps clinicians make informed decisions, improving the safety and efficacy of myopia management. This study demonstrates the potential of combining SHAP with ML models for personalized care in ophthalmology.

Journal Article

Share this book

Add to My Shelf

Health Data-Driven Machine Learning Algorithms Applied to Risk Indicators Assessment for Chronic Kidney Disease

by Tian-Shyug Lee , Yen-Ling Chiu , Chi-Jie Lu in Academic achievement , Aging , Algorithms

2021

As global aging progresses, the health management of chronic diseases has become an important issue of concern to governments. Influenced by the aging of its population and improvements in the medical system and healthcare in general, Taiwan's population of patients with chronic kidney disease (CKD) has tended to grow year by year, including the incidence of high-risk cases that pose major health hazards to the elderly and middle-aged populations. This study analyzed the annual health screening data for 65,394 people from 2010 to 2015 sourced from the MJ Group - a major health screening center in Taiwan - including data for 18 risk indicators. We used five prediction model analysis methods, namely, logistic regression (LR) analysis, C5.0 decision tree (C5.0) analysis, stochastic gradient boosting (SGB) analysis, multivariate adaptive regression splines (MARS), and eXtreme gradient boosting (XGboost), with estimated glomerular filtration rate (e-GFR) data to determine G3a, G3b & G4 stage CKD risk factors. The LR analysis (AUC=0.848), SGB analysis (AUC=0.855), and XGboost (AUC=0.858) generated similar classification performance levels and all outperformed the C5.0 and MARS methods. The study results showed that in terms of CKD risk factors, blood urea nitrogen (BUN) and uric acid (UA) were identified as the first and second most important indicators in the models of all five analysis methods, and they were also clinically recognized as the major risk factors. The results for systolic blood pressure (SBP), SGPT, SGOT, and LDL were similar to those of a related study. Interestingly, however, socioeconomic status-related education was found to be the third important indicator in all three of the better performing analysis methods, indicating that it is more important than the other risk indicators of this study, which had different levels of importance according to the different methods. The five prediction model methods can provide high and similar classification performance in this study. Based on the results of this study, it is recommended that education as the socioeconomic status should be an important factor for CKD, as high educational level showed a negative and highly significant correlation with CKD. The findings of this study should also be of value for further discussions and follow-up research.

Journal Article

Share this book

Add to My Shelf

Comparison of Different Machine Learning Classifiers for Glaucoma Diagnosis Based on Spectralis OCT

by Wu, Chao-Wei , Chen, Ssu-Han , Chen, Hsin-Yi in Accuracy , Automation , Diabetic retinopathy

2021

Early detection is important in glaucoma management. By using optical coherence tomography (OCT), the subtle structural changes caused by glaucoma can be detected. Though OCT provided abundant parameters for comprehensive information, clinicians may be confused once the results conflict. Machine learning classifiers (MLCs) are good tools for considering numerous parameters and generating reliable diagnoses in glaucoma practice. Here we aim to compare different MLCs based on Spectralis OCT parameters, including circumpapillary retinal nerve fiber layer (cRNFL) thickness, Bruch’s membrane opening-minimum rim width (BMO-MRW), Early Treatment Diabetes Retinopathy Study (ETDRS) macular thickness, and posterior pole asymmetry analysis (PPAA), in discriminating normal from glaucomatous eyes. Five MLCs were proposed, namely conditional inference trees (CIT), logistic model tree (LMT), C5.0 decision tree, random forest (RF), and extreme gradient boosting (XGBoost). Logistic regression (LGR) was used as a benchmark for comparison. RF was shown to be the best model. Ganglion cell layer measurements were the most important predictors in early glaucoma detection and cRNFL measurements were more important as the glaucoma severity increased. The global, temporal, inferior, superotemporal, and inferotemporal sites were relatively influential locations among all parameters. Clinicians should cautiously integrate the Spectralis OCT results into the entire clinical picture when diagnosing glaucoma.

Journal Article

Share this book

Add to My Shelf

Sales forecasting by combining clustering and machine-learning techniques for computer retailing

by Lu, Chi-Jie , Chen, I-Fei in Artificial Intelligence , Computational Biology/Bioinformatics , Computational Science and Engineering

2017

Sales forecasting is a critical task for computer retailers endeavoring to maintain favorable sales performance and manage inventories. In this study, a clustering-based forecasting model by combining clustering and machine-learning methods is proposed for computer retailing sales forecasting. The proposed method first used the clustering technique to divide training data into groups, clustering data with similar features or patterns into a group. Subsequently, machine-learning techniques are used to train the forecasting model of each group. After the cluster with data patterns most similar to the test data was determined, the trained forecasting model of the cluster was adopted for sales forecasting. Since the sales data of computer retailers show similar data patterns or features at different time periods, the accuracy of the forecast can be enhanced by using the proposed clustering-based forecasting model. Three clustering techniques including self-organizing map (SOM), growing hierarchical self-organizing map (GHSOM), and K-means and two machine-learning techniques including support vector regression (SVR) and extreme learning machine (ELM) are used in this study. A total of six clustering-based forecasting models were proposed. Real-life sales data for the personal computers, notebook computers, and liquid crystal displays are used as the empirical examples. The experimental results showed that the model combining the GHSOM and ELM provided superior forecasting performance for all three products compared with the other five forecasting models, as well as the single SVR and single ELM models. It can be effectively used as a clustering-based sales forecasting model for computer retailing.

Journal Article

Share this book

Add to My Shelf

Demand Forecasting for Multichannel Fashion Retailers by Integrating Clustering and Machine Learning Algorithms

by Lu, Chi-Jie , Chen, I-Fei in Accuracy , Algorithms , Artificial intelligence

2021

In today’s rapidly changing and highly competitive industrial environment, a new and emerging business model—fast fashion—has started a revolution in the apparel industry. Due to the lack of historical data, constantly changing fashion trends, and product demand uncertainty, accurate demand forecasting is an important and challenging task in the fashion industry. This study integrates k-means clustering (KM), extreme learning machines (ELMs), and support vector regression (SVR) to construct cluster-based KM-ELM and KM-SVR models for demand forecasting in the fashion industry using empirical demand data of physical and virtual channels of a case company to examine the applicability of proposed forecasting models. The research results showed that both the KM-ELM and KM-SVR models are superior to the simple ELM and SVR models. They have higher prediction accuracy, indicating that the integration of clustering analysis can help improve predictions. In addition, the KM-ELM model produces satisfactory results when performing demand forecasting on retailers both with and without physical stores. Compared with other prediction models, it can be the most suitable demand forecasting method for the fashion industry.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter