Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
218 result(s) for "Classification and Regression Trees (CART)"
Sort by:
Flood Prediction Using Machine Learning Models: Literature Review
Floods are among the most destructive natural disasters, which are highly complex to model. The research on the advancement of flood prediction models contributed to risk reduction, policy suggestion, minimization of the loss of human life, and reduction of the property damage associated with floods. To mimic the complex mathematical expressions of physical processes of floods, during the past two decades, machine learning (ML) methods contributed highly in the advancement of prediction systems providing better performance and cost-effective solutions. Due to the vast benefits and potential of ML, its popularity dramatically increased among hydrologists. Researchers through introducing novel ML methods and hybridizing of the existing ones aim at discovering more accurate and efficient prediction models. The main contribution of this paper is to demonstrate the state of the art of ML models in flood prediction and to give insight into the most suitable models. In this paper, the literature where ML models were benchmarked through a qualitative analysis of robustness, accuracy, effectiveness, and speed are particularly investigated to provide an extensive overview on the various ML algorithms used in the field. The performance comparison of ML models presents an in-depth understanding of the different techniques within the framework of a comprehensive evaluation and discussion. As a result, this paper introduces the most promising prediction methods for both long-term and short-term floods. Furthermore, the major trends in improving the quality of the flood prediction models are investigated. Among them, hybridization, data decomposition, algorithm ensemble, and model optimization are reported as the most effective strategies for the improvement of ML methods. This survey can be used as a guideline for hydrologists as well as climate scientists in choosing the proper ML method according to the prediction task.
Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression
Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice.
Censoring Unbiased Regression Trees and Ensembles
This article proposes a novel paradigm for building regression trees and ensemble learning in survival analysis. Generalizations of the classification and regression trees (CART) and random forests (RF) algorithms for general loss functions, and in the latter case more general bootstrap procedures, are both introduced. These results, in combination with an extension of the theory of censoring unbiased transformations (CUTs) applicable to loss functions, underpin the development of two new classes of algorithms for constructing survival trees and survival forests: censoring unbiased regression trees and censoring unbiased regression ensembles. For a certain \"doubly robust\" CUT of squared error loss, we further show how these new algorithms can be implemented using existing software (e.g., CART, RF). Comparisons of these methods to existing ensemble procedures for predicting survival probabilities are provided in both simulated settings and through applications to four datasets. It is shown that these new methods either improve upon, or remain competitive with, existing implementations of random survival forests, conditional inference forests, and recursively imputed survival trees.
Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction
The task of modeling the distribution of a large number of tree species under future climate scenarios presents unique challenges. First, the model must be robust enough to handle climate data outside the current range without producing unacceptable instability in the output. In addition, the technique should have automatic search mechanisms built in to select the most appropriate values for input model parameters for each species so that minimal effort is required when these parameters are fine-tuned for individual tree species. We evaluated four statistical models--Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS)--for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model. To test, we applied these techniques to four tree species common in the eastern United States: loblolly pine (Pinus taeda), sugar maple (Acer saccharum), American beech (Fagus grandifolia), and white oak (Quercus alba). When the four techniques were assessed with Kappa and fuzzy Kappa statistics, RF and BT were superior in reproducing current importance value (a measure of basal area in addition to abundance) distributions for the four tree species, as derived from approximately 100,000 USDA Forest Service's Forest Inventory and Analysis plots. Future estimates of suitable habitat after climate change were visually more reasonable with BT and RF, with slightly better performance by RF as assessed by Kappa statistics, correlation estimates, and spatial distribution of importance values. Although RTA did not perform as well as BT and RF, it provided interpretive models for species whose distributions were captured well by our current set of predictors. MARS was adequate for predicting current distributions but unacceptable for future climate. We consider RTA, BT, and RF modeling approaches, especially when used together to take advantage of their individual strengths, to be robust for predictive mapping and recommend their inclusion in the ecological toolbox.
Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem
Spatial prediction of soil organic matter is a global challenge and of particular importance for regions with intensive land use and where availability of soil data is limited. This study evaluated a Digital Soil Mapping (DSM) approach to model the spatial distribution of stocks of soil organic carbon (SOC), total carbon (Ctot), total nitrogen (Ntot) and total sulphur (Stot) for a data-sparse, semi-arid catchment in Inner Mongolia, Northern China. Random Forest (RF) was used as a new modeling tool for soil properties and Classification and Regression Trees (CART) as an additional method for the analysis of variable importance. At 120 locations soil profiles to 1 m depth were analyzed for soil texture, SOC, Ctot, Ntot, Stot, bulk density (BD) and pH. On the basis of a digital elevation model, the catchment was divided into pixels of 90 m × 90 m and for each cell, predictor variables were determined: land use unit, Reference Soil Group (RSG), geological unit and 12 topography-related variables. Prediction maps showed that the highest amounts of SOC, Ctot, Ntot and Stot stocks are stored under marshland, steppes and mountain meadows. River-like structures of very high elemental stocks in valleys within the steppes are partly responsible for the high amounts of SOC for grasslands (81-84% of total catchment stocks). Analysis of variable importance showed that land use, RSG and geology are the most important variables influencing SOC storage. Prediction accuracy of the RF modeling and the generated maps was acceptable and explained variances of 42 to 62% and 66 to 75%, respectively. A decline of up to 70% in elemental stocks was calculated after conversion of steppe to arable land confirming the risk of rapid soil degradation if steppes are cultivated. Thus their suitability for agricultural use is limited.
Regression trees modeling of time series for air pollution analysis and forecasting
Solving the problems related to air pollution is crucial for human health and the ecosystems in many urban areas throughout the world. The accumulation of large arrays of data with measurements of various air pollutants makes it possible to analyze these in order to predict and control pollution. This study presents a common approach for building quality nonlinear models of environmental time series by using the powerful data mining technique of classification and regression trees (CART). Predictors for modeling are time series with meteorological, atmospheric or other data, date-time variables and lagged variables of the dependent variable and predictors, involved as groups. The proposed approach is tested in empirical studies of the daily average concentrations of atmospheric PM10 (particulate matter 10 μm in diameter) in the cities of Ruse and Pernik, Bulgaria. A 1-day-ahead forecasts are obtained. All models are cross-validated against overfitting. The best models are selected using goodness-of-fit measures, such as root-mean-square error and coefficient of determination. Relative importance of the predictors and predictor groups is obtained and interpreted. The CART models are compared with the corresponding models built by using ARIMA transfer function methodology, and the superiority of CART over ARIMA is demonstrated. The practical applicability of the models is assessed using 2 × 2 contingency tables. The results show that CART models fit well the data and correctly predict about 90% of measured values of PM10 with respect to the average daily European threshold value of 50 µg/m 3 .
Assessing the ability of an instrumental variable causal forest algorithm to personalize treatment evidence using observational data: the case of early surgery for shoulder fracture
Background Comparative effectiveness research (CER) using observational databases has been suggested to obtain personalized evidence of treatment effectiveness. Inferential difficulties remain using traditional CER approaches especially related to designating patients to reference classes a priori. A novel Instrumental Variable Causal Forest Algorithm (IV-CFA) has the potential to provide personalized evidence using observational data without designating reference classes a priori, but the consistency of the evidence when varying key algorithm parameters remains unclear. We investigated the consistency of IV-CFA estimates through application to a database of Medicare beneficiaries with proximal humerus fractures (PHFs) that previously revealed heterogeneity in the effects of early surgery using instrumental variable estimators. Methods IV-CFA was used to estimate patient-specific early surgery effects on both beneficial and detrimental outcomes using different combinations of algorithm parameters and estimate variation was assessed for a population of 72,751 fee-for-service Medicare beneficiaries with PHFs in 2011. Classification and regression trees (CART) were applied to these estimates to create ex-post reference classes and the consistency of these classes were assessed. Two-stage least squares (2SLS) estimators were applied to representative ex-post reference classes to scrutinize the estimates relative to known 2SLS properties. Results IV-CFA uncovered substantial early surgery effect heterogeneity across PHF patients, but estimates for individual patients varied with algorithm parameters. CART applied to these estimates revealed ex-post reference classes consistent across algorithm parameters. 2SLS estimates showed that ex-post reference classes containing older, frailer patients with more comorbidities, and lower utilizers of healthcare were less likely to benefit and more likely to have detriments from higher rates of early surgery. Conclusions IV-CFA provides an illuminating method to uncover ex-post reference classes of patients based on treatment effects using observational data with a strong instrumental variable. Interpretation of treatment effect estimates within each ex-post reference class using traditional CER methods remains conditional on the extent of measured information in the data.
Using Hybrid Artificial Intelligence and Machine Learning Technologies for Sustainability in Going-Concern Prediction
The going-concern opinions of certified public accountants (CPAs) and auditors are very critical, and due to misjudgments, the failure to discover the possibility of bankruptcy can cause great losses to financial statement users and corporate stakeholders. Traditional statistical models have disadvantages in giving going-concern opinions and are likely to cause misjudgments, which can have significant adverse effects on the sustainable survival and development of enterprises and investors’ judgments. In order to embrace the era of big data, artificial intelligence (AI) and machine learning technologies have been used in recent studies to judge going concern doubts and reduce judgment errors. The Big Four accounting firms (Deloitte, KPMG, PwC, and EY) are paying greater attention to auditing via big data and artificial intelligence (AI). Thus, this study integrates AI and machine learning technologies: in the first stage, important variables are selected by two decision tree algorithms, classification and regression trees (CART), and a chi-squared automatic interaction detector (CHAID); in the second stage, classification models are respectively constructed by extreme gradient boosting (XGB), artificial neural network (ANN), support vector machine (SVM), and C5.0 for comparison, and then, financial and non-financial variables are adopted to construct effective going-concern opinion decision models (which are more accurate in prediction). The subjects of this study are listed companies and OTC (over-the-counter) companies in Taiwan with and without going-concern doubts from 2000 to 2019. According to the empirical results, among the eight models constructed in this study, the prediction accuracy of the CHAID–C5.0 model is the highest (95.65%), followed by the CART–C5.0 model (92.77%).
Assessing temporal snow cover variation in the Sutlej river basin using google earth engine and machine learning models
Snow cover information is essential for pursuing seasonal variation studies in Himalayan river basins. This study aims to investigate the seasonal variation of snow cover in the Sutlej river basin (Tibet to Bhakra dam in India) over three different seasons: Monsoon (June–September), winter (October-January), and summer (February-May) during the period 2010–2021. Landsat 7 and 8 Surface Reflectance (SR) data is used to develop 108 land use land cover (LULC) maps for 12 years, with three seasons per year and three machine-learning models. The study has conducted on the Google Earth Engine (GEE) platform, employing Random Forest (RF), Classification and Regression Trees (CART), and Support Vector Machine (SVM) models to classify the Landsat satellite data and assess the seasonal snow cover variation during the three seasons. The results show that among the three machine learning models, the RF model exhibits the highest average overall accuracy at 98.75%, followed by the CART model at 98.10%, and the SVM model with the lowest accuracy at 97.15%. In terms of snow cover area variability; there is a decline trend in summer snow cover and an increase in monsoon snow cover over the past three years (2019–2021). However, an increasing trend emerges when considering the decadal changes in all three seasons. In addition, the maximum percentages of snow cover area observed as 67.61%, 46.78%, and 30.58% in the summer period of 2013, the winter period of 2019, and the monsoon period of 2021, respectively. Similarly, the minimum percentage of snow cover is 23.22%, 11.36%, and 13.01%, observed in the summer period of 2014, the winter period of 2011, and the monsoon period of 2012, respectively. A comprehensive assessment procedure on temporal and seasonal snow cover variation in a large river basin have been presented in this work, which will help to plan and manage the sustainable water resources in study region.
Sustainability Performance Assessment Using Self-Organizing Maps (SOM) and Classification and Ensembles of Regression Trees (CART)
This study aims to develop a new approach based on machine learning techniques to assess sustainability performance. Two main dimensions of sustainability, ecological sustainability, and human sustainability, were considered in this study. A set of sustainability indicators was used, and the research method in this study was developed using cluster analysis and prediction learning techniques. A Self-Organizing Map (SOM) was applied for data clustering, while Classification and Regression Trees (CART) were applied to assess sustainability performance. The proposed method was evaluated through Sustainability Assessment by Fuzzy Evaluation (SAFE) dataset, which comprises various indicators of sustainability performance in 128 countries. Eight clusters from the data were found through the SOM clustering technique. A prediction model was found in each cluster through the CART technique. In addition, an ensemble of CART was constructed in each cluster of SOM to increase the prediction accuracy of CART. All prediction models were assessed through the adjusted coefficient of determination approach. The results demonstrated that the prediction accuracy values were high in all CART models. The results indicated that the method developed by ensembles of CART and clustering provide higher prediction accuracy than individual CART models. The main advantage of integrating the proposed method is its ability to automate decision rules from big data for prediction models. The method proposed in this study could be implemented as an effective tool for sustainability performance assessment.