Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
12,504 result(s) for "regression tree"
Sort by:
Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia
The purpose of the current study is to produce landslide susceptibility maps using different data mining models. Four modeling techniques, namely random forest (RF), boosted regression tree (BRT), classification and regression tree (CART), and general linear (GLM) are used, and their results are compared for landslides susceptibility mapping at the Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslide locations were identified and mapped from the interpretation of different data types, including high-resolution satellite images, topographic maps, historical records, and extensive field surveys. In total, 125 landslide locations were mapped using ArcGIS 10.2, and the locations were divided into two groups; training (70 %) and validating (25 %), respectively. Eleven layers of landslide-conditioning factors were prepared, including slope aspect, altitude, distance from faults, lithology, plan curvature, profile curvature, rainfall, distance from streams, distance from roads, slope angle, and land use. The relationships between the landslide-conditioning factors and the landslide inventory map were calculated using the mentioned 32 models (RF, BRT, CART, and generalized additive (GAM)). The models’ results were compared with landslide locations, which were not used during the models’ training. The receiver operating characteristics (ROC), including the area under the curve (AUC), was used to assess the accuracy of the models. The success (training data) and prediction (validation data) rate curves were calculated. The results showed that the AUC for success rates are 0.783 (78.3 %), 0.958 (95.8 %), 0.816 (81.6 %), and 0.821 (82.1 %) for RF, BRT, CART, and GLM models, respectively. The prediction rates are 0.812 (81.2 %), 0.856 (85.6 %), 0.862 (86.2 %), and 0.769 (76.9 %) for RF, BRT, CART, and GLM models, respectively. Subsequently, landslide susceptibility maps were divided into four classes, including low, moderate, high, and very high susceptibility. The results revealed that the RF, BRT, CART, and GLM models produced reasonable accuracy in landslide susceptibility mapping. The outcome maps would be useful for general planned development activities in the future, such as choosing new urban areas and infrastructural activities, as well as for environmental protection.
Using machine learning to detect misstatements
Machine learning offers empirical methods to sift through accounting datasets with a large number of variables and limited a priori knowledge about functional forms. In this study, we show that these methods help detect and interpret patterns present in ongoing accounting misstatements. We use a wide set of variables from accounting, capital markets, governance, and auditing datasets to detect material misstatements. A primary insight of our analysis is that accounting variables, while they do not detect misstatements well on their own, become important with suitable interactions with audit and market variables. We also analyze differences between misstatements and irregularities, compare algorithms, examine one-year- and two-year-ahead predictions and interpret groups at greater risk of misstatements.
A Comparative Assessment Between Three Machine Learning Models and Their Performance Comparison by Bivariate and Multivariate Statistical Methods in Groundwater Potential Mapping
As demand for fresh groundwater in the worldwide is increasing, delineation of groundwater spring potential zones become an increasingly important tool for implementing a successful groundwater determination, protection, and management programs. Therefore, the objective of current study is to evaluate the capability of three machine learning models such as boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF), and comparison of their performance by bivariate (evidential belief function (EBF)), and multivariate (general linear model (GLM)) statistical methods in the groundwater potential mapping. This study was carried out in the Beheshtabad Watershed, Chaharmahal-e-Bakhtiari Province, Iran. In total, 1425 spring locations were detected in the study area. Seventy percent of the spring locations were used for model training, and 30 % for validation purposes. Fourteen conditioning-factors were considered in this investigation, including slope angle, slope aspect, altitude, plan curvature, profile curvature, slope length (LS), stream power index (SPI), topographic wetness index (TWI), distance from rivers, distance from faults, river density, fault density, lithology, and land use. Using the above conditioning factors and different algorithms, groundwater potential maps were generated, and the results were plotted in ArcGIS 9.3. According to the results of success rate curves (SRC), values of area under the curve (AUC) for the five models vary from 0.692 to 0.975. In contrast, the AUC for prediction rate curves (PRC) ranges from 77.26 to 86.39 %. The CART, BRT, and RF machine learning techniques showed very good performance in groundwater potential mapping with the AUC values of 86.39, 86.12, and 86.05 %, respectively. By the way, The GLM and EBF models in comparison by machine learning models showed weaker performance in spring groundwater potential mapping by the AUC values of 77.26, and 67.72 %, respectively. The proposed methods provided rapid, accurate, and cost effective results. Furthermore, the analysis may be transferable to other watersheds with similar topographic and hydro-geological characteristics.
BART WITH TARGETED SMOOTHING
This article introduces BART with Targeted Smoothing, or tsBART, a new Bayesian tree-based model for nonparametric regression. The goal of tsBART is to introduce smoothness over a single target covariate t while not necessarily requiring smoothness over other covariates x. tsBART is based on the Bayesian Additive Regression Trees (BART) model, an ensemble of regression trees. tsBART extends BART by parameterizing each tree’s terminal nodes with smooth functions of t rather than independent scalars. Like BART, tsBART captures complex nonlinear relationships and interactions among the predictors. But unlike BART, tsBART guarantees that the response surface will be smooth in the target covariate. This improves interpretability and helps to regularize the estimate. After introducing and benchmarking the tsBART model, we apply it to our motivating example—pregnancy outcomes data from the National Center for Health Statistics. Our aim is to provide patient-specific estimates of stillbirth risk across gestational age (t) and based on maternal and fetal risk factors (x). Obstetricians expect stillbirth risk to vary smoothly over gestational age but not necessarily over other covariates, and tsBART has been designed precisely to reflect this structural knowledge. The results of our analysis show the clear superiority of the tsBART model for quantifying stillbirth risk, thereby providing patients and doctors with better information for managing the risk of fetal mortality. All methods described here are implemented in the R package tsbart.
A comparative study of land subsidence susceptibility mapping of Tasuj plane, Iran, using boosted regression tree, random forest and classification and regression tree methods
Land subsidence occurrence in the Tasuj plane is becoming more frequent and hazardous in the near future due to the water crisis. To mitigate damage caused by land subsidence events, it is necessary to determine the susceptible or prone areas. This study focuses on producing and comparing land subsidence susceptibility map (LSSM) using boosted regression tree (BRT), random forest (RF), and classification and regression tree (CART) approaches with twelve influencing variables, namely altitude, slope angle, aspect, groundwater level, groundwater level change, land cover, lithology, distance to fault, distance to stream, stream power index, topographic wetness index, and plan curvature. Moreover, by implementing the Relief-F feature selection method, the most important variables in LSSM procedure were identified. The performance of the adopted methods was assessed using the area under the receiver operating characteristics curve (AUROC) and statistical evaluation indexes. The results showed that all the employed methods performed well; in particular, the BRT model (AUROC = 0.819) yielded higher prediction accuracy than RF (AUROC = 0.798) and CART (AUROC = 0.764). Findings of this study can assist in characterizing and mitigating the related hazard of land subsidence events.
Surface Motion Prediction and Mapping for Road Infrastructures Management by PS-InSAR Measurements and Machine Learning Algorithms
This paper introduces a methodology for predicting and mapping surface motion beneath road pavement structures caused by environmental factors. Persistent Scatterer Interferometric Synthetic Aperture Radar (PS-InSAR) measurements, geospatial analyses, and Machine Learning Algorithms (MLAs) are employed for achieving the purpose. Two single learners, i.e., Regression Tree (RT) and Support Vector Machine (SVM), and two ensemble learners, i.e., Boosted Regression Trees (BRT) and Random Forest (RF) are utilized for estimating the surface motion ratio in terms of mm/year over the Province of Pistoia (Tuscany Region, central Italy, 964 km2), in which strong subsidence phenomena have occurred. The interferometric process of 210 Sentinel-1 images from 2014 to 2019 allows exploiting the average displacements of 52,257 Persistent Scatterers as output targets to predict. A set of 29 environmental-related factors are preprocessed by SAGA-GIS, version 2.3.2, and ESRI ArcGIS, version 10.5, and employed as input features. Once the dataset has been prepared, three wrapper feature selection approaches (backward, forward, and bi-directional) are used for recognizing the set of most relevant features to be used in the modeling. A random splitting of the dataset in 70% and 30% is implemented to identify the training and test set. Through a Bayesian Optimization Algorithm (BOA) and a 10-Fold Cross-Validation (CV), the algorithms are trained and validated. Therefore, the Predictive Performance of MLAs is evaluated and compared by plotting the Taylor Diagram. Outcomes show that SVM and BRT are the most suitable algorithms; in the test phase, BRT has the highest Correlation Coefficient (0.96) and the lowest Root Mean Square Error (0.44 mm/year), while the SVM has the lowest difference between the standard deviation of its predictions (2.05 mm/year) and that of the reference samples (2.09 mm/year). Finally, algorithms are used for mapping surface motion over the study area. We propose three case studies on critical stretches of two-lane rural roads for evaluating the reliability of the procedure. Road authorities could consider the proposed methodology for their monitoring, management, and planning activities.
Reconstruction of Historical Land Use and Urban Flood Simulation in Xi’an, Shannxi, China
Reconstruction of historical land uses helps to understand patterns, drivers, and impacts of land-use change, and is essential for finding solutions to land-use sustainability. In order to analyze the relationship between land-use change and urban flooding, this study used the Classification and Regression Tree (CART) method to extract modern (2017) land-use data based on remote sensing images. Then, the Paleo-Land-Use Reconstruction (PLUR) program was used to reconstruct the land-use maps of Xi’an during the Ming (1582) and Qing (1766) dynasties by consulting and collecting records of land-use change in historical documents. Finally, the Flo-2D model was used to simulate urban flooding under different land-use scenarios. Over the past 435 years (1582–2017), the urban construction land area showed a trend of increasing, while the unused land area and water bodies were continuously decreasing. The increase in urban green space and buildings was 20.49% and 19.85% respectively, and the unused land area changed from 0.32 km2 to 0. Urban flooding in the modern land-use scenario is the most serious. In addition to the increase in impervious areas, the increase in building density and the decrease in water areas are also important factors that aggravate urban flooding. This study can provide a reference for future land-use planning and urban flooding control policy formulation and revision in the study area.
Novel Bayesian Additive Regression Tree Methodology for Flood Susceptibility Modeling
Identifying areas prone to flooding is a key step in flood risk management. The purpose of this study is to develop and present a novel flood susceptibility model based on Bayesian Additive Regression Tree (BART) methodology. The predictive performance of the new model is assessed via comparison with the Naïve Bayes (NB) and Random Forest (RF) based methods that were previously published in the literature. All models were tested on a real case study based in the Kan watershed in Iran. The following fifteen climatic and geo-environmental variables were used as inputs into all flood susceptibility models: altitude, aspect, slope, plan curvature, profile curvature, drainage density, distance from river distance from road, stream power index (SPI), topographic wetness index (TPI), topographic position index (TPI), curve number (CN), land use, lithology and rainfall. Based on the existing flood field survey and other information available for the analyzed area, a total of 118 flood locations were identified as potentially prone to flooding. The data available were divided into two groups with 70% used for training and 30% for validation of all models. The receiver operating characteristic (ROC) curve parameters were used to evaluate the predictive accuracy of the new and existing models. Based on the area under curve (AUC) the new BART (86%) model outperformed the NB (80%) and RF (85%) models. Regarding the importance of input variables, the results obtained showed that the location’s altitude and distance from the river are the most important variables for assessing flooding susceptibility.
Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction
The task of modeling the distribution of a large number of tree species under future climate scenarios presents unique challenges. First, the model must be robust enough to handle climate data outside the current range without producing unacceptable instability in the output. In addition, the technique should have automatic search mechanisms built in to select the most appropriate values for input model parameters for each species so that minimal effort is required when these parameters are fine-tuned for individual tree species. We evaluated four statistical models--Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS)--for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model. To test, we applied these techniques to four tree species common in the eastern United States: loblolly pine (Pinus taeda), sugar maple (Acer saccharum), American beech (Fagus grandifolia), and white oak (Quercus alba). When the four techniques were assessed with Kappa and fuzzy Kappa statistics, RF and BT were superior in reproducing current importance value (a measure of basal area in addition to abundance) distributions for the four tree species, as derived from approximately 100,000 USDA Forest Service's Forest Inventory and Analysis plots. Future estimates of suitable habitat after climate change were visually more reasonable with BT and RF, with slightly better performance by RF as assessed by Kappa statistics, correlation estimates, and spatial distribution of importance values. Although RTA did not perform as well as BT and RF, it provided interpretive models for species whose distributions were captured well by our current set of predictors. MARS was adequate for predicting current distributions but unacceptable for future climate. We consider RTA, BT, and RF modeling approaches, especially when used together to take advantage of their individual strengths, to be robust for predictive mapping and recommend their inclusion in the ecological toolbox.
Analysis of Circular Price Prediction Strategy for Used Electric Vehicles
As the car price war has intensified in China from 2023, the continuous decline in prices of new cars for both conventional fuel vehicles and electric vehicles (EVs) has led to a sharp decline in used cars. In particular, the EV market appears more vulnerable as the prime cost of battery raw materials has decreased since January 2023. And thus, a second-hand EV price prediction system is urgent. This study compares several methods for used EVs in China. We find that the random forest method and the gradient boosting regression tree (GBRT) method have good effects on predicting used EV prices in respecting price ranges. Timed EV data capture is applied to guarantee the real-time property of our prediction system. Then, we propose the concept of circular pricing, which means that the obsolete data for the priced car will be repriced according to the latest data. In this way, such a system can guide the used car dealers to adjust the price in time.