Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
13,419
result(s) for
"random forest"
Sort by:
Land Cover Classification using Google Earth Engine and Random Forest Classifier—The Role of Image Composition
by
Lehnert, Lukas W.
,
Phan, Thanh Noi
,
Kuch, Verena
in
Google Earth Engine (GEE)
,
image composition
,
land cover classification
2020
Land cover information plays a vital role in many aspects of life, from scientific and economic to political. Accurate information about land cover affects the accuracy of all subsequent applications, therefore accurate and timely land cover information is in high demand. In land cover classification studies over the past decade, higher accuracies were produced when using time series satellite images than when using single date images. Recently, the availability of the Google Earth Engine (GEE), a cloud-based computing platform, has gained the attention of remote sensing based applications where temporal aggregation methods derived from time series images are widely applied (i.e., the use the metrics such as mean or median), instead of time series images. In GEE, many studies simply select as many images as possible to fill gaps without concerning how different year/season images might affect the classification accuracy. This study aims to analyze the effect of different composition methods, as well as different input images, on the classification results. We use Landsat 8 surface reflectance (L8sr) data with eight different combination strategies to produce and evaluate land cover maps for a study area in Mongolia. We implemented the experiment on the GEE platform with a widely applied algorithm, the Random Forest (RF) classifier. Our results show that all the eight datasets produced moderately to highly accurate land cover maps, with overall accuracy over 84.31%. Among the eight datasets, two time series datasets of summer scenes (images from 1 June to 30 September) produced the highest accuracy (89.80% and 89.70%), followed by the median composite of the same input images (88.74%). The difference between these three classifications was not significant based on the McNemar test (p > 0.05). However, significant difference (p < 0.05) was observed for all other pairs involving one of these three datasets. The results indicate that temporal aggregation (e.g., median) is a promising method, which not only significantly reduces data volume (resulting in an easier and faster analysis) but also produces an equally high accuracy as time series data. The spatial consistency among the classification results was relatively low compared to the general high accuracy, showing that the selection of the dataset used in any classification on GEE is an important and crucial step, because the input images for the composition play an essential role in land cover classification, particularly with snowy, cloudy and expansive areas like Mongolia.
Journal Article
Assessment of Soft Computing Techniques for the Prediction of Compressive Strength of Bacterial Concrete
by
Fadi Almohammed
,
C. Venkata Siva Rama Prasad
,
Parveen Sihag
in
Ammonia
,
Bacteria
,
bacterial concrete; compressive strength; soft computing techniques; support vector regression; M5P; random forest; Random Tree; artificial intelligence
2022
In this investigation, the potential of M5P, Random Tree (RT), Reduced Error Pruning Tree (REP Tree), Random Forest (RF), and Support Vector Regression (SVR) techniques have been evaluated and compared with the multiple linear regression-based model (MLR) to be used for prediction of the compressive strength of bacterial concrete. For this purpose, 128 experimental observations have been collected. The total data set has been divided into two segments such as training (87 observations) and testing (41 observations). The process of data set separation was arbitrary. Cement, Aggregate, Sand, Water to Cement Ratio, Curing time, Percentage of Bacteria, and type of sand were the input variables, whereas the compressive strength of bacterial concrete has been considered as the final target. Seven performance evaluation indices such as Correlation Coefficient (CC), Coefficient of determination (R2), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Bias, Nash-Sutcliffe Efficiency (NSE), and Scatter Index (SI) have been used to evaluate the performance of the developed models. Outcomes of performance evaluation indices recommend that the Polynomial kernel function based SVR model works better than other developed models with CC values as 0.9919, 0.9901, R2 values as 0.9839, 0.9803, NSE values as 0.9832, 0.9800, and lower values of RMSE are 1.5680, 1.9384, MAE is 0.7854, 1.5155, Bias are 0.2353, 0.1350 and SI are 0.0347, 0.0414 for training and testing stages, respectively. The sensitivity investigation shows that the curing time (T) is the vital input variable affecting the prediction of the compressive strength of bacterial concrete, using this data set.
Journal Article
Flash Flood Susceptibility Modeling Using New Approaches of Hybrid and Ensemble Tree-Based Machine Learning Algorithms
by
Saha, Asish
,
Melesse, Assefa M.
,
Chandra Pal, Subodh
in
adverse effects
,
Algorithms
,
altitude
2020
Flash flooding is considered one of the most dynamic natural disasters for which measures need to be taken to minimize economic damages, adverse effects, and consequences by mapping flood susceptibility. Identifying areas prone to flash flooding is a crucial step in flash flood hazard management. In the present study, the Kalvan watershed in Markazi Province, Iran, was chosen to evaluate the flash flood susceptibility modeling. Thus, to detect flash flood-prone zones in this study area, five machine learning (ML) algorithms were tested. These included boosted regression tree (BRT), random forest (RF), parallel random forest (PRF), regularized random forest (RRF), and extremely randomized trees (ERT). Fifteen climatic and geo-environmental variables were used as inputs of the flash flood susceptibility models. The results showed that ERT was the most optimal model with an area under curve (AUC) value of 0.82. The rest of the models’ AUC values, i.e., RRF, PRF, RF, and BRT, were 0.80, 0.79, 0.78, and 0.75, respectively. In the ERT model, the areal coverage for very high to moderate flash flood susceptible area was 582.56 km2 (28.33%), and the rest of the portion was associated with very low to low susceptibility zones. It is concluded that topographical and hydrological parameters, e.g., altitude, slope, rainfall, and the river’s distance, were the most effective parameters. The results of this study will play a vital role in the planning and implementation of flood mitigation strategies in the region.
Journal Article
Detection of phishing websites using an efficient feature-based machine learning framework
by
Rao, Routhu Srinivasa
,
Pais, Alwyn Roshan
in
Algorithms
,
Artificial Intelligence
,
Classification
2019
Phishing is a cyber-attack which targets naive online users tricking into revealing sensitive information such as username, password, social security number or credit card number etc. Attackers fool the Internet users by masking webpage as a trustworthy or legitimate page to retrieve personal information. There are many anti-phishing solutions such as blacklist or whitelist, heuristic and visual similarity-based methods proposed to date, but online users are still getting trapped into revealing sensitive information in phishing websites. In this paper, we propose a novel classification model, based on heuristic features that are extracted from URL, source code, and third-party services to overcome the disadvantages of existing anti-phishing techniques. Our model has been evaluated using eight different machine learning algorithms and out of which, the Random Forest (RF) algorithm performed the best with an accuracy of 99.31%. The experiments were repeated with different (orthogonal and oblique) random forest classifiers to find the best classifier for the phishing website detection. Principal component analysis Random Forest (PCA-RF) performed the best out of all oblique Random Forests (oRFs) with an accuracy of 99.55%. We have also tested our model with the third-party-based features and without third-party-based features to determine the effectiveness of third-party services in the classification of suspicious websites. We also compared our results with the baseline models (CANTINA and CANTINA+). Our proposed technique outperformed these methods and also detected zero-day phishing attacks.
Journal Article
Estimating above-ground biomass in sub-tropical buffer zone community Forests, Nepal, using Sentinel 2 data
by
Tsuyuki, Satoshi
,
Pandit, Santa
,
Dube, Timothy
in
above-ground biomass (AGB)
,
Accuracy
,
Algorithms
2018
Accurate assessment of above-ground biomass (AGB) is important for the sustainable management of forests, especially buffer zone (areas within the protected area, where restrictions are placed upon resource use and special measure are undertaken to intensify the conservation value of protected area) areas with a high dependence on forest products. This study presents a new AGB estimation method and demonstrates the potential of medium-resolution Sentinel-2 Multi-Spectral Instrument (MSI) data application as an alternative to hyperspectral data in inaccessible regions. Sentinel-2 performance was evaluated for a buffer zone community forest in Parsa National Park, Nepal, using field-based AGB as a dependent variable, as well as spectral band values and spectral-derived vegetation indices as independent variables in the Random Forest (RF) algorithm. The 10-fold cross-validation was used to evaluate model effectiveness. The effect of the input variable number on AGB prediction was also investigated. The model using all extracted spectral information plus all derived spectral vegetation indices provided better AGB estimates (R2 = 0.81 and RMSE = 25.57 t ha-1). Incorporating the optimal subset of key variables did not improve model variance but reduced the error slightly. This result is explained by the technically-advanced nature of Sentinel-2, which includes fine spatial resolution (10, 20 m) and strategically-positioned bands (red-edge), conducted in flat topography with an advanced machine learning algorithm. However, assessing its transferability to other forest types with varying altitude would enable future performance and interpretability assessments of Sentinel-2.
Journal Article
Prediction of Risk Delay in Construction Projects Using a Hybrid Artificial Intelligence Model
by
Yaseen, Zaher Mundher
,
Salih, Sinan Q.
,
Ali, Zainab Hasan
in
Accuracy
,
Artificial intelligence
,
Civil engineering
2020
Project delays are the major problems tackled by the construction sector owing to the associated complexity and uncertainty in the construction activities. Artificial Intelligence (AI) models have evidenced their capacity to solve dynamic, uncertain and complex tasks. The aim of this current study is to develop a hybrid artificial intelligence model called integrative Random Forest classifier with Genetic Algorithm optimization (RF-GA) for delay problem prediction. At first, related sources and factors of delay problems are identified. A questionnaire is adopted to quantify the impact of delay sources on project performance. The developed hybrid model is trained using the collected data of the previous construction projects. The proposed RF-GA is validated against the classical version of an RF model using statistical performance measure indices. The achieved results of the developed hybrid RF-GA model revealed a good resultant performance in terms of accuracy, kappa and classification error. Based on the measured accuracy, kappa and classification error, RF-GA attained 91.67%, 87% and 8.33%, respectively. Overall, the proposed methodology indicated a robust and reliable technique for project delay prediction that is contributing to the construction project management monitoring and sustainability.
Journal Article
Prediction of meteorological drought and standardized precipitation index based on the random forest (RF), random tree (RT), and Gaussian process regression (GPR) models
by
Singh, Sudhir Kumar
,
Vishwakarma, Dinesh Kumar
,
Elbeltagi, Ahmed
in
Algorithms
,
Aquatic Pollution
,
Arid regions
2023
Agriculture, meteorological, and hydrological drought is a natural hazard which affects ecosystems in the central India of Maharashtra state. Due to limited historical data for drought monitoring and forecasting available in the central India of Maharashtra state, implementing machine learning (ML) algorithms could allow for the prediction of future drought events. In this paper, we have focused on the prediction accuracy of meteorological drought in the semi-arid region based on the standardized precipitation index (SPI) using the random forest (RF), random tree (RT), and Gaussian process regression (GPR-PUK kernel) models. A different combination of machine learning models and variables has been performed for the forecasting of metrological drought based on the SPI-6 and 12 months. Models were developed using monthly rainfall data for the period of 2000–2019 at two meteorological stations, namely, Karanjali and Gangawdi, each representing a geographical region of Upper Godavari river basin area in the central India of Maharashtra state which frequently experiences droughts. Historical data from the SPI from 2000 to 2013 was processed to train the model into machine learning model, and the rest of the 2014 to 2019-year data were used for testing to forecast the SPI and metrological drought. The mean square error (MSE), root mean square error (RMSE), adjusted
R
2
, Mallows’ (Cp), Akaike’s (AIC), Schwarz’s (SBC), and Amemiya’s PC were used to identify the best combination input model and best subregression analysis for both stations of SPI-6 and 12. The correlation coefficient (
r
), mean absolute error (MAE), root mean square error (RMSE), relative absolute error (RAE), and root relative squared error (RRSE) were used to perform evaluation for SPI-6 and 12 months of both stations with RF, RT, and GPR-PUK kernel models during the training and testing scenarios. The results during testing phase revealed that the RF was found as the best model in forecasting droughts with values of
r
, MAE, RMSE, RAE (%), and RRSE (%) being 0.856, 0.551, 0.718, 74.778, and 54.019, respectively, for SPI-6 while 0.961, 0.361, 0.538, 34.926, and 28.262, respectively, for SPI-12 scales at Gangawdi station. Further, the respective values of evaluators at Karanjali station were 0.913 and 0.966, 0.541 and 0.386, 0.604 and 0.589, 52.592 and 36.959, and 42.315 and 31.394 for PUK kernel and RT models, respectively, during SPI-6 and SPI-12. Machine learning models are potential drought warning techniques because they take less time, have fewer inputs, and are less sophisticated than dynamic or scientific models.
Journal Article
Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia
by
Pourghasemi, Hamid Reza
,
Youssef, Ahmed Mohamed
,
Pourtaghi, Zohre Sadat
in
Agriculture
,
Carts
,
Civil Engineering
2016
The purpose of the current study is to produce landslide susceptibility maps using different data mining models. Four modeling techniques, namely random forest (RF), boosted regression tree (BRT), classification and regression tree (CART), and general linear (GLM) are used, and their results are compared for landslides susceptibility mapping at the Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslide locations were identified and mapped from the interpretation of different data types, including high-resolution satellite images, topographic maps, historical records, and extensive field surveys. In total, 125 landslide locations were mapped using ArcGIS 10.2, and the locations were divided into two groups; training (70 %) and validating (25 %), respectively. Eleven layers of landslide-conditioning factors were prepared, including slope aspect, altitude, distance from faults, lithology, plan curvature, profile curvature, rainfall, distance from streams, distance from roads, slope angle, and land use. The relationships between the landslide-conditioning factors and the landslide inventory map were calculated using the mentioned 32 models (RF, BRT, CART, and generalized additive (GAM)). The models’ results were compared with landslide locations, which were not used during the models’ training. The receiver operating characteristics (ROC), including the area under the curve (AUC), was used to assess the accuracy of the models. The success (training data) and prediction (validation data) rate curves were calculated. The results showed that the AUC for success rates are 0.783 (78.3 %), 0.958 (95.8 %), 0.816 (81.6 %), and 0.821 (82.1 %) for RF, BRT, CART, and GLM models, respectively. The prediction rates are 0.812 (81.2 %), 0.856 (85.6 %), 0.862 (86.2 %), and 0.769 (76.9 %) for RF, BRT, CART, and GLM models, respectively. Subsequently, landslide susceptibility maps were divided into four classes, including low, moderate, high, and very high susceptibility. The results revealed that the RF, BRT, CART, and GLM models produced reasonable accuracy in landslide susceptibility mapping. The outcome maps would be useful for general planned development activities in the future, such as choosing new urban areas and infrastructural activities, as well as for environmental protection.
Journal Article
From Easy to Hopeless—Predicting the Difficulty of Phylogenetic Analyses
2022
Abstract
Phylogenetic analyzes under the Maximum-Likelihood (ML) model are time and resource intensive. To adequately capture the vastness of tree space, one needs to infer multiple independent trees. On some datasets, multiple tree inferences converge to similar tree topologies, on others to multiple, topologically highly distinct yet statistically indistinguishable topologies. At present, no method exists to quantify and predict this behavior. We introduce a method to quantify the degree of difficulty for analyzing a dataset and present Pythia, a Random Forest Regressor that accurately predicts this difficulty. Pythia predicts the degree of difficulty of analyzing a dataset prior to initiating ML-based tree inferences. Pythia can be used to increase user awareness with respect to the amount of signal and uncertainty to be expected in phylogenetic analyzes, and hence inform an appropriate (post-)analysis setup. Further, it can be used to select appropriate search algorithms for easy-, intermediate-, and hard-to-analyze datasets.
Journal Article
Burned Area Detection Using Multi-Sensor SAR, Optical, and Thermal Data in Mediterranean Pine Forest
2022
Burned area (BA) mapping of a forest after a fire is required for its management and the determination of the impacts on ecosystems. Different remote sensing sensors and their combinations have been used due to their individual limitations for accurate BA mapping. This study analyzes the contribution of different features derived from optical, thermal, and Synthetic Aperture Radar (SAR) images to extract BA information from the Turkish red pine (Pinus brutia Ten.) forest in a Mediterranean ecosystem. In addition to reflectance values of the optical images, Normalized Burn Ratio (NBR) and Land Surface Temperature (LST) data are produced from both Sentinel-2 and Landsat-8 data. The backscatter of C-band Sentinel-1 and L-band ALOS-2 SAR images and the coherence feature derived from the Interferometric SAR technique were also used. The pixel-based random forest image classification method is applied to classify the BA detection in 24 scenarios created using these features. The results show that the L-band data provided a better contribution than C-band data and the combination of features created from Landsat LST, NBR, and coherence of L-band ALOS-2 achieved the highest accuracy, with an overall accuracy of 96% and a Kappa coefficient of 92.62%.
Journal Article