Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
491
result(s) for
"random forest (RF)"
Sort by:
Land Cover Classification using Google Earth Engine and Random Forest Classifier—The Role of Image Composition
by
Lehnert, Lukas W.
,
Phan, Thanh Noi
,
Kuch, Verena
in
Google Earth Engine (GEE)
,
image composition
,
land cover classification
2020
Land cover information plays a vital role in many aspects of life, from scientific and economic to political. Accurate information about land cover affects the accuracy of all subsequent applications, therefore accurate and timely land cover information is in high demand. In land cover classification studies over the past decade, higher accuracies were produced when using time series satellite images than when using single date images. Recently, the availability of the Google Earth Engine (GEE), a cloud-based computing platform, has gained the attention of remote sensing based applications where temporal aggregation methods derived from time series images are widely applied (i.e., the use the metrics such as mean or median), instead of time series images. In GEE, many studies simply select as many images as possible to fill gaps without concerning how different year/season images might affect the classification accuracy. This study aims to analyze the effect of different composition methods, as well as different input images, on the classification results. We use Landsat 8 surface reflectance (L8sr) data with eight different combination strategies to produce and evaluate land cover maps for a study area in Mongolia. We implemented the experiment on the GEE platform with a widely applied algorithm, the Random Forest (RF) classifier. Our results show that all the eight datasets produced moderately to highly accurate land cover maps, with overall accuracy over 84.31%. Among the eight datasets, two time series datasets of summer scenes (images from 1 June to 30 September) produced the highest accuracy (89.80% and 89.70%), followed by the median composite of the same input images (88.74%). The difference between these three classifications was not significant based on the McNemar test (p > 0.05). However, significant difference (p < 0.05) was observed for all other pairs involving one of these three datasets. The results indicate that temporal aggregation (e.g., median) is a promising method, which not only significantly reduces data volume (resulting in an easier and faster analysis) but also produces an equally high accuracy as time series data. The spatial consistency among the classification results was relatively low compared to the general high accuracy, showing that the selection of the dataset used in any classification on GEE is an important and crucial step, because the input images for the composition play an essential role in land cover classification, particularly with snowy, cloudy and expansive areas like Mongolia.
Journal Article
Estimating above-ground biomass in sub-tropical buffer zone community Forests, Nepal, using Sentinel 2 data
by
Tsuyuki, Satoshi
,
Pandit, Santa
,
Dube, Timothy
in
above-ground biomass (AGB)
,
Accuracy
,
Algorithms
2018
Accurate assessment of above-ground biomass (AGB) is important for the sustainable management of forests, especially buffer zone (areas within the protected area, where restrictions are placed upon resource use and special measure are undertaken to intensify the conservation value of protected area) areas with a high dependence on forest products. This study presents a new AGB estimation method and demonstrates the potential of medium-resolution Sentinel-2 Multi-Spectral Instrument (MSI) data application as an alternative to hyperspectral data in inaccessible regions. Sentinel-2 performance was evaluated for a buffer zone community forest in Parsa National Park, Nepal, using field-based AGB as a dependent variable, as well as spectral band values and spectral-derived vegetation indices as independent variables in the Random Forest (RF) algorithm. The 10-fold cross-validation was used to evaluate model effectiveness. The effect of the input variable number on AGB prediction was also investigated. The model using all extracted spectral information plus all derived spectral vegetation indices provided better AGB estimates (R2 = 0.81 and RMSE = 25.57 t ha-1). Incorporating the optimal subset of key variables did not improve model variance but reduced the error slightly. This result is explained by the technically-advanced nature of Sentinel-2, which includes fine spatial resolution (10, 20 m) and strategically-positioned bands (red-edge), conducted in flat topography with an advanced machine learning algorithm. However, assessing its transferability to other forest types with varying altitude would enable future performance and interpretability assessments of Sentinel-2.
Journal Article
Prediction of Blood-Brain Barrier Penetration (BBBP) Based on Molecular Descriptors of the Free-Form and In-Blood-Form Datasets
by
Fukuda, Motohisa
,
Sakiyama, Hiroshi
,
Okuno, Takashi
in
Amines - chemistry
,
Amines - pharmacology
,
Biological Transport - drug effects
2021
The blood-brain barrier (BBB) controls the entry of chemicals from the blood to the brain. Since brain drugs need to penetrate the BBB, rapid and reliable prediction of BBB penetration (BBBP) is helpful for drug development. In this study, free-form and in-blood-form datasets were prepared by modifying the original BBBP dataset, and the effects of the data modification were investigated. For each dataset, molecular descriptors were generated and used for BBBP prediction by machine learning (ML). For ML, the dataset was split into training, validation, and test data by the scaffold split algorithm MoleculeNet used. This creates an unbalanced split and makes the prediction difficult; however, we decided to use that algorithm to evaluate the predictive performance for unknown compounds dissimilar to existing ones. The highest prediction score was obtained by the random forest model using 212 descriptors from the free-form dataset, and this score was higher than the existing best score using the same split algorithm without using any external database. Furthermore, using a deep neural network, a comparable result was obtained with only 11 descriptors from the free-form dataset, and the resulting descriptors suggested the importance of recognizing the glucose-like characteristics in BBBP prediction.
Journal Article
Legendre polynomial transformation and energy-weighted random forests for sequential data classification
by
Alzubaidi, Samirah
,
Alharbi, Nada MohammedSaeed
,
Alghamdi, Fatimah M.
in
639/705/531
,
639/705/794
,
Atmospheric sciences
2025
The accurate classification of sequential data encompassing time series, sensor streams, and temporal signals is critical for applications ranging from environmental monitoring to industrial fault detection. Traditional machine learning methods often struggle with temporal dependencies, noise, and non-stationary patterns, while deep learning approaches encounter computational bottlenecks and challenges related to interpretability when classifying sequence data. This paper introduces the Legendre Energy-Weighted Random Forest (LEW-RF), a novel framework that integrates Legendre polynomial transformations with Random Forest (RF) to address these limitations. By projecting sequential data onto a Legendre polynomial basis, LEW-RF extracts low-degree coefficients that encode discriminative temporal trends, such as cubic drifts and abrupt anomalies. Specifically, LEW-RF employs feature-wise energies to guide splits in RF. Theoretically, we demonstrate that Legendre energy is correlated with class separability, thereby enabling robustness to noise and irregular sampling. A comprehensive simulation study was performed to evaluate LEW-RF on synthetic sequential datasets with controlled polynomial patterns and noise structures. Results demonstrate that LEW-RF achieves 81.2% accuracy and 86.4% AUC, outperforming conventional RF by 5.3% in accuracy while operating 126 times faster than BiLSTM models. Empirical evaluation on a real-world benchmark eight-hour ozone dataset comprising 2,534 samples across 72 features with severe class imbalance (6.93% harmful ozone days) shows that LEW-RF achieves 97.0% accuracy, 99.6% recall, and 99.8% AUC after class balancing. It outperforms conventional RF by 1.4% in accuracy while operating 228 times faster than BiLSTM. In addition, the importance of the LEW-RF interpretable feature aligns with atmospheric science principles, identifying critical temporal sensors (T13–T15) that drive photochemical pollution events.
Journal Article
A New Application of Random Forest Algorithm to Estimate Coverage of Moss-Dominated Biological Soil Crusts in Semi-Arid Mu Us Sandy Land, China
2019
Biological soil crusts (BSCs) play an essential role in desert ecosystems. Knowledge of the distribution and disappearance of BSCs is vital for the management of ecosystems and for desertification researches. However, the major remote sensing approaches used to extract BSCs are multispectral indices, which lack accuracy, and hyperspectral indices, which have lower data availability and require a higher computational effort. This study employs random forest (RF) models to optimize the extraction of BSCs using band combinations similar to the two multispectral BSC indices (Crust Index-CI; Biological Soil Crust Index-BSCI), but covering all possible band combinations. Simulated multispectral datasets resampled from in-situ hyperspectral data were used to extract BSC information. Multispectral datasets (Landsat-8 and Sentinel-2 datasets) were then used to detect BSC coverage in Mu Us Sandy Land, located in northern China, where BSCs dominated by moss are widely distributed. The results show that (i) the spectral curves of moss-dominated BSCs are different from those of other typical land surfaces, (ii) the BSC coverage can be predicted using the simulated multispectral data (mean square error (MSE) < 0.01), (iii) Sentinel-2 satellite datasets with CI-based band combinations provided a reliable RF model for detecting moss-dominated BSCs (10-fold validation, R2 = 0.947; ground validation, R2 = 0.906). In conclusion, application of the RF algorithm to the Sentinel-2 dataset can precisely and effectively map BSCs dominated by moss. This new application can be used as a theoretical basis for detecting BSCs in other arid and semi-arid lands within desert ecosystems.
Journal Article
The Relative Importance of Socioeconomic and Environmental Variables in Explaining Land Change in Bolivia, 2001-2010
by
Redo, Daniel J.
,
Clark, Matthew L.
,
Aide, T. Mitchell
in
Agricultural geography
,
Agricultural land
,
Agriculture
2012
This study assesses the relationship between trends in land change from 2001 to 2010 and socioeconomic and environmental variables in Bolivia at multiple spatial scales using a nonparametric, tree-based modeling approach. It also explores the theoretical dimensions surrounding the debate over the relative importance of socioeconomic and environmental variables in explaining land change. Results from the land change analysis show several hotspots of dynamic change. The majority of woody vegetation loss occurred in the eastern lowlands of Santa Cruz, Beni, and Pando and was attributable to the expansion of industrial agriculture. Gains in woody vegetation took place in the drylands of Santa Cruz and Beni savanna, and these changes were attributed to shifting patterns in precipitation and fire rather than human-induced change. Other hotspots of woody vegetation gain were attributed to abandonment of agriculture and herbaceous lands in the intermontane valleys of the southern Andes. Regression analyses showed that population and other demographic variables were poor predictors of land change. There is a clear relationship, however, between changes in woody and agriculture/herbaceous vegetation and environmental variables such as precipitation, temperature, and elevation. Municipalities with adequate precipitation and moderate temperature tended to show increases in agriculture and herbaceous vegetation and woody vegetation declines. Woody vegetation tended to increase in municipalities at higher elevations. This study also shows that explanations of only wealth or population as the main drivers of land change undervalue the role that natural features, like topography and precipitation, play in limiting or permitting certain land-use decisions.
Journal Article
Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery
by
Thanh Noi, Phan
,
Kappas, Martin
in
Classification
,
classification algorithms
,
k-Nearest Neighbor (kNN)
2017
In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km2 within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.
Journal Article
Bearing Fault Diagnosis Method Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning
by
Xu, Gaowei
,
Shen, Weiming
,
Liu, Min
in
bearing fault diagnosis
,
continuous wavelet transform (CWT)
,
convolutional neural network (CNN)
2019
Recently, research on data-driven bearing fault diagnosis methods has attracted increasing attention due to the availability of massive condition monitoring data. However, most existing methods still have difficulties in learning representative features from the raw data. In addition, they assume that the feature distribution of training data in source domain is the same as that of testing data in target domain, which is invalid in many real-world bearing fault diagnosis problems. Since deep learning has the automatic feature extraction ability and ensemble learning can improve the accuracy and generalization performance of classifiers, this paper proposes a novel bearing fault diagnosis method based on deep convolutional neural network (CNN) and random forest (RF) ensemble learning. Firstly, time domain vibration signals are converted into two dimensional (2D) gray-scale images containing abundant fault information by continuous wavelet transform (CWT). Secondly, a CNN model based on LeNet-5 is built to automatically extract multi-level features that are sensitive to the detection of faults from the images. Finally, the multi-level features containing both local and global information are utilized to diagnose bearing faults by the ensemble of multiple RF classifiers. In particular, low-level features containing local characteristics and accurate details in the hidden layers are combined to improve the diagnostic performance. The effectiveness of the proposed method is validated by two sets of bearing data collected from reliance electric motor and rolling mill, respectively. The experimental results indicate that the proposed method achieves high accuracy in bearing fault diagnosis under complex operational conditions and is superior to traditional methods and standard deep learning methods.
Journal Article
Classification of Maxillofacial Morphology by Artificial Intelligence Using Cephalometric Analysis Measurements
2023
The characteristics of maxillofacial morphology play a major role in orthodontic diagnosis and treatment planning. While Sassouni’s classification scheme outlines different categories of maxillofacial morphology, there is no standardized approach to assigning these classifications to patients. This study aimed to create an artificial intelligence (AI) model that uses cephalometric analysis measurements to accurately classify maxillofacial morphology, allowing for the standardization of maxillofacial morphology classification. This study used the initial cephalograms of 220 patients aged 18 years or older. Three orthodontists classified the maxillofacial morphologies of 220 patients using eight measurements as the accurate classification. Using these eight cephalometric measurement points and the subject’s gender as input features, a random forest classifier from the Python sci-kit learning package was trained and tested with a k-fold split of five to determine orthodontic classification; distinct models were created for horizontal-only, vertical-only, and combined maxillofacial morphology classification. The accuracy of the combined facial classification was 0.823 ± 0.060; for anteroposterior-only classification, the accuracy was 0.986 ± 0.011; and for the vertical-only classification, the accuracy was 0.850 ± 0.037. ANB angle had the greatest feature importance at 0.3519. The AI model created in this study accurately classified maxillofacial morphology, but it can be further improved with more learning data input.
Journal Article
Machine Learning-Based Gully Erosion Susceptibility Mapping: A Case Study of Eastern India
by
Roy, Jagabandhu
,
Saha, Sunil
,
Blaschke, Thomas
in
geographical information system (gis)
,
gradient boosted regression tree (gbrt)
,
naïve bayes tree (nbt)
2020
Gully erosion is a form of natural disaster and one of the land loss mechanisms causing severe problems worldwide. This study aims to delineate the areas with the most severe gully erosion susceptibility (GES) using the machine learning techniques Random Forest (RF), Gradient Boosted Regression Tree (GBRT), Naïve Bayes Tree (NBT), and Tree Ensemble (TE). The gully inventory map (GIM) consists of 120 gullies. Of the 120 gullies, 84 gullies (70%) were used for training and 36 gullies (30%) were used to validate the models. Fourteen gully conditioning factors (GCFs) were used for GES modeling and the relationships between the GCFs and gully erosion was assessed using the weight-of-evidence (WofE) model. The GES maps were prepared using RF, GBRT, NBT, and TE and were validated using area under the receiver operating characteristic (AUROC) curve, the seed cell area index (SCAI) and five statistical measures including precision (PPV), false discovery rate (FDR), accuracy, mean absolute error (MAE), and root mean squared error (RMSE). Nearly 7% of the basin has high to very high susceptibility for gully erosion. Validation results proved the excellent ability of these models to predict the GES. Of the analyzed models, the RF (AUROC = 0.96, PPV = 1.00, FDR = 0.00, accuracy = 0.87, MAE = 0.11, RMSE = 0.19 for validation dataset) is accurate enough for modeling and better suited for GES modeling than the other models. Therefore, the RF model can be used to model the GES areas not only in this river basin but also in other areas with the same geo-environmental conditions.
Journal Article