Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
26
result(s) for
"random forest parameterization"
Sort by:
RaFSIP: Parameterizing Ice Multiplication in Models Using a Machine Learning Approach
2024
Accurately representing mixed‐phase clouds (MPCs) in global climate models (GCMs) is critical for capturing climate sensitivity and Arctic amplification. Secondary ice production (SIP), can significantly increase ice crystal number concentration (ICNC) in MPCs, affecting cloud properties and processes. Here, we introduce a machine‐learning (ML) approach, called Random Forest SIP (RaFSIP), to parameterize SIP in stratiform MPCs. RaFSIP is trained on 16 grid points with 10‐km horizontal spacing derived from a 2‐year simulation with the Weather Research and Forecasting (WRF) model, including explicit SIP microphysics. Designed for a temperature range of 0 to −25°C, RaFSIP simplifies the description of rime splintering, ice‐ice collisional break‐up, and droplet‐shattering using only a limited set of inputs. RaFSIP was evaluated offline before being integrated into WRF, demonstrating its stable online performance in a 1‐year simulation keeping the same model setup as during training. Even when coupled with the 50‐km grid spacing domain of WRF, RaFSIP reproduces ICNC predictions within a factor of 3 when compared to simulations with explicit SIP microphysics. The coupled WRF‐RaFSIP scheme replicates regions of enhanced SIP and accurately maps ICNCs and liquid water content, particularly at temperatures above −10°C. Uncertainties in RaFSIP minimally impact surface cloud radiative forcing in the Arctic, resulting in radiative biases under 3 Wm−2 compared to simulations with detailed microphysics. Although the performance of RaFSIP in convective clouds remains untested, its adaptable nature allows for data set augmentation to address this aspect. This framework opens possibilities for GCM simplification and process description through physics‐guided ML algorithms. Plain Language Summary Being able to correctly simulate the amount of ice and liquid in clouds is essential for accurate predictions of the cloud radiative forcing in the climatologically sensitive polar regions. A number of collisional processes between ice and liquid particles in clouds, known as secondary ice production, can significantly enhance the ice crystal number concentrations contained in them. This enhancement is often accompanied by a decrease in the cloud liquid water content, resulting in less opaque clouds to incoming solar radiation, which, in turn, can cause a cloud‐induced warming at the surface. Currently most global climate models are missing the description of the most important secondary ice production processes, which can lead to a biased radiative impact of clouds at the surface. To address this, we propose using a machine learning algorithm trained on high‐resolution model outputs to include the effect of ice multiplication in large‐scale climate models. The machine learning framework effectively captures the physical processes underlying secondary ice production in stratiform clouds using only a few inputs readily available in model frameworks. This approach has the potential to improve model predictions bringing them closer to the observed cloud phase partitioning. Key Points A random‐forest parameterization for secondary ice production is developed using outputs from a 10‐km horizontal grid spacing simulation Cloud phase partitioning agrees within a factor of 3, with radiative biases below 3 Wm−2 compared to the detailed microphysics simulation The scheme can be adjusted to coarser resolutions typical of climate models without losing computational efficiency and numerical stability
Journal Article
Landslide susceptibility assessment in complex geological settings: sensitivity to geological information and insights on its parameterization
2020
The literature about landslide susceptibility mapping is rich of works focusing on improving or comparing the algorithms used for the modeling, but to our knowledge, a sensitivity analysis on the use of geological information has never been performed, and a standard method to input geological maps into susceptibility assessments has never been established. This point is crucial, especially when working on wide and complex areas, in which a detailed geological map needs to be reclassified according to more general criteria. In a study area in Italy, we tested different configurations of a random forest–based landslide susceptibility model, accounting for geological information with the use of lithologic, chronologic, structural, paleogeographic, and genetic units. Different susceptibility maps were obtained, and a validation procedure based on AUC (area under receiver-operator characteristic curve) and OOBE (out of bag error) allowed us to get to some conclusions that could be of help for in future landslide susceptibility assessments. Different parameters can be derived from a detailed geological map by aggregating the mapped elements into broader units, and the results of the susceptibility assessment are very sensitive to these geology-derived parameters; thus, it is of paramount importance to understand properly the nature and the meaning of the information provided by geology-related maps before using them in susceptibility assessment. Regarding the model configurations making use of only one parameter, the best results were obtained using the genetic approach, while lithology, which is commonly used in the current literature, was ranked only second. However, in our case study, the best prediction was obtained when all the geological parameters were used together. Geological maps provide a very complex and multifaceted information; in wide and complex area, this information cannot be represented by a single parameter: more geology-based parameters can perform better than one, because each of them can account for specific features connected to landslide predisposition.
Journal Article
The parameter sensitivity of random forests
by
Boutros, Paul C.
,
Huang, Barbara F.F.
in
Algorithms
,
Bioinformatics
,
Biomedical and Life Sciences
2016
Background
The Random Forest (RF) algorithm for supervised machine learning is an ensemble learning method widely used in science and many other fields. Its popularity has been increasing, but relatively few studies address the parameter selection process: a critical step in model fitting. Due to numerous assertions regarding the performance reliability of the default parameters, many RF models are fit using these values. However there has not yet been a thorough examination of the parameter-sensitivity of RFs in computational genomic studies. We address this gap here.
Results
We examined the effects of parameter selection on classification performance using the RF machine learning algorithm on two biological datasets with distinct
p/n
ratios: sequencing summary statistics (low
p/n
) and microarray-derived data (high
p/n
). Here,
p,
refers to the number of variables and,
n
, the number of samples. Our findings demonstrate that parameterization is highly correlated with prediction accuracy and variable importance measures (VIMs). Further, we demonstrate that different parameters are critical in tuning different datasets, and that parameter-optimization significantly enhances upon the default parameters.
Conclusions
Parameter performance demonstrated wide variability on both low and high
p/n
data. Therefore, there is significant benefit to be gained by model tuning RFs away from their default parameter settings.
Journal Article
Accuracy assessment of inverse distance weighting interpolation of groundwater nitrate concentrations in Bavaria (Germany)
by
Ohlert, Paul L.
,
Breuer, Lutz
,
Bach, Martin
in
Accuracy
,
Agricultural land
,
Aquatic Pollution
2023
For the designation of nitrate vulnerable zones under the EU Nitrate Directive, some German federal states use inverse distance weighting (IDW) as interpolation method. Our study quantifies the accuracy of IDW with respect to the designation of areas with a groundwater nitrate concentration above the threshold of 50 mg NO
3
/l using a dataset of 5790 groundwater monitoring sites in Bavaria. The results show that the absolute differences of nitrate concentrations between the monitoring sites are only weakly correlated within a range of no more than 0.4 km. The IDW cross-validated nitrate concentration of measurement sites shows a mean absolute error of 7.0 mg NO
3
/l and the number of measurement sites above 50 mg NO
3
/l is 44% too low by interpolation for all sites as a whole. The corresponding values for interpolation separately for the 18 hydrogeological regions in Bavaria are 7.1 mg NO
3
/l and 38%. The sensitivity and the accuracy of nitrate concentration maps due to the variation of IDW parameters and the position of sampling points are analysed by Monte Carlo IDW interpolations using a Random Forest modelled map as reference spatial distribution. Compared to this reference map, the area with a concentration above 50 mg NO
3
/l in groundwater is estimated by IDW to be 46% too low for the best IDW parametrization. Overall, IDW interpolation systematically underrates the occurrence of higher range nitrate concentrations. In view of these underestimations, IDW does not appear to be a suitable regionalization method for the designation of nitrate vulnerable zones, neither when applied for a federal state as a whole nor when interpolated separately for hydrogeological regions.
Journal Article
Estimates of grassland biomass and turnover time on the Tibetan Plateau
2018
The grassland of the Tibetan Plateau forms a globally significant biome, which represents 6% of the world's grasslands and 44% of China's grasslands. However, large uncertainties remain concerning the vegetation carbon storage and turnover time in this biome. In this study, we quantified the pool size of both the aboveground and belowground biomass and turnover time of belowground biomass across the Tibetan Plateau by combining systematic measurements taken from a substantial number of surveys (i.e. 1689 sites for aboveground biomass, 174 sites for belowground biomass) with a machine learning technique (i.e. random forest, RF). Our study demonstrated that the RF model is effective tool for upscaling local biomass observations to the regional scale, and for producing continuous biomass estimates of the Tibetan Plateau. On average, the models estimated 46.57 Tg (1 Tg = 1012g) C of aboveground biomass and 363.71 Tg C of belowground biomass in the Tibetan grasslands covering an area of 1.32 × 106 km2. The turnover time of belowground biomass demonstrated large spatial heterogeneity, with a median turnover time of 4.25 years. Our results also demonstrated large differences in the biomass simulations among the major ecosystem models used for the Tibetan Plateau, largely because of inadequate model parameterization and validation. This study provides a spatially continuous measure of vegetation carbon storage and turnover time, and provides useful information for advancing ecosystem models and improving their performance.
Journal Article
Regression Forest Approaches to Gravity Wave Parameterization for Climate Projection
by
Connelly, David S.
,
Gerber, Edwin P.
in
Atmospheric gravity waves
,
Atmospheric models
,
boosted forest
2024
We train random and boosted forests, two machine learning architectures based on regression trees, to emulate a physics‐based parameterization of atmospheric gravity wave momentum transport. We compare the forests to a neural network benchmark, evaluating both offline errors and online performance when coupled to an atmospheric model under the present day climate and in 800 and 1,200 ppm CO2 global warming scenarios. Offline, the boosted forest exhibits similar skill to the neural network, while the random forest scores significantly lower. Both forest models couple stably to the atmospheric model, and control climate integrations with the boosted forest exhibit lower biases than those with the neural network. Integrations with all three data‐driven emulators successfully capture the Quasi‐Biennial Oscillation (QBO) and sudden stratospheric warmings, key modes of stratospheric variability, with the boosted forest more accurate than the random forest in replicating their statistics across our range of carbon dioxide perturbations. The boosted forest and neural network capture the sign of the QBO period response to increased CO2, though both struggle with the magnitude of this response under the more extreme 1,200 ppm scenario. To investigate the connection between performance in the control climate and the ability to generalize, we use techniques from interpretable machine learning to understand how the data‐driven methods use physical information. We leverage this understanding to develop a retraining procedure that improves the coupled performance of the boosted forest in the control climate and under the 800 ppm CO2 scenario. Plain Language Summary Parameterizations are reduced‐complexity models that estimate the effects of physical processes smaller than what can be resolved by the grid of a weather or climate model. While necessary for realistic simulations, they are a source of uncertainty in climate projections. Recently, machine learning has been used to augment or replace conventional parameterizations of atmospheric gravity waves, a type of motion by which disturbances near the Earth's surface can affect the wind higher up. We compare several machine learning approaches to the gravity wave parameterization problem. In particular, we test neural networks against random and boosted forests, which are built around flowchart‐like models called regression trees. We find that boosted forests, though not widely used for climate model parameterization, are especially successful, scoring as well as or better than neural networks on various performance metrics. We then provide proof‐of‐concept of a novel method to retrain the boosted forest so that it uses its input data more in line with the physics of the system, and show that this technique improves the forest's behavior when used together with an atmospheric model. Key Points Two kinds of regression forest emulate a gravity wave parameterization offline and online, with boosted forests outperforming random forests Relative to a neural network benchmark, the boosted forest exhibits similar online skill and ability to generalize to new climates Feature importance analysis informs a retraining procedure to improve online behavior of data‐driven parameterizations
Journal Article
Predictor Importance for Hydrological Fluxes of Global Hydrological and Land Surface Models
by
Athanasiadis, Ioannis
,
Brêda, João Paulo L. F.
,
Dijk, Albert
in
Climate
,
Climate models
,
Climate prediction
2024
Global Hydrological and Land Surface Models (GHM/LSMs) embody numerous interacting predictors and equations, complicating the understanding of primary hydrological relationships. We propose a model diagnostic approach based on Random Forest (RF) feature importance to detect the input variables that most influence simulated hydrological fluxes. We analyzed the JULES, ORCHIDEE, HTESSEL, SURFEX, and PCR‐GLOBWB models for the relative importance of precipitation, climate, soil, land cover and topographic slope as predictors of simulated average evaporation, runoff, and surface and subsurface runoff. RF models functioned as a metamodel and could reproduce GHM/LSMs outputs with a coefficient of determination (R2) over 0.85 in all cases and often considerably better. The GHM/LSMs agreed that precipitation, climate and land cover share equal importance for evaporation prediction, and mean precipitation is the most important predictor of runoff, while topographic slope and soil texture have no influence on the total variance of the water balance. However, the GHM/LSMs disagreed on which features determine surface and subsurface runoff processes, especially with regard to the relative importance of soil texture and topographic slope. Finally, the selection of soil maps was only important for target variables of which soil is a relevant predictor. We conclude that estimating feature importance is a useful diagnostic approach for model intercomparison projects. Plain Language Summary Simulations of hydrological fluxes such as evaporation and runoff at a global scale are uncertain. This happens because the models that produce global simulations are different in terms of structure, parametrization and meteorological data. So, several model intercomparison projects (MIP) have tried to identify where the hydrological fluxes estimates are most discrepant. In order to make MIPs even more useful, we are proposing an additional method focusing on understanding why the models disagree. This method consists of replacing the original global model with a random forest model and then identifying which input variables are more relevant using the feature importance functionality. More specifically, we detected how important meteorological variables, soil properties, land cover and topography are for each global model. We observed that the models agree that precipitation, climate and land cover are equally important for evaporation and that precipitation is the most important feature for estimating runoff. When partitioning runoff into quick and slow flow, we observed that the models disagree on the importance of features, especially topographic slope and soil. Key Points Detecting the predictors importance can be an additional approach for Model Intercomparison Projects Global models agree about the features importance for water balance components but disagree for surface and subsurface runoff Selecting the soil database only matters when soil is a relevant predictor, which is not the case for all models and target variables
Journal Article
Fast Segmentation and Classification of Very High Resolution Remote Sensing Data Using SLIC Superpixels
2017
Speed and accuracy are important factors when dealing with time-constraint events for disaster, risk, and crisis-management support. Object-based image analysis can be a time consuming task in extracting information from large images because most of the segmentation algorithms use the pixel-grid for the initial object representation. It would be more natural and efficient to work with perceptually meaningful entities that are derived from pixels using a low-level grouping process (superpixels). Firstly, we tested a new workflow for image segmentation of remote sensing data, starting the multiresolution segmentation (MRS, using ESP2 tool) from the superpixel level and aiming at reducing the amount of time needed to automatically partition relatively large datasets of very high resolution remote sensing data. Secondly, we examined whether a Random Forest classification based on an oversegmentation produced by a Simple Linear Iterative Clustering (SLIC) superpixel algorithm performs similarly with reference to a traditional object-based classification regarding accuracy. Tests were applied on QuickBird and WorldView-2 data with different extents, scene content complexities, and number of bands to assess how the computational time and classification accuracy are affected by these factors. The proposed segmentation approach is compared with the traditional one, starting the MRS from the pixel level, regarding geometric accuracy of the objects and the computational time. The computational time was reduced in all cases, the biggest improvement being from 5 h 35 min to 13 min, for a WorldView-2 scene with eight bands and an extent of 12.2 million pixels, while the geometric accuracy is kept similar or slightly better. SLIC superpixel-based classification had similar or better overall accuracy values when compared to MRS-based classification, but the results were obtained in a fast manner and avoiding the parameterization of the MRS. These two approaches have the potential to enhance the automation of big remote sensing data analysis and processing, especially when time is an important constraint.
Journal Article
From between-stand to within-tree variation: wood and timber quality of Norway spruce (Picea abies H. Karst) analyzed at scale using laser scanning and industrial data
by
Holopainen, Markus
,
Saikkonen, Otto
,
Pehkonen, Mika
in
Adaptiveness
,
Agricultural and Veterinary Sciences
,
Agriculture, Forestry and Fisheries
2026
Key message
Using laser scanning and industrial data, we found that over 70% of wood quality variability occurred within Norway spruce (
Picea abies
H. Karst) trees. The most important wood quality predictors were stem size, crown vigor, and growth rate inferred from laser scans. Random Forest models based on the laser-scanned features captured 25% of the industrially measured wood quality variability with 39.9% RMSE on average. The low crown plasticity of Norway spruce introduced biological constraints to laser scanning-based wood quality modeling.
Context
Wood quality models that also predict wood and timber properties in addition to size and growth variables are essential for increasing the precision of forest management and forest use, yet they remain notoriously untransferable. Laser scanning offers a powerful tool for their parameterization, but its ability to capture the within-tree variability of wood quality is still poorly understood in many species.
Aims
Our aim was to test whether multi-viewpoint laser scanning can capture within-tree gradients of wood quality in Norway spruce trees (
Picea abies
H. Karst.), thereby enabling more robust and transferable models.
Methods
We analyzed 479 mature Norway spruce trees, combining handheld and airborne laser scanning with industrial wood quality data. We modeled 18 industrially relevant variables related to log geometry, heartwood, knottiness, and timber strength IP value against laser-scanned features at stand, tree, and log levels.
Results
Most wood quality variability (73%) occurred within trees. Log-level laser features explained 25% of the variation across stands and log types in the test data, with average RMSEs of 39.9%. The most stable predictions were obtained for heartwood ring width, heartwood density, and knot percentage.
Conclusion
Overall, external crown and stem attributes captured key growth responses but failed to robustly represent most wood quality factors in Norway spruce. These results underscore biological constraints in laser scanning-based wood quality modeling depending on the species-specific adaptiveness of the crown structure to the environment.
Journal Article
Global patterns in vegetation accessible subsurface water storage emerge from spatially varying importance of individual drivers
by
Viering, Tom
,
Alessandri, Andrea
,
Hrachowitz, Markus
in
Accessibility
,
Atmosphere
,
Catchments
2024
Vegetation roots play an essential role in regulating the hydrological cycle by removing water from the subsurface and releasing it to the atmosphere. However, the present understanding of the drivers of ecosystem-scale root development and their spatial variability globally is limited. This study investigates the varying roles of climate, landscape, and vegetation on the magnitude of root zone storage capacity ( Sr) worldwide, which is defined as the maximum volume of subsurface moisture accessible to vegetation roots. To this aim, we quantified Sr and evaluated 21 possible climate, landscape, and vegetation controls for 3612 river catchments worldwide using a random forest machine learning model. Our findings reveal climate as primary, but spatially varying, driver of ecosystem scale Sr with landscape and vegetation characteristics playing a minor role. More specifically, we found the mean inter-storm duration as most dominant control of Sr globally, followed by mean temperature, mean precipitation, and mean topographic slope. While the inter-storm duration, temperature, and slope exhibit a consistent relation with Sr globally, the relation between precipitation and Sr varies spatially. Based on this spatial variability, we classified two different regimes: precipitation driven and energy limited. The precipitation-driven regime exhibits a positive relation between precipitation and Sr for precipitation of up to 3 mmd−1, above which the relation flattens and eventually becomes negative. The energy-limited regime exhibits a strictly negative relation between precipitation and Sr. Using the random forest model based on these three dominant climate variables and the landscape variable slope, we generated a global gridded dataset of Sr, which closely resembles other global datasets of root characteristics. This suggests that our parsimonious approach based on four globally available variables to estimate Sr on a global scale has the potential to be readily and easily integrated into the parameterization of Sr in global hydrological and land surface models. This may enhance the accuracy of global predictions of land–atmosphere exchange fluxes and hydrological extremes by providing a robust representation of both spatial and temporal variability in vegetation root characteristics.
Journal Article