Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
199
result(s) for
"gap-filling"
Sort by:
BHPMF – a hierarchical Bayesian approach to gap‐filling and trait prediction for macroecology and functional biogeography
by
Wright, Ian J.
,
Wirth, Christian B.
,
Dickie, John
in
artificial intelligence
,
Bayesian analysis
,
Bayesian hierarchical model
2015
AIM: Functional traits of organisms are key to understanding and predicting biodiversity and ecological change, which motivates continuous collection of traits and their integration into global databases. Such trait matrices are inherently sparse, severely limiting their usefulness for further analyses. On the other hand, traits are characterized by the phylogenetic trait signal, trait–trait correlations and environmental constraints, all of which provide information that could be used to statistically fill gaps. We propose the application of probabilistic models which, for the first time, utilize all three characteristics to fill gaps in trait databases and predict trait values at larger spatial scales. INNOVATION: For this purpose we introduce BHPMF, a hierarchical Bayesian extension of probabilistic matrix factorization (PMF). PMF is a machine learning technique which exploits the correlation structure of sparse matrices to impute missing entries. BHPMF additionally utilizes the taxonomic hierarchy for trait prediction and provides uncertainty estimates for each imputation. In combination with multiple regression against environmental information, BHPMF allows for extrapolation from point measurements to larger spatial scales. We demonstrate the applicability of BHPMF in ecological contexts, using different plant functional trait datasets, also comparing results to taking the species mean and PMF. MAIN CONCLUSIONS: Sensitivity analyses validate the robustness and accuracy of BHPMF: our method captures the correlation structure of the trait matrix as well as the phylogenetic trait signal – also for extremely sparse trait matrices – and provides a robust measure of confidence in prediction accuracy for each missing entry. The combination of BHPMF with environmental constraints provides a promising concept to extrapolate traits beyond sampled regions, accounting for intraspecific trait variability. We conclude that BHPMF and its derivatives have a high potential to support future trait‐based research in macroecology and functional biogeography.
Journal Article
Deep Learning‐Based Approach for Enhancing Streamflow Prediction in Watersheds With Aggregated and Intermittent Observations
2025
Accurate daily streamflow estimates are crucial for water resources management. Yet, many regions lack high‐temporal‐resolution data due to limited monitoring infrastructure, often relying on monthly aggregates or intermittent observations. Predicting streamflow in these sparsely sampled watersheds remains challenging. This study proposes a deep learning‐based approach using Long Short‐Term Memory, leveraging its inherent advantages in learning long‐term dependencies within hydrological variables and processes to enhance streamflow predictions in sparsely sampled watersheds. The approach was evaluated for simulating daily flow patterns from monthly aggregated and monthly or weekly intermittent observations in two contrasting hydrological settings: near‐natural and human‐influenced watersheds. Results showed that the proposed approach reliably predicts daily flows from monthly aggregates with a median Nash‐Sutcliffe efficiency (NSE) of 0.61 for near‐natural and 0.48 for human‐influenced watersheds. The proposed approach performed even better for daily flow predictions from monthly or weekly intermittent observation, achieving a median NSE of 0.70 and 0.55 for near‐natural and human‐influenced watersheds, respectively. The proposed approach remained robust across different seasons and hydrological regimes, with a median percentage bias of ±5%, except in arid regions. Moreover, data sensitivity analysis indicated that data from wet seasons were crucial for improving model predictions and that weekly data could yield results comparable to daily observations. Overall, this study demonstrates that the deep learning‐based approach offers a robust and accurate representation of daily streamflow patterns from aggregated or intermittent observations, providing valuable hydrological insights and promising solutions for improving water resource management in regions with limited monitoring infrastructures. Plain Language Summary In regions where streamflow observations are available only at regular or irregular weekly to monthly intervals, converting these sparse observations into daily data is crucial for water resources management applications. This study presents a new deep learning‐based approach for estimating daily streamflow from monthly aggregated or intermittent monthly and weekly observations. We applied this method to two types of watersheds: those minimally affected by human activities and those with human influences. The results showed that this approach reliably predicts daily streamflow patterns from both monthly aggregated and intermittent monthly or weekly observations in both watershed types. Notably, we found that weekly observations provide predictions almost as accurate as daily ones, indicating that more frequent data collection may not always be necessary. Additionally, observations collected during the wet season were essential for improving model accuracy. This deep learning‐based method effectively captures key streamflow patterns across different seasons and conditions, making it a valuable tool for managing water resources in regions with sparse or irregular data. Key Points We proposed a deep learning‐based approach for streamflow simulation using aggregated and intermittent observations across varied hydrological settings The proposed approach predicted well in diverse flow regimes for near‐natural and human‐influenced watersheds, except in arid zones The proposed approach demonstrated adaptability to watershed heterogeneity and anthropogenic influences without additional data
Journal Article
Using Window Regression to Gap-Fill Landsat ETM+ Post SLC-Off Data
2018
The continued development of algorithms using multitemporal Landsat data creates opportunities to develop and adapt imputation algorithms to improve the quality of that data as part of preprocessing. One example is de-striping Enhanced Thematic Mapper Plus (ETM+, Landsat 7) images acquired after the Scan Line Corrector failure in 2003. In this study, we apply window regression, an algorithm that was originally designed to impute low-quality Moderate Resolution Imaging Spectroradiometer (MODIS) data, to Landsat Analysis Ready Data from 2014–2016. We mask Operational Land Imager (OLI; Landsat 8) image stacks from five study areas with corresponding ETM+ missing data layers, using these modified OLI stacks as inputs. We explored the algorithm’s parameter space, particularly window size in the spatial and temporal dimensions. Window regression yielded the best accuracy (and moderately long computation time) with a large spatial radius (a 7 × 7 pixel window) and a moderate temporal radius (here, five layers). In this case, root mean square error for deviations from the observed reflectance ranged from 3.7–7.6% over all study areas, depending on the band. Second-order response surface analysis suggested that a 15 × 15 pixel window, in conjunction with a 9-layer temporal window, may produce the best accuracy. Compared to the neighborhood similar pixel interpolator gap-filling algorithm, window regression yielded slightly better accuracy on average. Because it relies on no ancillary data, window regression may be used to conveniently preprocess stacks for other data-intensive algorithms.
Journal Article
Mapping of CO2 at high spatiotemporal resolution using satellite observations: Global distributions from OCO-2
by
Hammerling, Dorit M.
,
Michalak, Anna M.
,
Kawa, S. Randolph
in
Carbon
,
Carbon cycle
,
Carbon dioxide
2012
Satellite observations of CO2 offer new opportunities to improve our understanding of the global carbon cycle. Using such observations to infer global maps of atmospheric CO2 and their associated uncertainties can provide key information about the distribution and dynamic behavior of CO2, through comparison to atmospheric CO2 distributions predicted from biospheric, oceanic, or fossil fuel flux emissions estimates coupled with atmospheric transport models. Ideally, these maps should be at temporal resolutions that are short enough to represent and capture the synoptic dynamics of atmospheric CO2. This study presents a geostatistical method that accomplishes this goal. The method can extract information about the spatial covariance structure of the CO2 field from the available CO2 retrievals, yields full coverage (Level 3) maps at high spatial resolutions, and provides estimates of the uncertainties associated with these maps. The method does not require information about CO2 fluxes or atmospheric transport, such that the Level 3 maps are informed entirely by available retrievals. The approach is assessed by investigating its performance using synthetic OCO‐2 data generated from the PCTM/GEOS‐4/CASA‐GFED model, for time periods ranging from 1 to 16 days and a target spatial resolution of 1° latitude × 1.25° longitude. Results show that global CO2 fields from OCO‐2 observations can be predicted well at surprisingly high temporal resolutions. Even one‐day Level 3 maps reproduce the large‐scale features of the atmospheric CO2 distribution, and yield realistic uncertainty bounds. Temporal resolutions of two to four days result in the best performance for a wide range of investigated scenarios, providing maps at an order of magnitude higher temporal resolution relative to the monthly or seasonal Level 3 maps typically reported in the literature. Key Points High spatio‐temporal resolution mapping of remotely‐sensed CO2 possible A priori CO2 flux or transport information not required to create global maps New possibilities for probabilistic comparison with carbon cycle models
Journal Article
Evaluating Combinations of Temporally Aggregated Sentinel-1, Sentinel-2 and Landsat 8 for Land Cover Mapping with Google Earth Engine
2019
Land cover mapping of large areas is challenging due to the significant volume of satellite data to acquire and process, as well as the lack of spatial continuity due to cloud cover. Temporal aggregation—the use of metrics (i.e., mean or median) derived from satellite data over a period of time—is an approach that benefits from recent increases in the frequency of free satellite data acquisition and cloud-computing power. This enables the efficient use of multi-temporal data and the exploitation of cloud-gap filling techniques for land cover mapping. Here, we provide the first formal comparison of the accuracy between land cover maps created with temporal aggregation of Sentinel-1 (S1), Sentinel-2 (S2), and Landsat-8 (L8) data from one-year and test whether this method matches the accuracy of traditional approaches. Thirty-two datasets were created for Wales by applying automated cloud-masking and temporally aggregating data over different time intervals, using Google Earth Engine. Manually processed S2 data was used for comparison using a traditional two-date composite approach. Supervised classifications were created, and their accuracy was assessed using field-based data. Temporal aggregation only matched the accuracy of the traditional two-date composite approach (77.9%) when an optimal combination of optical and radar data was used (76.5%). Combined datasets (S1, S2 or S1, S2, and L8) outperformed single-sensor datasets, while datasets based on spectral indices obtained the lowest levels of accuracy. The analysis of cloud cover showed that to ensure at least one cloud-free pixel per time interval, a maximum of two intervals per year for temporal aggregation were possible with L8, while three or four intervals could be used for S2. This study demonstrates that temporal aggregation is a promising tool for integrating large amounts of data in an efficient way and that it can compensate for the lower quality of automatic image selection and cloud masking. It also shows that combining data from different sensors can improve classification accuracy. However, this study highlights the need for identifying optimal combinations of satellite data and aggregation parameters in order to match the accuracy of manually selected and processed image composites.
Journal Article
Evaluating Machine Learning and Geostatistical Methods for Spatial Gap-Filling of Monthly ESA CCI Soil Moisture in China
2021
Obtaining large-scale, long-term, and spatial continuous soil moisture (SM) data is crucial for climate change, hydrology, and water resource management, etc. ESA CCI SM is such a large-scale and long-term SM (longer than 40 years until now). However, there exist data gaps, especially for the area of China, due to the limitations in remote sensing of SM such as complex topography, human-induced radio frequency interference (RFI), and vegetation disturbances, etc. The data gaps make the CCI SM data cannot achieve spatial continuity, which entails the study of gap-filling methods. In order to develop suitable methods to fill the gaps of CCI SM in the whole area of China, we compared typical Machine Learning (ML) methods, including Random Forest method (RF), Feedforward Neural Network method (FNN), and Generalized Linear Model (GLM) with a geostatistical method, i.e., Ordinary Kriging (OK) in this study. More than 30 years of passive–active combined CCI SM from 1982 to 2018 and other biophysical variables such as Normalized Difference Vegetation Index (NDVI), precipitation, air temperature, Digital Elevation Model (DEM), soil type, and in situ SM from International Soil Moisture Network (ISMN) were utilized in this study. Results indicated that: (1) the data gap of CCI SM is frequent in China, which is found not only in cold seasons and areas but also in warm seasons and areas. The ratio of gap pixel numbers to the whole pixel numbers can be greater than 80%, and its average is around 40%. (2) ML methods can fill the gaps of CCI SM all up. Among the ML methods, RF had the best performance in fitting the relationship between CCI SM and biophysical variables. (3) Over simulated gap areas, RF had a comparable performance with OK, and they outperformed the FNN and GLM methods greatly. (4) Over in situ SM networks, RF achieved better performance than the OK method. (5) We also explored various strategies for gap-filling CCI SM. Results demonstrated that the strategy of constructing a monthly model with one RF for simulating monthly average SM and another RF for simulating monthly SM disturbance achieved the best performance. Such strategy combining with the ML method such as the RF is suggested in this study for filling the gaps of CCI SM in China.
Journal Article
Development of Hourly Resolution Air Temperature Across Titicaca Lake on Auxiliary ERA5 Variables and Machine Learning-Based Gap-Filling
by
Cuentas Toledo, Osmar
,
Satgé, Frederic
,
Pacheco Mollinedo, Paula
in
air temperature
,
Algorithms
,
Climate change
2025
This article presents an innovative procedure that combines advanced quality control (QC) methods with machine learning (ML) techniques to produce reliable, continuous, high-resolution meteorological data. The approach was applied to hourly air temperature records from six automatic weather stations located around Lake Titicaca in the Altiplano region of South America. The raw dataset contained time gaps, inconsistencies, and outliers. To address these, the QC stage employed Interquartile Range, Biweight, and Local Outlier Factor (LOF) statistics, resulting in a clean dataset. Two gap-filling methods were implemented: a spatial approach using time series from nearby stations and a temporal approach based on each station’s time series and selected variables from the ERA5-Land reanalysis. Several ML models were also employed in this process: Random Forest (RF), Support Vector Machine (SVM), Stacking (STACK), and AdaBoost (ADA). Model performance was evaluated on a validation subset (30% of station data). The RF model achieved the best results, with R2 values up to 0.9 and Root Mean Square Error (RMSE) below 1.5 °C. The spatial approach performed best when stations were strongly correlated, while the temporal approach was more suitable for locations with low inter-station correlation and high local variability. Overall, the procedure substantially improved data reliability and completeness, and it can be extended to other meteorological variables.
Journal Article
Generation and Validation of the iKp1289 Metabolic Model for Klebsiella pneumoniae KPPR1
by
Rotman, Ella
,
Lathem, Wyndham W.
,
Hauser, Alan R.
in
bacteria
,
Biolog
,
Carbohydrate Metabolism
2017
Klebsiella pneumoniae has a reputation for causing a wide range of infectious conditions, with numerous highly virulent and antibiotic-resistant strains. Metabolic models have the potential to provide insights into the growth behavior, nutrient requirements, essential genes, and candidate drug targets in these strains. Here we develop a metabolic model for KPPR1, a highly virulent strain of K. pneumoniae. We apply a combination of Biolog phenotype data and fitness data to validate and refine our KPPR1 model. The final model displays a predictive accuracy of 75% in identifying potential carbon and nitrogen sources for K. pneumoniae and of 99% in predicting nonessential genes in rich media. We demonstrate how this model is useful in studying the differences in the metabolic capabilities of the low-virulence MGH 78578 strain and the highly virulent KPPR1 strain. For example, we demonstrate that these strains differ in carbohydrate metabolism, including the ability to metabolize dulcitol as a primary carbon source. Our model makes numerous other predictions for follow-up verification and analysis.
Journal Article
A Review of Reconstructing Remotely Sensed Land Surface Temperature under Cloudy Conditions
2021
Land surface temperature (LST) is an important environmental parameter in climate change, urban heat islands, drought, public health, and other fields. Thermal infrared (TIR) remote sensing is the main method used to obtain LST information over large spatial scales. However, cloud cover results in many data gaps in remotely sensed LST datasets, greatly limiting their practical applications. Many studies have sought to fill these data gaps and reconstruct cloud-free LST datasets over the last few decades. This paper reviews the progress of LST reconstruction research. A bibliometric analysis is conducted to provide a brief overview of the papers published in this field. The existing reconstruction algorithms can be grouped into five categories: spatial gap-filling methods, temporal gap-filling methods, spatiotemporal gap-filling methods, multi-source fusion-based gap-filling methods, and surface energy balance-based gap-filling methods. The principles, advantages, and limitations of these methods are described and discussed. The applications of these methods are also outlined. In addition, the validation of filled LST values’ cloudy pixels is an important concern in LST reconstruction. The different validation methods applied for reconstructed LST datasets are also reviewed herein. Finally, prospects for future developments in LST reconstruction are provided.
Journal Article
Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain
by
García-Marín, Amanda Penélope
,
Bellido-Jiménez, Juan Antonio
,
Gualda, Javier Estévez
in
Algorithms
,
Automation
,
Bayesian optimization
2021
The presence of missing data in hydrometeorological datasets is a common problem, usually due to sensor malfunction, deficiencies in records storage and transmission, or other recovery procedures issues. These missing values are the primary source of problems when analyzing and modeling their spatial and temporal variability. Thus, accurate gap-filling techniques for rainfall time series are necessary to have complete datasets, which is crucial in studying climate change evolution. In this work, several machine learning models have been assessed to gap-fill rainfall data, using different approaches and locations in the semiarid region of Andalusia (Southern Spain). Based on the obtained results, the use of neighbor data, located within a 50 km radius, highly outperformed the rest of the assessed approaches, with RMSE (root mean squared error) values up to 1.246 mm/day, MBE (mean bias error) values up to −0.001 mm/day, and R2 values up to 0.898. Besides, inland area results outperformed coastal area in most locations, arising the efficiency effects based on the distance to the sea (up to an improvement of 63.89% in terms of RMSE). Finally, machine learning (ML) models (especially MLP (multilayer perceptron)) notably outperformed simple linear regression estimations in the coastal sites, whereas in inland locations, the improvements were not such significant.
Journal Article