Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
LanguageLanguage
-
SubjectSubject
-
Item TypeItem Type
-
DisciplineDiscipline
-
YearFrom:-To:
-
More FiltersMore FiltersIs Peer Reviewed
Done
Filters
Reset
96
result(s) for
"time series data synthesis"
Sort by:
SPOT: Testing Stream Processing Programs with Symbolic Execution and Stream Synthesizing
2021
Adoption of distributed stream processing (DSP) systems such as Apache Flink in real-time big data processing is increasing. However, DSP programs are prone to be buggy, especially when one programmer neglects some DSP features (e.g., source data reordering), which motivates development of approaches for testing and verification. In this paper, we focus on the test data generation problem for DSP programs. Currently, there is a lack of an approach that generates test data for DSP programs with both high path coverage and covering different stream reordering situations. We present a novel solution, SPOT (i.e., Stream Processing Program Test), to achieve these two goals simultaneously. At first, SPOT generates a set of individual test data representing each path of one DSP program through symbolic execution. Then, SPOT composes these independent data into various time series data (a.k.a, stream) in diverse reordering. Finally, we can perform a test by feeding the DSP program with these streams continuously. To automatically support symbolic analysis, we also developed JPF-Flink, a JPF (i.e., Java Pathfinder) extension to coordinate the execution of Flink programs. We present four case studies to illustrate that: (1) SPOT can support symbolic analysis for the commonly used DSP operators; (2) test data generated by SPOT can more efficiently achieve high JDU (i.e., Joint Dataflow and UDF) path coverage than two recent DSP testing approaches; (3) test data generated by SPOT can more easily trigger software failure when comparing with those two DSP testing approaches; and (4) the data randomly generated by those two test techniques are highly skewed in terms of stream reordering, which is measured by the entropy metric. In comparison, it is even for test data from SPOT.
Journal Article
A practical guide to selecting models for exploration, inference, and prediction in ecology
by
Adler, Peter B.
,
Ellner, Stephen P.
,
Hooker, Giles
in
Best practice
,
butterflies
,
Concepts & Synthesis
2021
Selecting among competing statistical models is a core challenge in science. However, the many possible approaches and techniques for model selection, and the conflicting recommendations for their use, can be confusing. We contend that much confusion surrounding statistical model selection results from failing to first clearly specify the purpose of the analysis. We argue that there are three distinct goals for statistical modeling in ecology: data exploration, inference, and prediction. Once the modeling goal is clearly articulated, an appropriate model selection procedure is easier to identify. We review model selection approaches and highlight their strengths and weaknesses relative to each of the three modeling goals. We then present examples of modeling for exploration, inference, and prediction using a time series of butterfly population counts. These show how a model selection approach flows naturally from the modeling goal, leading to different models selected for different purposes, even with exactly the same data set. This review illustrates best practices for ecologists and should serve as a reminder that statistical recipes cannot substitute for critical thinking or for the use of independent data to test hypotheses and validate predictions.
Journal Article
The intrinsic predictability of ecological time series and its potential to guide forecasting
by
Brose, Ulrich
,
Williams, Richard
,
Ward, Colette
in
Complexity
,
Computer simulation
,
CONCEPTS & SYNTHESIS
2019
Successfully predicting the future states of systems that are complex, stochastic, and potentially chaotic is a major challenge. Model forecasting error (FE) is the usual measure of success; however model predictions provide no insights into the potential for improvement. In short, the realized predictability of a specific model is uninformative about whether the system is inherently predictable or whether the chosen model is a poor match for the system and our observations thereof. Ideally, model proficiency would be judged with respect to the systems' intrinsic predictability, the highest achievable predictability given the degree to which system dynamics are the result of deterministic vs. stochastic processes. Intrinsic predictability may be quantified with permutation entropy (PE), a model-free, information-theoretic measure of the complexity of a time series. By means of simulations, we show that a correlation exists between estimated PE and FE and show how stochasticity, process error, and chaotic dynamics affect the relationship. This relationship is verified for a data set of 461 empirical ecological time series. We show how deviations from the expected PE–FE relationship are related to covariates of data quality and the nonlinearity of ecological dynamics. These results demonstrate a theoretically grounded basis for a model-free evaluation of a system's intrinsic predictability. Identifying the gap between the intrinsic and realized predictability of time series will enable researchers to understand whether forecasting proficiency is limited by the quality and quantity of their data or the ability of the chosen forecasting model to explain the data. Intrinsic predictability also provides a model-free baseline of forecasting proficiency against which modeling efforts can be evaluated.
Journal Article
The basis function approach for modeling autocorrelation in ecological data
by
Kay, Shannon L.
,
Buderman, Frances E.
,
Hooten, Mevin B.
in
Autocorrelation
,
Basis functions
,
Bayesian model
2017
Analyzing ecological data often requires modeling the autocorrelation created by spatial and temporal processes. Many seemingly disparate statistical methods used to account for autocorrelation can be expressed as regression models that include basis functions. Basis functions also enable ecologists to modify a wide range of existing ecological models in order to account for autocorrelation, which can improve inference and predictive accuracy. Furthermore, understanding the properties of basis functions is essential for evaluating the fit of spatial or time-series models, detecting a hidden form of collinearity, and analyzing large data sets. We present important concepts and properties related to basis functions and illustrate several tools and techniques ecologists can use when modeling autocorrelation in ecological data.
Journal Article
PVS-GEN: Systematic Approach for Universal Synthetic Data Generation Involving Parameterization, Verification, and Segmentation
2024
Synthetic data generation addresses the challenges of obtaining extensive empirical datasets, offering benefits such as cost-effectiveness, time efficiency, and robust model development. Nonetheless, synthetic data-generation methodologies still encounter significant difficulties, including a lack of standardized metrics for modeling different data types and comparing generated results. This study introduces PVS-GEN, an automated, general-purpose process for synthetic data generation and verification. The PVS-GEN method parameterizes time-series data with minimal human intervention and verifies model construction using a specific metric derived from extracted parameters. For complex data, the process iteratively segments the empirical dataset until an extracted parameter can reproduce synthetic data that reflects the empirical characteristics, irrespective of the sensor data type. Moreover, we introduce the PoR metric to quantify the quality of the generated data by evaluating its time-series characteristics. Consequently, the proposed method can automatically generate diverse time-series data that covers a wide range of sensor types. We compared PVS-GEN with existing synthetic data-generation methodologies, and PVS-GEN demonstrated a superior performance. It generated data with a similarity of up to 37.1% across multiple data types and by 19.6% on average using the proposed metric, irrespective of the data type.
Journal Article
Assessment of Annual Composite Images Obtained by Google Earth Engine for Urban Areas Mapping Using Random Forest
2021
Urban areas represent the primary source region of greenhouse gas emissions. Mapping urban areas is essential for understanding land cover change, carbon cycles, and climate change (urban areas also refer to impervious surfaces, i.e., artificial cover and structures). Remote sensing has greatly advanced urban areas mapping over the last several decades. At present, we have entered the era of big data. Long time series of satellite data such as Landsat and high-performance computing platforms such as Google Earth Engine (GEE) offer new opportunities to map urban areas. The objective of this research was to determine how annual time series images from Landsat 8 Operational Land Imager (OLI) can effectively be composed to map urban areas in three cities in China in support of GEE. Three reducer functions, ee.Reducer.min(), ee.Reducer.median(), and ee.Reducer.max() provided by GEE, were selected to construct four schemes to synthesize the annual intensive time series Landsat 8 OLI data for three cities in China. Then, urban areas were mapped based on the random forest algorithm and the accuracy was evaluated in detail. The results show that (1) the quality of annual composite images was improved significantly, particularly in reducing the impact of cloud and cloud shadows, and (2) the annual composite images obtained by the combination of multiple reducer functions had better performance than that obtained by a single reducer function. Further, the overall accuracy of urban areas mapping with the combination of multiple reducer functions exceeded 90% in all three cities in China. In summary, a suitable combination of reducer functions for synthesizing annual time series images can enhance data quality and ensure differences between characteristics and higher precision for urban areas mapping.
Journal Article
Temporal changes in taxon abundances are positively correlated but poorly predicted at the global scale
by
Dolezal, Aleksandra J.
,
Fan, Sophia
,
Black, Emily N.
in
Abundance
,
Accuracy
,
Archives & records
2025
Linking changes in taxon abundance to biotic and abiotic drivers over space and time is critical for understanding biodiversity responses to global change. Furthermore, deciphering temporal trends in relationships among taxa, including correlated abundance changes (e.g. synchrony), can facilitate predictions of future shifts. However, what drives these correlated changes over large scales are complex and understudied, impeding our ability to predict shifts in ecological communities. We used two global datasets containing abundance time‐series (BioTIME) and biotic interactions (GloBI) to quantify correlations among yearly changes in the abundance of pairs of geographically proximal taxa (genus pairs). We used a hierarchical linear model and cross‐validation to test the overall magnitude, direction and predictive accuracy of correlated abundance changes among genera at the global scale. We then tested how correlated abundance changes are influenced by latitude, biotic interactions, disturbance and time‐series length while accounting for differences among studies and taxonomic categories. We found that abundance changes between genus pairs are, on average, positively correlated over time, suggesting synchrony at the global scale. Furthermore, we found that abundance changes are more positively correlated with longer time‐series, with known biotic interactions and in disturbed habitats. However, the magnitude of these ecological drivers alone are relatively weak, with model predictive accuracy increasing approximately two‐fold with the inclusion of study identity and taxonomic category. This suggests that while patterns in abundance correlations are shaped by ecological drivers at the global scale, these drivers have limited utility in forecasting changes in abundances among unknown taxa or in the context of future global change. Our study indicates that including taxonomy and known ecological drivers can improve predictions of biodiversity loss over large spatial and temporal scales, but also that idiosyncrasies of different studies continue to weaken our ability to make global predictions.
Journal Article
Herbaceous perennial plants with short generation time have stronger responses to climate anomalies than those with longer generation time
2021
There is an urgent need to synthesize the state of our knowledge on plant responses to climate. The availability of open-access data provide opportunities to examine quantitative generalizations regarding which biomes and species are most responsive to climate drivers. Here, we synthesize time series of structured population models from 162 populations of 62 plants, mostly herbaceous species from temperate biomes, to link plant population growth rates (
λ
) to precipitation and temperature drivers. We expect: (1) more pronounced demographic responses to precipitation than temperature, especially in arid biomes; and (2) a higher climate sensitivity in short-lived rather than long-lived species. We find that precipitation anomalies have a nearly three-fold larger effect on
λ
than temperature. Species with shorter generation time have much stronger absolute responses to climate anomalies. We conclude that key species-level traits can predict plant population responses to climate, and discuss the relevance of this generalization for conservation planning.
Plant population growth rate is sensitive to annual temperature and precipitation anomalies. Here the authors analyse time series of population projection models from multiple biomes, finding a relationship between short generation times and strong demographic responses to climate—particularly precipitation—anomalies.
Journal Article
GAN-Based Generation of Synthetic Data for Vehicle Driving Events
by
Hernández-Álvarez, Myriam
,
Valdivieso Caraguay, Ángel Leonardo
,
Sanchez-Gordon, Sandra
in
Accuracy
,
Algorithms
,
Computational linguistics
2024
Developing solutions to reduce traffic accidents requires experimentation and much data. However, due to confidentiality issues, not all datasets used in previous research are publicly available, and those that are available may be insufficient for research. Building datasets with real data is costly. Given this reality, this paper proposes a procedure to generate synthetic data sequences of driving events using the Time series GAN (TimeGAN) and Real-world time series (RTSGAN) frameworks. First, a 15-feature driving event dataset is constructed with real data, which forms the basis for generating datasets using the two mentioned frameworks. The generated datasets are evaluated using the qualitative metrics PCA and T-SNE, as well as the discriminative and predictive score quantitative metrics defined in TimeGAN. The generated synthetic data are then used in an unsupervised algorithm to identify clusters representing vehicle crash risk levels. Next, the generated data are used in a supervised classification algorithm to predict risk level categories. Comparison results between the data generated by TimeGAN and RTSGAN show that the data generated by RTSGAN achieve better scores than the the data generated with TimeGAN. On the other hand, we demonstrate that the use of datasets trained with synthetic data to train a supervised classification model for predicting the level of accident risk can obtain accuracy comparable to that of models that use datasets with only real data in their training, proving the usefulness of the generated data.
Journal Article
Hybrid datasets
by
Bulleri, Fabio
,
Ravaglioli, Chiara
,
Benedetti-Cecchi, Lisandro
in
biogeography
,
Biosphere
,
case studies
2018
Understanding how increasing human domination of the biosphere affects life on earth is a critical research challenge. This task is facilitated by the increasing availability of open-source data repositories, which allow ecologists to address scientific questions at unprecedented spatial and temporal scales. Large datasets are mostly observational, so they may have limited ability to uncover causal relations among variables. Experiments are better suited at attributing causation, but they are often limited in scope. We propose hybrid datasets, resulting from the integration of observational with experimental data, as an approach to leverage the scope and ability to attribute causality in ecological studies. We show how the analysis of hybrid datasets with emerging techniques in time series analysis (Convergent Cross-mapping) and macroecology (Joint Species Distribution Models) can generate novel insights into causal effects of abiotic and biotic processes that would be difficult to achieve otherwise. We illustrate these principles with two case studies in marine ecosystems and discuss the potential to generalize across environments, species and ecological processes. If used wisely, the analysis of hybrid datasets may become the standard approach for research goals that seek causal explanations for large-scale ecological phenomena.
Journal Article