Catalogue Search | MBRL

A General Framework for Vecchia Approximations of Gaussian Processes

by Katzfuss, Matthias , Guinness, Joseph in Approximation , Datasets , Gaussian process

2021

Gaussian processes (GPs) are commonly used as models for functions, time series, and spatial fields, but they are computationally infeasible for large datasets. Focusing on the typical setting of modeling data as a GP plus an additive noise term, we propose a generalization of the Vecchia (J. Roy. Statist. Soc. Ser. B 50 (1988) 297–312) approach as a framework for GP approximations. We show that our general Vecchia approach contains many popular existing GP approximations as special cases, allowing for comparisons among the different methods within a unified framework. Representing the models by directed acyclic graphs, we determine the sparsity of the matrices necessary for inference, which leads to new insights regarding the computational properties. Based on these results, we propose a novel sparse general Vecchia approximation, which ensures computational feasibility for large spatial datasets but can lead to considerable improvements in approximation accuracy over Vecchia's original approach. We provide several theoretical results and conduct numerical comparisons. We conclude with guidelines for the use of Vecchia approximations in spatial statistics.

Journal Article

Share this book

Add to My Shelf

Permutation and Grouping Methods for Sharpening Gaussian Process Approximations

by Guinness, Joseph in Accuracy , Approximation , Comparative analysis

2018

Vecchia's approximate likelihood for Gaussian process parameters depends on how the observations are ordered, which has been cited as a deficiency. This article takes the alternative standpoint that the ordering can be tuned to sharpen the approximations. Indeed, the first part of the article includes a systematic study of how ordering affects the accuracy of Vecchia's approximation. We demonstrate the surprising result that random orderings can give dramatically sharper approximations than default coordinate-based orderings. Additional ordering schemes are described and analyzed numerically, including orderings capable of improving on random orderings. The second contribution of this article is a new automatic method for grouping calculations of components of the approximation. The grouping methods simultaneously improve approximation accuracy and reduce computational burden. In common settings, reordering combined with grouping reduces Kullback-Leibler divergence from the target model by more than a factor of 60 compared to ungrouped approximations with default ordering. The claims are supported by theory and numerical results with comparisons to other approximations, including tapered covariances and stochastic partial differential equations. Computational details are provided, including the use of the approximations for prediction and conditional simulation. An application to space-time satellite data is presented.

Journal Article

Share this book

Add to My Shelf

A Case Study Competition Among Methods for Analyzing Large Spatial Data

by Nychka, Douglas W. , Gerber, Florian , Guhaniyogi, Rajarshi in Agriculture , Big data , Biostatistics

2019

The Gaussian process is an indispensable tool for spatial data analysts. The onset of the “big data” era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics.

Journal Article

Share this book

Add to My Shelf

Spectral density estimation for random fields via periodic embeddings

by GUINNESS, JOSEPH in Computer simulation , Covariance , Density

2019

We introduce methods for estimating the spectral density of a random field on a d-dimensional lattice from incomplete gridded data. Data are iteratively imputed onto an expanded lattice according to a model with a periodic covariance function. The imputations are convenient computationally, in that circulant embedding and preconditioned conjugate gradient methods can produce imputations in O(n log n) time and O(n) memory. However, these so-called periodic imputations are motivated mainly by their ability to produce accurate spectral density estimates. In addition, we introduce a parametric filtering method that is designed to reduce periodogram smoothing bias. The paper contains theoretical results on properties of the imputed-data periodogram and numerical and simulation studies comparing the performance of the proposed methods to existing approaches in a number of scenarios. We present an application to a gridded satellite surface temperature dataset with missing values.

Journal Article

Share this book

Add to My Shelf

Discussion on Competition for Spatial Statistics for Large Datasets

by Guinness, Joseph in Agriculture , Biostatistics , Health Sciences

2021

Journal Article

Share this book

Add to My Shelf

Circulant Embedding of Approximate Covariances for Inference From Gaussian Data on Large Lattices

by Guinness, Joseph , Fuentes, Montserrat in Algorithms , Approximation , Conditional simulation; Fast Fourier transform; Gaussian process; Kriging

2017

Recently proposed computationally efficient Markov chain Monte Carlo (MCMC) and Monte Carlo expectation-maximization (EM) methods for estimating covariance parameters from lattice data rely on successive imputations of values on an embedding lattice that is at least two times larger in each dimension. These methods can be considered exact in some sense, but we demonstrate that using such a large number of imputed values leads to slowly converging Markov chains and EM algorithms. We propose instead the use of a discrete spectral approximation to allow for the implementation of these methods on smaller embedding lattices. While our methods are approximate, our examples indicate that the error introduced by this approximation is small compared to the Monte Carlo errors present in long Markov chains or many iterations of Monte Carlo EM algorithms. Our results are demonstrated in simulation studies, as well as in numerical studies that explore both increasing domain and fixed domain asymptotics. We compare the exact methods to our approximate methods on a large satellite dataset, and show that the approximate methods are also faster to compute, especially when the aliased spectral density is modeled directly. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Compression and Conditional Emulation of Climate Model Output

by Hammerling, Dorit , Guinness, Joseph in Algorithms , Applications and Case Studies , Climate

2018

Numerical climate model simulations run at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the data is not sustainable, and thus it is important to develop methods for representing the full datasets by smaller compressed versions. We propose a statistical compression and decompression algorithm based on storing a set of summary statistics as well as a statistical model describing the conditional distribution of the full dataset given the summary statistics. We decompress the data by computing conditional expectations and conditional simulations from the model given the summary statistics. Conditional expectations represent our best estimate of the original data but are subject to oversmoothing in space and time. Conditional simulations introduce realistic small-scale noise so that the decompressed fields are neither too smooth nor too rough compared with the original data. Considerable attention is paid to accurately modeling the original dataset-1 year of daily mean temperature data-particularly with regard to the inherent spatial nonstationarity in global fields, and to determining the statistics to be stored, so that the variation in the original data can be closely captured, while allowing for fast decompression and conditional emulation on modest computers. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Baseline drift estimation for air quality data using quantile trend filtering

by Brantley, Halley L. , Guinness, Joseph , Chi, Eric C.

2020

Journal Article

Share this book

Add to My Shelf

Proposed Method for Statistical Analysis of On-Farm Single Strip Treatment Trials

by Guinness, Joseph , Cho, Jason B. , Kharel, Tulsi in Agriculture , agronomy , Corn

2021

On-farm experimentation (OFE) allows farmers to improve crop management over time. The randomized complete blocks design (RCBD) with field-length strips as individual plots is commonly used, but it requires advanced planning and has limited statistical power when only three to four replications are implemented. Harvester-mounted yield monitor systems generate high resolution data (1-s intervals), allowing for development of more meaningful, easily implementable OFE designs. Here we explored statistical frameworks to quantify the effect of a single treatment strip using georeferenced yield monitor data and yield stability-based management zones. Nitrogen-rich single treatment strips per field were implemented in 2018 and 2019 on three fields each on two farms in central New York. Least squares and generalized least squares approaches were evaluated for estimating treatment effects (assuming independence) versus spatial covariance for estimating standard errors. The analysis showed that estimates of treatment effects using the generalized least squares approach are unstable due to over-emphasis on certain data points, while assuming independence leads to underestimation of standard errors. We concluded that the least squares approach should be used to estimate treatment effects, while spatial covariance should be assumed when estimating standard errors for evaluation of zone-based treatment effects using the single-strip spatial evaluation approach.

Journal Article

Share this book

Add to My Shelf

An evolutionary spectrum approach to incorporate large-scale geographical descriptors on global processes

by Guinness, Joseph , Castruccio, Stefano in Analysis of covariance , Axial symmetry , Climate output compression

2017

We introduce a non-stationary spatiotemporal model for gridded data on the sphere. The model specifies a computationally convenient covariance structure that depends on heterogeneous geography. Widely used statistical models on a spherical domain are non-stationary for different latitudes, but stationary at the same latitude (axial symmetry). This assumption has been acknowledged to be too restrictive for quantities such as surface temperature, whose statistical behaviour is influenced by large-scale geographical descriptors such as land and ocean. We propose an evolutionary spectrum approach that can account for different regimes across the Earth's geography and results in a more general and flexible class of models that vastly outperforms axially symmetric models and captures longitudinal patterns that would otherwise be assumed constant. The model can be estimated with a multistep conditional likelihood approximation that preserves the non-stationary features while allowing for easily distributed computations: we show how the model can be fitted to more than 20 million data points in less than 1 day on a state of the art workstation. The resulting estimates from the statistical model can be regarded as a synthetic description (i.e. a compression) of the space-time characteristics of an entire initial condition ensemble.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter