Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
14
result(s) for
"Marginal data augmentation"
Sort by:
Partially Collapsed Gibbs Samplers
2008
Ever-increasing computational power, along with ever-more sophisticated statistical computing techniques, is making it possible to fit ever-more complex statistical models. Among the more computationally intensive methods, the Gibbs sampler is popular because of its simplicity and power to effectively generate samples from a high-dimensional probability distribution. Despite its simple implementation and description, however, the Gibbs sampler is criticized for its sometimes slow convergence, especially when it is used to fit highly structured complex models. Here we present partially collapsed Gibbs sampling strategies that improve the convergence by capitalizing on a set of functionally incompatible conditional distributions. Such incompatibility generally is avoided in the construction of a Gibbs sampler, because the resulting convergence properties are not well understood. We introduce three basic tools (marginalization, permutation, and trimming) that allow us to transform a Gibbs sampler into a partially collapsed Gibbs sampler with known stationary distribution and faster convergence.
Journal Article
Parameter Expanded Algorithms for Bayesian Latent Variable Modeling of Genetic Pleiotropy Data
2016
Motivated by genetic association studies of pleiotropy, we propose a Bayesian latent variable approach to jointly study multiple outcomes. The models studied here can incorporate both continuous and binary responses, and can account for serial and cluster correlations. We consider Bayesian estimation for the model parameters, and we develop a novel MCMC algorithm that builds upon hierarchical centering and parameter expansion techniques to efficiently sample from the posterior distribution. We evaluate the proposed method via extensive simulations and demonstrate its utility with an application to an association study of various complication outcomes related to Type 1 diabetes. This article has supplementary material online.
Journal Article
Partially Collapsed Gibbs Samplers: Illustrations and Applications
by
Park, Taeyoung
,
van Dyk, David A.
in
AECM algorithm
,
Astrophysical data analysis
,
Bayesian Computation
2009
Among the computationally intensive methods for fitting complex multilevel models, the Gibbs sampler is especially popular owing to its simplicity and power to effectively generate samples from a high-dimensional probability distribution. The Gibbs sampler, however, is often justifiably criticized for its sometimes slow convergence, especially when it is used to fit highly structured complex models. The recently proposed Partially Collapsed Gibbs (PCG) sampler offers a new strategy for improving the convergence characteristics of a Gibbs sampler. A PCG sampler achieves faster convergence by reducing the conditioning in some or all of the component draws of its parent Gibbs sampler. Although this strategy can significantly improve convergence, it must be implemented with care to be sure that the desired stationary distribution is preserved. In some cases the set of conditional distributions sampled in a PCG sampler may be functionally incompatible and permuting the order of draws can change the stationary distribution of the chain. In this article, we draw an analogy between the PCG sampler and certain efficient EM-type algorithms that helps to explain the computational advantage of PCG samplers and to suggest when they might be used in practice. We go on to illustrate the PCG samplers in three substantial examples drawn from our applied work: a multilevel spectral model commonly used in high-energy astrophysics, a piecewise-constant multivariate time series model, and a joint imputation model for nonnested data. These are all useful highly structured models that involve computational challenges that can be solved using PCG samplers. The examples illustrate not only the computation advantage of PCG samplers but also how they should be constructed to maintain the desired stationary distribution. Supplemental materials for the examples given in this article are available online.
Journal Article
The Trace Restriction: An Alternative Identification Strategy for the Bayesian Multinomial Probit Model
2012
Previous authors have made Bayesian multinomial probit models identifiable by fixing a parameter on the main diagonal of the covariance matrix. The choice of which element one fixes can influence posterior predictions. Thus, we propose restricting the trace of the covariance matrix, which we achieve without computational penalty. This permits a prior that is symmetric to permutations of the nonbase outcome categories. We find in real and simulated consumer choice datasets that the trace-restricted model is less prone to making extreme predictions. Further, the trace restriction can provide stronger identification, yielding marginal posterior distributions that are more easily interpreted.
Journal Article
MARGINAL MARKOV CHAIN MONTE CARLO METHODS
2010
Marginal Data Augmentation and Parameter-Expanded Data Augmentation are related methods for improving the convergence properties of the two-step Gibbs sampler known as the Data Augmentation sampler. These methods expand the parameter space with a so-called working parameter that is unidentifiable given the observed data but is identifiable given the so-called augmented data. Although these methods can result in enormous computational gains, their use has been somewhat limited due to the constrained framework they are constructed under and the necessary identification of a working parameter. This article proposes a new prescriptive framework that greatly expands the class of problems that can benefit from the key idea underlying these methods. In particular, we show how working parameters can automatically be introduced into any Gibbs sampler, and explore how they should be updated vis-à-vis the updating of the model parameters in order to either fully or partially marginalize them from the target distribution. A prior distribution is specified on the working parameters and the convergence properties of the Markov chain depend on this choice. Under certain conditions the optimal choice is improper and results in a non-positive recurrent joint Markov chain on the expanded parameter space. This leads to unexplored technical difficulties when one attempts to exploit the computational advantage in multi-step MCMC samplers, the very chains that might benefit most from this technology. In this article we develop strategies and theory that allow optimal marginal methods to be used in multi-step samplers. We illustrate the potential to dramatically improve the convergence properties of MCMC samplers by applying the marginal Gibbs sampler to a logistic mixed model.
Journal Article
Extreme earthquake loss assessment using spliced marginal distributions and SJC Copula based joint modeling
by
Zhao, Yu
,
Li, Yani
,
Li, Yuhong
in
Earthquake catastrophic losses
,
extreme risk assessment
,
GAN-based data augmentation
2026
Earthquake hazards, though occurring infrequently, can produce catastrophic losses with pronounced right-skewed and heavy-tailed characteristics in both economic damages and fatalities. These impacts often intensify jointly under extreme conditions, posing challenges for reliable regional catastrophe risk assessment. Using earthquake records from 1980 to 2024 in selected provinces (autonomous regions and municipalities) of China, this study develops variable weights spliced Gumbel–GPD and Weibull–GPD distributions to model the marginal behavior of extreme losses. The dependence between economic losses and casualties—particularly in the upper tail—is captured using the SJC Copula, allowing for asymmetric co-extremal behavior. Due to the limited number of high-loss historical events, we analyze the sensitivity of parameter estimates to sample size through numerical simulations. After confirming that the GAN-generated samples are statistically consistent with the original data, they are employed to strengthen the robustness of parameter estimation. Integrating the spliced marginal models with a copula-based dependence framework, this study evaluates extreme loss levels for different regions under historical seismic conditions. The resulting estimates offer quantitative support for identifying key areas requiring enhanced seismic protection and for informing regional disaster-risk management.
Journal Article
HiDEF: A Hierarchical Disaster Information Extraction Framework Based on Adversarial Augmentation and Dynamic Prompting
by
Wang, Xiaodong
,
Yang, Tengfei
,
Yang, Xiaohan
in
adversarial data augmentation
,
Architecture
,
Climate change
2026
In disaster emergency response, spatial location information embedded within social media texts holds substantial value for the rapid localization of affected areas and the implementation of precise rescue operations. Existing research predominantly employs natural language processing and deep learning technologies for geographic information extraction; however, two critical limitations persist: first, insufficient integration of textual semantic features for disaster relevance determination, resulting in inadequate correlation between extracted results and actual disaster locations; second, absence of mechanisms for identifying affected sites in multi-location contexts, thereby compromising decision support efficacy. Addressing these challenges, this study proposes a hierarchical disaster location information extraction framework that integrates semantic understanding. The framework operates through a three-tier hierarchy: data-level adversarial augmentation, semantic-level dynamic parsing, and parameter-level scale optimization. It achieves three core functionalities: (1) precise determination of disaster relevance for geographic location information; (2) identification of affected areas in multi-location contexts; (3) establishment of a logarithmic scaling relationship between LLM parameter scale and optimal prompt sample size.
Journal Article
The Art of Data Augmentation
2001
The term data augmentation refers to methods for constructing iterative optimization or sampling algorithms via the introduction of unobserved data or latent variables. For deterministic algorithms, the method was popularized in the general statistical community by the seminal article by Dempster, Laird, and Rubin on the EM algorithm for maximizing a likelihood function or, more generally, a posterior density. For stochastic algorithms, the method was popularized in the statistical literature by Tanner and Wong's Data Augmentation algorithm for posterior sampling and in the physics literature by Swendsen and Wang's algorithm for sampling from the Ising and Potts models and their generalizations; in the physics literature, the method of data augmentation is referred to as the method of auxiliary variables. Data augmentation schemes were used by Tanner and Wong to make simulation feasible and simple, while auxiliary variables were adopted by Swendsen and Wang to improve the speed of iterative simulation. In general, however, constructing data augmentation schemes that result in both simple and fast algorithms is a matter of art in that successful strategies vary greatly with the (observed-data) models being considered. After an overview of data augmentation/auxiliary variables and some recent developments in methods for constructing such efficient data augmentation schemes, we introduce an effective search strategy that combines the ideas of marginal augmentation and conditional augmentation, together with a deterministic approximation method for selecting good augmentation schemes. We then apply this strategy to three common classes of models (specifically, multivariate t, probit regression, and mixed-effects models) to obtain efficient Markov chain Monte Carlo algorithms for posterior sampling. We provide theoretical and empirical evidence that the resulting algorithms, while requiring similar programming effort, can show dramatic improvement over the Gibbs samplers commonly used for these models in practice. A key feature of all these new algorithms is that they are positive recurrent subchains of nonpositive recurrent Markov chains constructed in larger spaces.
Journal Article
Cross-Fertilizing Strategies for Better EM Mountain Climbing and DA Field Exploration: A Graphical Guide Book
2010
In recent years, a variety of extensions and refinements have been developed for data augmentation based model fitting routines. These developments aim to extend the application, improve the speed and/or simplify the implementation of data augmentation methods, such as the deterministic EM algorithm for mode finding and stochastic Gibbs sampler and other auxiliary-variable based methods for posterior sampling. In this overview article we graphically illustrate and compare a number of these extensions, all of which aim to maintain the simplicity and computation stability of their predecessors. We particularly emphasize the usefulness of identifying similarities between the deterministic and stochastic counterparts as we seek more efficient computational strategies. We also demonstrate the applicability of data augmentation methods for handling complex models with highly hierarchical structure, using a high-energy high-resolution spectral imaging model for data from satellite telescopes, such as the Chandra X-ray Observatory.
Journal Article