Catalogue Search | MBRL

Scalable importance tempering and Bayesian variable selection

by Zanella, Giacomo , Roberts, Gareth in Algorithms , Bayesian analysis , Bayesian theory

2019

We propose a Monte Carlo algorithm to sample from high dimensional probability distributions that combines Markov chain Monte Carlo and importance sampling. We provide a careful theoretical analysis, including guarantees on robustness to high dimensionality, explicit comparison with standard Markov chain Monte Carlo methods and illustrations of the potential improvements in efficiency. Simple and concrete intuition is provided for when the novel scheme is expected to outperform standard schemes. When applied to Bayesian variable-selection problems, the novel algorithm is orders of magnitude more efficient than available alternative sampling schemes and enables fast and reliable fully Bayesian inferences with tens of thousand regressors.

Journal Article

Share this book

Add to My Shelf

Bayesian Estimation of a New Pareto-Type Distribution Based on Mixed Gibbs Sampling Algorithm

by Li, Fanqun , Zhao, Mingtao , Wei, Shanran in Algorithms , Analysis , Artificial intelligence

2024

In this paper, based on the mixed Gibbs sampling algorithm, a Bayesian estimation procedure is proposed for a new Pareto-type distribution in the case of complete and type II censored samples. Simulation studies show that the proposed method is consistently superior to the maximize likelihood estimation in the context of small samples. Also, an analysis of some real data is provided to test the Bayesian estimation.

Journal Article

Share this book

Add to My Shelf

A Conversation with Alan Gelfand

by Carlin, Bradley P. , Gelfand, Alan , Herring, Amy H. in Bayes , Bayesian analysis , CCNY

2015

Alan E. Gelfand was born April 17, 1945, in the Bronx, New York. He attended public grade schools and did his undergraduate work at what was then called City College of New York (CCNY, now CUNY), excelling at mathematics. He then surprised and saddened his mother by going all the way across the country to Stanford to graduate school, where he completed his dissertation in 1969 under the direction of Professor Herbert Solomon, making him an academic grandson of Herman Rubin and Harold Hotelling. Alan then accepted a faculty position at the University of Connecticut (UConn) where he was promoted to tenured associate professor in 1975 and to full professor in 1980. A few years later he became interested in decision theory, then empirical Bayes, which eventually led to the publication of Gelfand and Smith [J. Amer. Statist. Assoc. 85 (1990) 398–409], the paper that introduced the Gibbs sampler to most statisticians and revolutionized Bayesian computing. In the mid-1990s, Alan's interests turned strongly to spatial statistics, leading to fundamental contributions in spatially-varying coefficient models, coregionalization, and spatial boundary analysis (wombling). He spent 33 years on the faculty at UConn, retiring in 2002 to become the James B. Duke Professor of Statistics and Decision Sciences at Duke University, serving as chair from 2007–2012. At Duke, he has continued his work in spatial methodology while increasing his impact in the environmental sciences. To date, he has published over 260 papers and 6 books; he has also supervised 36 Ph.D. dissertations and 10 postdocs. This interview was done just prior to a conference of his family, academic descendants, and colleagues to celebrate his 70th birthday and his contributions to statistics which took place on April 19–22, 2015 at Duke University.

Journal Article

Share this book

Add to My Shelf

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

by Li, Yanchao , Jelodar, Hamed , Yuan, Chi in Data mining , Dirichlet problem , Modelling

2019

Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data and text documents. Researchers have published many articles in the field of topic modeling and applied in various fields such as software engineering, political science, medical and linguistic science, etc. There are various methods for topic modelling; Latent Dirichlet Allocation (LDA) is one of the most popular in this field. Researchers have proposed various models based on the LDA in topic modeling. According to previous work, this paper will be very useful and valuable for introducing LDA approaches in topic modeling. In this paper, we investigated highly scholarly articles (between 2003 to 2016) related to topic modeling based on LDA to discover the research development, current trends and intellectual structure of topic modeling. In addition, we summarize challenges and introduce famous tools and datasets in topic modeling based on LDA.

Journal Article

Share this book

Add to My Shelf

A joint model of place of residence (POR) and place of work (POW)

by Habib, Khandker Nurul , Hawkins, Jason , Zhang, Hengyang in Commuting , Conditional probabilities , Household income

2019

Place or residence (POR) and place of work (POW) are two spatial pivots defining patterns of travel behavior. These choices are considered part of long-term choice influencing short-term daily travel choices. Hence, POR-POW distributions are input into almost all daily travel demand models. However, in many cases, POW-POR is modelled in an ad-hoc way considering the gravity-based or entropy is maximizing aggregate modelling approach. Lack of data on the sequence of choices related to POR and POW is often blamed for avoiding using disaggregate choice model. Recognizing such data limitation, this paper presents an alternative methodology of modelling joint distribution of POW-POW that uses disaggregate choice models without necessarily knowing the sequence of POR and POW choices. It uses the conditional probability break downs of joint POR-POW choice probabilities as depicted in the Gibbs sampling approach. This allows capturing effects of household socioeconomic characteristics, zonal land-use characteristics, and modal accessibility factors in the POR-POW models. The model is applied for a case study in the city of Ottawa. Results reveal that the proposed methodology can replicate observed patterns of POR-POW with a high degree of accuracy.

Journal Article

Share this book

Add to My Shelf

Non-parametric Bayesian inference on bivariate extremes

by Perron, François , Segers, Johan , Guillotte, Simon in Algorithms , Approximation , Atoms

2011

The tail of a bivariate distribution function in the domain of attraction of a bivariate extreme value distribution may be approximated by that of its extreme value attractor. The extreme value attractor has margins that belong to a three-parameter family and a dependence structure which is characterized by a probability measure on the unit interval with mean equal to , which is called the spectral measure. Inference is done in a Bayesian framework using a censored likelihood approach. A prior distribution is constructed on an infinite dimensional model for this measure, the model being at the same time dense and computationally manageable. A trans-dimensional Markov chain Monte Carlo algorithm is developed and convergence to the posterior distribution is established. In simulations, the Bayes estimator for the spectral measure is shown to compare favourably with frequentist non-parametric estimators. An application to a data set of Danish fire insurance claims is provided.

Journal Article

Share this book

Add to My Shelf

Approximate Dirichlet Process Computing in Finite Normal Mixtures

by James, Lancelot F , Ishwaran, Hemant in Almost sure truncation , Approximation , Atoms

2002

A rich nonparametric analysis of the finite normal mixture model is obtained by working with a precise truncation approximation of the Dirichlet process. Model fitting is carried out by a simple Gibbs sampling algorithm that directly samples the nonparametric posterior. The proposed sampler mixes well, requires no tuning parameters, and involves only draws from simple distributions, including the draw for the mass parameter that controls clustering, and the draw for the variances with the use of a nonconjugate uniform prior. Working directly with the nonparametric prior is conceptually appealing and among other things leads to graphical methods for studying the posterior mixing distribution as well as penalized MLE procedures for deriving point estimates. We discuss methods for automating selection of priors for the mean and variance components to avoid over or undersmoothing the data. We also look at the effectiveness of incorporating prior information in the form of frequentist point estimates.

Journal Article

Share this book

Add to My Shelf

BAYESIAN MULTISTUDY FACTOR ANALYSIS FOR HIGH-THROUGHPUT BIOLOGICAL DATA

by Bellio, Ruggero , De Vito, Roberta , Trippa, Lorenzo

2021

This paper analyzes breast cancer gene expression across seven studies to identify genuine and thus replicable gene patterns shared among these studies. Our premise is that genuine biological signal is more likely to be reproducibly present in multiple studies than spurious signal. Our analysis uses a new modeling strategy for the joint analysis of high-throughput biological studies which simultaneously identifies shared as well as study-specific signal. To this end, we generalize the multi-study factor analysis model to handle high-dimensional data and generalize the sparse Bayesian infinite factor model to this context. We provide strategies for the identification of the loading matrices, common and study-specific. Through extensive simulation analysis, we characterize the performance of the proposed approach in various scenarios and show that it outperforms standard factor analysis in identifying replicable signal in all scenarios considered. The analysis of breast cancer gene expression studies identifies clear replicable gene patterns. These patterns are related to well-known biological pathways involved in breast cancer, such as the ER, cell cycle, immune system, collagen, and metabolic pathways. Some of these patterns are also associated with existing breast cancer subtypes, such as LumA, Her2, and basal subtypes, while other patterns identify novel pathways active across subtypes and missed by hierarchical clustering approaches. The R package MSFA implementing the method is available on GitHub.

Journal Article

Share this book

Add to My Shelf

Temperament and its heritability in Corriedale and Merino lambs

by Zambra, N. , Gimeno, D. , Blache, D. in Aging , agitation , animal performance

2015

Temperament can be defined as the fearfulness and reactivity of an animal in response to humans and strange, novel or threatening environments. The productive performance of an animal is affected by its temperament, and selection of calm animals might improve their adaptation to the farming environment and handling, as well as improve productivity. The temperament was measured in lambs of two breeds of sheep in Uruguay. The effects of dam’s age, type of birth, age of the lamb and contemporary group (CG; lambs belonging to the same year, flock, sex and rearing group) on the temperament of the lambs and the heritability of temperament were estimated with a Bayesian analysis using Gibbs sampling. Overall, 4962 Corriedale lambs and 2952 Merino lambs from 13 farms were tested. Temperament was measured using the isolation box test, isolating a lamb inside the box for 30 s, and recording the vibrations produced by its movements. The average temperament score (±s.e.m.) of the Corriedale lambs was 24.7 (±0.23) and that of the Merino was 36.8 (±0.45). Temperament was not associated with dam’s age, type of birth or lamb’s age. There were no relevant differences in the agitation score between lambs born in 2010 and 2011. The mean of the distribution of possible values of heritability (±s.d.) was 0.18 (±0.05) for the Corriedale and 0.31 (±0.06) for the Merino. The likelihood of heritability values to be greater than 0.15 exceeded 70% in the Corriedale and 90% in the Merino. The temperament of Merino and Corriedale sheep in Uruguay is moderately heritable. It is not related to dam’s age, type of birth or age of the lambs; however, it is affected by some aspect of the CG.

Journal Article

Share this book

Add to My Shelf

Interactive topic modeling

by Satinoff, Brianna , Smith, Alison , Boyd-Graber, Jordan in Algorithms , Artificial Intelligence , Computer Science

2014

Topic models are a useful and ubiquitous tool for understanding large corpora. However, topic models are not perfect, and for many users in computational social science, digital humanities, and information studies—who are not machine learning experts—existing models and frameworks are often a “take it or leave it” proposition. This paper presents a mechanism for giving users a voice by encoding users’ feedback to topic models as correlations between words into a topic model. This framework, interactive topic modeling ( itm ), allows untrained users to encode their feedback easily and iteratively into the topic models. Because latency in interactive systems is crucial, we develop more efficient inference algorithms for tree-based topic models. We validate the framework both with simulated and real users.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter