Catalogue Search | MBRL

Genetic Association in Multivariate Phenotypic Data: Power in Five Models

by Boomsma, Dorret I. , Minica, Camelia C. , Dolan, Conor V. in association , Case studies , Diseases in twins

2010

This article concerns the power of various data analytic strategies to detect the effect of a single genetic variant (GV) in multivariate data. We simulated exactly fitting monozygotic and dizygotic phenotypic data according to single and two common factor models, and simplex models. We calculated the power to detect the GV in twin 1 data in an ANOVA of phenotypic sum scores, in a MANOVA, and in exploratory factor analysis (EFA), in which the common factors are regressed on the genetic variant. We also report power in the full twin model, and power of the single phenotype ANOVA. The results indicate that (1) if the GV affects all phenotypes, the sum score ANOVA and the EFA are most powerful, while the MANOVA is less powerful. Increasing phenotypic correlations further decreases the power of the MANOVA; and (2) if the GV affects only a subset of the phenotypes, the EFA or the MANOVA are most powerful, while sum score ANOVA is less powerful. In this case, an increase in phenotypic correlations may enhance the power of MANOVA and EFA. If the effect of the GV is modeled directly on the phenotypes in the EFA, the power of the EFA is approximately equal to the power of the MANOVA.

Journal Article

Share this book

Add to My Shelf

The normal law under linear restrictions: simulation and estimation via minimax tilting

by Botev, Z. I. in Bayesian analysis , Computer simulation , Data simulation

2017

Simulation from the truncated multivariate normal distribution in high dimensions is a recurrent problem in statistical computing and is typically only feasible by using approximate Markov chain Monte Carlo sampling. We propose a minimax tilting method for exact independently and identically distributed data simulation from the truncated multivariate normal distribution. The new methodology provides both a method for simulation and an efficient estimator to hitherto intractable Gaussian integrals. We prove that the estimator has a rare vanishing relative error asymptotic property. Numerical experiments suggest that the scheme proposed is accurate in a wide range of set-ups for which competing estimation schemes fail. We give an application to exact independently and identically distributed data simulation from the Bayesian posterior of the probit regression model.

Journal Article

Share this book

Add to My Shelf

THE ZIG-ZAG PROCESS AND SUPER-EFFICIENT SAMPLING FOR BAYESIAN ANALYSIS OF BIG DATA

by Fearnhead, Paul , Bierkens, Joris , Roberts, Gareth in Algorithms , Bayesian analysis , Big Data

2019

Standard MCMC methods can scale poorly to big data settings due to the need to evaluate the likelihood at each iteration. There have been a number of approximate MCMC algorithms that use sub-sampling ideas to reduce this computational burden, but with the drawback that these algorithms no longer target the true posterior distribution. We introduce a new family of Monte Carlo methods based upon a multidimensional version of the Zig-Zag process of [Ann. Appl. Probab. 27 (2017) 846–882], a continuous-time piecewise deterministic Markov process. While traditional MCMC methods are reversible by construction (a property which is known to inhibit rapid convergence) the Zig-Zag process offers a flexible nonreversible alternative which we observe to often have favourable convergence properties. We show how the Zig-Zag process can be simulated without discretisation error, and give conditions for the process to be ergodic. Most importantly, we introduce a sub-sampling version of the Zig-Zag process that is an example of an exact approximate scheme, that is, the resulting approximate process still has the posterior as its stationary distribution. Furthermore, if we use a control-variate idea to reduce the variance of our unbiased estimator, then the Zig-Zag process can be super-efficient: after an initial preprocessing step, essentially independent samples from the posterior distribution are obtained at a computational cost which does not depend on the size of the data.

Journal Article

Share this book

Add to My Shelf

Robust Bayesian Inference via Coarsening

by Dunson, David B. , Miller, Jeffrey W. in Algorithms , Autoregressive models , Bayesian analysis

2019

The standard approach to Bayesian inference is based on the assumption that the distribution of the data belongs to the chosen model class. However, even a small violation of this assumption can have a large impact on the outcome of a Bayesian procedure. We introduce a novel approach to Bayesian inference that improves robustness to small departures from the model: rather than conditioning on the event that the observed data are generated by the model, one conditions on the event that the model generates data close to the observed data, in a distributional sense. When closeness is defined in terms of relative entropy, the resulting \"coarsened\" posterior can be approximated by simply tempering the likelihood-that is, by raising the likelihood to a fractional power-thus, inference can usually be implemented via standard algorithms, and one can even obtain analytical solutions when using conjugate priors. Some theoretical properties are derived, and we illustrate the approach with real and simulated data using mixture models and autoregressive models of unknown order. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

An Initial Assessment of Antarctic Sea Ice Extent in the CMIP5 Models

by Scott Hosking, J. , Phillips, Tony , Bracegirdle, Thomas J. in Annual variations , Antarctic sea ice , Antarctica

2013

This paper examines the annual cycle and trends in Antarctic sea ice extent (SIE) for 18 models used in phase 5 of the Coupled Model Intercomparison Project (CMIP5) that were run with historical forcing for the 1850s to 2005. Many of the models have an annual SIE cycle that differs markedly from that observed over the last 30 years. The majority of models have too small of an SIE at the minimum in February, while several of the models have less than two-thirds of the observed SIE at the September maximum. In contrast to the satellite data, which exhibit a slight increase in SIE, the mean SIE of the models over 1979–2005 shows a decrease in each month, with the greatest multimodel mean percentage monthly decline of 13.6% decade−1in February and the greatest absolute loss of ice of −0.40 × 10⁶ km² decade−1in September. The models have very large differences in SIE over 1860–2005. Most of the control runs have statistically significant trends in SIE over their full time span, and all of the models have a negative trend in SIE since the mid-nineteenth century. The negative SIE trends in most of the model runs over 1979–2005 are a continuation of an earlier decline, suggesting that the processes responsible for the observed increase over the last 30 years are not being simulated correctly.

Journal Article

Share this book

Add to My Shelf

RE-EM trees: a data mining approach for longitudinal and clustered data

by Simonoff, Jeffrey S. , Sela, Rebecca J. in Applied sciences , Artificial Intelligence , Clustering

2012

Longitudinal data refer to the situation where repeated observations are available for each sampled object. Clustered data, where observations are nested in a hierarchical structure within objects (without time necessarily being involved) represent a similar type of situation. Methodologies that take this structure into account allow for the possibilities of systematic differences between objects that are not related to attributes and autocorrelation within objects across time periods. A standard methodology in the statistics literature for this type of data is the mixed effects model, where these differences between objects are represented by so-called “random effects” that are estimated from the data (population-level relationships are termed “fixed effects,” together resulting in a mixed effects model). This paper presents a methodology that combines the structure of mixed effects models for longitudinal and clustered data with the flexibility of tree-based estimation methods. We apply the resulting estimation method, called the RE-EM tree, to pricing in online transactions, showing that the RE-EM tree is less sensitive to parametric assumptions and provides improved predictive power compared to linear models with random effects and regression trees without random effects. We also apply it to a smaller data set examining accident fatalities, and show that the RE-EM tree strongly outperforms a tree without random effects while performing comparably to a linear model with random effects. We also perform extensive simulation experiments to show that the estimator improves predictive performance relative to regression trees without random effects and is comparable or superior to using linear models with random effects in more general situations.

Journal Article

Share this book

Add to My Shelf

On the Concept of Depth for Functional Data

by López-Pintado, Sara , Romo, Juan in Applications , Boys , Child development

2009

The statistical analysis of functional data is a growing need in many research areas. In particular, a robust methodology is important to study curves, which are the output of many experiments in applied statistics. As a starting point for this robust analysis, we propose, analyze, and apply a new definition of depth for functional observations based on the graphic representation of the curves. Given a collection of functions, it establishes the \"centrality\" of an observation and provides a natural center-outward ordering of the sample curves. Robust statistics, such as the median function or a trimmed mean function, can be defined from this depth definition. Its finite-dimensional version provides a new depth for multivariate data that is computationally feasible and useful for studying high-dimensional observations. Thus, this new depth is also suitable for complex observations such as microarray data, images, and those arising in some recent marketing and financial studies. Natural properties of these new concepts are established and the uniform consistency of the sample depth is proved. Simulation results show that the corresponding depth based trimmed mean presents better performance than other possible location estimators proposed in the literature for some contaminated models. Data depth can be also used to screen for outliers. The ability of the new notions of depth to detect \"shape\" outliers is presented. Several real datasets are considered to illustrate this new concept of depth, including applications to microarray observations, weather data, and growth curves. Finally, through this depth, we generalize to functions the Wilcoxon rank sum test. It allows testing whether two groups of curves come from the same population. This functional rank test when applied to children growth curves shows different growth patterns for boys and girls.

Journal Article

Share this book

Add to My Shelf

A semantic matching energy function for learning with multi-relational data

by Bordes, Antoine , Glorot, Xavier , Weston, Jason in Applied sciences , Architecture , Artificial Intelligence

2014

Large-scale relational learning becomes crucial for handling the huge amounts of structured data generated daily in many application domains ranging from computational biology or information retrieval, to natural language processing. In this paper, we present a new neural network architecture designed to embed multi-relational graphs into a flexible continuous vector space in which the original data is kept and enhanced. The network is trained to encode the semantics of these graphs in order to assign high probabilities to plausible components. We empirically show that it reaches competitive performance in link prediction on standard datasets from the literature as well as on data from a real-world knowledge base (WordNet). In addition, we present how our method can be applied to perform word-sense disambiguation in a context of open-text semantic parsing, where the goal is to learn to assign a structured meaning representation to almost any sentence of free text, demonstrating that it can scale up to tens of thousands of nodes and thousands of types of relation.

Journal Article

Share this book

Add to My Shelf

Evaluation of Temperature and Precipitation Trends and Long-Term Persistence in CMIP5 Twentieth-Century Climate Simulations

by Kinter, James L. , Merwade, Venkatesh , Niyogi, Dev in 20th century , Climate , Climate change

2013

The authors have analyzed twentieth-century temperature and precipitation trends and long-term persistence from 19 climate models participating in phase 5 of the Coupled Model Intercomparison Project (CMIP5). This study is focused on continental areas (60°S–60°N) during 1930–2004 to ensure higher reliability in the observations. A nonparametric trend detection method is employed, and long-term persistence is quantified using the Hurst coefficient, taken from the hydrology literature. The authors found that the multimodel ensemble–mean global land–average temperature trend (0.07°C decade−1) captures the corresponding observed trend well (0.08°C decade−1). Globally, precipitation trends are distributed (spatially) at about zero in both the models and in the observations. There are large uncertainties in the simulation of regional-/local-scale temperature and precipitation trends. The models’ relative performances are different for temperature and precipitation trends. The models capture the long-term persistence in temperature reasonably well. The areal coverage of observed long-term persistence in precipitation is 60% less (32% of land area) than that of temperature (78%). The models have limited capability to capture the long-term persistence in precipitation. Most climate models underestimate the spatial variability in temperature trends. The multimodel ensemble–average trend generally provides a conservative estimate of local/regional trends. The results of this study are generally not biased by the choice of observation datasets used, including Climatic Research Unit Time Series 3.1; temperature data from Hadley Centre/Climatic Research Unit, version 4; and precipitation data from Global Historical Climatology Network, version 2.

Journal Article

Share this book

Add to My Shelf

Classifier chains for multi-label classification

by Holmes, Geoff , Pfahringer, Bernhard , Frank, Eibe in Acceptability , Algorithmics. Computability. Computer arithmetics , Applied sciences

2011

The widely known binary relevance method for multi-label classification, which considers each label as an independent binary problem, has often been overlooked in the literature due to the perceived inadequacy of not directly modelling label correlations. Most current methods invest considerable complexity to model interdependencies between labels. This paper shows that binary relevance-based methods have much to offer, and that high predictive performance can be obtained without impeding scalability to large datasets. We exemplify this with a novel classifier chains method that can model label correlations while maintaining acceptable computational complexity. We extend this approach further in an ensemble framework. An extensive empirical evaluation covers a broad range of multi-label datasets with a variety of evaluation metrics. The results illustrate the competitiveness of the chaining method against related and state-of-the-art methods, both in terms of predictive performance and time complexity.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter