Catalogue Search | MBRL

Predictive Analytics in Information Systems Research

by Koppius, Otto R. , Shmueli, Galit in Analytics , Auctions , Empirical modeling

2011

This research essay highlights the need to integrate predictive analytics into information systems research and shows several concrete ways in which this goal can be accomplished. Predictive analytics include empirical methods (statistical and other) that generate data predictions as well as methods for assessing predictive power. Predictive analytics not only assist in creating practically useful models, they also play an important role alongside explanatory modeling in theory building and theory testing. We describe six roles for predictive analytics: new theory generation, measurement development, comparison of competing theories, improvement of existing models, relevance assessment, and assessment of the predictability of empirical phenomena. Despite the importance of predictive analytics, we find that they are rare in the empirical IS literature. Extant IS literature relies nearly exclusively on explanatory statistical modeling, where statistical inference is used to test and evaluate the explanatory power of underlying causal models, and predictive power is assumed to follow automatically from the explanatory model. However, explanatory power does not imply predictive power and thus predictive analytics are necessary for assessing predictive power and for building empirical models that predict well. To show that predictive analytics and explanatory statistical modeling are fundamentally disparate, we show that they are different in each step of the modeling process. These differences translate into different final models, so that a pure explanatory statistical model is best tuned for testing causal hypotheses and a pure predictive model is best in terms of predictive power. We convert a well-known explanatory paper on TAM to a predictive context to illustrate these differences and show how predictive analytics can add theoretical and practical value to IS research.

Journal Article

Share this book

Add to My Shelf

EQUIVALENCE OF DISTANCE-BASED AND RKHS-BASED STATISTICS IN HYPOTHESIS TESTING

by Sriperumbudur, Bharath , Fukumizu, Kenji , Sejdinovic, Dino in 46E22 , 62G10 , 62H20

2013

We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. In the case where the energy distance is computed with a semimetric of negative type, a positive definite kernel, termed distance kernel, may be defined such that the MMD corresponds exactly to the energy distance. Conversely, for any positive definite kernel, we can interpret the MMD as energy distance with respect to some negative-type semimetric. This equivalence readily extends to distance covariance using kernels on the product space. We determine the class of probability distributions for which the test statistics are consistent against all alternatives. Finally, we investigate the performance of the family of distance kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

Journal Article

Share this book

Add to My Shelf

Toward more accurate contextualization of the CEO effect on firm performance

by Hambrick, Donald C. , Quigley, Timothy J. in Business entities , CEO effect , CEOs

2014

We introduce multiple refinements to the standard method for assessing CEO effects on performance, variance partitioning methodology, more accurately contextualizing CEOs' contributions. Based on a large 20-year sample, our new 'CEO in Context' technique points to a much larger aggregate CEO effect than is obtained from typical approaches. As a validation test, we show that our technique yields estimates of CEO effects more in line with what would be expected from accepted theory about CEO influence on performance. We do this by examining the CEO effects in subsamples of low-, medium-, and high-discretion industries. Finally, we show that our technique generates substantially different—and we argue more logical—estimates of the effects of many individual CEOs than are obtained through customary analyses.

Journal Article

Share this book

Add to My Shelf

On the meta-analysis of response ratios for studies with correlated and multi-group designs

by Lajeunesse, Marc J. in Animal and plant ecology , Animal, plant and microbial ecology , Biological and medical sciences

2011

A common effect size metric used to quantify the outcome of experiments for ecological meta-analysis is the response ratio (RR): the log proportional change in the means of a treatment and control group. Estimates of the variance of RR are also important for meta-analysis because they serve as weights when effect sizes are averaged and compared. The variance of an effect size is typically a function of sampling error; however, it can also be influenced by study design. Here, I derive new variances and covariances for RR for several often-encountered experimental designs: when the treatment and control means are correlated; when multiple treatments have a common control; when means are based on repeated measures; and when the study has a correlated factorial design, or is multivariate. These developments are useful for improving the quality of data extracted from studies for meta-analysis and help address some of the common challenges meta-analysts face when quantifying a diversity of experimental designs with the response ratio.

Journal Article

Share this book

Add to My Shelf

Optimal Detection of Changepoints With a Linear Computational Cost

by Killick, R. , Fearnhead, P. , Eckley, I. A. in Algorithms , Computational methods , Cost efficiency

2012

In this article, we consider the problem of detecting multiple changepoints in large datasets. Our focus is on applications where the number of changepoints will increase as we collect more data: for example, in genetics as we analyze larger regions of the genome, or in finance as we observe time series over longer periods. We consider the common approach of detecting changepoints through minimizing a cost function over possible numbers and locations of changepoints. This includes several established procedures for detecting changing points, such as penalized likelihood and minimum description length. We introduce a new method for finding the minimum of such cost functions and hence the optimal number and location of changepoints that has a computational cost, which, under mild conditions, is linear in the number of observations. This compares favorably with existing methods for the same problem whose computational cost can be quadratic or even cubic. In simulation studies, we show that our new method can be orders of magnitude faster than these alternative exact methods. We also compare with the binary segmentation algorithm for identifying changepoints, showing that the exactness of our approach can lead to substantial improvements in the accuracy of the inferred segmentation of the data. This article has supplementary materials available online.

Journal Article

Share this book

Add to My Shelf

Quantifying individual variation in behaviour: mixed‐effect modelling approaches

by Dingemanse, Niels J , Dochtermann, Ned A , Pol, Martijn in 'HOW TO...' PAPER , accuracy , Animal and plant ecology

2013

Growing interest in proximate and ultimate causes and consequences of between‐ and within‐individual variation in labile components of the phenotype – such as behaviour or physiology – characterizes current research in evolutionary ecology. The study of individual variation requires tools for quantification and decomposition of phenotypic variation into between‐ and within‐individual components. This is essential as variance components differ in their ecological and evolutionary implications. We provide an overview of how mixed‐effect models can be used to partition variation in, and correlations among, phenotypic attributes into between‐ and within‐individual variance components. Optimal sampling schemes to accurately estimate (with sufficient power) a wide range of repeatabilities and key (co)variance components, such as between‐ and within‐individual correlations, are detailed. Mixed‐effect models enable the usage of unambiguous terminology for patterns of biological variation that currently lack a formal statistical definition (e.g. ‘animal personality’ or ‘behavioural syndromes’), and facilitate cross‐fertilisation between disciplines such as behavioural ecology, ecological physiology and quantitative genetics.

Journal Article

Share this book

Add to My Shelf

Equitability, mutual information, and the maximal information coefficient

by Atwal, Gurinder S. , Kinney, Justin B. in Bias , Correlations , data collection

2014

How should one quantify the strength of association between two random variables without bias for relationships of a specific form? Despite its conceptual simplicity, this notion of statistical “equitability” has yet to receive a definitive mathematical formalization. Here we argue that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequality. Mutual information, a fundamental quantity in information theory, is shown to satisfy this equitability criterion. These findings are at odds with the recent work of Reshef et al. [Reshef DN, et al. (2011) Science 334(6062):1518–1524], which proposed an alternative definition of equitability and introduced a new statistic, the “maximal information coefficient” (MIC), said to satisfy equitability in contradistinction to mutual information. These conclusions, however, were supported only with limited simulation evidence, not with mathematical arguments. Upon revisiting these claims, we prove that the mathematical definition of equitability proposed by Reshef et al. cannot be satisfied by any (nontrivial) dependence measure. We also identify artifacts in the reported simulation evidence. When these artifacts are removed, estimates of mutual information are found to be more equitable than estimates of MIC. Mutual information is also observed to have consistently higher statistical power than MIC. We conclude that estimating mutual information provides a natural (and often practical) way to equitably quantify statistical associations in large datasets.

Journal Article

Share this book

Add to My Shelf

To Explain or to Predict?

by Shmueli, Galit in Auctions , Causal explanations , causality

2010

Statistical modeling is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description. In many disciplines there is near-exclusive use of statistical modeling for causal explanation and the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and prediction is common, yet the distinction must be understood for progressing scientific knowledge. While this distinction has been recognized in the philosophy of science, the statistical literature lacks a thorough discussion of the many differences that arise in the process of modeling for an explanatory versus a predictive goal. The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the modeling process.

Journal Article

Share this book

Add to My Shelf

SUB-GAUSSIAN MEAN ESTIMATORS

by Oliveira, Roberto I. , Lerasle, Matthieu , Devroye, Luc in Confidence interval , Estimating techniques , Estimators

2016

We discuss the possibilities and limitations of estimating the mean of a real-valued random variable from independent and identically distributed observations from a nonasymptotic point of view. In particular, we define estimators with a sub-Gaussian behavior even for certain heavy-tailed distributions. We also prove various impossibility results for mean estimators.

Journal Article

Share this book

Add to My Shelf

Variance estimation using refitted cross‐validation in ultrahigh dimensional regression

by Guo, Shaojun , Hao, Ning , Fan, Jianqing in Asymptotic methods , Asymptotic properties , Correlation

2012

Variance estimation is a fundamental problem in statistical modelling. In ultrahigh dimensional linear regression where the dimensionality is much larger than the sample size, traditional variance estimation techniques are not applicable. Recent advances in variable selection in ultrahigh dimensional linear regression make this problem accessible. One of the major problems in ultrahigh dimensional regression is the high spurious correlation between the unobserved realized noise and some of the predictors. As a result, the realized noises are actually predicted when extra irrelevant variables are selected, leading to a serious underestimate of the level of noise. We propose a two‐stage refitted procedure via a data splitting technique, called refitted cross‐validation, to attenuate the influence of irrelevant variables with high spurious correlations. Our asymptotic results show that the resulting procedure performs as well as the oracle estimator, which knows in advance the mean regression function. The simulation studies lend further support to our theoretical claims. The naive two‐stage estimator and the plug‐in one‐stage estimators using the lasso and smoothly clipped absolute deviation are also studied and compared. Their performances can be improved by the refitted cross‐validation method proposed.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter