Catalogue Search | MBRL

Confidence intervals for policy evaluation in adaptive experiments

by Wager, Stefan , Hirshberg, David A. , Athey, Susan in Algorithms , Confidence intervals , Data collection

2021

Adaptive experimental designs can dramatically improve efficiency in randomized trials. But with adaptively collected data, common estimators based on sample means and inverse propensity-weighted means can be biased or heavy-tailed. This poses statistical challenges, in particular when the experimenter would like to test hypotheses about parameters that were not targeted by the data-collection mechanism. In this paper, we present a class of test statistics that can handle these challenges. Our approach is to adaptively reweight the terms of an augmented inverse propensity-weighting estimator to control the contribution of each term to the estimator’s variance. This scheme reduces overall variance and yields an asymptotically normal test statistic. We validate the accuracy of the resulting estimates and their CIs in numerical experiments and show that our methods compare favorably to existing alternatives in terms of mean squared error, coverage, and CI size.

Journal Article

Share this book

Add to My Shelf

Essays in Econometrics and Dynamic Kidney Exchange

by Hadad, Baisi Vitor in Computer science , Economics , Statistics

2018

This dissertation is divided into two parts. Part I - Dynamic Kidney Exchange In recent years, kidney paired donation (KPD) has an emerged as an attractive alternative for end-stage renal disease patients with incompatible living donors. However, we argue that the matching algorithm currently used by organ clearinghouses is inefficient, in the sense that a larger number of patients may be reached if kidney transplant centers take into consideration how their pool of patients and donors will evolve over time. In our work Two Novel Algorithms for Dynamic Kidney Exchange, we explore this claim and propose new computational algorithms to increase the cardinality of matchings in a discrete-time dynamic kidney exchange model with Poisson entries and Geometric deaths. Our algorithms are classified into direct prediction methods and multi-armed bandit methods. In the direct prediction method, we use machine learning estimator to produce a probability that each patient-donor pair should be matched today, as op- posed to being left for a future matching. The estimators are trained on offline optimal solutions. In contrast, in multi-armed bandit methods, we use simulations to evaluate the desirability of different matchings. Since the amount of different matchings is enormous, multi-armed bandits (MAB) are employed to decrease order to decrease the computational burden. Our methods are evaluated using simulations in a variety of simulation configurations. We find that the performance of at least one of our methods, based on multi-armed bandit algorithms, is able to uniformly dominate the myopic method that is used by kidney transplants in practice. We restrict our experiments to pairwise kidney exchange, but the methods described here are easily extensible, computational constraints permitting. Part II - Econometrics In our econometric paper Heterogenous Production Functions, Panel Data, and Productivity, we present methods for identification of moments and nonparametric marginal distributions of endogenous random coefficient models in fixed-T linear panel data models. Our identification strategy is constructive, immediately leading to relatively simple estimators that can be shown to be consistent and asymptotically normal. Because our strategy makes use of special properties of “small” (measure-zero) subpopulations, our estimators are irregularly identified: they can be shown to be consistent and asymptotically Normal, but converge at rates slower than root-n. We provide an illustration of our methods by estimating first and second moments of random Cobb-Douglas coefficients in production functions, using Indian plant-level microdata.

Dissertation

Share this book

Add to My Shelf

Adapting to Misspecification in Contextual Bandits with Offline Regression Oracles

by Athey, Susan , Krishnamurthy, Sanath Kumar , Hadad, Vitor in Algorithms , Prediction models , Regression

2021

Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the bandit algorithm may incur unexpected regret, so recent work has focused on algorithms that are robust to misspecification. We propose a simple family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy when there is evidence that misspecification is causing a regret increase. Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level. Compared to prior work, we attain similar regret guarantees, but we do no rely on a master algorithm, and do not require more robust oracles like online or constrained regression oracles (e.g., Foster et al. (2020a); Krishnamurthy et al. (2020)). This allows us to design algorithms for more general function approximation classes.

Paper

Share this book

Add to My Shelf

Tractable contextual bandits beyond realizability

by Athey, Susan , Krishnamurthy, Sanath Kumar , Hadad, Vitor in Algorithms , Bias , Linear functions

2021

Tractable contextual bandit algorithms often rely on the realizability assumption - i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does not hold, our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability, up to an additive term that accounts for the misspecification error. This extra term is proportional to T times a function of the mean squared error between the best model in the class and the true model, where T is the total number of time-steps. Our work sheds light on the bias-variance trade-off for tractable contextual bandits. This trade-off is not captured by algorithms that assume realizability, since under this assumption there exists an estimator in the class that attains zero bias.

Paper

Share this book

Add to My Shelf

Sufficient Representations for Categorical Variables

by Wager, Stefan , Athey, Susan , Johannemann, Jonathan in Algorithms , Computer simulation , Machine learning

2021

Many learning algorithms require categorical data to be transformed into real vectors before it can be used as input. Often, categorical variables are encoded as one-hot (or dummy) vectors. However, this mode of representation can be wasteful since it adds many low-signal regressors, especially when the number of unique categories is large. In this paper, we investigate simple alternative solutions for universally consistent estimators that rely on lower-dimensional real-valued representations of categorical variables that are \"sufficient\" in the sense that no predictive information is lost. We then compare preexisting and proposed methods on simulated and observational datasets.

Paper

Share this book

Add to My Shelf

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

by Athey, Susan , Zhan, Ruohan , Hadad, Vitor in Confidence intervals , Data collection , Estimators

2021

It has become increasingly common for data to be collected adaptively, for example using contextual bandits. Historical data of this type can be used to evaluate other treatment assignment policies to guide future innovation or experiments. However, policy evaluation is challenging if the target policy differs from the one used to collect data, and popular estimators, including doubly robust (DR) estimators, can be plagued by bias, excessive variance, or both. In particular, when the pattern of treatment assignment in the collected data looks little like the pattern generated by the policy to be evaluated, the importance weights used in DR estimators explode, leading to excessive variance. In this paper, we improve the DR estimator by adaptively weighting observations to control its variance. We show that a t-statistic based on our improved estimator is asymptotically normal under certain conditions, allowing us to form confidence intervals and test hypotheses. Using synthetic data and public benchmarks, we provide empirical evidence for our estimator's improved accuracy and inferential properties relative to existing alternatives.

Paper

Share this book

Add to My Shelf

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

by Athey, Susan , Zhan, Ruohan , Hadad, Vitor in Data collection , Estimators , Observational weighting

2021

It has become increasingly common for data to be collected adaptively, for example using contextual bandits. Historical data of this type can be used to evaluate other treatment assignment policies to guide future innovation or experiments. However, policy evaluation is challenging if the target policy differs from the one used to collect data, and popular estimators, including doubly robust (DR) estimators, can be plagued by bias, excessive variance, or both. In particular, when the pattern of treatment assignment in the collected data looks little like the pattern generated by the policy to be evaluated, the importance weights used in DR estimators explode, leading to excessive variance. In this paper, we improve the DR estimator by adaptively weighting observations to control its variance. We show that a t-statistic based on our improved estimator is asymptotically normal under certain conditions, allowing us to form confidence intervals and test hypotheses. Using synthetic data and public benchmarks, we provide empirical evidence for our estimator's improved accuracy and inferential properties relative to existing alternatives.

Paper

Share this book

Add to My Shelf

Contextual Bandits in a Survey Experiment on Charitable Giving: Within-Experiment Outcomes versus Policy Learning

by Athey, Susan , Leung, Weiwen , Williams, Joseph Jay in Algorithms , Data collection , Decay rate

2022

We design and implement an adaptive experiment (a ``contextual bandit'') to learn a targeted treatment assignment policy, where the goal is to use a participant's survey responses to determine which charity to expose them to in a donation solicitation. The design balances two competing objectives: optimizing the outcomes for the subjects in the experiment (``cumulative regret minimization'') and gathering data that will be most useful for policy learning, that is, for learning an assignment rule that will maximize welfare if used after the experiment (``simple regret minimization''). We evaluate alternative experimental designs by collecting pilot data and then conducting a simulation study. Next, we implement our selected algorithm. Finally, we perform a second simulation study anchored to the collected data that evaluates the benefits of the algorithm we chose. Our first result is that the value of a learned policy in this setting is higher when data is collected via a uniform randomization rather than collected adaptively using standard cumulative regret minimization or policy learning algorithms. We propose a simple heuristic for adaptive experimentation that improves upon uniform randomization from the perspective of policy learning at the expense of increasing cumulative regret relative to alternative bandit algorithms. The heuristic modifies an existing contextual bandit algorithm by (i) imposing a lower bound on assignment probabilities that decay slowly so that no arm is discarded too quickly, and (ii) after adaptively collecting data, restricting policy learning to select from arms where sufficient data has been gathered.

Paper

Share this book

Add to My Shelf

Contextual Bandits in a Survey Experiment on Charitable Giving: Within-Experiment Outcomes versus Policy Learning

by Athey, Susan , Leung, Weiwen , Williams, Joseph Jay in Algorithms , Experiments

2022

We design and implement an adaptive experiment (a ``contextual bandit'') to learn a targeted treatment assignment policy, where the goal is to use a participant's survey responses to determine which charity to expose them to in a donation solicitation. The design balances two competing objectives: optimizing the outcomes for the subjects in the experiment (``cumulative regret minimization'') and gathering data that will be most useful for policy learning, that is, for learning an assignment rule that will maximize welfare if used after the experiment (``simple regret minimization''). We evaluate alternative experimental designs by collecting pilot data and then conducting a simulation study. Next, we implement our selected algorithm. Finally, we perform a second simulation study anchored to the collected data that evaluates the benefits of the algorithm we chose. Our first result is that the value of a learned policy in this setting is higher when data is collected via a uniform randomization rather than collected adaptively using standard cumulative regret minimization or policy learning algorithms. We propose a simple heuristic for adaptive experimentation that improves upon uniform randomization from the perspective of policy learning at the expense of increasing cumulative regret relative to alternative bandit algorithms. The heuristic modifies an existing contextual bandit algorithm by (i) imposing a lower bound on assignment probabilities that decay slowly so that no arm is discarded too quickly, and (ii) after adaptively collecting data, restricting policy learning to select from arms where sufficient data has been gathered.

Paper

Share this book

Add to My Shelf

Confidence Intervals for Policy Evaluation in Adaptive Experiments

by Wager, Stefan , Athey, Susan , Zhan, Ruohan in Confidence intervals , Cost analysis , Experiments

2021

Adaptive experiment designs can dramatically improve statistical efficiency in randomized trials, but they also complicate statistical inference. For example, it is now well known that the sample mean is biased in adaptive trials. Inferential challenges are exacerbated when our parameter of interest differs from the parameter the trial was designed to target, such as when we are interested in estimating the value of a sub-optimal treatment after running a trial to determine the optimal treatment using a stochastic bandit design. In this context, typical estimators that use inverse propensity weighting to eliminate sampling bias can be problematic: their distributions become skewed and heavy-tailed as the propensity scores decay to zero. In this paper, we present a class of estimators that overcome these issues. Our approach is to adaptively reweight the terms of an augmented inverse propensity weighting estimator to control the contribution of each term to the estimator's variance. This adaptive weighting scheme prevents estimates from becoming heavy-tailed, ensuring asymptotically correct coverage. It also reduces variance, allowing us to test hypotheses with greater power - especially hypotheses that were not targeted by the experimental design. We validate the accuracy of the resulting estimates and their confidence intervals in numerical experiments and show our methods compare favorably to existing alternatives in terms of RMSE and coverage.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter