Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
3,026 result(s) for "Causal inference"
Sort by:
Causal inference and the data-fusion problem
We review concepts, principles, and tools that unify current approaches to causal analysis and attend to new challenges presented by big data. In particular, we address the problem of data fusion—piecing together multiple datasets collected under heterogeneous conditions (i.e., different populations, regimes, and sampling methods) to obtain valid answers to queries of interest. The availability of multiple heterogeneous datasets presents new opportunities to big data analysts, because the knowledge that can be acquired from combined data would not be possible from any individual source alone. However, the biases that emerge in heterogeneous environments require new analytical tools. Some of these biases, including confounding, sampling selection, and cross-population biases, have been addressed in isolation, largely in restricted parametric models. We here present a general, nonparametric framework for handling these biases and, ultimately, a theoretical solution to the problem of data fusion in causal inference tasks.
Recursive partitioning for heterogeneous causal effects
In this paper we propose methods for estimating heterogeneity in causal effects in experimental and observational studies and for conducting hypothesis tests about the magnitude of differences in treatment effects across subsets of the population. We provide a data-driven approach to partition the data into subpopulations that differ in the magnitude of their treatment effects. The approach enables the construction of valid confidence intervals for treatment effects, even with many covariates relative to the sample size, and without “sparsity” assumptions.We propose an “honest” approach to estimation, whereby one sample is used to construct the partition and another to estimate treatment effects for each subpopulation. Our approach builds on regression tree methods, modified to optimize for goodness of fit in treatment effects and to account for honest estimation. Our model selection criterion anticipates that bias will be eliminated by honest estimation and also accounts for the effect of making additional splits on the variance of treatment effect estimates within each subpopulation. We address the challenge that the “ground truth” for a causal effect is not observed for any individual unit, so that standard approaches to cross-validation must be modified. Through a simulation study, we show that for our preferred method honest estimation results in nominal coverage for 90% confidence intervals, whereas coverage ranges between 74% and 84% for nonhonest approaches. Honest estimation requires estimating the model with a smaller sample size; the cost in terms of mean squared error of treatment effects for our preferred method ranges between 7–22%.
Causal inference in economics and marketing
This is an elementary introduction to causal inference in economics written for readers familiar with machine learning methods. The critical step in any causal analysis is estimating the counterfactual—a prediction of what would have happened in the absence of the treatment. The powerful techniques used in machine learning may be useful for developing better estimates of the counterfactual, potentially improving causal inference.
Estimating peer effects in networks with peer encouragement designs
Peer effects, in which the behavior of an individual is affected by the behavior of their peers, are central to social science. Because peer effects are often confounded with homophily and common external causes, recent work has used randomized experiments to estimate effects of specific peer behaviors. These experiments have often relied on the experimenter being able to randomly modulate mechanisms by which peer behavior is transmitted to a focal individual. We describe experimental designs that instead randomly assign individuals’ peers to encouragements to behaviors that directly affect those individuals. We illustrate this method with a large peer encouragement design on Facebook for estimating the effects of receiving feedback from peers on posts shared by focal individuals. We find evidence for substantial effects of receiving marginal feedback on multiple behaviors, including giving feedback to others and continued posting. These findings provide experimental evidence for the role of behaviors directed at specific individuals in the adoption and continued use of communication technologies. In comparison, observational estimates differ substantially, both underestimating and overestimating effects, suggesting that researchers and policy makers should be cautious in relying on them.
Modeling confounding by half-sibling regression
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as “half-sibling regression,” is inspired by recentwork in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application.
LIMITATIONS OF DESIGN-BASED CAUSAL INFERENCE AND A/B TESTING UNDER ARBITRARY AND NETWORK INTERFERENCE
Randomized experiments on a network often involve interference between connected units, namely, a situation in which an individual's treatment can affect the response of another individual Current approaches to deal with interference, in theory and in practice, often make restrictive assumptions on its structure—for instance, assuming that interference is local—even when using otherwise nonparametric inference strategies. This reliance on explicit restrictions on the interference mechanism suggests a shared intuition that inference is impossible without any assumptions on the interference structure. In this paper, we begin by formalizing this intuition in the context of a classical nonparametric approach to inference, referred to as design-based inference of causal effects. Next, we show how, always in the context of design-based inference, even parametric structural assumptions that allow the existence of unbiased estimators cannot guarantee a decreasing variance even in the large sample limit. This lack of concentration in large samples is often observed empirically, in randomized experiments in which interference of some form is expected to be present. This result has direct consequences for the design and analysis of large experiments—for instance, in online social platforms—where the belief is that large sample sizes automatically guarantee small variance. More broadly, our results suggest that although strategies for causal inference in the presence of interference borrow their formalism and main concepts from the traditional causal inference literature, much of the intuition from the no-interference case do not easily transfer to the interference setting.
Causal Inference and Observational Research: The Utility of Twins
Valid causal inference is central to progress in theoretical and applied psychology. Although the randomized experiment is widely considered the gold standard for determining whether a given exposure increases the likelihood of some specified outcome, experiments are not always feasible and in some cases can result in biased estimates of causal effects. Alternatively, standard observational approaches are limited by the possibility of confounding, reverse causation, and the nonrandom distribution of exposure (i.e., selection). We describe the counter-factual model of causation and apply it to the challenges of causal inference in observational research, with a particular focus on aging. We argue that the study of twin pairs discordant on exposure, and in particular discordant monozygotic twins, provides a useful analog to the idealized counter-factual design. A review of discordant-twin studies in aging reveals that they are consistent with, but do not unambiguously establish, a causal effect of lifestyle factors on important late-life outcomes. Nonetheless, the existing studies are few in number and have clear limitations that have not always been considered in interpreting their results. It is concluded that twin researchers could make greater use of the discordant-twin design as one approach to strengthen causal inferences in observational research.
Improving massive experiments with threshold blocking
Inferences from randomized experiments can be improved by blocking: assigning treatment in fixed proportions within groups of similar units. However, the use of the method is limited by the difficulty in deriving these groups. Current blocking methods are restricted to special cases or run in exponential time; are not sensitive to clustering of data points; and are often heuristic, providing an unsatisfactory solution in many common instances. We present an algorithm that implements a widely applicable class of blocking—threshold blocking—that solves these problems. Given a minimum required group size and a distance metric, we study the blocking problem of minimizing the maximum distance between any two units within the same group. We prove this is a nondeterministic polynomial-time hard problem and derive an approximation algorithm that yields a blocking where the maximum distance is guaranteed to be, at most, four times the optimal value. This algorithm runs in O(n log n) time with O(n) space complexity. This makes it, to our knowledge, the first blocking method with an ensured level of performance that works in massive experiments. Whereas many commonly used algorithms form pairs of units, our algorithm constructs the groups flexibly for any chosen minimum size. This facilitates complex experiments with several treatment arms and clustered data. A simulation study demonstrates the efficiency and efficacy of the algorithm; tens of millions of units can be blocked using a desktop computer in a few minutes.
Interactive molecular causal networks of hypertension using a fast machine learning algorithm MRdualPC
Background Understanding the complex interactions between genes and their causal effects on diseases is crucial for developing targeted treatments and gaining insight into biological mechanisms. However, the analysis of molecular networks, especially in the context of high-dimensional data, presents significant challenges. Methods This study introduces MRdualPC, a computationally tractable algorithm based on the MRPC approach, to infer large-scale causal molecular networks. We apply MRdualPC to investigate the upstream causal transcriptomics influencing hypertension using a comprehensive dataset of kidney genome and transcriptome data. Results Our algorithm proves to be 100 times faster than MRPC on average in identifying transcriptomics drivers of hypertension. Through clustering, we identify 63 modules with causal driver genes, including 17 modules with extensive causal networks. Notably, we find that genes within one of the causal networks are associated with the electron transport chain and oxidative phosphorylation, previously linked to hypertension. Moreover, the identified causal ancestor genes show an over-representation of blood pressure-related genes. Conclusions MRdualPC has the potential for broader applications beyond gene expression data, including multi-omics integration. While there are limitations, such as the need for clustering in large gene expression datasets, our study represents a significant advancement in building causal molecular networks, offering researchers a valuable tool for analyzing big data and investigating complex diseases.
A doubly robust estimator for continuous treatments in high dimensions
Background Generalized propensity score (GPS) methods have become popular for estimating causal relationships between a continuous treatment and an outcome in observational studies with rich covariate information. The presence of rich covariates enhances the plausibility of the unconfoundedness assumption. Nonetheless, it is also crucial to ensure the correct specification of both marginal and conditional treatment distributions, beyond the assumption of unconfoundedness. Method We address limitations in existing GPS methods by extending balance-based approaches to high dimensions and introducing the Generalized Outcome-Adaptive LASSO and Doubly Robust Estimate (GOALDeR). This novel approach integrates a balance-based method that is robust to the misspecification of distributions required for GPS methods, a doubly robust estimator that is robust to the misspecification of models, and a variable selection technique for causal inference that ensures an unbiased and statistically efficient estimation. Results Simulation studies showed that GOALDeR was able to generate nearly unbiased estimates when either the GPS model or the outcome model was correctly specified. Notably, GOALDeR demonstrated greater precision and accuracy compared to existing methods and was slightly affected by the covariate correlation structure and ratio of sample size to covariate dimension. Real data analysis revealed no statistically significant dose-response relationship between epigenetic age acceleration and Alzheimer’s disease. Conclusion In this study, we proposed GOALDeR as an advanced GPS method for causal inference in high dimensions, and empirically demonstrated that GOALDeR is doubly robust, with improved accuracy and precision compared to existing methods. The R package is available at https://github.com/QianGao-SXMU/GOALDeR .