Catalogue Search | MBRL

One-class edge classification through heterogeneous hypergraph for causal discovery

by Marcacini, Ricardo Marcondes , Gôlo, Marcos Paulo Silva in 639/705/117 , 639/705/258 , Causality

2025

Causal discovery from event pairs is essential for understanding complex real-world phenomena. Large language models (LLMs) have shown strong capabilities in capturing the semantics of events and inferring plausible cause-effect relations from text. However, they typically process each event pair in isolation and struggle to model the global event structure, which limits their ability to capture interdependencies among multiple events. Graph-based methods offer a structural alternative by explicitly modeling connections between events, but they often lack relational expressiveness, as relations are treated as implicit edges rather than as entities. Homogeneous hypergraphs address this by representing relations as nodes, enabling richer modeling of multi-event interactions and more expressive causal reasoning. Nevertheless, this strategy frequently leads to disconnected structures, hindering information aggregation through graph neural networks (GNNs). To address these challenges, we propose eCHOLGA (edge Classification through Heterogeneous One-cLass Graph Autoencoder), a novel method that leverages heterogeneous hypergraphs to model causal relationships more effectively. eCHOLGA integrates semantic features extracted from language models into the graph structure, enhancing the representation of events and their relations. By transforming relations into nodes and introducing additional node and edge types, it improves topological connectivity and enables GNNs to learn more informative edge representations. Furthermore, our method adopts a one-class learning strategy, requiring only positive (causal) examples for training, which reduces labeling effort. In addition to its effectiveness, eCHOLGA enhances interpretability and provides insights into the causal discovery process. Experimental results show that eCHOLGA outperforms state-of-the-art methods, establishing it as a promising approach for causal discovery in event pairs.

Journal Article

Share this book

Add to My Shelf

Learning debiased graph representations from the OMOP common data model for synthetic data generation

by Johanns, Ole , Rath, Natalie , Carus, Jasmin in Algorithms , Analysis , Artificial intelligence

2024

Background Generating synthetic patient data is crucial for medical research, but common approaches build up on black-box models which do not allow for expert verification or intervention. We propose a highly available method which enables synthetic data generation from real patient records in a privacy preserving and compliant fashion, is interpretable and allows for expert intervention. Methods Our approach ties together two established tools in medical informatics, namely OMOP as a data standard for electronic health records and Synthea as a data synthetization method. For this study, data pipelines were built which extract data from OMOP, convert them into time series format, learn temporal rules by 2 statistical algorithms (Markov chain, TARM) and 3 algorithms of causal discovery (DYNOTEARS, J-PCMCI+, LiNGAM) and map the outputs into Synthea graphs. The graphs are evaluated quantitatively by their individual and relative complexity and qualitatively by medical experts. Results The algorithms were found to learn qualitatively and quantitatively different graph representations. Whereas the Markov chain results in extremely large graphs, TARM, DYNOTEARS, and J-PCMCI+ were found to reduce the data dimension during learning. The MultiGroupDirect LiNGAM algorithm was found to not be applicable to the problem statement at hand. Conclusion Only TARM and DYNOTEARS are practical algorithms for real-world data in this use case. As causal discovery is a method to debias purely statistical relationships, the gradient-based causal discovery algorithm DYNOTEARS was found to be most suitable.

Journal Article

Share this book

Add to My Shelf

CAnDOIT: Causal Discovery with Observational and Interventional Data from Time Series

by Bellotto, Nicola , Mghames, Sariah , Castri, Luca in Accuracy , Algorithms , causal robotics

2024

The study of cause and effect is of the utmost importance in many branches of science, but also for many practical applications of intelligent systems. In particular, identifying causal relationships in situations that include hidden factors is a major challenge for methods that rely solely on observational data for building causal models. This article proposes CAnDOIT, a causal discovery method to reconstruct causal models using both observational and interventional time‐series data. The use of interventional data in the causal analysis is crucial for real‐world applications, such as robotics, where the scenario is highly complex and observational data alone are often insufficient to uncover the correct causal structure. Validation of the method is performed initially on randomly generated synthetic models and subsequently on a well‐known benchmark for causal structure learning in a robotic manipulation environment. The experiments demonstrate that the approach can effectively handle data from interventions and exploit them to enhance the accuracy of the causal analysis. A Python implementation of CAnDOIT is developed and is publicly available on GitHub: https://github.com/lcastri/causalflow. CAusal Discovery with Observational and Interventional data from Time‐series (CAnDOIT) is a new algorithm that, for the first time in the time‐series domain, combines interventional and observational data for causal analysis. It excels in complex scenarios where observations alone are insufficient to retrieve the correct causal model. Validation on synthetic models and a robotic manipulation benchmark demonstrates CAnDOIT's strong performance.

Journal Article

Share this book

Add to My Shelf

Causal inference for time series analysis: problems, methods and evaluation

by Tahir Anique , Bhattacharya Anchit , Karami Mansooreh in Causality , Classification , Clustering

2021

Time series data are a collection of chronological observations which are generated by several domains such as medical and financial fields. Over the years, different tasks such as classification, forecasting and clustering have been proposed to analyze this type of data. Time series data have been also used to study the effect of interventions overtime. Moreover, in many fields of science, learning the causal structure of dynamic systems and time series data is considered an interesting task which plays an important role in scientific discoveries. Estimating the effect of an intervention and identifying the causal relations from the data can be performed via causal inference. Existing surveys on time series discuss traditional tasks such as classification and forecasting or explain the details of the approaches proposed to solve a specific task. In this paper, we focus on two causal inference tasks, i.e., treatment effect estimation and causal discovery for time series data and provide a comprehensive review of the approaches in each task. Furthermore, we curate a list of commonly used evaluation metrics and datasets for each task and provide an in-depth insight. These metrics and datasets can serve as benchmark for research in the field.

Journal Article

Share this book

Add to My Shelf

Causal inference by using invariant prediction: identification and confidence intervals

by Meinshausen, Nicolai , Peters, Jonas , Bühlmann, Peter in Causal discovery , Causal inference , Causal models

2016

What is the difference between a prediction that is made with a causal model and that with a non-causal model? Suppose that we intervene on the predictor variables or change the whole environment. The predictions from a causal model will in general work as well under interventions as for observational data. In contrast, predictions from a non-causal model can potentially be very wrong if we actively intervene on variables. Here, we propose to exploit this invariance of a prediction under a causal model for causal inference: given different experimental settings (e.g. various interventions) we collect all models that do show invariance in their predictive accuracy across settings and interventions. The causal model will be a member of this set of models with high probability. This approach yields valid confidence intervals for the causal relationships in quite general scenarios. We examine the example of structural equation models in more detail and provide sufficient assumptions under which the set of causal predictors becomes identifiable. We further investigate robustness properties of our approach under model misspecification and discuss possible extensions. The empirical properties are studied for various data sets, including large-scale gene perturbation experiments.

Journal Article

Share this book

Add to My Shelf

A survey of Bayesian Network structure learning

in Algorithms , Bayesian analysis , Biology

2023

Bayesian Networks (BNs) have become increasingly popular over the last few decades as a tool for reasoning under uncertainty in fields as diverse as medicine, biology, epidemiology, economics and the social sciences. This is especially true in real-world areas where we seek to answer complex questions based on hypothetical evidence to determine actions for intervention. However, determining the graphical structure of a BN remains a major challenge, especially when modelling a problem under causal assumptions. Solutions to this problem include the automated discovery of BN graphs from data, constructing them based on expert knowledge, or a combination of the two. This paper provides a comprehensive review of combinatoric algorithms proposed for learning BN structure from data, describing 74 algorithms including prototypical, well-established and state-of-the-art approaches. The basic approach of each algorithm is described in consistent terms, and the similarities and differences between them highlighted. Methods of evaluating algorithms and their comparative performance are discussed including the consistency of claims made in the literature. Approaches for dealing with data noise in real-world datasets and incorporating expert knowledge into the learning process are also covered.

Journal Article

Share this book

Add to My Shelf

Disentangling causality: assumptions in causal discovery and inference

in Assumptions , Causality , Discovery

2023

Causality has been a burgeoning field of research leading to the point where the literature abounds with different components addressing distinct parts of causality. For researchers, it has been increasingly difficult to discern the assumptions they have to abide by in order to glean sound conclusions from causal concepts or methods. This paper aims to disambiguate the different causal concepts that have emerged in causal inference and causal discovery from observational data by attributing them to different levels of Pearl’s Causal Hierarchy. We will provide the reader with a comprehensive arrangement of assumptions necessary to engage in causal reasoning at the desired level of the hierarchy. Therefore, the assumptions underlying each of these causal concepts will be emphasized and their concomitant graphical components will be examined. We show which assumptions are necessary to bridge the gaps between causal discovery, causal identification and causal inference from a parametric and a non-parametric perspective. Finally, this paper points to further research areas related to the strong assumptions that researchers have glibly adopted to take part in causal discovery, causal identification and causal inference.

Journal Article

Share this book

Add to My Shelf

On causal discovery with an equal-variance assumption

by WANG, Y. SAMUEL , CHEN, WENYU , DRTON, MATHIAS in Miscellanea , Multivariate statistical analysis , Structural equation modeling

2019

Prior work has shown that causal structure can be uniquely identified from observational data when these follow a structural equation model whose error terms have equal variance. We show that this fact is implied by an ordering among conditional variances. We demonstrate that ordering estimates of these variances yields a simple yet state-of-the-art method for causal structure learning that is readily extendable to high-dimensional problems.

Journal Article

Share this book

Add to My Shelf

Unsuitability of NOTEARS for Causal Graph Discovery when Dealing with Dimensional Quantities

by Kaiser, Marcus , Sipos, Maksim in Algorithms , Artificial Intelligence , Complex Systems

2022

Causal discovery methods aim to identify a DAG structure that represents causal relationships from observational data. In this article, we stress that it is important to test such methods for robustness in practical settings. As our main example, we analyze the NOTEARS method, for which we demonstrate a lack of scale-invariance. We show that NOTEARS is a method that aims to identify a parsimonious DAG from the data that explains the residual variance. We conclude that NOTEARS is not suitable for identifying truly causal relationships from the data for dimensional quantities.

Journal Article

Share this book

Add to My Shelf

Causal inference in genetic trio studies

by Bates, Stephen , Sesia, Matteo , Candès, Emmanuel in Biological Sciences , Genetic Association Studies , Genetic Techniques

2020

We introduce a method to draw causal inferences—inferences immune to all possible confounding—from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by developing a conditional independence test that identifies regions of the genome containing distinct causal variants. The proposed digital twin test compares an observed offspring to carefully constructed synthetic offspring from the same parents to determine statistical significance, and it can leverage any black-box multivariate model and additional nontrio genetic data to increase power. Crucially, our inferences are based only on a well-established mathematical model of recombination and make no assumptions about the relationship between the genotypes and phenotypes. We compare our method to the widely used transmission disequilibrium test and demonstrate enhanced power and localization.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter