Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
115
result(s) for
"Yang, Tianpei"
Sort by:
Spatial analysis with SPIAT and spaSim to characterize and simulate tissue microenvironments
2023
Spatial proteomics technologies have revealed an underappreciated link between the location of cells in tissue microenvironments and the underlying biology and clinical features, but there is significant lag in the development of downstream analysis methods and benchmarking tools. Here we present SPIAT (spatial image analysis of tissues), a spatial-platform agnostic toolkit with a suite of spatial analysis algorithms, and spaSim (spatial simulator), a simulator of tissue spatial data. SPIAT includes multiple colocalization, neighborhood and spatial heterogeneity metrics to characterize the spatial patterns of cells. Ten spatial metrics of SPIAT are benchmarked using simulated data generated with spaSim. We show how SPIAT can uncover cancer immune subtypes correlated with prognosis in cancer and characterize cell dysfunction in diabetes. Our results suggest SPIAT and spaSim as useful tools for quantifying spatial patterns, identifying and validating correlates of clinical outcomes and supporting method development.
Spatial proteomic data serve to provide cell-level location information for the extraction of biological features from tissues, but analyzing such data can be difficult. Here the authors report the development of SPIAT for data analyses and spaSim for simulation and validation of methods to help bridge the gap between the technology and its translation.
Journal Article
A survey on interpretable reinforcement learning
by
Glanois, Claire
,
Zimmer, Matthieu
,
Yang, Tianpei
in
Accountability
,
Algorithms
,
Artificial Intelligence
2024
Although deep reinforcement learning has become a promising machine learning approach for sequential decision-making problems, it is still not mature enough for high-stake domains such as autonomous driving or medical applications. In such contexts, a learned policy needs for instance to be interpretable, so that it can be inspected before any deployment (e.g., for safety and verifiability reasons). This survey provides an overview of various approaches to achieve higher interpretability in reinforcement learning (RL). To that aim, we distinguish interpretability (as an intrinsic property of a model) and explainability (as a post-hoc operation) and discuss them in the context of RL with an emphasis on the former notion. In particular, we argue that interpretable RL may embrace different facets: interpretable inputs, interpretable (transition/reward) models, and interpretable decision-making. Based on this scheme, we summarize and analyze recent work related to interpretable RL with an emphasis on papers published in the past 10 years. We also discuss briefly some related research areas and point to some potential promising research directions, notably related to the recent development of foundation models (e.g., large language models, RL from human feedback).
Journal Article
Human-in-the-Loop Reinforcement Learning: A Survey and Position on Requirements, Challenges, and Opportunities
by
Retzlaff, Carl Orge
,
Saranti, Anna
,
Holzinger, Andreas
in
Agents (artificial intelligence)
,
Artificial intelligence
,
Explainable artificial intelligence
2024
Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to enable agents to learn and perform tasks autonomously with superhuman performance. However, we consider RL as fundamentally a Human-in-the-Loop (HITL) paradigm, even when an agent eventually performs its task autonomously. In cases where the reward function is challenging or impossible to define, HITL approaches are considered particularly advantageous. The application of Reinforcement Learning from Human Feedback (RLHF) in systems such as ChatGPT demonstrates the effectiveness of optimizing for user experience and integrating their feedback into the training loop. In HITL RL, human input is integrated during the agent’s learning process, allowing iterative updates and fine-tuning based on human feedback, thus enhancing the agent’s performance. Since the human is an essential part of this process, we argue that human-centric approaches are the key to successful RL, a fact that has not been adequately considered in the existing literature. This paper aims to inform readers about current explainability methods in HITL RL. It also shows how the application of explainable AI (xAI) and specific improvements to existing explainability approaches can enable a better human-agent interaction in HITL RL for all types of users, whether for lay people, domain experts, or machine learning specialists. Accounting for the workflow in HITL RL and based on software and machine learning methodologies, this article identifies four phases for human involvement for creating HITL RL systems: (1) Agent Development, (2) Agent Learning, (3) Agent Evaluation, and (4) Agent Deployment. We highlight human involvement, explanation requirements, new challenges, and goals for each phase. We furthermore identify low-risk, high-return opportunities for explainability research in HITL RL and present long-term research goals to advance the field. Finally, we propose a vision of human-robot collaboration that allows both parties to reach their full potential and cooperate effectively.
Journal Article
668 A toolkit for the quantitative analysis of the spatial distribution of cells of the tumor immune microenvironment
2020
BackgroundSpatial technologies that query the location of cells in tissues such as multiplex immunohistochemistry and spatial transcriptomics are gaining popularity and are likely to become commonplace. The resulting data often includes the X, Y coordinates of millions of cells, cell phenotypes and marker or gene expression levels. In cancer, the spatial location of lymphocytes has been linked to prognosis and response to immunotherapy. While these advances have been exciting for the field, the methods currently being used are still coarse, making us severely underpowered in our ability to extract quantifiable information. Appropriate quantitative tools are desperately needed to refine and uncover novel biologically and clinically meaningful insights from the spatial distribution of cells of the tumor immune microenvironment.MethodsWe compiled over 60 prostate cancer and melanoma FFPE tumor sections and performed Opal multiplex immunohistochemistry for a diversity of T-cell and other immune markers, including CD3, CD4, CD8, FOXP3 and PDL1, as well as a prostate cancer (AMACR) or melanoma (SOX10) marker and DAPI. Following spectral imaging on the Vectra Polaris, we performed cell and tissue segmentation and phenotyping with the inForm or HALO image analysis software. The detected X, Y coordinates of cells and marker intensities were used for subsequent method development.ResultsWe developed SPIAT (Spatial Image Analysis of Tissues)1, an R package with a suite of data processing, quality control, visualization, data handling and data analysis tools for spatial data. SPIAT includes our novel algorithms for the identification of cell clusters, tumor margins and cell gradients, the calculation of neighborhood proportions and algorithms for the prediction of cell phenotypes. By interfacing with packages used in ecology, geographic data analysis and spatial statistics, we have begun to robustly address fundamental questions in the analysis of cell spatial data, such as metrics to measure mixing between cell types, the identification of tumor borders and statistical approaches to compare samples.ConclusionsSPIAT is compatible with multiplex immunohistochemistry, spatial transcriptomics and data generated from other spatial platforms, and continues to be actively developed. We expect SPIAT to become a user-friendly and speedy go-to package for the spatial analysis of cells in tissues, as well as promote the use of quantitative metrics in the spatial analysis of the tumor immune microenvironment.ReferencesTianpei Yang, Volkan Ozcoban, Anu Pasam, Nikolce Kocovski, Angela Pizzolla, Yu-Kuan Huang, Greg Bass, Simon P. Keam, Paul J. Neeson, Shahneen K. Sandhu, David L. Goode, Anna S. Trigos. SPIAT: An R package for the Spatial Image Analysis of Cells in Tissues. BioRxiv doi: https://doi.org/10.1101/2020.05.28.122614
Journal Article
ASN: action semantics network for multiagent reinforcement learning
by
Wang, Weixun
,
Yang, Tianpei
,
Taylor, Matthew E.
in
Algorithms
,
Artificial Intelligence
,
Computer & video games
2023
In multiagent systems (MASs), each agent makes individual decisions but all contribute globally to the system’s evolution. Learning in MASs is difficult since each agent’s selection of actions must take place in the presence of other co-learning agents. Moreover, the environmental stochasticity and uncertainties increase exponentially with the number of agents. Previous works borrow various multiagent coordination mechanisms for use in deep learning architectures to facilitate multiagent coordination. However, none of them explicitly consider that different actions can have different influence on other agents, which we call the action semantics. In this paper, we propose a novel network architecture, named Action Semantics Network (ASN), that explicitly represents such action semantics between agents. ASN characterizes different actions’ influence on other agents using neural networks based on the action semantics between them. ASN can be easily combined with existing deep reinforcement learning (DRL) algorithms to boost their performance. Experimental results on StarCraft II micromanagement and Neural MMO show that ASN significantly improves the performance of state-of-the-art DRL approaches, compared with several other network architectures. We also successfully deploy ASN to a popular online MMORPG game called Justice Online, which indicates a promising future for ASN to be applied in even more complex scenarios.
Journal Article
Faster Game Solving via Asymmetry of Step Sizes
2025
Counterfactual Regret Minimization (CFR) algorithms are widely used to compute a Nash equilibrium (NE) in two-player zero-sum imperfect-information extensive-form games (IIGs). Among them, Predictive CFR\\(^+\\) (PCFR\\(^+\\)) is particularly powerful, achieving an exceptionally fast empirical convergence rate via the prediction in many games.However, the empirical convergence rate of PCFR\\(^+\\) would significantly degrade if the prediction is inaccurate, leading to unstable performance on certain IIGs. To enhance the robustness of PCFR\\(^+\\), we propose Asymmetric PCFR\\(^+\\) (APCFR\\(^+\\)), which employs an adaptive asymmetry of step sizes between the updates of implicit and explicit accumulated counterfactual regrets to mitigate the impact of the prediction inaccuracy on convergence. We present a theoretical analysis demonstrating why APCFR\\(^+\\) can enhance the robustness. To the best of our knowledge, we are the first to propose the asymmetry of step sizes, a simple yet novel technique that effectively improves the robustness of PCFR\\(^+\\). Then, to reduce the difficulty of implementing APCFR\\(^+\\) caused by the adaptive asymmetry, we propose a simplified version of APCFR\\(^+\\) called Simple APCFR\\(^+\\) (SAPCFR\\(^+\\)), which uses a fixed asymmetry of step sizes to enable only a single-line modification compared to original PCFR\\(^+\\).Experimental results on five standard IIG benchmarks and two heads-up no-limit Texas Hold' em (HUNL) Subagems show that (i) both APCFR\\(^+\\) and SAPCFR\\(^+\\) outperform PCFR\\(^+\\) in most of the tested games, (ii) SAPCFR\\(^+\\) achieves a comparable empirical convergence rate with APCFR\\(^+\\),and (iii) our approach can be generalized to improve other CFR algorithms, e.g., Discount CFR (DCFR).
Causal Information Prioritization for Efficient Reinforcement Learning
2025
Current Reinforcement Learning (RL) methods often suffer from sample-inefficiency, resulting from blind exploration strategies that neglect causal relationships among states, actions, and rewards. Although recent causal approaches aim to address this problem, they lack grounded modeling of reward-guided causal understanding of states and actions for goal-orientation, thus impairing learning efficiency. To tackle this issue, we propose a novel method named Causal Information Prioritization (CIP) that improves sample efficiency by leveraging factored MDPs to infer causal relationships between different dimensions of states and actions with respect to rewards, enabling the prioritization of causal information. Specifically, CIP identifies and leverages causal relationships between states and rewards to execute counterfactual data augmentation to prioritize high-impact state features under the causal understanding of the environments. Moreover, CIP integrates a causality-aware empowerment learning objective, which significantly enhances the agent's execution of reward-guided actions for more efficient exploration in complex environments. To fully assess the effectiveness of CIP, we conduct extensive experiments across 39 tasks in 5 diverse continuous control environments, encompassing both locomotion and manipulation skills learning with pixel-based and sparse reward settings. Experimental results demonstrate that CIP consistently outperforms existing RL methods across a wide range of scenarios.
Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain
2023
Deep Reinforcement Learning (DRL) and Deep Multi-agent Reinforcement Learning (MARL) have achieved significant successes across a wide range of domains, including game AI, autonomous vehicles, robotics, and so on. However, DRL and deep MARL agents are widely known to be sample inefficient that millions of interactions are usually needed even for relatively simple problem settings, thus preventing the wide application and deployment in real-industry scenarios. One bottleneck challenge behind is the well-known exploration problem, i.e., how efficiently exploring the environment and collecting informative experiences that could benefit policy learning towards the optimal ones. This problem becomes more challenging in complex environments with sparse rewards, noisy distractions, long horizons, and non-stationary co-learners. In this paper, we conduct a comprehensive survey on existing exploration methods for both single-agent and multi-agent RL. We start the survey by identifying several key challenges to efficient exploration. Beyond the above two main branches, we also include other notable exploration methods with different ideas and techniques. In addition to algorithmic analysis, we provide a comprehensive and unified empirical comparison of different exploration methods for DRL on a set of commonly used benchmarks. According to our algorithmic and empirical investigation, we finally summarize the open problems of exploration in DRL and deep MARL and point out a few future directions.
Multi-Agent Reinforcement Learning with Communication-Constrained Priors
2026
Communication is one of the effective means to improve the learning of cooperative policy in multi-agent systems. However, in most real-world scenarios, lossy communication is a prevalent issue. Existing multi-agent reinforcement learning with communication, due to their limited scalability and robustness, struggles to apply to complex and dynamic real-world environments. To address these challenges, we propose a generalized communication-constrained model to uniformly characterize communication conditions across different scenarios. Based on this, we utilize it as a learning prior to distinguish between lossy and lossless messages for specific scenarios. Additionally, we decouple the impact of lossy and lossless messages on distributed decision-making, drawing on a dual mutual information estimatior, and introduce a communication-constrained multi-agent reinforcement learning framework, quantifying the impact of communication messages into the global reward. Finally, we validate the effectiveness of our approach across several communication-constrained benchmarks.
Multi-Agent Reinforcement Learning with Communication-Constrained Priors
2025
Communication is one of the effective means to improve the learning of cooperative policy in multi-agent systems. However, in most real-world scenarios, lossy communication is a prevalent issue. Existing multi-agent reinforcement learning with communication, due to their limited scalability and robustness, struggles to apply to complex and dynamic real-world environments. To address these challenges, we propose a generalized communication-constrained model to uniformly characterize communication conditions across different scenarios. Based on this, we utilize it as a learning prior to distinguish between lossy and lossless messages for specific scenarios. Additionally, we decouple the impact of lossy and lossless messages on distributed decision-making, drawing on a dual mutual information estimatior, and introduce a communication-constrained multi-agent reinforcement learning framework, quantifying the impact of communication messages into the global reward. Finally, we validate the effectiveness of our approach across several communication-constrained benchmarks.