Catalogue Search | MBRL

Hierarchical reinforcement learning via dynamic subspace search for multi-agent planning

by Ouimet, Michael , Cortés, Jorge , Ma, Aaron in Algorithms , Computer simulation , Markov processes

2020

We consider scenarios where a swarm of unmanned vehicles (UxVs) seek to satisfy a number of diverse, spatially distributed objectives. The UxVs strive to determine an efficient plan to service the objectives while operating in a coordinated fashion. We focus on developing autonomous high-level planning, where low-level controls are leveraged from previous work in distributed motion, target tracking, localization, and communication. We rely on the use of state and action abstractions in a Markov decision processes framework to introduce a hierarchical algorithm, Dynamic Domain Reduction for Multi-Agent Planning, that enables multi-agent planning for large multi-objective environments. Our analysis establishes the correctness of our search procedure within specific subsets of the environments, termed ‘sub-environment’ and characterizes the algorithm performance with respect to the optimal trajectories in single-agent and sequential multi-agent deployment scenarios using tools from submodularity. Simulated results show significant improvement over using a standard Monte Carlo tree search in an environment with large state and action spaces.

Journal Article

Share this book

Add to My Shelf

Analysis of a time–cost trade-off in a resource-constrained GERT project scheduling problem using the Markov decision process

by Sadri, Shadi , Ghomi, S. M. T. Fatemi , Dehghanian, Amin in Cost analysis , Distribution functions , Genetic algorithms

2024

Nowadays the advent of new types of projects such as startups, maintenance, and education make a revolution in project management, so that, classical project scheduling methods are incapable in analyzing of these stochastic projects. This study considers a time–cost trade-off project scheduling problem, where the structure of the project is uncertain. To deal with the uncertainties, we implemented Graphical Evaluation and Review Technique (GERT). The main aim of the study is to balance time and the amount of a non-renewable resource allocated to each activity considering the finite-time horizon and resource limitations. To preserve the generality of the model, we considered both discrete and continuous distribution functions for the activity’s duration. From a methodological standpoint, we proposed an analytical approach based on the Markov Decision Process (MDP) and Semi-Markov Decision Process (SMDP) to find the probability distribution of project makespan. These models are solved using the value iteration and a finite-horizon Linear Programming (LP) model. Two randomly generated examples explain the value iteration for models in detail. Furthermore, seven example groups each with five instances are adopted from a well-known data set, PSPLIB, to validate the efficiency of the proposed models in contrast to the two extensively-studied methods, Genetic algorithm (GA) and Monte-Carlo simulation. The convergence of the GA and simulation results to those of MDP and SMDP represent the efficiency of the proposed models. Besides, conducting a sensitivity analysis on the project completion probability with respect to the available resource, gives a good insight to managers to plan their resources.

Journal Article

Share this book

Add to My Shelf

Decision-making under uncertainty: beyond probabilities

by Suilen, Marnix , Simão, Thiago D. , Jansen, Nils in Computer Science , Explanation Paradigms Leveraging Analytic Intuition , Software Engineering

2023

This position paper reflects on the state-of-the-art in decision-making under uncertainty. A classical assumption is that probabilities can sufficiently capture all uncertainty in a system. In this paper, the focus is on the uncertainty that goes beyond this classical interpretation, particularly by employing a clear distinction between aleatoric and epistemic uncertainty. The paper features an overview of Markov decision processes (MDPs) and extensions to account for partial observability and adversarial behavior. These models sufficiently capture aleatoric uncertainty, but fail to account for epistemic uncertainty robustly. Consequently, we present a thorough overview of so-called uncertainty models that exhibit uncertainty in a more robust interpretation. We show several solution techniques for both discrete and continuous models, ranging from formal verification, over control-based abstractions, to reinforcement learning. As an integral part of this paper, we list and discuss several key challenges that arise when dealing with rich types of uncertainty in a model-based fashion.

Journal Article

Share this book

Add to My Shelf

Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems: Part 1—Fundamentals and Applications in Games, Robotics and Natural Language Processing

by Xiang, Xuanchen , Foo, Simon in Algorithms , Deep learning , deep reinforcement learning

2021

The first part of a two-part series of papers provides a survey on recent advances in Deep Reinforcement Learning (DRL) applications for solving partially observable Markov decision processes (POMDP) problems. Reinforcement Learning (RL) is an approach to simulate the human’s natural learning process, whose key is to let the agent learn by interacting with the stochastic environment. The fact that the agent has limited access to the information of the environment enables AI to be applied efficiently in most fields that require self-learning. Although efficient algorithms are being widely used, it seems essential to have an organized investigation—we can make good comparisons and choose the best structures or algorithms when applying DRL in various applications. In this overview, we introduce Markov Decision Processes (MDP) problems and Reinforcement Learning and applications of DRL for solving POMDP problems in games, robotics, and natural language processing. A follow-up paper will cover applications in transportation, communications and networking, and industries.

Journal Article

Share this book

Add to My Shelf

Explicit Explore, Exploit, or Escape (E4): near-optimal safety-constrained reinforcement learning in polynomial time

by Bossens, David M. , Bishop, Nicholas in Algorithms , Artificial Intelligence , Computer Science

2023

In reinforcement learning (RL), an agent must explore an initially unknown environment in order to learn a desired behaviour. When RL agents are deployed in real world environments, safety is of primary concern. Constrained Markov decision processes (CMDPs) can provide long-term safety constraints; however, the agent may violate the constraints in an effort to explore its environment. This paper proposes a model-based RL algorithm called Explicit Explore, Exploit, or Escape ( E 4 ), which extends the Explicit Explore or Exploit ( E 3 ) algorithm to a robust CMDP setting. E 4 explicitly separates exploitation, exploration, and escape CMDPs, allowing targeted policies for policy improvement across known states, discovery of unknown states, as well as safe return to known states. E 4 robustly optimises these policies on the worst-case CMDP from a set of CMDP models consistent with the empirical observations of the deployment environment. Theoretical results show that E 4 finds a near-optimal constraint-satisfying policy in polynomial time whilst satisfying safety constraints throughout the learning process. We then discuss E 4 as a practical algorithmic framework, including robust-constrained offline optimisation algorithms, the design of uncertainty sets for the transition dynamics of unknown states, and how to further leverage empirical observations and prior knowledge to relax some of the worst-case assumptions underlying the theory.

Journal Article

Share this book

Add to My Shelf

Risk-Sensitivity Vanishing Limit for Controlled Markov Processes

by Chen, Jinwen , Dai, Yanan in Banach spaces , Calculus of Variations and Optimal Control; Optimization , Control

2023

In this paper, we prove that the optimal risk-sensitive reward for Markov decision processes with compact state space and action space converges to the optimal average reward as the risk-sensitive factor tends to 0. In doing so, a variational formula for the optimal risk-sensitive reward is derived. An extension of the Kreĭn-Rutman Theorem to certain nonlinear operators is involved. Based on these, partially observable Markov decision processes are also investigated. A portfolio optimization problem is presented as an example of an application of the approach, in which a duality-relation between the maximization of risk-sensitive reward and the maximization of upside chance for out-performance over the optimal average reward is established.

Journal Article

Share this book

Add to My Shelf

Optimization methods to solve adaptive management problems

by Chadès, Iadine , Pichancourt, Jean-Baptiste , Péron, Martin in Adaptive management , Biodiversity and Ecology , Biomedical and Life Sciences

2017

Determining the best management actions is challenging when critical information is missing. However, urgency and limited resources require that decisions must be made despite this uncertainty. The best practice method for managing uncertain systems is adaptive management, or learning by doing. Adaptive management problems can be solved optimally using decision-theoretic methods; the challenge for these methods is to represent current and future knowledge using easy-to-optimize representations. Significant methodological advances have been made since the seminal adaptive management work was published in the 1980s, but despite recent advances, guidance for implementing these approaches has been piecemeal and study-specific. There is a need to collate and summarize new work. Here, we classify methods and update the literature with the latest optimal or near-optimal approaches for solving adaptive management problems. We review three mathematical concepts required to solve adaptive management problems: Markov decision processes, sufficient statistics, and Bayes’ theorem. We provide a decision tree to determine whether adaptive management is appropriate and then group adaptive management approaches based on whether they learn only from the past (passive) or anticipate future learning (active). We discuss the assumptions made when using existing models and provide solution algorithms for each approach. Finally, we propose new areas of development that could inspire future research. For a long time, limited by the efficiency of the solution methods, recent techniques to efficiently solve partially observable decision problems now allow us to solve more realistic adaptive management problems such as imperfect detection and non-stationarity in systems.

Journal Article

Share this book

Add to My Shelf

Autonomous Vehicle Decision-Making with Policy Prediction for Handling a Round Intersection

by Aksun-Guvenc, Bilin , Li, Xinchen , Guvenc, Levent in Algorithms , Autonomous vehicles , Cities

2023

Autonomous shuttles have been used as end-mile solutions for smart mobility in smart cities. The urban driving conditions of smart cities with many other actors sharing the road and the presence of intersections have posed challenges to the use of autonomous shuttles. Round intersections are more challenging because it is more difficult to perceive the other vehicles in and near the intersection. Thus, this paper focuses on the decision-making of autonomous vehicles for handling round intersections. The round intersection is introduced first, followed by introductions of the Markov Decision Process (MDP), the Partially Observable Markov Decision Process (POMDP) and the Object-Oriented Partially Observable Markov Decision Process (OOPOMDP), which are used for decision-making with uncertain knowledge of the motion of the other vehicles. The Partially Observable Monte-Carlo Planning (POMCP) algorithm is used as the solution method and OOPOMDP is applied to the decision-making of autonomous vehicles in round intersections. Decision-making is formulated first as a POMDP problem, and the penalty function is formulated and set accordingly. This is followed by an improvement in decision-making with policy prediction. Augmented objective state and policy-based state transition are introduced, and simulations are used to demonstrate the effectiveness of the proposed method for collision-free handling of round intersections by the ego vehicle.

Journal Article

Share this book

Add to My Shelf

On Risk-Sensitive Piecewise Deterministic Markov Decision Processes

by Guo, Xin , Zhang, Yi in Insurance claims , Inventory management , Iterative algorithms

2020

We consider a piecewise deterministic Markov decision process, where the expected exponential utility of total (nonnegative) cost is to be minimized. The cost rate, transition rate and post-jump distributions are under control. The state space is Borel, and the transition and cost rates are locally integrable along the drift. Under natural conditions, we establish the optimality equation, justify the value iteration algorithm, and show the existence of a deterministic stationary optimal policy. Applied to special cases, the obtained results already significantly improve some existing results in the literature on finite horizon and infinite horizon discounted risk-sensitive continuous-time Markov decision processes.

Journal Article

Share this book

Add to My Shelf

Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems Part 2—Applications in Transportation, Industries, Communications and Networking and More Topics

by Xiang, Xuanchen , Zang, Huanyu , Foo, Simon in Algorithms , Automation , Behavior

2021

The two-part series of papers provides a survey on recent advances in Deep Reinforcement Learning (DRL) for solving partially observable Markov decision processes (POMDP) problems. Reinforcement Learning (RL) is an approach to simulate the human’s natural learning process, whose key is to let the agent learn by interacting with the stochastic environment. The fact that the agent has limited access to the information of the environment enables AI to be applied efficiently in most fields that require self-learning. It’s essential to have an organized investigation—we can make good comparisons and choose the best structures or algorithms when applying DRL in various applications. The first part of the overview introduces Markov Decision Processes (MDP) problems and Reinforcement Learning and applications of DRL for solving POMDP problems in games, robotics, and natural language processing. In part two, we continue to introduce applications in transportation, industries, communications and networking, etc. and discuss the limitations of DRL.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter