Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Reading LevelReading Level
-
Content TypeContent Type
-
YearFrom:-To:
-
More FiltersMore FiltersItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
26,178
result(s) for
"Dynamic programming."
Sort by:
Stochastic dual dynamic integer programming
2019
Multistage stochastic integer programming (MSIP) combines the difficulty of uncertainty, dynamics, and non-convexity, and constitutes a class of extremely challenging problems. A common formulation for these problems is a dynamic programming formulation involving nested cost-to-go functions. In the linear setting, the cost-to-go functions are convex polyhedral, and decomposition algorithms, such as nested Benders’ decomposition and its stochastic variant, stochastic dual dynamic programming (SDDP), which proceed by iteratively approximating these functions by cuts or linear inequalities, have been established as effective approaches. However, it is difficult to directly adapt these algorithms to MSIP due to the nonconvexity of integer programming value functions. In this paper we propose an extension to SDDP—called stochastic dual dynamic integer programming (SDDiP)—for solving MSIP problems with binary state variables. The crucial component of the algorithm is a new reformulation of the subproblems in each stage and a new class of cuts, termed Lagrangian cuts, derived from a Lagrangian relaxation of a specific reformulation of the subproblems in each stage, where local copies of state variables are introduced. We show that the Lagrangian cuts satisfy a tightness condition and provide a rigorous proof of the finite convergence of SDDiP with probability one. We show that, under fairly reasonable assumptions, an MSIP problem with general state variables can be approximated by one with binary state variables to desired precision with only a modest increase in problem size. Thus our proposed SDDiP approach is applicable to very general classes of MSIP problems. Extensive computational experiments on three classes of real-world problems, namely electric generation expansion, financial portfolio management, and network revenue management, show that the proposed methodology is very effective in solving large-scale multistage stochastic integer optimization problems.
Journal Article
Robust Dual Dynamic Programming
by
Wiesemann, Wolfram
,
Georghiou, Angelos
,
Tsoukalas, Angelos
in
Algorithms
,
Cattle
,
Chemical process industries
2019
In the paper “Robust Dual Dynamic Programming,” Angelos Georghiou, Angelos Tsoukalas, and Wolfram Wiesemann propose a novel solution scheme for addressing planning problems with long horizons. Such problems can be formulated as multistage robust optimization problems. The proposed method takes advantage of the decomposable nature of these problems by bounding the costs arising in the future stages through lower and upper cost-to-go functions. The proposed scheme does not require a relatively complete recourse, and it offers deterministic upper and lower bounds throughout the execution of the algorithm. The promising performance of the algorithm is shown in a stylized inventory-management problem in which the proposed algorithm achieved the optimal solution in problem instances with 100 time stages in a few minutes.
Multistage robust optimization problems, where the decision maker can dynamically react to consecutively observed realizations of the uncertain problem parameters, pose formidable theoretical and computational challenges. As a result, the existing solution approaches for this problem class typically determine suboptimal solutions under restrictive assumptions. In this paper, we propose a
robust dual dynamic programming
(RDDP) scheme for multistage robust optimization problems. The RDDP scheme takes advantage of the decomposable nature of these problems by bounding the costs arising in the future stages through lower and upper cost-to-go functions. For problems with uncertain technology matrices and/or constraint right-hand sides, our RDDP scheme determines an optimal solution in finite time. Also, if the objective function and/or the recourse matrices are uncertain, our method converges asymptotically (but deterministically) to an optimal solution. Our RDDP scheme does not require a relatively complete recourse, and it offers deterministic upper and lower bounds throughout the execution of the algorithm. We show the promising performance of our algorithm in a stylized inventory management problem.
Supplemental material is available at
https://doi.org/10.1287/opre.2018.1835
.
Journal Article
Relaxations of Weakly Coupled Stochastic Dynamic Programs
2008
We consider a broad class of stochastic dynamic programming problems that are amenable to relaxation via decomposition. These problems comprise multiple subproblems that are independent of each other except for a collection of coupling constraints on the action space. We fit an additively separable value function approximation using two techniques, namely, Lagrangian relaxation and the linear programming (LP) approach to approximate dynamic programming. We prove various results comparing the relaxations to each other and to the optimal problem value. We also provide a column generation algorithm for solving the LP-based relaxation to any desired optimality tolerance, and we report on numerical experiments on bandit-like problems. Our results provide insight into the complexity versus quality trade-off when choosing which of these relaxations to implement.
Journal Article
Modified model free dynamic programming :an augmented approach for unmanned aerial vehicle
by
Din, Adnan Fayyaz Ud
,
Habib, Muzaffar
,
Mir, Imran
in
Algorithms
,
Changing environments
,
Dynamic programming
2023
The design complexities of trending UAVs necessitates formulation of C ontrol L aws that are both robust and model-free besides being self-capable of handling the evolving dynamic environments. In this research, a unique intelligent control architecture is presented which aims at maximizing the glide range of an experimental UAV having unconventional controls. To handle control complexities, while keeping them computationally acceptable, a distinct RL technique namely Modified Model Free Dynamic Programming (MMDP) is proposed. The methodology is novel as RL based Dynamic Programming algorithm has been specifically modified to configure the problem in continuous state and control space domains without knowledge of the underline UAV model dynamics. Major challenge during the research was the development of a suitable reward function which helps in achieving the desired objective of maximising the glide performance. The efficacy of the results and performance characteristics, demonstrated the ability of the presented algorithm to dynamically adapt to the changing environment, thereby making it suitable for UAV applications. Non-linear simulations performed under different environmental and varying initial conditions demonstrated the effectiveness of the proposed methodology over the conventional classical approaches.
Journal Article
Reductions of Approximate Linear Programs for Network Revenue Management
2015
The linear programming approach to approximate dynamic programming has received considerable attention in the recent network revenue management literature. A major challenge of the approach lies in solving the resulting approximate linear programs (ALPs), which often have a huge number of constraints and/or variables. We show that the ALPs can be dramatically reduced in size for both affine and separable piecewise linear approximations to network revenue management problems, under both independent and discrete choice models of demand. Our key result is the equivalence between each ALP and a corresponding reduced program, which is more compact in size and admits an intuitive probabilistic interpretation. For the affine approximation to network revenue management under an independent demand model, we recover an equivalence result known in the literature, but provide an alternative proof. Our other equivalence results are new. We test the numerical performance of solving the reduced programs directly using off-the-shelf commercial solvers on a set of test instances taken from the literature.
Journal Article
Fairness, Efficiency, and Flexibility in Organ Allocation for Kidney Transplantation
by
Bertsimas, Dimitris
,
Trichakis, Nikolaos
,
Farias, Vivek F.
in
Allocations
,
applications
,
Blood & organ donations
2013
We propose a scalable, data-driven method for designing national policies for the allocation of deceased donor kidneys to patients on a waiting list in a fair and efficient way. We focus on policies that have the same form as the one currently used in the United States. In particular, we consider policies that are based on a point system that ranks patients according to some priority criteria, e.g., waiting time, medical urgency, etc., or a combination thereof. Rather than making specific assumptions about fairness principles or priority criteria, our method offers the designer the flexibility to select his desired criteria and fairness constraints from a broad class of allowable constraints. The method then designs a point system that is based on the selected priority criteria and approximately maximizes medical efficiency-i.e., life-year gains from transplant-while simultaneously enforcing selected fairness constraints.
Among the several case studies we present employing our method, one case study designs a point system that has the same form, uses the same criteria, and satisfies the same fairness constraints as the point system that was recently proposed by U.S. policy makers. In addition, the point system we design delivers an 8% increase in extra life-year gains. We evaluate the performance of all policies under consideration using the same statistical and simulation tools and data as the U.S. policy makers use. Other case studies perform a sensitivity analysis (for instance, demonstrating that the increase in extra life-year gains by relaxing certain fairness constraints can be as high as 30%) and also pursue the design of policies targeted specifically at remedying criticisms leveled at the recent point system proposed by U.S. policy makers.
Journal Article
A feasibility-driven approach to control-limited DDP
by
Mastalli, Carlos
,
Merkt, Wolfgang
,
Vijayakumar, Sethu
in
Algorithms
,
Convergence
,
Dynamic programming
2022
Differential dynamic programming (DDP) is a direct single shooting method for trajectory optimization. Its efficiency derives from the exploitation of temporal structure (inherent to optimal control problems) and explicit roll-out/integration of the system dynamics. However, it suffers from numerical instability and, when compared to direct multiple shooting methods, it has limited initialization options (allows initialization of controls, but not of states) and lacks proper handling of control constraints. In this work, we tackle these issues with a feasibility-driven approach that regulates the dynamic feasibility during the numerical optimization and ensures control limits. Our feasibility search emulates the numerical resolution of a direct multiple shooting problem with only dynamics constraints. We show that our approach (named Box-FDDP) has better numerical convergence than Box-DDP+ (a single shooting method), and that its convergence rate and runtime performance are competitive with state-of-the-art direct transcription formulations solved using the interior point and active set algorithms available in Knitro. We further show that Box-FDDP decreases the dynamic feasibility error monotonically—as in state-of-the-art nonlinear programming algorithms. We demonstrate the benefits of our approach by generating complex and athletic motions for quadruped and humanoid robots. Finally, we highlight that Box-FDDP is suitable for model predictive control in legged robots.
Journal Article
Robust adaptive dynamic programming for linear and nonlinear systems: An overview
2013
The field of adaptive dynamic programming with diverse applications in control engineering has undergone rapid progress over the past few years. A new theory called “Robust Adaptive Dynamic Programming” (for short, RADP) is developed for the design of robust optimal controllers for linear and nonlinear systems subject to both parametric and dynamic uncertainties. A central objective of this paper is to give a brief overview of our recent contributions to the development of the theory of RADP and to outline its potential applications in engineering and biology.
Journal Article
An Approximate Dynamic Programming Algorithm for Monotone Value Functions
2015
Many sequential decision problems can be formulated as Markov decision processes (MDPs) where the optimal value function (or
cost-to-go
function) can be shown to satisfy a monotone structure in some or all of its dimensions. When the state space becomes large, traditional techniques, such as the backward dynamic programming algorithm (i.e., backward induction or value iteration), may no longer be effective in finding a solution within a reasonable time frame, and thus we are forced to consider other approaches, such as approximate dynamic programming (ADP). We propose a provably convergent ADP algorithm called
Monotone-ADP
that exploits the monotonicity of the value functions to increase the rate of convergence. In this paper, we describe a general finite-horizon problem setting where the optimal value function is monotone, present a convergence proof for Monotone-ADP under various technical assumptions, and show numerical results for three application domains:
optimal stopping
,
energy storage
/
allocation
, and
glycemic control for diabetes patients
. The empirical results indicate that by taking advantage of monotonicity, we can attain high quality solutions within a relatively small number of iterations, using up to two orders of magnitude less computation than is needed to compute the optimal solution exactly.
Journal Article