Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Language
      Language
      Clear All
      Language
  • Subject
      Subject
      Clear All
      Subject
  • Item Type
      Item Type
      Clear All
      Item Type
  • Discipline
      Discipline
      Clear All
      Discipline
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
130 result(s) for "dynamic programming/optimal control"
Sort by:
Relaxations of Weakly Coupled Stochastic Dynamic Programs
We consider a broad class of stochastic dynamic programming problems that are amenable to relaxation via decomposition. These problems comprise multiple subproblems that are independent of each other except for a collection of coupling constraints on the action space. We fit an additively separable value function approximation using two techniques, namely, Lagrangian relaxation and the linear programming (LP) approach to approximate dynamic programming. We prove various results comparing the relaxations to each other and to the optimal problem value. We also provide a column generation algorithm for solving the LP-based relaxation to any desired optimality tolerance, and we report on numerical experiments on bandit-like problems. Our results provide insight into the complexity versus quality trade-off when choosing which of these relaxations to implement.
Fairness, Efficiency, and Flexibility in Organ Allocation for Kidney Transplantation
We propose a scalable, data-driven method for designing national policies for the allocation of deceased donor kidneys to patients on a waiting list in a fair and efficient way. We focus on policies that have the same form as the one currently used in the United States. In particular, we consider policies that are based on a point system that ranks patients according to some priority criteria, e.g., waiting time, medical urgency, etc., or a combination thereof. Rather than making specific assumptions about fairness principles or priority criteria, our method offers the designer the flexibility to select his desired criteria and fairness constraints from a broad class of allowable constraints. The method then designs a point system that is based on the selected priority criteria and approximately maximizes medical efficiency-i.e., life-year gains from transplant-while simultaneously enforcing selected fairness constraints. Among the several case studies we present employing our method, one case study designs a point system that has the same form, uses the same criteria, and satisfies the same fairness constraints as the point system that was recently proposed by U.S. policy makers. In addition, the point system we design delivers an 8% increase in extra life-year gains. We evaluate the performance of all policies under consideration using the same statistical and simulation tools and data as the U.S. policy makers use. Other case studies perform a sensitivity analysis (for instance, demonstrating that the increase in extra life-year gains by relaxing certain fairness constraints can be as high as 30%) and also pursue the design of policies targeted specifically at remedying criticisms leveled at the recent point system proposed by U.S. policy makers.
Reductions of Approximate Linear Programs for Network Revenue Management
The linear programming approach to approximate dynamic programming has received considerable attention in the recent network revenue management literature. A major challenge of the approach lies in solving the resulting approximate linear programs (ALPs), which often have a huge number of constraints and/or variables. We show that the ALPs can be dramatically reduced in size for both affine and separable piecewise linear approximations to network revenue management problems, under both independent and discrete choice models of demand. Our key result is the equivalence between each ALP and a corresponding reduced program, which is more compact in size and admits an intuitive probabilistic interpretation. For the affine approximation to network revenue management under an independent demand model, we recover an equivalence result known in the literature, but provide an alternative proof. Our other equivalence results are new. We test the numerical performance of solving the reduced programs directly using off-the-shelf commercial solvers on a set of test instances taken from the literature.
\We Will Be Right with You\: Managing Customer Expectations with Vague Promises and Cheap Talk
Delay announcements informing customers about anticipated service delays are prevalent in service-oriented systems. How delay announcements can influence customers in service systems is a complex problem that depends on both the dynamics of the underlying queueing system and on the customers' strategic behavior. We examine this problem of information communication by considering a model in which both the firm and the customers act strategically: the firm in choosing its delay announcement while anticipating customer response, and the customers in interpreting these announcements and in making the decision about when to join the system and when to balk. We characterize the equilibrium language that emerges between the service provider and her customers. The analysis of the emerging equilibria provides new and interesting insights into customer-firm information sharing. We show that even though the information provided to customers is nonverifiable, it improves the profits of the firm and the expected utility of the customers. The robustness of the results is illustrated via various extensions of the model. In particular, studying models with incomplete information on the system parameters allows us also to highlight the role of information provision in managing customer expectations regarding the congestion in the system. Further, the information could be as simple as \"high congestion\"/\"low congestion\" announcements, or it could be as detailed as the true state of the system. We also show that firms may choose to shade some of the truth by using intentional vagueness to lure customers.
Joint Optimization of Sampling and Control of Partially Observable Failing Systems
Stochastic control problems that arise in reliability and maintenance optimization typically assume that information used for decision-making is obtained according to a predetermined sampling schedule. In many real applications, however, there is a high sampling cost associated with collecting such data. It is therefore of equal importance to determine when information should be collected and to decide how this information should be utilized for maintenance decision-making. This type of joint optimization has been a long-standing problem in the operations research and maintenance optimization literature, and very few results regarding the structure of the optimal sampling and maintenance policy have been published. In this paper, we formulate and analyze the joint optimization of sampling and maintenance decision-making in the partially observable Markov decision process framework. We prove the optimality of a policy that is characterized by three critical thresholds, which have practical interpretation and give new insight into the value of condition-based maintenance programs in life-cycle asset management. Illustrative numerical comparisons are provided that show substantial cost savings over existing suboptimal policies.
Dynamic Bid Prices in Revenue Management
We formally derive the standard deterministic linear program (LP) for bid-price control by making an affine functional approximation to the optimal dynamic programming value function. This affine functional approximation gives rise to a new LP that yields tighter bounds than the standard LP. Whereas the standard LP computes static bid prices, our LP computes a time trajectory of bid prices. We show that there exist dynamic bid prices, optimal for the LP, that are individually monotone with respect to time. We provide a column generation procedure for solving the LP within a desired optimality tolerance, and present numerical results on computational and economic performance.
Managing Patient Service in a Diagnostic Medical Facility
Hospital diagnostic facilities, such as magnetic resonance imaging centers, typically provide service to several diverse patient groups: outpatients, who are scheduled in advance; inpatients, whose demands are generated randomly during the day; and emergency patients, who must be served as soon as possible. Our analysis focuses on two interrelated tasks: designing the outpatient appointment schedule, and establishing dynamic priority rules for admitting patients into service. We formulate the problem of managing patient demand for diagnostic service as a finite-horizon dynamic program and identify properties of the optimal policies. Using empirical data from a major urban hospital, we conduct numerical studies to develop insights into the sensitivity of the optimal policies to the various cost and probability parameters and to evaluate the performance of several heuristic rules for appointment acceptance and patient scheduling.
Using Lagrangian Relaxation to Compute Capacity-Dependent Bid Prices in Network Revenue Management
We propose a new method to compute bid prices in network revenue management problems. The novel aspect of our method is that it explicitly considers the temporal dynamics of the arrivals of the itinerary requests and generates bid prices that depend on the remaining leg capacities. Our method is based on relaxing certain constraints that link the decisions for different flight legs by associating Lagrange multipliers with them. In this case, the network revenue management problem decomposes by the flight legs, and we can concentrate on one flight leg at a time. When compared with the so-called deterministic linear program, we show that our method provides a tighter upper bound on the optimal objective value of the network revenue management problem. Computational experiments indicate that the bid prices obtained by our method perform significantly better than the ones obtained by standard benchmark methods.
A Learning Approach for Interactive Marketing to a Customer Segment
When a marketer in an interactive environment decides which messages to send to her customers, she may send messages currently thought to be most promising (exploitation) or use poorly understood messages for the purpose of information gathering (exploration). We assume that customers are already clustered into homogeneous segments, and we consider the adaptive learning of message effectiveness within a customer segment. We present a Bayesian formulation of the problem in which decisions are made for batches of customers simultaneously, although decisions may vary within a batch. This extends the classical multiarmed bandit problem for sampling one-by-one from a set of reward populations. Our solution methods include a Lagrangian decomposition-based approximate dynamic programming approach and a heuristic based on a known asymptotic approximation to the multiarmed bandit solution. Computational results show that our methods clearly outperform approaches that ignore the effects of information gain.
Statistical Learning of Service-Dependent Demand in a Multiperiod Newsvendor Setting
We study an inventory system wherein a customer may leave the seller's market after experiencing an inventory stockout. Traditionally, researchers and practitioners assume a single penalty cost to model this customer behavior of stockout aversion. Recently, a stream of researchers explicitly model this customer behavior and support the traditional penalty cost approach. We enrich this literature by studying the statistical learning of service-dependent demand. We build and solve four models: a baseline model, where the seller can observe the demand distribution; a second model, where the seller cannot observe the demand distribution but statistically learns the demand distribution; a third model, where the seller can learn or pay to obtain the exact information of the demand distribution; and a fourth model, where demand in excess of available inventory is lost and unobserved. Interestingly, we find that all four models support the traditional penalty cost approach. This result confirms the use of a state-independent stockout penalty cost in the presence of demand learning. More strikingly, the first three models imply the same stockout penalty cost, which is larger than the stockout penalty cost implied by the last model.