Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
26 result(s) for "Balseiro, Santiago R."
Sort by:
Learning in Repeated Auctions with Budgets: Regret Minimization and Equilibrium
In online advertising markets, advertisers often purchase ad placements through bidding in repeated auctions based on realized viewer information. We study how budget-constrained advertisers may compete in such sequential auctions in the presence of uncertainty about future bidding opportunities and competition. We formulate this problem as a sequential game of incomplete information, in which bidders know neither their own valuation distribution nor the budgets and valuation distributions of their competitors. We introduce a family of practical bidding strategies we refer to as adaptive pacing strategies, in which advertisers adjust their bids according to the sample path of expenditures they exhibit, and analyze the performance of these strategies in different competitive settings. We establish the asymptotic optimality of these strategies when competitors’ bids are independent and identically distributed over auctions, but also when competing bids are arbitrary. When all the bidders adopt these strategies, we establish the convergence of the induced dynamics and characterize a regime (well motivated in the context of online advertising markets) under which these strategies constitute an approximate Nash equilibrium in dynamic strategies: the benefit from unilaterally deviating to other strategies, including ones with access to complete information, becomes negligible as the number of auctions and competitors grows large. This establishes a connection between regret minimization and market stability, by which advertisers can essentially follow approximate equilibrium bidding strategies that also ensure the best performance that can be guaranteed off equilibrium. This paper was accepted by Noah Gans, stochastic models and simulation.
Static Routing in Stochastic Scheduling: Performance Guarantees and Asymptotic Optimality
Scheduling problems have a deep and well-developed literature in both operations research and computer science. Stochastic scheduling problems with many jobs are notoriously difficult, and optimal policies may require constant tracking of the elapsed time of all jobs in process. When the processors (or “machines”) are identical, a policy that simply schedules jobs in fixed order of job weight to expected processing time—the WSEPT rule—is asymptotically optimal when the number of jobs is large. Much less understood is the case with specialized (or “unrelated”) machines (i.e., each job’s processing distribution may vary across machines). In “Static Routing in Stochastic Scheduling: Performance Guarantees and Asymptotic Optimality,” Balseiro, Brown, and Chen study stochastic scheduling with unrelated machines. The authors study a simple static routing policy that (i) assigns jobs to machines up front and (ii) schedules the jobs on each machine in the WSEPT order. This static routing policy depends only on job processing times through their expected values and is easy to compute with a single convex optimization problem. The authors explicitly characterize the performance loss of this static routing policy relative to an optimal scheduling policy; this result implies that this static routing policy is asymptotically optimal in the regime of many jobs. We study the problem of scheduling a set of J jobs on M machines with stochastic job processing times when no preemptions are allowed and with a weighted sum of expected completion times objective. Our model allows for “unrelated” machines: the distributions of processing times may vary across both jobs and machines. We study static routing policies , which assign (or “route”) each job to a particular machine at the start of the problem and then sequence jobs on each machine according to the weighted shortest expected processing time rule. We discuss how to obtain a good routing of jobs to machines by solving a convex quadratic optimization problem that has J × M variables and depends only on the job processing distributions through their expected values. Our main result is an additive performance bound on the suboptimality of this static routing policy relative to an optimal adaptive, nonanticipative scheduling policy. This result implies that such static routing policies are asymptotically optimal as the number of jobs grows large. In the special case of “uniformly related” machines—that is, machines differing only in their speeds—we obtain a similar but slightly sharper result for a static routing policy that routes jobs to machines proportionally to machine speeds. We also study the impact that dependence in processing times across jobs can have on the suboptimality of the static routing policy. The main novelty in our work is deriving lower bounds on the performance of an optimal adaptive, nonanticipative scheduling policy; we do this through the use of an information relaxation in which all processing times are revealed before scheduling jobs and a penalty that appropriately compensates for this additional information. The online appendices are available at https://doi.org/10.1287/opre.2018.1749 .
Repeated Auctions with Budgets in Ad Exchanges: Approximations and Design
Ad exchanges are emerging Internet markets where advertisers may purchase display ad placements, in real time and based on specific viewer information, directly from publishers via a simple auction mechanism. Advertisers join these markets with a prespecified budget and participate in multiple second-price auctions over the length of a campaign. This paper studies the competitive landscape that arises in ad exchanges and the implications for publishers’ decisions. The presence of budgets introduces dynamic interactions among advertisers that need to be taken into account when attempting to characterize the bidding landscape or the impact of changes in the auction design. To this end, we introduce the notion of a fluid mean-field equilibrium (FMFE) that is behaviorally appealing and computationally tractable, and in some important cases, it yields a closed-form characterization. We establish that an FMFE approximates well the rational behavior of advertisers in these markets. We then show how this framework may be used to provide sharp prescriptions for key auction design decisions that publishers face in these markets. In particular, we show that ignoring budgets, a common practice in this literature, can result in significant profit losses for the publisher when setting the reserve price. This paper was accepted by Dimitris Bertsimas, optimization .
Yield Optimization of Display Advertising with Ad Exchange
It is clear from the growing role of ad exchanges in the real-time sale of advertising slots that Web publishers are considering a new alternative to their more traditional reservation-based ad contracts. To make this choice, the publisher must trade off, in real-time, the short-term revenue from ad exchange with the long-term benefits of delivering good spots to the reservation ads. In this paper we formalize this combined optimization problem as a multiobjective stochastic control problem and derive an efficient policy for online ad allocation in settings with general joint distribution over placement quality and exchange prices. We prove the asymptotic optimality of this policy in terms of any arbitrary trade-off between the quality of delivered reservation ads and revenue from the exchange, and we show that our policy approximates any Pareto-optimal point on the quality-versus-revenue curve. Experimental results on data derived from real publisher inventory confirm that there are significant benefits for publishers if they jointly optimize over both channels. Data, as supplemental material, are available at http://dx.doi.org/10.1287/mnsc.2014.2017 . This paper was accepted by Dimitris Bertsimas, optimization.
Multiagent Mechanism Design Without Money
Efficient Allocation of Resources Without Money How should a planner allocate a single resource to multiple requesters efficiently when monetary transfers are not feasible? This question naturally arises in many relevant settings ranging from health care and antipoverty programs to cloud computing systems. In these settings, resource requests occur repeatedly, and requesters’ private values might change over time. In “Multiagent Mechanism Design Without Money,” S. R. Balseiro, H. Gurkan, and P. Sun propose a mechanism that asymptotically achieves the first-best efficient allocation (the welfare-maximizing allocation as if values are publicly observable) as requesters become more patient. Furthermore, the authors provide sharp characterizations of convergence rates to first best as a function of the discount factor. In the case of two agents, the authors prove that the convergence rate of their mechanism is optimal—i.e., no other mechanism can converge faster to first best. We consider a principal repeatedly allocating a single resource in each period to one of multiple agents, whose values are private, without relying on monetary payments over an infinite horizon with discounting. We design a dynamic mechanism that induces agents to report their values truthfully in each period via promises/threats of future favorable/unfavorable allocations. We show that our mechanism asymptotically achieves the first-best efficient allocation (the welfare-maximizing allocation as if values are public) as agents become more patient and provide sharp characterizations of convergence rates to first best as a function of the discount factor. In particular, in the case of two agents we prove that the convergence rate of our mechanism is optimal—that is, no other mechanism can converge faster to first best.
Optimal Contracts for Intermediaries in Online Advertising
In online advertising, the prevalent method advertisers employ to acquire impressions is to contract with an intermediary. These contracts involve upfront payments made by the advertisers to the intermediary, in exchange for running campaigns on their behalf. This paper studies the optimal contract offered by the intermediary in a setting where advertisers’ budgets and targeting criteria are private. This problem can naturally be formulated as a multidimensional mechanism design problem, which in general is hard to solve. We tackle this problem by combining a performance space characterization technique, which relies on delineating the expected cost and value achievable by any feasible (dynamic) bidding policy, and a duality-based approach, which reduces the optimal contract design problem to a tractable convex optimization problem. This approach yields a crisp characterization of the intermediary’s optimal bidding policy: the policy is stationary and bids a weighted average of the values associated with different types (to guarantee that the advertiser reports her type truthfully) that is appropriately shaded (to account for budget constraints). Additionally, when advertisers have identical value distributions, our formulation yields a closed-form characterization of the optimal contract. Our results indicate that an intermediary can profitably provide bidding service to a budget-constrained advertiser and at the same time increase the overall market efficiency. The online appendix is available at https://doi.org/10.1287/opre.2017.1618 .
Approximations to Stochastic Dynamic Programs via Information Relaxation Duality
In the analysis of complex stochastic dynamic programs, we often seek strong theoretical guarantees on the suboptimality of heuristic policies. A common technique for obtaining performance bounds in the approximation algorithms literature is “hindsight” (or “offline”) analysis, which considers a decision maker who has perfect information about the outcomes of all uncertainties in advance. In many problems, however, this information is quite valuable and leads to weak bounds. In “Approximations to Stochastic Dynamic Programs via Information Relaxation Duality,” Balseiro and Brown study information relaxation duality, which involves incorporating a penalty for perfect information. The primary application of information relaxation duality to this point has been as a computational method for evaluating heuristic policies. The authors show how to use this approach to derive theoretical guarantees on the performance of heuristic policies in complex dynamic programs. The paper introduces a general recipe involving an approximate value function that generates both a heuristic policy and a penalty for the perfect information analysis. The authors apply the approach to three challenging problems: (1) stochastic knapsack problems, (2) stochastic scheduling on parallel machines, and (3) sequential search problems with recall. In each problem, the method leads to analytical bounds on the suboptimality of the corresponding heuristic policy, which, in turn, implies asymptotic optimality of the policy in specific regimes of interest. In the analysis of complex stochastic dynamic programs, we often seek strong theoretical guarantees on the suboptimality of heuristic policies. One technique for obtaining performance bounds is perfect information analysis: this approach provides bounds on the performance of an optimal policy by considering a decision maker who has access to the outcomes of all future uncertainties before making decisions, that is, fully relaxed nonanticipativity constraints. A limitation of this approach is that in many problems perfect information about uncertainties is quite valuable, and thus, the resulting bound is weak. In this paper, we use an information relaxation duality approach, which includes a penalty that punishes violations of the nonanticipativity constraints, to derive stronger analytical bounds on the suboptimality of heuristic policies in stochastic dynamic programs that are too difficult to solve. The general framework we develop ties the heuristic policy and the performance bound together explicitly through the use of an approximate value function: heuristic policies are greedy with respect to this approximation, and penalties are also generated in a specific way using this approximation. We apply this approach to three challenging problems: stochastic knapsack problems, stochastic scheduling on parallel machines, and sequential search problems. In each of these problems, we consider a greedy heuristic policy generated by an approximate value function and a corresponding penalized perfect information bound. We then characterize the gap between the performance of the policy and the information relaxation bound in each problem; the results imply asymptotic optimality of the heuristic policy for specific “large” regimes of interest. The online appendices are available at https://doi.org/10.1287/opre.2018.1782 .
Dynamic Mechanisms with Martingale Utilities
We study the dynamic mechanism design problem of a seller who repeatedly sells independent items to a buyer with private values. In this setting, the seller could potentially extract the entire buyer surplus by running efficient auctions and charging an upfront participation fee at the beginning of the horizon. In some markets, such as Internet advertising, participation fees are not practical since buyers expect to inspect items before purchasing them. This motivates us to study the design of dynamic mechanisms under successively more stringent requirements that capture the implicit business constraints of these markets. We first consider a periodic individual rationality constraint , which limits the mechanism to charge at most the buyer’s value in each period. While this prevents large upfront participation fees, the seller can still design mechanisms that spread a participation fee across multiple initial auctions. These mechanisms have the unappealing feature that they provide close-to-zero buyer utility in earlier auctions in exchange for higher utility in later auctions. To address this problem, we introduce a martingale utility constraint , which imposes the requirement that from the perspective of the buyer, the next item’s expected utility is equal to the present one’s. Our main result is providing a dynamic auction satisfying martingale utility and periodic individual rationality whose loss in profit with respect to first-best (full extraction of buyer surplus) is optimal up to polylogarithmic factors. The proposed mechanism is a dynamic two-tier auction with a hard floor and a soft floor that allocates the item whenever the buyer’s bid is above the hard floor and charges the minimum of the bid and the soft floor. The electronic companion is available at https://doi.org/10.1287/mnsc.2017.2872 . This paper was accepted by Noah Gans, stochastic models and simulation.
Dynamic Mechanism Design with Budget-Constrained Buyers Under Limited Commitment
We study the dynamic mechanism design problem of a seller who repeatedly auctions independent items over a discrete time horizon to buyers who face a cumulative budget constraint. A driving motivation behind our model is the emergence of real-time bidding markets for online display advertising in which such budgets are prevalent. We assume the seller has a strong form of limited commitment: she commits to the rules of the current auction but cannot commit to those of future auctions. We show that the celebrated Myersonian approach that leverages the envelope theorem fails in this setting, and therefore, characterizing the dynamic optimal mechanism seems intractable. Despite these challenges, we derive and characterize a near-optimal dynamic mechanism. To do so, we show that the Myersonian approach is recovered in a corresponding fluid continuous time model in which the time interval between consecutive items becomes negligible. Then we leverage this approach to characterize the optimal dynamic direct-revelation mechanism, highlighting novel incentives at play in settings with buyers’ budget constraints and seller’s limited commitment. We show through a combination of theoretical and numerical results that the optimal mechanism arising from the fluid continuous time model approximately satisfies incentive compatibility for the buyers and is approximately sequentially rational for the seller in the original discrete time model. Supplemental material is available at https://doi.org/10.1287/opre.2018.1830 .
Dynamic Pricing for Reusable Resources: The Power of Two Prices
Motivated by real-world applications such as rental and cloud computing services, we investigate pricing for reusable resources. We consider a system where a single resource with a fixed number of identical copies serves customers with heterogeneous willingness-to-pay (WTP), and the usage duration distribution is general. Optimal dynamic policies are computationally intractable when usage durations are not memoryless, so existing literature has focused on static pricing, which incurs a steady-state performance loss of \\(O(c)\\) compared to optimality when supply and demand scale with \\(c\\). We propose a class of dynamic \"stock-dependent\" policies that 1) are computationally tractable and 2) can attain a steady-state performance loss of \\(o(c)\\). We give parametric bounds based on the local shape of the reward function at the optimal fluid admission probability and show that the performance loss of stock-dependent policies can be as low as \\(O((c)^2)\\). We characterize the tight performance loss for stock-dependent policies and show that they can in fact be achieved by a simple two-price policy that sets a higher price when the stock is below some threshold and a lower price otherwise. We extend our results to settings with multiple resources and multiple customer classes. Finally, we demonstrate this \"minimally dynamic\" class of two-price policies performs well numerically, even in non-asymptotic settings, suggesting that a little dynamicity can go a long way.