Catalogue Search | MBRL

On Synchronous, Asynchronous, and Randomized Best-Response Schemes for Stochastic Nash Games

by Pang, Jong-Shi , Sen, Suvrajeet , Shanbhag, Uday V. in 91A15 , 91A20 , 91B28

2020

In this paper, we consider a stochastic Nash game in which each player minimizes a parameterized expectation-valued convex objective function. In deterministic regimes, proximal best-response (BR) schemes have been shown to be convergent under a suitable spectral property associated with the proximal BR map. However, a direct application of this scheme to stochastic settings requires obtaining exact solutions to stochastic optimization problems at each iteration. Instead, we propose an inexact generalization of this scheme in which an inexact solution to the BR problem is computed in an expected-value sense via a stochastic approximation (SA) scheme. On the basis of this framework, we present three inexact BR schemes: (i) First, we propose a synchronous inexact BR scheme where all players simultaneously update their strategies. (ii) Second, we extend this to a randomized setting where a subset of players is randomly chosen to update their strategies while the other players keep their strategies invariant. (iii) Third, we propose an asynchronous scheme, where each player chooses its update frequency while using outdated rival-specific data in updating its strategy. Under a suitable contractive property on the proximal BR map, we proceed to derive almost sure convergence of the iterates to the Nash equilibrium (NE) for (i) and (ii) and mean convergence for (i)–(iii). In addition, we show that for (i)–(iii), the generated iterates converge to the unique equilibrium in mean at a linear rate with a prescribed constant rather than a sublinear rate. Finally, we establish the overall iteration complexity of the scheme in terms of projected stochastic gradient (SG) steps for computing an ɛ -NE 2 (or ɛ -NE ∞ ) and note that in all settings, the iteration complexity is O ( 1 / ɛ 2 ( 1 + c ) + δ ) , where c = 0 in the context of (i), and c > 0 represents the positive cost of randomization in (ii) and asynchronicity and delay in (iii). Notably, in the synchronous regime, we achieve a near-optimal rate from the standpoint of solving stochastic convex optimization problems by SA schemes. The schemes are further extended to settings where players solve two-stage stochastic Nash games with linear and quadratic recourse. Finally, preliminary numerics developed on a multiportfolio investment problem and a two-stage capacity expansion game support the rate and complexity statements.

Journal Article

Share this book

Add to My Shelf

Equilibrium, uncertainty and risk in hydro-thermal electricity systems

by Philpott, Andy , Ferris, Michael , Wets, Roger in Auctions , Calculus of Variations and Optimal Control; Optimization , Combinatorics

2016

The correspondence of competitive partial equilibrium with a social optimum is well documented in the welfare theorems of economics. These theorems can be applied to single-period electricity pool auctions in which price-taking agents maximize profits at competitive prices, and extend naturally to standard models with locational marginal prices. In hydro-thermal markets where the auctions are repeated over many periods, agents seek to optimize their current and future profit accounting for future prices that depend on uncertain inflows. This makes the agent problems multistage stochastic optimization models, but perfectly competitive partial equilibrium still corresponds to a social optimum when all agents are risk neutral and share common knowledge of the probability distribution governing future inflows. The situation is complicated when agents are risk averse. In this setting we show under mild conditions that a social optimum corresponds to a competitive market equilibrium if agents have time-consistent dynamic coherent risk measures and there are enough traded market instruments to hedge inflow uncertainty. We illustrate some of the consequences of risk aversion on market outcomes using a simple two-stage competitive equilibrium model with three agents.

Journal Article

Share this book

Add to My Shelf

Two-stage non-cooperative games with risk-averse players

by Pang, Jong-Shi , Sen, Suvrajeet , Shanbhag, Uday V. in Approximation , Calculus of Variations and Optimal Control; Optimization , Combinatorics

2017

This paper formally introduces and studies a non-cooperative multi-agent game under uncertainty. The well-known Nash equilibrium is employed as the solution concept of the game. While there are several formulations of a stochastic Nash equilibrium problem, we focus mainly on a two-stage setting of the game wherein each agent is risk-averse and solves a rival-parameterized stochastic program with quadratic recourse. In such a game, each agent takes deterministic actions in the first stage and recourse decisions in the second stage after the uncertainty is realized. Each agent’s overall objective consists of a deterministic first-stage component plus a second-stage mean-risk component defined by a coherent risk measure describing the agent’s risk aversion. We direct our analysis towards a broad class of quantile-based risk measures and linear-quadratic recourse functions. For this class of non-cooperative games under uncertainty, the agents’ objective functions can be shown to be convex in their own decision variables, provided that the deterministic component of these functions have the same convexity property. Nevertheless, due to the non-differentiability of the recourse functions, the agents’ objective functions are at best directionally differentiable. Such non-differentiability creates multiple challenges for the analysis and solution of the game, two principal ones being: (1) a stochastic multi-valued variational inequality is needed to characterize a Nash equilibrium, provided that the players’ optimization problems are convex; (2) one needs to be careful in the design of algorithms that require differentiability of the objectives. Moreover, the resulting (multi-valued) variational formulation cannot be expected to be of the monotone type in general. The main contributions of this paper are as follows: (a) Prior to addressing the main problem of the paper, we summarize several approaches that have existed in the literature to deal with uncertainty in a non-cooperative game. (b) We introduce a unified formulation of the two-stage SNEP with risk-averse players and convex quadratic recourse functions and highlight the technical challenges in dealing with this game. (c) To handle the lack of smoothness, we propose smoothing schemes and regularization that lead to differentiable approximations. (d) To deal with non-monotonicity, we impose a generalized diagonal dominance condition on the players’ smoothed objective functions that facilitates the application and ensures the convergence of an iterative best-response scheme. (e) To handle the expectation operator, we rely on known methods in stochastic programming that include sampling and approximation. (f) We provide convergence results for various versions of the best-response scheme, particularly for the case of private recourse functions. Overall, this paper lays the foundation for future research into the class of SNEPs that provides a constructive paradigm for modeling and solving competitive decision making problems with risk-averse players facing uncertainty; this paradigm is very much at an infancy stage of research and requires extensive treatment in order to meet its broad applications in many engineering and economics domains.

Journal Article

Share this book

Add to My Shelf

Absorption paths and equilibria in quitting games

by Rainer, Catherine , Krasikov, Ilia , Ashkenazi-Golan, Galit in Absorption , Calculus of Variations and Optimal Control; Optimization , Combinatorics

2024

We study quitting games and introduce an alternative notion of strategy profiles— absorption paths . An absorption path is parametrized by the total probability of absorption in past play rather than by time, and it accommodates both discrete-time aspects and continuous-time aspects. We then define the concept of sequentially 0-perfect absorption paths, which are shown to be limits of ε -equilibrium strategy profiles as ε goes to 0. We establish that all quitting games that do not have simple equilibria (that is, an equilibrium where the game terminates in the first period or one where the game never terminates) have a sequentially 0-perfect absorption path. Finally, we prove the existence of sequentially 0-perfect absorption paths in a new class of quitting games.

Journal Article

Share this book

Add to My Shelf

Catch-Up: A Rule That Makes Service Sports More Competitive

by Stromquist, Walter , Ismail, Mehmet S. , Kilgour, D. Marc in MSC: Primary 60J20 , Numerical analysis , Probability

2018

Service sports include two-player contests such as volleyball, badminton, and squash. We analyze four rules, including the Standard Rule (SR), in which a player continues to serve until he or she loses. The Catch-Up Rule (CR) gives the serve to the player who has lost the previous point-as opposed to the player who won the previous point, as under SR. We also consider two Trailing Rules that make the server the player who trails in total score. Surprisingly, compared with SR, only CR gives the players the same probability of winning a game while increasing its expected length, thereby making it more competitive and exciting to watch. Unlike one of the Trailing Rules, CR is strategy-proof. By contrast, the rules of tennis fix who serves and when; its tiebreaker, however, keeps play competitive by being fair-not favoring either the player who serves first or who serves second.

Journal Article

Share this book

Add to My Shelf

Equilibrium dividend strategies in the dual model with a random time horizon

by Cheng, Gong-pin , Ye, Chuan-xiu , Zhao, Yong-xia in Digital Object Identifier , Discount rates , Discounts

2023

This paper investigates the dividend problem with non-exponential discounting in a dual model. We assume that the dividends can only be paid at a bounded rate and that the surplus process is killed by an exponential random variable. Since the non-exponential discount function leads to a time inconsistent control problem, we study the equilibrium HJB-equation and give the associated verification theorem. For the case of a mixture of exponential discount functions and exponential gains, we obtain the explicit equilibrium dividend strategy and the corresponding equilibrium value function. Besides, numerical examples are shown to illustrate our results.

Journal Article

Share this book

Add to My Shelf

$Zero-sum repeated games: Counterexamples to the existence of the asymptotic value and the conjecture$\\operatorname{maxmin}=\\operatorname{lim}v_{n}$

Zero-sum repeated games: Counterexamples to the existence of the asymptotic value and the conjecture$\\operatorname{maxmin}=\\operatorname{lim}v_{n}

by Ziliotto, Bruno in 91A05 , 91A15 , 91A20

2016

Mertens [In Proceedings of the International Congress of Mathematicians (Berkeley, Calif., 1986) (1987) 1528–1577 Amer. Math. Soc.] proposed two general conjectures about repeated games: the first one is that, in any two-person zero-sum repeated game, the asymptotic value exists, and the second one is that, when Player 1 is more informed than Player 2, in the long run Player 1 is able to guarantee the asymptotic value. We disprove these two long-standing conjectures by providing an example of a zero-sum repeated game with public signals and perfect observation of the actions, where the value of the \\lambda-discounted game does not converge when \\lambda goes to 0. The aforementioned example involves seven states, two actions and two signals for each player. Remarkably, players observe the payoffs, and play in turn.

Journal Article

Share this book

Add to My Shelf

ON VALUES OF REPEATED GAMES WITH SIGNALS

by Zielonka, Wiesław , Venel, Xavier , Renault, Jérôme in 60G35 , 91A05 , 91A15

2016

We study the existence of different notions of value in two-person zerosum repeated games where the state evolves and players receive signals. We provide some examples showing that the limsup value (and the uniform value) may not exist in general. Then we show the existence of the value for any Borei payoff function if the players observe a public signal including the actions played. We also prove two other positive results without assumptions on the signaling structure: the existence of the sup value in any game and the existence of the uniform value in recursive games with nonnegative payoffs.

Journal Article

Share this book

Add to My Shelf

Convergent Multiple-Timescales Reinforcement Learning Algorithms in Normal Form Games

by Leslie, David S. , Collins, E. J. in 68T05 , 91A20 , Approximation

2003

We consider reinforcement learning algorithms in normal form games. Using two-timescales stochastic approximation, we introduce a model-free algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surely to Nash distribution in two-player zero-sum games and N-player partnership games. However, there are simple games for which these, and most other adaptive processes, fail to converge-in particular, we consider the N-player matching pennies game and Shapley's variant of the rock-scissors-paper game. By extending stochastic approximation results to multiple timescales we can allow each player to learn at a different rate. We show that this extension will converge for two-player zero-sum games and two-player partnership games, as well as for the two special cases we consider.

Journal Article

Share this book

Add to My Shelf

Asymptotically Optimal Multistage Tests of Simple Hypotheses

by Bartroff, Jay in 62L10 , 91A20 , asymptotic optimality

2007

A family of variable stage size multistage tests of simple hypotheses is described, based on efficient multistage sampling procedures. Using a loss function that is a linear combination of sampling costs and error probabilities, these tests are shown to minimize the integrated risk to second order as the costs per stage and per observation approach zero. A numerical study shows significant improvement over group sequential tests in a binomial testing problem.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter