Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
11
result(s) for
"91A20"
Sort by:
On Synchronous, Asynchronous, and Randomized Best-Response Schemes for Stochastic Nash Games
2020
In this paper, we consider a stochastic Nash game in which each player minimizes a parameterized expectation-valued convex objective function. In deterministic regimes, proximal best-response (BR) schemes have been shown to be convergent under a suitable spectral property associated with the proximal BR map. However, a direct application of this scheme to stochastic settings requires obtaining exact solutions to stochastic optimization problems at each iteration. Instead, we propose an
inexact
generalization of this scheme in which an inexact solution to the BR problem is computed in an expected-value sense via a stochastic approximation (SA) scheme. On the basis of this framework, we present three inexact BR schemes: (i) First, we propose a synchronous inexact BR scheme where all players simultaneously update their strategies. (ii) Second, we extend this to a randomized setting where a subset of players is randomly chosen to update their strategies while the other players keep their strategies invariant. (iii) Third, we propose an asynchronous scheme, where each player chooses its update frequency while using outdated rival-specific data in updating its strategy. Under a suitable contractive property on the proximal BR map, we proceed to derive almost sure convergence of the iterates to the Nash equilibrium (NE) for (i) and (ii) and mean convergence for (i)–(iii). In addition, we show that for (i)–(iii), the generated iterates converge to the unique equilibrium in mean at a linear rate with a prescribed constant rather than a sublinear rate. Finally, we establish the overall iteration complexity of the scheme in terms of projected stochastic gradient (SG) steps for computing an
ɛ
-NE
2
(or
ɛ
-NE
∞
) and note that in all settings, the iteration complexity is
O
(
1
/
ɛ
2
(
1
+
c
)
+
δ
)
, where
c
=
0
in the context of (i), and
c
> 0 represents the positive cost of randomization in (ii) and asynchronicity and delay in (iii). Notably, in the synchronous regime, we achieve a near-optimal rate from the standpoint of solving stochastic convex optimization problems by SA schemes. The schemes are further extended to settings where players solve two-stage stochastic Nash games with linear and quadratic recourse. Finally, preliminary numerics developed on a multiportfolio investment problem and a two-stage capacity expansion game support the rate and complexity statements.
Journal Article
Equilibrium, uncertainty and risk in hydro-thermal electricity systems
by
Philpott, Andy
,
Ferris, Michael
,
Wets, Roger
in
Auctions
,
Calculus of Variations and Optimal Control; Optimization
,
Combinatorics
2016
The correspondence of competitive partial equilibrium with a social optimum is well documented in the welfare theorems of economics. These theorems can be applied to single-period electricity pool auctions in which price-taking agents maximize profits at competitive prices, and extend naturally to standard models with locational marginal prices. In hydro-thermal markets where the auctions are repeated over many periods, agents seek to optimize their current and future profit accounting for future prices that depend on uncertain inflows. This makes the agent problems multistage stochastic optimization models, but perfectly competitive partial equilibrium still corresponds to a social optimum when all agents are risk neutral and share common knowledge of the probability distribution governing future inflows. The situation is complicated when agents are risk averse. In this setting we show under mild conditions that a social optimum corresponds to a competitive market equilibrium if agents have time-consistent dynamic coherent risk measures and there are enough traded market instruments to hedge inflow uncertainty. We illustrate some of the consequences of risk aversion on market outcomes using a simple two-stage competitive equilibrium model with three agents.
Journal Article
Two-stage non-cooperative games with risk-averse players
by
Pang, Jong-Shi
,
Sen, Suvrajeet
,
Shanbhag, Uday V.
in
Approximation
,
Calculus of Variations and Optimal Control; Optimization
,
Combinatorics
2017
This paper formally introduces and studies a non-cooperative multi-agent game under uncertainty. The well-known Nash equilibrium is employed as the solution concept of the game. While there are several formulations of a stochastic Nash equilibrium problem, we focus mainly on a two-stage setting of the game wherein each agent is risk-averse and solves a rival-parameterized stochastic program with quadratic recourse. In such a game, each agent takes deterministic actions in the first stage and recourse decisions in the second stage after the uncertainty is realized. Each agent’s overall objective consists of a deterministic first-stage component plus a second-stage mean-risk component defined by a coherent risk measure describing the agent’s risk aversion. We direct our analysis towards a broad class of quantile-based risk measures and linear-quadratic recourse functions. For this class of non-cooperative games under uncertainty, the agents’ objective functions can be shown to be convex in their own decision variables, provided that the deterministic component of these functions have the same convexity property. Nevertheless, due to the non-differentiability of the recourse functions, the agents’ objective functions are at best directionally differentiable. Such non-differentiability creates multiple challenges for the analysis and solution of the game, two principal ones being: (1) a stochastic multi-valued variational inequality is needed to characterize a Nash equilibrium, provided that the players’ optimization problems are convex; (2) one needs to be careful in the design of algorithms that require differentiability of the objectives. Moreover, the resulting (multi-valued) variational formulation cannot be expected to be of the monotone type in general. The main contributions of this paper are as follows: (a) Prior to addressing the main problem of the paper, we summarize several approaches that have existed in the literature to deal with uncertainty in a non-cooperative game. (b) We introduce a unified formulation of the two-stage SNEP with risk-averse players and convex quadratic recourse functions and highlight the technical challenges in dealing with this game. (c) To handle the lack of smoothness, we propose smoothing schemes and regularization that lead to differentiable approximations. (d) To deal with non-monotonicity, we impose a generalized diagonal dominance condition on the players’ smoothed objective functions that facilitates the application and ensures the convergence of an iterative best-response scheme. (e) To handle the expectation operator, we rely on known methods in stochastic programming that include sampling and approximation. (f) We provide convergence results for various versions of the best-response scheme, particularly for the case of private recourse functions. Overall, this paper lays the foundation for future research into the class of SNEPs that provides a constructive paradigm for modeling and solving competitive decision making problems with risk-averse players facing uncertainty; this paradigm is very much at an infancy stage of research and requires extensive treatment in order to meet its broad applications in many engineering and economics domains.
Journal Article
Absorption paths and equilibria in quitting games
by
Rainer, Catherine
,
Krasikov, Ilia
,
Ashkenazi-Golan, Galit
in
Absorption
,
Calculus of Variations and Optimal Control; Optimization
,
Combinatorics
2024
We study quitting games and introduce an alternative notion of strategy profiles—
absorption paths
. An absorption path is parametrized by the total probability of absorption in past play rather than by time, and it accommodates both discrete-time aspects and continuous-time aspects. We then define the concept of
sequentially 0-perfect
absorption paths, which are shown to be limits of
ε
-equilibrium strategy profiles as
ε
goes to 0. We establish that all quitting games that do not have simple equilibria (that is, an equilibrium where the game terminates in the first period or one where the game never terminates) have a sequentially 0-perfect absorption path. Finally, we prove the existence of sequentially 0-perfect absorption paths in a new class of quitting games.
Journal Article
Catch-Up: A Rule That Makes Service Sports More Competitive
by
Stromquist, Walter
,
Ismail, Mehmet S.
,
Kilgour, D. Marc
in
MSC: Primary 60J20
,
Numerical analysis
,
Probability
2018
Service sports include two-player contests such as volleyball, badminton, and squash. We analyze four rules, including the Standard Rule (SR), in which a player continues to serve until he or she loses. The Catch-Up Rule (CR) gives the serve to the player who has lost the previous point-as opposed to the player who won the previous point, as under SR. We also consider two Trailing Rules that make the server the player who trails in total score. Surprisingly, compared with SR, only CR gives the players the same probability of winning a game while increasing its expected length, thereby making it more competitive and exciting to watch. Unlike one of the Trailing Rules, CR is strategy-proof. By contrast, the rules of tennis fix who serves and when; its tiebreaker, however, keeps play competitive by being fair-not favoring either the player who serves first or who serves second.
Journal Article
Equilibrium dividend strategies in the dual model with a random time horizon
by
Cheng, Gong-pin
,
Ye, Chuan-xiu
,
Zhao, Yong-xia
in
Digital Object Identifier
,
Discount rates
,
Discounts
2023
This paper investigates the dividend problem with non-exponential discounting in a dual model. We assume that the dividends can only be paid at a bounded rate and that the surplus process is killed by an exponential random variable. Since the non-exponential discount function leads to a time inconsistent control problem, we study the equilibrium HJB-equation and give the associated verification theorem. For the case of a mixture of exponential discount functions and exponential gains, we obtain the explicit equilibrium dividend strategy and the corresponding equilibrium value function. Besides, numerical examples are shown to illustrate our results.
Journal Article
Zero-sum repeated games: Counterexamples to the existence of the asymptotic value and the conjecture$\\operatorname{maxmin}=\\operatorname{lim}v_{n}
2016
Mertens [In Proceedings of the International Congress of Mathematicians (Berkeley, Calif., 1986) (1987) 1528–1577 Amer. Math. Soc.] proposed two general conjectures about repeated games: the first one is that, in any two-person zero-sum repeated game, the asymptotic value exists, and the second one is that, when Player 1 is more informed than Player 2, in the long run Player 1 is able to guarantee the asymptotic value. We disprove these two long-standing conjectures by providing an example of a zero-sum repeated game with public signals and perfect observation of the actions, where the value of the \\lambda-discounted game does not converge when \\lambda goes to 0. The aforementioned example involves seven states, two actions and two signals for each player. Remarkably, players observe the payoffs, and play in turn.
Journal Article
ON VALUES OF REPEATED GAMES WITH SIGNALS
2016
We study the existence of different notions of value in two-person zerosum repeated games where the state evolves and players receive signals. We provide some examples showing that the limsup value (and the uniform value) may not exist in general. Then we show the existence of the value for any Borei payoff function if the players observe a public signal including the actions played. We also prove two other positive results without assumptions on the signaling structure: the existence of the sup value in any game and the existence of the uniform value in recursive games with nonnegative payoffs.
Journal Article
Convergent Multiple-Timescales Reinforcement Learning Algorithms in Normal Form Games
2003
We consider reinforcement learning algorithms in normal form games. Using two-timescales stochastic approximation, we introduce a model-free algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surely to Nash distribution in two-player zero-sum games and N-player partnership games. However, there are simple games for which these, and most other adaptive processes, fail to converge-in particular, we consider the N-player matching pennies game and Shapley's variant of the rock-scissors-paper game. By extending stochastic approximation results to multiple timescales we can allow each player to learn at a different rate. We show that this extension will converge for two-player zero-sum games and two-player partnership games, as well as for the two special cases we consider.
Journal Article
Asymptotically Optimal Multistage Tests of Simple Hypotheses
2007
A family of variable stage size multistage tests of simple hypotheses is described, based on efficient multistage sampling procedures. Using a loss function that is a linear combination of sampling costs and error probabilities, these tests are shown to minimize the integrated risk to second order as the costs per stage and per observation approach zero. A numerical study shows significant improvement over group sequential tests in a binomial testing problem.
Journal Article