Catalogue Search | MBRL

On Synchronous, Asynchronous, and Randomized Best-Response Schemes for Stochastic Nash Games

by Pang, Jong-Shi , Sen, Suvrajeet , Shanbhag, Uday V. in 91A15 , 91A20 , 91B28

2020

In this paper, we consider a stochastic Nash game in which each player minimizes a parameterized expectation-valued convex objective function. In deterministic regimes, proximal best-response (BR) schemes have been shown to be convergent under a suitable spectral property associated with the proximal BR map. However, a direct application of this scheme to stochastic settings requires obtaining exact solutions to stochastic optimization problems at each iteration. Instead, we propose an inexact generalization of this scheme in which an inexact solution to the BR problem is computed in an expected-value sense via a stochastic approximation (SA) scheme. On the basis of this framework, we present three inexact BR schemes: (i) First, we propose a synchronous inexact BR scheme where all players simultaneously update their strategies. (ii) Second, we extend this to a randomized setting where a subset of players is randomly chosen to update their strategies while the other players keep their strategies invariant. (iii) Third, we propose an asynchronous scheme, where each player chooses its update frequency while using outdated rival-specific data in updating its strategy. Under a suitable contractive property on the proximal BR map, we proceed to derive almost sure convergence of the iterates to the Nash equilibrium (NE) for (i) and (ii) and mean convergence for (i)–(iii). In addition, we show that for (i)–(iii), the generated iterates converge to the unique equilibrium in mean at a linear rate with a prescribed constant rather than a sublinear rate. Finally, we establish the overall iteration complexity of the scheme in terms of projected stochastic gradient (SG) steps for computing an ɛ -NE 2 (or ɛ -NE ∞ ) and note that in all settings, the iteration complexity is O ( 1 / ɛ 2 ( 1 + c ) + δ ) , where c = 0 in the context of (i), and c > 0 represents the positive cost of randomization in (ii) and asynchronicity and delay in (iii). Notably, in the synchronous regime, we achieve a near-optimal rate from the standpoint of solving stochastic convex optimization problems by SA schemes. The schemes are further extended to settings where players solve two-stage stochastic Nash games with linear and quadratic recourse. Finally, preliminary numerics developed on a multiportfolio investment problem and a two-stage capacity expansion game support the rate and complexity statements.

Journal Article

Share this book

Add to My Shelf

An interaction-fair semi-decentralized trajectory planner for connected and autonomous vehicles

by Hong, Yiguang , Liu, Zhengqin , Lei, Jinlong in Algorithms , Artificial Intelligence , Autonomous vehicles

2025

Lately, there has been a lot of interest in game-theoretic approaches to the trajectory planning of autonomous vehicles (AVs). But most methods solve the game independently for each AV while lacking coordination mechanisms, and hence result in redundant computation and fail to converge to the same equilibrium, which presents challenges in computational efficiency and safety. Moreover, most studies rely on the strong assumption of knowing the intentions of all other AVs. This paper designs a novel autonomous vehicle trajectory planning approach to resolve the computational efficiency and safety problems in uncoordinated trajectory planning by exploiting vehicle-to-everything (V2X) technology. Firstly, the trajectory planning for connected and autonomous vehicles (CAVs) is formulated as a game with coupled safety constraints. We then define the interaction fairness of the planned trajectories and prove that interaction-fair trajectories correspond to the variational equilibrium (VE) of this game. Subsequently, we propose a semi-decentralized planner for the vehicles to seek VE-based fair trajectories, in which each CAV optimizes its individual trajectory based on neighboring CAVs’ information shared through V2X, and the roadside unit takes the role of updating multipliers for collision avoidance constraints. The approach can significantly improve computational efficiency through parallel computing among CAVs, and enhance the safety of planned trajectories by ensuring equilibrium concordance among CAVs. Finally, we conduct Monte Carlo experiments in multiple situations at an intersection, where the empirical results show the advantages of SVEP, including the fast computation speed, a small communication payload, high scalability, equilibrium concordance, and safety, making it a promising solution for trajectory planning in connected traffic scenarios. To the best of our knowledge, this is the first study to achieve semi-distributed solving of a game with coupled constraints in a CAV trajectory planning problem.

Journal Article

Share this book

Add to My Shelf

STV-SC: Segmentation and Temporal Verification Enhanced Scan Context for Place Recognition in Unstructured Environment

by Hong, Yiguang , Zhang, Fu , Lei, Jinlong in Algorithms , Dictionaries , Intelligent systems

2022

Place recognition is an essential part of simultaneous localization and mapping (SLAM). LiDAR-based place recognition relies almost exclusively on geometric information. However, geometric information may become unreliable when faced with environments dominated by unstructured objects. In this paper, we explore the role of segmentation for extracting key structured information. We propose STV-SC, a novel segmentation and temporal verification enhanced place recognition method for unstructured environments. It contains a range image-based 3D point segmentation algorithm and a three-stage process to detect a loop. The three-stage method consists of a two-stage candidate loop search process and a one-stage segmentation and temporal verification (STV) process. Our STV process utilizes the time-continuous feature of SLAM to determine whether there is an occasional mismatch. We quantitatively demonstrate that the STV process can trigger false detections caused by unstructured objects and effectively extract structured objects to avoid outliers. Comparison with state-of-art algorithms on public datasets shows that STV-SC can run online and achieve improved performance in unstructured environments (Under the same precision, the recall rate is 1.4∼16% higher than Scan context). Therefore, our algorithm can effectively avoid the mismatching caused by the original algorithm in unstructured environment and improve the environmental adaptability of mobile agents.

Journal Article

Share this book

Add to My Shelf

Effective distributed algorithm for solving linear matrix equations

by Fan, Yuan , Hong, Yiguang , Cheng, Songsong in Algorithms , Computer Science , Information Systems and Communication Service

2023

Journal Article

Share this book

Add to My Shelf

Rate Analysis of Coupled Distributed Stochastic Approximation for Misspecified Optimization

by Yang, Yaqun , Lei, Jinlong in Algorithms , Approximation , Convergence

2024

We consider an \\(n\\) agents distributed optimization problem with imperfect information characterized in a parametric sense, where the unknown parameter can be solved by a distinct distributed parameter learning problem. Though each agent only has access to its local parameter learning and computational problem, they mean to collaboratively minimize the average of their local cost functions. To address the special optimization problem, we propose a coupled distributed stochastic approximation algorithm, in which every agent updates the current beliefs of its unknown parameter and decision variable by stochastic approximation method; and then averages the beliefs and decision variables of its neighbors over network in consensus protocol. Our interest lies in the convergence analysis of this algorithm. We quantitatively characterize the factors that affect the algorithm performance, and prove that the mean-squared error of the decision variable is bounded by \\(\\mathcal{O}(\\frac{1}{nk})+\\mathcal{O}\\left(\\frac{1}{\\sqrt{n}(1-\\rho_w)}\\right)\\frac{1}{k^{1.5}}+\\mathcal{O}\\big(\\frac{1}{(1-\\rho_w)^2} \\big)\\frac{1}{k^2}\\), where \\(k\\) is the iteration count and \\((1-\\rho_w)\\) is the spectral gap of the network weighted adjacency matrix. It reveals that the network connectivity characterized by \\((1-\\rho_w)\\) only influences the high order of convergence rate, while the domain rate still acts the same as the centralized algorithm. In addition, we analyze that the transient iteration needed for reaching its dominant rate \\(\\mathcal{O}(\\frac{1}{nk})\\) is \\(\\mathcal{O}(\\frac{n}{(1-\\rho_w)^2})\\). Numerical experiments are carried out to demonstrate the theoretical results by taking different CPUs as agents, which is more applicable to real-world distributed scenarios.

Paper

Share this book

Add to My Shelf

Variance-Reduced Accelerated First-order Methods: Central Limit Theorems and Confidence Statements

by Lei, Jinlong , Shanbhag, Uday V in Confidence , Convergence , Convexity

2024

In this paper, we study a stochastic strongly convex optimization problem and propose three classes of variable sample-size stochastic first-order methods including the standard stochastic gradient descent method, its accelerated variant, and the stochastic heavy ball method. In the iterates of each scheme, the unavailable exact gradients are approximated by averaging across an increasing batch size of sampled gradients. We prove that when the sample-size increases geometrically, the generated estimates converge in mean to the optimal solution at a geometric rate. Based on this result, we provide a unified framework to show that the rescaled estimation errors converge in distribution to a normal distribution, in which the covariance matrix depends on the Hessian matrix, covariance of the gradient noise, and the steplength. If the sample-size increases at a polynomial rate, we show that the estimation errors decay at the corresponding polynomial rate and establish the corresponding central limit theorems (CLTs). Finally, we provide an avenue to construct confidence regions for the optimal solution based on the established CLTs, and test the theoretic findings on a stochastic parameter estimation problem.

Paper

Share this book

Add to My Shelf

HGFormer: A Hierarchical Graph Transformer Framework for Two-Stage Colonel Blotto Games via Reinforcement Learning

by Yang, Lv , Peng, Yi , Lei, Jinlong in Decision making , Game theory , Games

2025

Two-stage Colonel Blotto game represents a typical adversarial resource allocation problem, in which two opposing agents sequentially allocate resources in a network topology across two phases: an initial resource deployment followed by multiple rounds of dynamic reallocation adjustments. The sequential dependency between game stages and the complex constraints imposed by the graph topology make it difficult for traditional approaches to attain a globally optimal strategy. To address these challenges, we propose a hierarchical graph Transformer framework called HGformer. By incorporating an enhanced graph Transformer encoder with structural biases and a two-agent hierarchical decision model, our approach enables efficient policy generation in large-scale adversarial environments. Moreover, we design a layer-by-layer feedback reinforcement learning algorithm that feeds the long-term returns from lower-level decisions back into the optimization of the higher-level strategy, thus bridging the coordination gap between the two decision-making stages. Experimental results demonstrate that, compared to existing hierarchical decision-making or graph neural network methods, HGformer significantly improves resource allocation efficiency and adversarial payoff, achieving superior overall performance in complex dynamic game scenarios.

Paper

Share this book

Add to My Shelf

Distributed Riemannian Stochastic Gradient Tracking Algorithm on the Stiefel Manifold

by Lei, Jinlong , Zhao, Jishu , Wang, Xi in Algorithms , Cost function , Manifolds (mathematics)

2025

This paper focus on investigating the distributed Riemannian stochastic optimization problem on the Stiefel manifold for multi-agent systems, where all the agents work collaboratively to optimize a function modeled by the average of their expectation-valued local costs. Each agent only processes its own local cost function and communicate with neighboring agents to achieve optimal results while ensuring consensus. Since the local Riemannian gradient in stochastic regimes cannot be directly calculated, we will estimate the gradient by the average of a variable number of sampled gradient, which however brings about noise to the system. We then propose a distributed Riemannian stochastic optimization algorithm on the Stiefel manifold by combining the variable sample size gradient approximation method with the gradient tracking dynamic. It is worth noticing that the suitably chosen increasing sample size plays an important role in improving the algorithm efficiency, as it reduces the noise variance. In an expectation-valued sense, the iterates of all agents are proved to converge to a stationary point (or neighborhood) with fixed step sizes. We further establish the convergence rate of the iterates for the cases when the sample size is exponentially increasing, polynomial increasing, or a constant, respectively. Finally, numerical experiments are implemented to demonstrate the theoretical results.

Paper

Share this book

Add to My Shelf

A Local Information Aggregation based Multi-Agent Reinforcement Learning for Robot Swarm Dynamic Task Allocation

by Yang, Lv , Peng, Yi , Lei, Jinlong in Algorithms , Heuristic methods , Machine learning

2024

In this paper, we explore how to optimize task allocation for robot swarms in dynamic environments, emphasizing the necessity of formulating robust, flexible, and scalable strategies for robot cooperation. We introduce a novel framework using a decentralized partially observable Markov decision process (Dec_POMDP), specifically designed for distributed robot swarm networks. At the core of our methodology is the Local Information Aggregation Multi-Agent Deep Deterministic Policy Gradient (LIA_MADDPG) algorithm, which merges centralized training with distributed execution (CTDE). During the centralized training phase, a local information aggregation (LIA) module is meticulously designed to gather critical data from neighboring robots, enhancing decision-making efficiency. In the distributed execution phase, a strategy improvement method is proposed to dynamically adjust task allocation based on changing and partially observable environmental conditions. Our empirical evaluations show that the LIA module can be seamlessly integrated into various CTDE-based MARL methods, significantly enhancing their performance. Additionally, by comparing LIA_MADDPG with six conventional reinforcement learning algorithms and a heuristic algorithm, we demonstrate its superior scalability, rapid adaptation to environmental changes, and ability to maintain both stability and convergence speed. These results underscore LIA_MADDPG's outstanding performance and its potential to significantly improve dynamic task allocation in robot swarms through enhanced local collaboration and adaptive strategy execution.

Paper

Share this book

Add to My Shelf

A Semi-decentralized and Variational-Equilibrium-Based Trajectory Planner for Connected and Autonomous Vehicles

by Peng, Yi , Liu, Zhengqin , Lei, Jinlong in Computational efficiency , Trajectory planning , Vehicles

2024

This paper designs a novel trajectory planning approach to resolve the computational efficiency and safety problems in uncoordinated methods by exploiting vehicle-to-everything (V2X) technology. The trajectory planning for connected and autonomous vehicles (CAVs) is formulated as a game with coupled safety constraints. We then define interaction-fair trajectories and prove that they correspond to the variational equilibrium (VE) of this game. We propose a semi-decentralized planner for the vehicles to seek VE-based fair trajectories, which can significantly improve computational efficiency through parallel computing among CAVs and enhance the safety of planned trajectories by ensuring equilibrium concordance among CAVs. Finally, experimental results show the advantages of the approach, including fast computation speed, high scalability, equilibrium concordance, and safety.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter