Source author record

Jianghai Hu

Jianghai Hu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC eess.SY Systems and Control

Catalog footprint

What is connected

5works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Zeroth-Order Learning in Continuous Games via Residual Pseudogradient Estimates

A variety of practical problems can be modeled by the decision-making process in multi-player games where a group of self-interested players aim at optimizing their own local objectives, while the objectives depend on the actions taken by others. The local gradient information of each player, essential in implementing algorithms for finding game solutions, is all too often unavailable. In this paper, we focus on designing solution algorithms for multi-player games using bandit feedback, i.e., the only available feedback at each player's disposal is the realized objective values. To tackle the issue of large variances in the existing bandit learning algorithms with a single oracle call, we propose two algorithms by integrating the residual feedback scheme into single-call extra-gradient methods. Subsequently, we show that the actual sequences of play can converge almost surely to a critical point if the game is pseudo-monotone plus and characterize the convergence rate to the critical point when the game is strongly pseudo-monotone. The ergodic convergence rates of the generated sequences in monotone games are also investigated as a supplement. Finally, the validity of the proposed algorithms is further verified via numerical examples.

preprint2022arXiv

Distributed Computation of Stochastic GNE with Partial Information: An Augmented Best-Response Approach

In this paper, we focus on the stochastic generalized Nash equilibrium problem (SGNEP) which is an important and widely-used model in many different fields. In this model, subject to certain global resource constraints, a set of self-interested players aim to optimize their local objectives that depend on their own decisions and the decisions of others and are influenced by some random factors. We propose a distributed stochastic generalized Nash equilibrium seeking algorithm in a partial-decision information setting based on the Douglas-Rachford operator splitting scheme, which relaxes assumptions in the existing literature. The proposed algorithm updates players' local decisions through augmented best-response schemes and subsequent projections onto the local feasible sets, which occupy most of the computational workload. The projected stochastic subgradient method is applied to provide approximate solutions to the augmented best-response subproblems for each player. The Robbins-Siegmund theorem is leveraged to establish the main convergence results to a true Nash equilibrium using the proposed inexact solver. Finally, we illustrate the validity of the proposed algorithm via two numerical examples, i.e., a stochastic Nash-Cournot distribution game and a multi-product assembly problem with the two-stage model.

preprint2022arXiv

Distributed Stochastic Nash Equilibrium Learning in Locally Coupled Network Games with Unknown Parameters

In stochastic Nash equilibrium problems (SNEPs), it is natural for players to be uncertain about their complex environments and have multi-dimensional unknown parameters in their models. Among various SNEPs, this paper focuses on locally coupled network games where the objective of each rational player is subject to the aggregate influence of its neighbors. We propose a distributed learning algorithm based on the proximal-point iteration and ordinary least-square estimator, where each player repeatedly updates the local estimates of neighboring decisions, makes its augmented best-response decisions given the current estimated parameters, receives the realized objective values, and learns the unknown parameters. Leveraging the Robbins-Siegmund theorem and the law of large deviations for M-estimators, we establish the almost sure convergence of the proposed algorithm to solutions of SNEPs when the updating step sizes decay at a proper rate.

preprint2020arXiv

Column Partition based Distributed Algorithms for Coupled Convex Sparse Optimization: Dual and Exact Regularization Approaches

This paper develops column partition based distributed schemes for a class of large-scale convex sparse optimization problems, e.g., basis pursuit (BP), LASSO, basis pursuit denosing (BPDN), and their extensions, e.g., fused LASSO. We are particularly interested in the cases where the number of (scalar) decision variables is much larger than the number of (scalar) measurements, and each agent has limited memory or computing capacity such that it only knows a small number of columns of a measurement matrix. These problems in consideration are densely coupled and cannot be formulated as separable convex programs using column partition. To overcome this difficulty, we consider their dual problems which are separable or locally coupled. Once a dual solution is attained, it is shown that a primal solution can be found from the dual of corresponding regularized BP-like problems under suitable exact regularization conditions. A wide range of existing distributed schemes can be exploited to solve the obtained dual problems. This yields two-stage column partition based distributed schemes for LASSO-like and BPDN-like problems; the overall convergence of these schemes is established using sensitivity analysis techniques. Numerical results illustrate the effectiveness of the proposed schemes.

preprint2020arXiv

Primal-Dual Distributed Temporal Difference Learning

The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for a class of multi-agent Markov decision processes (MDPs). The temporal-difference (TD) learning is a reinforcement learning (RL) algorithm that learns an infinite horizon discounted cost function (or value function) for a given fixed policy without the model knowledge. In the multi-agent MDP each agent receives a local reward through a local processing. The agents communicate over sparse and random networks to learn the global value function corresponding to the aggregate of local rewards. In this paper, the problem of estimating the global value function is converted into a constrained convex optimization problem. Then, we propose a stochastic primal-dual distributed algorithm to solve it and prove that the algorithm converges to a set of solutions of the optimization problem.

Jianghai Hu

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Zeroth-Order Learning in Continuous Games via Residual Pseudogradient Estimates

Distributed Computation of Stochastic GNE with Partial Information: An Augmented Best-Response Approach

Distributed Stochastic Nash Equilibrium Learning in Locally Coupled Network Games with Unknown Parameters

Column Partition based Distributed Algorithms for Coupled Convex Sparse Optimization: Dual and Exact Regularization Approaches

Primal-Dual Distributed Temporal Difference Learning