Source author record

Zongying Shi

Zongying Shi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory Machine Learning math.OC

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Evaluation and Learning in Two-Player Symmetric Games via Best and Better Responses

Artificial intelligence and robotic competitions are accompanied by a class of game paradigms in which each player privately commits a strategy to a game system which simulates the game using the collected joint strategy and then returns payoffs to players. This paper considers the strategy commitment for two-player symmetric games in which the players' strategy spaces are identical and their payoffs are symmetric. First, we introduce two digraph-based metrics at a meta-level for strategy evaluation in two-agent reinforcement learning, grounded on sink equilibrium. The metrics rank the strategies of a single player and determine the set of strategies which are preferred for the private commitment. Then, in order to find the preferred strategies under the metrics, we propose two variants of the classical learning algorithm self-play, called strictly best-response and weakly better-response self-plays. By modeling learning processes as walks over joint-strategy response digraphs, we prove that the learnt strategies by two variants are preferred under two metrics, respectively. The preferred strategies under both two metrics are identified and adjacency matrices induced by one metric and one variant are connected. Finally, simulations are provided to illustrate the results.

preprint2020arXiv

Matching-Based Capture Strategies for 3D Heterogeneous Multiplayer Reach-Avoid Differential Games

This paper studies a 3D multiplayer reach-avoid differential game with a goal region and a play region. Multiple pursuers defend the goal region by consecutively capturing multiple evaders in the play region. The players have heterogeneous moving speeds and the pursuers have heterogeneous capture radii. Since this game is hard to analyze directly, we decompose the whole game as many subgames which involve multiple pursuers and only one evader. Then, these subgames are used as a building block for the pursuer-evader matching. First, for multiple pursuers and one evader, we introduce an evasion space (ES) method characterized by a potential function to construct a guaranteed pursuer winning strategy. Then, based on this strategy, we develop conditions to determine whether a pursuit team can guard the goal region against one evader. It is shown that in 3D, if a pursuit team is able to defend the goal region against an evader, then at most three pursuers in the team are necessarily needed. We also compute the value function of the Hamilton-Jacobi-Isaacs (HJI) equation for a special subgame of degree. To capture the maximum number of evaders in the open-loop sense, we formulate a maximum bipartite matching problem with conflict graph (MBMC). We show that the MBMC is NP-hard and design a polynomial-time constant-factor approximation algorithm to solve it. Finally, we propose a receding horizon strategy for the pursuit team where in each horizon an MBMC is solved and the strategies of the pursuers are given. We also extend our results to the case of a bounded convex play region where the evaders escape through an exit. Two numerical examples are provided to demonstrate the obtained results.

preprint2020arXiv

Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning via Best Response

This paper introduces two metrics (cycle-based and memory-based metrics), grounded on a dynamical game-theoretic solution concept called sink equilibrium, for the evaluation, ranking, and computation of policies in multi-agent learning. We adopt strict best response dynamics (SBRD) to model selfish behaviors at a meta-level for multi-agent reinforcement learning. Our approach can deal with dynamical cyclical behaviors (unlike approaches based on Nash equilibria and Elo ratings), and is more compatible with single-agent reinforcement learning than alpha-rank which relies on weakly better responses. We first consider settings where the difference between largest and second largest underlying metric has a known lower bound. With this knowledge we propose a class of perturbed SBRD with the following property: only policies with maximum metric are observed with nonzero probability for a broad class of stochastic games with finite memory. We then consider settings where the lower bound for the difference is unknown. For this setting, we propose a class of perturbed SBRD such that the metrics of the policies observed with nonzero probability differ from the optimal by any given tolerance. The proposed perturbed SBRD addresses the opponent-induced non-stationarity by fixing the strategies of others for the learning agent, and uses empirical game-theoretic analysis to estimate payoffs for each strategy profile obtained due to the perturbation.

Zongying Shi

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Evaluation and Learning in Two-Player Symmetric Games via Best and Better Responses

Matching-Based Capture Strategies for 3D Heterogeneous Multiplayer Reach-Avoid Differential Games

Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning via Best Response