Researcher profile

Zongying Shi

Zongying Shi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

Evaluation and Learning in Two-Player Symmetric Games via Best and Better Responses

Artificial intelligence and robotic competitions are accompanied by a class of game paradigms in which each player privately commits a strategy to a game system which simulates the game using the collected joint strategy and then returns payoffs to players. This paper considers the strategy commitment for two-player symmetric games in which the players' strategy spaces are identical and their payoffs are symmetric. First, we introduce two digraph-based metrics at a meta-level for strategy evaluation in two-agent reinforcement learning, grounded on sink equilibrium. The metrics rank the strategies of a single player and determine the set of strategies which are preferred for the private commitment. Then, in order to find the preferred strategies under the metrics, we propose two variants of the classical learning algorithm self-play, called strictly best-response and weakly better-response self-plays. By modeling learning processes as walks over joint-strategy response digraphs, we prove that the learnt strategies by two variants are preferred under two metrics, respectively. The preferred strategies under both two metrics are identified and adjacency matrices induced by one metric and one variant are connected. Finally, simulations are provided to illustrate the results.

preprint2020arXiv

Matching-Based Capture Strategies for 3D Heterogeneous Multiplayer Reach-Avoid Differential Games

This paper studies a 3D multiplayer reach-avoid differential game with a goal region and a play region. Multiple pursuers defend the goal region by consecutively capturing multiple evaders in the play region. The players have heterogeneous moving speeds and the pursuers have heterogeneous capture radii. Since this game is hard to analyze directly, we decompose the whole game as many subgames which involve multiple pursuers and only one evader. Then, these subgames are used as a building block for the pursuer-evader matching. First, for multiple pursuers and one evader, we introduce an evasion space (ES) method characterized by a potential function to construct a guaranteed pursuer winning strategy. Then, based on this strategy, we develop conditions to determine whether a pursuit team can guard the goal region against one evader. It is shown that in 3D, if a pursuit team is able to defend the goal region against an evader, then at most three pursuers in the team are necessarily needed. We also compute the value function of the Hamilton-Jacobi-Isaacs (HJI) equation for a special subgame of degree. To capture the maximum number of evaders in the open-loop sense, we formulate a maximum bipartite matching problem with conflict graph (MBMC). We show that the MBMC is NP-hard and design a polynomial-time constant-factor approximation algorithm to solve it. Finally, we propose a receding horizon strategy for the pursuit team where in each horizon an MBMC is solved and the strategies of the pursuers are given. We also extend our results to the case of a bounded convex play region where the evaders escape through an exit. Two numerical examples are provided to demonstrate the obtained results.

preprint2020arXiv

Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning via Best Response

This paper introduces two metrics (cycle-based and memory-based metrics), grounded on a dynamical game-theoretic solution concept called sink equilibrium, for the evaluation, ranking, and computation of policies in multi-agent learning. We adopt strict best response dynamics (SBRD) to model selfish behaviors at a meta-level for multi-agent reinforcement learning. Our approach can deal with dynamical cyclical behaviors (unlike approaches based on Nash equilibria and Elo ratings), and is more compatible with single-agent reinforcement learning than alpha-rank which relies on weakly better responses. We first consider settings where the difference between largest and second largest underlying metric has a known lower bound. With this knowledge we propose a class of perturbed SBRD with the following property: only policies with maximum metric are observed with nonzero probability for a broad class of stochastic games with finite memory. We then consider settings where the lower bound for the difference is unknown. For this setting, we propose a class of perturbed SBRD such that the metrics of the policies observed with nonzero probability differ from the optimal by any given tolerance. The proposed perturbed SBRD addresses the opponent-induced non-stationarity by fixing the strategies of others for the learning agent, and uses empirical game-theoretic analysis to estimate payoffs for each strategy profile obtained due to the perturbation.