Researcher profile

Muhammed O. Sayin

Muhammed O. Sayin contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Fictitious Play in Markov Games with Single Controller

Certain but important classes of strategic-form games, including zero-sum and identical-interest games, have the fictitious-play-property (FPP), i.e., beliefs formed in fictitious play dynamics always converge to a Nash equilibrium (NE) in the repeated play of these games. Such convergence results are seen as a (behavioral) justification for the game-theoretical equilibrium analysis. Markov games (MGs), also known as stochastic games, generalize the repeated play of strategic-form games to dynamic multi-state settings with Markovian state transitions. In particular, MGs are standard models for multi-agent reinforcement learning -- a reviving research area in learning and games, and their game-theoretical equilibrium analyses have also been conducted extensively. However, whether certain classes of MGs have the FPP or not (i.e., whether there is a behavioral justification for equilibrium analysis or not) remains largely elusive. In this paper, we study a new variant of fictitious play dynamics for MGs and show its convergence to an NE in n-player identical-interest MGs in which a single player controls the state transitions. Such games are of interest in communications, control, and economics applications. Our result together with the recent results in [Sayin et al. 2020] establishes the FPP of two-player zero-sum MGs and n-player identical-interest MGs with a single controller (standing at two different ends of the MG spectrum from fully competitive to fully cooperative).

preprint2022arXiv

Fictitious play in zero-sum stochastic games

We present a novel variant of fictitious play dynamics combining classical fictitious play with Q-learning for stochastic games and analyze its convergence properties in two-player zero-sum stochastic games. Our dynamics involves players forming beliefs on the opponent strategy and their own continuation payoff (Q-function), and playing a greedy best response by using the estimated continuation payoffs. Players update their beliefs from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.

preprint2022arXiv

On the Heterogeneity of Independent Learning Dynamics in Zero-sum Stochastic Games

We analyze the convergence properties of the two-timescale fictitious play combining the classical fictitious play with the Q-learning for two-player zero-sum stochastic games with player-dependent learning rates. We show its almost sure convergence under the standard assumptions in two-timescale stochastic approximation methods when the discount factor is less than the product of the ratios of player-dependent step sizes. To this end, we formulate a novel Lyapunov function formulation and present a one-sided asynchronous convergence result.

preprint2020arXiv

Bayesian Persuasion with State-Dependent Quadratic Cost Measures

We address Bayesian persuasion between a sender and a receiver with state-dependent quadratic cost measures for general classes of distributions. The receiver seeks to make mean-square-error estimate of a state based on a signal sent by the sender while the sender signals strategically in order to control the receiver's estimate in a certain way. Such a scheme could model, e.g., deception and privacy, problems in multi-agent systems. Existing solution concepts are not viable since here the receiver has continuous action space. We show that for finite state spaces, optimal signaling strategies can be computed through an equivalent linear optimization problem over the cone of completely positive matrices. We then establish its strong duality to a copositive program. To exemplify the effectiveness of this equivalence result, we adopt sequential polyhedral approximation of completely-positive cones and analyze its performance numerically. We also quantify the approximation error for a quantized version of a continuous distribution and show that a semi-definite program relaxation of the equivalent problem could be a benchmark lower bound for the sender's cost for large state spaces.

preprint2020arXiv

Persuasion-based Robust Sensor Design Against Attackers with Unknown Control Objectives

In this paper, we introduce a robust sensor design framework to provide "persuasion-based" defense in stochastic control systems against an unknown type attacker with a control objective exclusive to its type. For effective control, such an attacker's actions depend on its belief on the underlying state of the system. We design a robust "linear-plus-noise" signaling strategy to encode sensor outputs in order to shape the attacker's belief in a strategic way and correspondingly to persuade the attacker to take actions that lead to minimum damage with respect to the system's objective. The specific model we adopt is a Gauss-Markov process driven by a controller with a (partially) "unknown" malicious/benign control objective. We seek to defend against the worst possible distribution over control objectives in a robust way under the solution concept of Stackelberg equilibrium, where the sensor is the leader. We show that a necessary and sufficient condition on the covariance matrix of the posterior belief is a certain linear matrix inequality and we provide a closed-form solution for the associated signaling strategy. This enables us to formulate an equivalent tractable problem, indeed a semi-definite program, to compute the robust sensor design strategies "globally" even though the original optimization problem is non-convex and highly nonlinear. We also extend this result to scenarios where the sensor makes noisy or partial measurements. Finally, we analyze the ensuing performance numerically for various scenarios.