Source author record

Muhammed O. Sayin

Muhammed O. Sayin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory Systems and Control Machine Learning math.OC math.DS Cryptography and Security eess.SY Information Theory math.IT Numerical Analysis

Catalog footprint

What is connected

10works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Fictitious Play in Markov Games with Single Controller

Certain but important classes of strategic-form games, including zero-sum and identical-interest games, have the fictitious-play-property (FPP), i.e., beliefs formed in fictitious play dynamics always converge to a Nash equilibrium (NE) in the repeated play of these games. Such convergence results are seen as a (behavioral) justification for the game-theoretical equilibrium analysis. Markov games (MGs), also known as stochastic games, generalize the repeated play of strategic-form games to dynamic multi-state settings with Markovian state transitions. In particular, MGs are standard models for multi-agent reinforcement learning -- a reviving research area in learning and games, and their game-theoretical equilibrium analyses have also been conducted extensively. However, whether certain classes of MGs have the FPP or not (i.e., whether there is a behavioral justification for equilibrium analysis or not) remains largely elusive. In this paper, we study a new variant of fictitious play dynamics for MGs and show its convergence to an NE in n-player identical-interest MGs in which a single player controls the state transitions. Such games are of interest in communications, control, and economics applications. Our result together with the recent results in [Sayin et al. 2020] establishes the FPP of two-player zero-sum MGs and n-player identical-interest MGs with a single controller (standing at two different ends of the MG spectrum from fully competitive to fully cooperative).

preprint2022arXiv

Fictitious play in zero-sum stochastic games

We present a novel variant of fictitious play dynamics combining classical fictitious play with Q-learning for stochastic games and analyze its convergence properties in two-player zero-sum stochastic games. Our dynamics involves players forming beliefs on the opponent strategy and their own continuation payoff (Q-function), and playing a greedy best response by using the estimated continuation payoffs. Players update their beliefs from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.

preprint2022arXiv

On the Heterogeneity of Independent Learning Dynamics in Zero-sum Stochastic Games

We analyze the convergence properties of the two-timescale fictitious play combining the classical fictitious play with the Q-learning for two-player zero-sum stochastic games with player-dependent learning rates. We show its almost sure convergence under the standard assumptions in two-timescale stochastic approximation methods when the discount factor is less than the product of the ratios of player-dependent step sizes. To this end, we formulate a novel Lyapunov function formulation and present a one-sided asynchronous convergence result.

preprint2020arXiv

Bayesian Persuasion with State-Dependent Quadratic Cost Measures

We address Bayesian persuasion between a sender and a receiver with state-dependent quadratic cost measures for general classes of distributions. The receiver seeks to make mean-square-error estimate of a state based on a signal sent by the sender while the sender signals strategically in order to control the receiver's estimate in a certain way. Such a scheme could model, e.g., deception and privacy, problems in multi-agent systems. Existing solution concepts are not viable since here the receiver has continuous action space. We show that for finite state spaces, optimal signaling strategies can be computed through an equivalent linear optimization problem over the cone of completely positive matrices. We then establish its strong duality to a copositive program. To exemplify the effectiveness of this equivalence result, we adopt sequential polyhedral approximation of completely-positive cones and analyze its performance numerically. We also quantify the approximation error for a quantized version of a continuous distribution and show that a semi-definite program relaxation of the equivalent problem could be a benchmark lower bound for the sender's cost for large state spaces.

preprint2020arXiv

Persuasion-based Robust Sensor Design Against Attackers with Unknown Control Objectives

In this paper, we introduce a robust sensor design framework to provide "persuasion-based" defense in stochastic control systems against an unknown type attacker with a control objective exclusive to its type. For effective control, such an attacker's actions depend on its belief on the underlying state of the system. We design a robust "linear-plus-noise" signaling strategy to encode sensor outputs in order to shape the attacker's belief in a strategic way and correspondingly to persuade the attacker to take actions that lead to minimum damage with respect to the system's objective. The specific model we adopt is a Gauss-Markov process driven by a controller with a (partially) "unknown" malicious/benign control objective. We seek to defend against the worst possible distribution over control objectives in a robust way under the solution concept of Stackelberg equilibrium, where the sensor is the leader. We show that a necessary and sufficient condition on the covariance matrix of the posterior belief is a certain linear matrix inequality and we provide a closed-form solution for the associated signaling strategy. This enables us to formulate an equivalent tractable problem, indeed a semi-definite program, to compute the robust sensor design strategies "globally" even though the original optimization problem is non-convex and highly nonlinear. We also extend this result to scenarios where the sensor makes noisy or partial measurements. Finally, we analyze the ensuing performance numerically for various scenarios.

preprint2015arXiv

Stochastic Subgradient Algorithms for Strongly Convex Optimization over Distributed Networks

We study diffusion and consensus based optimization of a sum of unknown convex objective functions over distributed networks. The only access to these functions is through stochastic gradient oracles, each of which is only available at a different node, and a limited number of gradient oracle calls is allowed at each node. In this framework, we introduce a convex optimization algorithm based on the stochastic gradient descent (SGD) updates. Particularly, we use a carefully designed time-dependent weighted averaging of the SGD iterates, which yields a convergence rate of $O\left(\frac{N\sqrt{N}}{T}\right)$ after $T$ gradient updates for each node on a network of $N$ nodes. We then show that after $T$ gradient oracle calls, the average SGD iterate achieves a mean square deviation (MSD) of $O\left(\frac{\sqrt{N}}{T}\right)$. This rate of convergence is optimal as it matches the performance lower bound up to constant terms. Similar to the SGD algorithm, the computational complexity of the proposed algorithm also scales linearly with the dimensionality of the data. Furthermore, the communication load of the proposed method is the same as the communication load of the SGD algorithm. Thus, the proposed algorithm is highly efficient in terms of complexity and communication load. We illustrate the merits of the algorithm with respect to the state-of-art methods over benchmark real life data sets and widely studied network topologies.

preprint2014arXiv

Compressive Diffusion Strategies Over Distributed Networks for Reduced Communication Load

We study the compressive diffusion strategies over distributed networks based on the diffusion implementation and adaptive extraction of the information from the compressed diffusion data. We demonstrate that one can achieve a comparable performance with the full information exchange configurations, even if the diffused information is compressed into a scalar or a single bit. To this end, we provide a complete performance analysis for the compressive diffusion strategies. We analyze the transient, steady-state and tracking performance of the configurations in which the diffused data is compressed into a scalar or a single-bit. We propose a new adaptive combination method improving the convergence performance of the compressive diffusion strategies further. In the new method, we introduce one more freedom-of-dimension in the combination matrix and adapt it by using the conventional mixture approach in order to enhance the convergence performance for any possible combination rule used for the full diffusion configuration. We demonstrate that our theoretical analysis closely follow the ensemble averaged results in our simulations. We provide numerical examples showing the improved convergence performance with the new adaptive combination method.

preprint2014arXiv

Predicting Nearly As Well As the Optimal Twice Differentiable Regressor

We study nonlinear regression of real valued data in an individual sequence manner, where we provide results that are guaranteed to hold without any statistical assumptions. We address the convergence and undertraining issues of conventional nonlinear regression methods and introduce an algorithm that elegantly mitigates these issues via an incremental hierarchical structure, (i.e., via an incremental decision tree). Particularly, we present a piecewise linear (or nonlinear) regression algorithm that partitions the regressor space in a data driven manner and learns a linear model at each region. Unlike the conventional approaches, our algorithm gradually increases the number of disjoint partitions on the regressor space in a sequential manner according to the observed data. Through this data driven approach, our algorithm sequentially and asymptotically achieves the performance of the optimal twice differentiable regression function for any data sequence with an unknown and arbitrary length. The computational complexity of the introduced algorithm is only logarithmic in the data length under certain regularity conditions. We provide the explicit description of the algorithm and demonstrate the significant gains for the well-known benchmark real data sets and chaotic signals.

preprint2013arXiv

A Novel Family of Adaptive Filtering Algorithms Based on The Logarithmic Cost

We introduce a novel family of adaptive filtering algorithms based on a relative logarithmic cost. The new family intrinsically combines the higher and lower order measures of the error into a single continuous update based on the error amount. We introduce important members of this family of algorithms such as the least mean logarithmic square (LMLS) and least logarithmic absolute difference (LLAD) algorithms that improve the convergence performance of the conventional algorithms. However, our approach and analysis are generic such that they cover other well-known cost functions as described in the paper. The LMLS algorithm achieves comparable convergence performance with the least mean fourth (LMF) algorithm and extends the stability bound on the step size. The LLAD and least mean square (LMS) algorithms demonstrate similar convergence performance in impulse-free noise environments while the LLAD algorithm is robust against impulsive interferences and outperforms the sign algorithm (SA). We analyze the transient, steady state and tracking performance of the introduced algorithms and demonstrate the match of the theoretical analyzes and simulation results. We show the extended stability bound of the LMLS algorithm and analyze the robustness of the LLAD algorithm against impulsive interferences. Finally, we demonstrate the performance of our algorithms in different scenarios through numerical examples.

preprint2013arXiv

Single Bit and Reduced Dimension Diffusion Strategies Over Distributed Networks

We introduce novel diffusion based adaptive estimation strategies for distributed networks that have significantly less communication load and achieve comparable performance to the full information exchange configurations. After local estimates of the desired data is produced in each node, a single bit of information (or a reduced dimensional data vector) is generated using certain random projections of the local estimates. This newly generated data is diffused and then used in neighboring nodes to recover the original full information. We provide the complete state-space description and the mean stability analysis of our algorithms.

Muhammed O. Sayin

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Fictitious Play in Markov Games with Single Controller

Fictitious play in zero-sum stochastic games

On the Heterogeneity of Independent Learning Dynamics in Zero-sum Stochastic Games

Bayesian Persuasion with State-Dependent Quadratic Cost Measures

Persuasion-based Robust Sensor Design Against Attackers with Unknown Control Objectives

Stochastic Subgradient Algorithms for Strongly Convex Optimization over Distributed Networks

Compressive Diffusion Strategies Over Distributed Networks for Reduced Communication Load

Predicting Nearly As Well As the Optimal Twice Differentiable Regressor

A Novel Family of Adaptive Filtering Algorithms Based on The Logarithmic Cost

Single Bit and Reduced Dimension Diffusion Strategies Over Distributed Networks