Source author record

Yagiz Savas

Yagiz Savas appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory eess.SP Machine Learning math.OC

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

No-Regret Learning in Dynamic Stackelberg Games

In a Stackelberg game, a leader commits to a randomized strategy, and a follower chooses their best strategy in response. We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an underlying state space that affects the leader's rewards and available strategies and evolves in a Markovian manner depending on both the leader and follower's selected strategies. Although standard Stackelberg games have been utilized to improve scheduling in security domains, their deployment is often limited by requiring complete information of the follower's utility function. In contrast, we consider scenarios where the follower's utility function is unknown to the leader; however, it can be linearly parameterized. Our objective then is to provide an algorithm that prescribes a randomized strategy to the leader at each step of the game based on observations of how the follower responded in previous steps. We design a no-regret learning algorithm that, with high probability, achieves a regret bound (when compared to the best policy in hindsight) which is sublinear in the number of time steps; the degree of sublinearity depends on the number of features representing the follower's utility function. The regret of the proposed learning algorithm is independent of the size of the state space and polynomial in the rest of the parameters of the game. We show that the proposed learning algorithm outperforms existing model-free reinforcement learning approaches.

preprint2021arXiv

Physical-Layer Security via Distributed Beamforming in the Presence of Adversaries with Unknown Locations

We study the problem of securely communicating a sequence of information bits with a client in the presence of multiple adversaries at unknown locations in the environment. We assume that the client and the adversaries are located in the far-field region, and all possible directions for each adversary can be expressed as a continuous interval of directions. In such a setting, we develop a periodic transmission strategy, i.e., a sequence of joint beamforming gain and artificial noise pairs, that prevents the adversaries from decreasing their uncertainty on the information sequence by eavesdropping on the transmission. We formulate a series of nonconvex semi-infinite optimization problems to synthesize the transmission strategy. We show that the semi-definite program (SDP) relaxations of these nonconvex problems are exact under an efficiently verifiable sufficient condition. We approximate the SDP relaxations, which are subject to infinitely many constraints, by randomly sampling a finite subset of the constraints and establish the probability with which optimal solutions to the obtained finite SDPs and the semi-infinite SDPs coincide. We demonstrate with numerical simulations that the proposed periodic strategy can ensure the security of communication in scenarios in which all stationary strategies fail to guarantee security.

preprint2020arXiv

On the Complexity of Sequential Incentive Design

In many scenarios, a principal dynamically interacts with an agent and offers a sequence of incentives to align the agent's behavior with a desired objective. This paper focuses on the problem of synthesizing an incentive sequence that, once offered, induces the desired agent behavior even when the agent's intrinsic motivation is unknown to the principal. We model the agent's behavior as a Markov decision process, express its intrinsic motivation as a reward function, which belongs to a finite set of possible reward functions, and consider the incentives as additional rewards offered to the agent. We first show that the behavior modification problem (BMP), i.e., the problem of synthesizing an incentive sequence that induces a desired agent behavior at minimum total cost to the principal, is PSPACE-hard. Moreover, we show that by imposing certain restrictions on the incentive sequences available to the principal, one can obtain two NP-complete variants of the BMP. We also provide a sufficient condition on the set of possible reward functions under which the BMP can be solved via linear programming. Finally, we propose two algorithms to compute globally and locally optimal solutions to the NP-complete variants of the BMP.