Source author record

Svetlana Obraztsova

Svetlana Obraztsova appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory Machine Learning Multiagent Systems Artificial Intelligence cs.CY math.CO

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Mis-spoke or mis-lead: Achieving Robustness in Multi-Agent Communicative Reinforcement Learning

Recent studies in multi-agent communicative reinforcement learning (MACRL) have demonstrated that multi-agent coordination can be greatly improved by allowing communication between agents. Meanwhile, adversarial machine learning (ML) has shown that ML models are vulnerable to attacks. Despite the increasing concern about the robustness of ML algorithms, how to achieve robust communication in multi-agent reinforcement learning has been largely neglected. In this paper, we systematically explore the problem of adversarial communication in MACRL. Our main contributions are threefold. First, we propose an effective method to perform attacks in MACRL, by learning a model to generate optimal malicious messages. Second, we develop a defence method based on message reconstruction, to maintain multi-agent coordination under message attacks. Third, we formulate the adversarial communication problem as a two-player zero-sum game and propose a game-theoretical method R-MACRL to improve the worst-case defending performance. Empirical results demonstrate that many state-of-the-art MACRL methods are vulnerable to message attacks, and our method can significantly improve their robustness.

preprint2022arXiv

Off-Beat Multi-Agent Reinforcement Learning

We investigate model-free multi-agent reinforcement learning (MARL) in environments where off-beat actions are prevalent, i.e., all actions have pre-set execution durations. During execution durations, the environment changes are influenced by, but not synchronised with, action execution. Such a setting is ubiquitous in many real-world problems. However, most MARL methods assume actions are executed immediately after inference, which is often unrealistic and can lead to catastrophic failure for multi-agent coordination with off-beat actions. In order to fill this gap, we develop an algorithmic framework for MARL with off-beat actions. We then propose a novel episodic memory, LeGEM, for model-free MARL algorithms. LeGEM builds agents' episodic memories by utilizing agents' individual experiences. It boosts multi-agent learning by addressing the challenging temporal credit assignment problem raised by the off-beat actions via our novel reward redistribution scheme, alleviating the issue of non-Markovian reward. We evaluate LeGEM on various multi-agent scenarios with off-beat actions, including Stag-Hunter Game, Quarry Game, Afforestation Game, and StarCraft II micromanagement tasks. Empirical results show that LeGEM significantly boosts multi-agent coordination and achieves leading performance and improved sample efficiency.

preprint2021arXiv

Reaching Consensus Under a Deadline

Committee decisions are complicated by a deadline, e.g., the next start of a budget, or the beginning of a semester. In committee hiring decisions, it may be that if no candidate is supported by a strong majority, the default is to hire no one - an option that may cost dearly. As a result, committee members might prefer to agree on a reasonable, if not necessarily the best, candidate, to avoid unfilled positions. In this paper, we propose a model for the above scenario - Consensus Under a Deadline (CUD)- based on a time-bounded iterative voting process. We provide convergence guarantees and an analysis of the quality of the final decision. An extensive experimental study demonstrates more subtle features of CUDs, e.g., the difference between two simple types of committee member behavior, lazy vs.~proactive voters. Finally, a user study examines the differences between the behavior of rational voting bots and real voters, concluding that it may often be best to have bots play on the voters' behalf.

preprint2016arXiv

Teams in Online Scheduling Polls: Game-Theoretic Aspects

Consider an important meeting to be held in a team-based organization. Taking availability constraints into account, an online scheduling poll is being used in order to decide upon the exact time of the meeting. Decisions are to be taken during the meeting, therefore each team would like to maximize its relative attendance in the meeting (i.e., the proportional number of its participating team members). We introduce a corresponding game, where each team can declare (in the scheduling poll) a lower total availability, in order to improve its relative attendance---the pay-off. We are especially interested in situations where teams can form coalitions. We provide an efficient algorithm that, given a coalition, finds an optimal way for each team in a coalition to improve its pay-off. In contrast, we show that deciding whether such a coalition exists is NP-hard. We also study the existence of Nash equilibria: Finding Nash equilibria for various small sizes of teams and coalitions can be done in polynomial time while it is coNP-hard if the coalition size is unbounded.

preprint2014arXiv

Equilibria of Plurality Voting: Lazy and Truth-biased Voters

We present a systematic study of Plurality elections with strategic voters who, in addition to having preferences over election winners, have secondary preferences, which govern their behavior when their vote cannot affect the election outcome. Specifically, we study two models that have been recently considered in the literature: lazy voters, who prefer to abstain when they are not pivotal, and truth-biased voters, who prefer to vote truthfully when they are not pivotal. We extend prior work by investigating the behavior of both lazy and truth-biased voters under different tie-breaking rules (lexicographic rule, random voter rule, random candidate rule). Two of these six combinations of secondary preferences and a tie-breaking rule have been studied in prior work. In order to understand the impact of different secondary preferences and tie-breaking rules on the election outcomes, we study the remaining four combinations. We characterize pure Nash equilibria (PNE) of the resulting strategic games and study the complexity of related computational problems. Our results extend to settings where some of the voters may be non-strategic.

preprint2010arXiv

On the chromatic uniqueness of $K_4$-homeomorphs with girth 7

This paper settled the question, which remains open in article published in Discrete Mathematics in 2008. Graphs homeomorphic to $K_4$, i.e., cliques on 4 vertices with edges replaced by paths are considered in this work. This work completes the study of chromaticity of $K_4$-homeomorphs of girth 7.