Source author record

Ilai Bistritz

Ilai Bistritz appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory econ.GN Machine Learning Multiagent Systems q-fin.EC

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Do Informational Cascades Happen with Non-myopic Agents?

We consider an environment where players need to decide whether to buy a certain product (or adopt a technology) or not. The product is either good or bad, but its true value is unknown to the players. Instead, each player has her own private information on its quality. Each player can observe the previous actions of other players and estimate the quality of the product. A classic result in the literature shows that in similar settings informational cascades occur where learning stops for the whole network and players repeat the actions of their predecessors. In contrast to this literature, in this work, players get more than one opportunity to act. In each turn, a player is chosen uniformly at random from all players and can decide to buy the product and leave the market or wait. Her utility is the total expected discounted reward, and thus myopic strategies may not constitute equilibria. We provide a characterization of perfect Bayesian equilibria (PBE) with forward-looking strategies through a fixed-point equation of dimensionality that grows only quadratically with the number of players. Using this tractable fixed-point equation, we show the existence of a PBE and characterize PBE with threshold strategies. Based on this characterization we study informational cascades in two regimes. First, we show that for a discount factor δ strictly smaller than one, informational cascades happen with high probability as the number of players N increases. Furthermore, only a small portion of the total information in the system is revealed before a cascade occurs ...

preprint2022arXiv

No Weighted-Regret Learning in Adversarial Bandits with Delays

Consider a scenario where a player chooses an action in each round $t$ out of $T$ rounds and observes the incurred cost after a delay of $d_{t}$ rounds. The cost functions and the delay sequence are chosen by an adversary. We show that in a non-cooperative game, the expected weighted ergodic distribution of play converges to the set of coarse correlated equilibria if players use algorithms that have "no weighted-regret" in the above scenario, even if they have linear regret due to too large delays. For a two-player zero-sum game, we show that no weighted-regret is sufficient for the weighted ergodic average of play to converge to the set of Nash equilibria. We prove that the FKM algorithm with $n$ dimensions achieves an expected regret of $O\left(nT^{\frac{3}{4}}+\sqrt{n}T^{\frac{1}{3}}D^{\frac{1}{3}}\right)$ and the EXP3 algorithm with $K$ arms achieves an expected regret of $O\left(\sqrt{\log K\left(KT+D\right)}\right)$ even when $D=\sum_{t=1}^{T}d_{t}$ and $T$ are unknown. These bounds use a novel doubling trick that, under mild assumptions, provably retains the regret bound for when $D$ and $T$ are known. Using these bounds, we show that FKM and EXP3 have no weighted-regret even for $d_{t}=O\left(t\log t\right)$. Therefore, algorithms with no weighted-regret can be used to approximate a CCE of a finite or convex unknown game that can only be simulated with bandit feedback, even if the simulation involves significant delays.

preprint2020arXiv

My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits

Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, representable as an NxM matrix. These utilities are unknown to the players. In each turn players select an arm and receive a noisy observation of their utility for it. However, if any other players selected the same arm that turn, all colliding players will all receive zero utility due to the conflict. No other communication or coordination between the players is possible. Our goal is to design a distributed algorithm that learns the matching between players and arms that achieves max-min fairness while minimizing the regret. We present an algorithm and prove that it is regret optimal up to a $\log\log T$ factor. This is the first max-min fairness multi-player bandit algorithm with (near) order optimal regret.