Source author record

Gal Bahar

Gal Bahar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory Machine Learning Artificial Intelligence cs.CY Information Retrieval

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Fiduciary Bandits

Recommendation systems often face exploration-exploitation tradeoffs: the system can only learn about the desirability of new options by recommending them to some user. Such systems can thus be modeled as multi-armed bandit settings; however, users are self-interested and cannot be made to follow recommendations. We ask whether exploration can nevertheless be performed in a way that scrupulously respects agents' interests---i.e., by a system that acts as a fiduciary. More formally, we introduce a model in which a recommendation system faces an exploration-exploitation tradeoff under the constraint that it can never recommend any action that it knows yields lower reward in expectation than an agent would achieve if it acted alone. Our main contribution is a positive result: an asymptotically optimal, incentive compatible, and ex-ante individually rational recommendation algorithm.

preprint2020arXiv

Learning under Invariable Bayesian Safety

A recent body of work addresses safety constraints in explore-and-exploit systems. Such constraints arise where, for example, exploration is carried out by individuals whose welfare should be balanced with overall welfare. In this paper, we adopt a model inspired by recent work on a bandit-like setting for recommendations. We contribute to this line of literature by introducing a safety constraint that should be respected in every round and determines that the expected value in each round is above a given threshold. Due to our modeling, the safe explore-and-exploit policy deserves careful planning, or otherwise, it will lead to sub-optimal welfare. We devise an asymptotically optimal algorithm for the setting and analyze its instance-dependent convergence rate.

preprint2020arXiv

Multi-Issue Social Learning

We consider social learning where agents can only observe part of the population (modeled as neighbors on an undirected graph), face many decision problems, and arrival order of the agents is unknown. The central question we pose is whether there is a natural observability graph that prevents the information cascade phenomenon. We introduce the `celebrities graph' and prove that indeed it allows for proper information aggregation in large populations even when the order at which agents decide is random and even when different issues are decided in different orders.

preprint2015arXiv

Economic Recommendation Systems

In the on-line Explore and Exploit literature, central to Machine Learning, a central planner is faced with a set of alternatives, each yielding some unknown reward. The planner's goal is to learn the optimal alternative as soon as possible, via experimentation. A typical assumption in this model is that the planner has full control over the experiment design and implementation. When experiments are implemented by a society of self-motivated agents the planner can only recommend experimentation but has no power to enforce it. Kremer et al (JPE, 2014) introduce the first study of explore and exploit schemes that account for agents' incentives. In their model it is implicitly assumed that agents do not see nor communicate with each other. Their main result is a characterization of an optimal explore and exploit scheme. In this work we extend Kremer et al (JPE, 2014) by adding a layer of a social network according to which agents can observe each other. It turns out that when observability is factored in the scheme proposed by Kremer et al (JPE, 2014) is no longer incentive compatible. In our main result we provide a tight bound on how many other agents can each agent observe and still have an incentive-compatible algorithm and asymptotically optimal outcome. More technically, for a setting with N agents where the number of nodes with degree greater than N^alpha is bounded by N^beta and 2*alpha+beta < 1 we construct incentive-compatible asymptotically optimal mechanism. The bound 2*alpha+beta < 1 is shown to be tight.

Gal Bahar

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Fiduciary Bandits

Learning under Invariable Bayesian Safety

Multi-Issue Social Learning

Economic Recommendation Systems