Source author record

Kannan Ramachandran

Kannan Ramachandran appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Science and Game Theory Machine Learning math.OC Multiagent Systems

Catalog footprint

What is connected

2works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

The Square Root Agreement Rule for Incentivizing Truthful Feedback on Online Platforms

A major challenge in obtaining evaluations of products or services on e-commerce platforms is eliciting informative responses in the absence of verifiability. This paper proposes the Square Root Agreement Rule (SRA): a simple reward mechanism that incentivizes truthful responses to objective evaluations on such platforms. In this mechanism, an agent gets a reward for an evaluation only if her answer matches that of her peer, where this reward is inversely proportional to a popularity index of the answer. This index is defined to be the square root of the empirical frequency at which any two agents performing the same evaluation agree on the particular answer across evaluations of similar entities operating on the platform. Rarely agreed-upon answers thus earn a higher reward than answers for which agreements are relatively more common. We show that in the many tasks regime, the truthful equilibrium under SRA is strictly payoff-dominant across large classes of natural equilibria that could arise in these settings, thus increasing the likelihood of its adoption. While there exist other mechanisms achieving such guarantees, they either impose additional assumptions on the response distribution that are not generally satisfied for objective evaluations or they incentivize truthful behavior only if each agent performs a prohibitively large number of evaluations and commits to using the same strategy for each evaluation. SRA is the first known incentive mechanism satisfying such guarantees without imposing any such requirements. Moreover, our empirical findings demonstrate the robustness of the incentive properties of SRA in the presence of mild subjectivity or observational biases in the responses. These properties make SRA uniquely attractive for administering reward-based incentive schemes (e.g., rebates, discounts, reputation scores, etc.) on online platforms.

preprint2020arXiv

Toward the Fundamental Limits of Imitation Learning

Imitation learning (IL) aims to mimic the behavior of an expert policy in a sequential decision-making problem given only demonstrations. In this paper, we focus on understanding the minimax statistical limits of IL in episodic Markov Decision Processes (MDPs). We first consider the setting where the learner is provided a dataset of $N$ expert trajectories ahead of time, and cannot interact with the MDP. Here, we show that the policy which mimics the expert whenever possible is in expectation $\lesssim \frac{|\mathcal{S}| H^2 \log (N)}{N}$ suboptimal compared to the value of the expert, even when the expert follows an arbitrary stochastic policy. Here $\mathcal{S}$ is the state space, and $H$ is the length of the episode. Furthermore, we establish a suboptimality lower bound of $\gtrsim |\mathcal{S}| H^2 / N$ which applies even if the expert is constrained to be deterministic, or if the learner is allowed to actively query the expert at visited states while interacting with the MDP for $N$ episodes. To our knowledge, this is the first algorithm with suboptimality having no dependence on the number of actions, under no additional assumptions. We then propose a novel algorithm based on minimum-distance functionals in the setting where the transition model is given and the expert is deterministic. The algorithm is suboptimal by $\lesssim \min \{ H \sqrt{|\mathcal{S}| / N} ,\ |\mathcal{S}| H^{3/2} / N \}$, showing that knowledge of transition improves the minimax rate by at least a $\sqrt{H}$ factor.