Researcher profile

Kannan Ramachandran

Kannan Ramachandran contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

The Square Root Agreement Rule for Incentivizing Truthful Feedback on Online Platforms

A major challenge in obtaining evaluations of products or services on e-commerce platforms is eliciting informative responses in the absence of verifiability. This paper proposes the Square Root Agreement Rule (SRA): a simple reward mechanism that incentivizes truthful responses to objective evaluations on such platforms. In this mechanism, an agent gets a reward for an evaluation only if her answer matches that of her peer, where this reward is inversely proportional to a popularity index of the answer. This index is defined to be the square root of the empirical frequency at which any two agents performing the same evaluation agree on the particular answer across evaluations of similar entities operating on the platform. Rarely agreed-upon answers thus earn a higher reward than answers for which agreements are relatively more common. We show that in the many tasks regime, the truthful equilibrium under SRA is strictly payoff-dominant across large classes of natural equilibria that could arise in these settings, thus increasing the likelihood of its adoption. While there exist other mechanisms achieving such guarantees, they either impose additional assumptions on the response distribution that are not generally satisfied for objective evaluations or they incentivize truthful behavior only if each agent performs a prohibitively large number of evaluations and commits to using the same strategy for each evaluation. SRA is the first known incentive mechanism satisfying such guarantees without imposing any such requirements. Moreover, our empirical findings demonstrate the robustness of the incentive properties of SRA in the presence of mild subjectivity or observational biases in the responses. These properties make SRA uniquely attractive for administering reward-based incentive schemes (e.g., rebates, discounts, reputation scores, etc.) on online platforms.

preprint2020arXiv

Toward the Fundamental Limits of Imitation Learning

Imitation learning (IL) aims to mimic the behavior of an expert policy in a sequential decision-making problem given only demonstrations. In this paper, we focus on understanding the minimax statistical limits of IL in episodic Markov Decision Processes (MDPs). We first consider the setting where the learner is provided a dataset of $N$ expert trajectories ahead of time, and cannot interact with the MDP. Here, we show that the policy which mimics the expert whenever possible is in expectation $\lesssim \frac{|\mathcal{S}| H^2 \log (N)}{N}$ suboptimal compared to the value of the expert, even when the expert follows an arbitrary stochastic policy. Here $\mathcal{S}$ is the state space, and $H$ is the length of the episode. Furthermore, we establish a suboptimality lower bound of $\gtrsim |\mathcal{S}| H^2 / N$ which applies even if the expert is constrained to be deterministic, or if the learner is allowed to actively query the expert at visited states while interacting with the MDP for $N$ episodes. To our knowledge, this is the first algorithm with suboptimality having no dependence on the number of actions, under no additional assumptions. We then propose a novel algorithm based on minimum-distance functionals in the setting where the transition model is given and the expert is deterministic. The algorithm is suboptimal by $\lesssim \min \{ H \sqrt{|\mathcal{S}| / N} ,\ |\mathcal{S}| H^{3/2} / N \}$, showing that knowledge of transition improves the minimax rate by at least a $\sqrt{H}$ factor.