Researcher profile

Priyank Agrawal

Priyank Agrawal contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

Learning-Augmented Mechanism Design: Leveraging Predictions for Facility Location

In this work we introduce an alternative model for the design and analysis of strategyproof mechanisms that is motivated by the recent surge of work in "learning-augmented algorithms". Aiming to complement the traditional approach in computer science, which analyzes the performance of algorithms based on worst-case instances, this line of work has focused on the design and analysis of algorithms that are enhanced with machine-learned predictions regarding the optimal solution. The algorithms can use the predictions as a guide to inform their decisions, and the goal is to achieve much stronger performance guarantees when these predictions are accurate (consistency), while also maintaining near-optimal worst-case guarantees, even if these predictions are very inaccurate (robustness). So far, these results have been limited to algorithms, but in this work we argue that another fertile ground for this framework is in mechanism design. We initiate the design and analysis of strategyproof mechanisms that are augmented with predictions regarding the private information of the participating agents. To exhibit the important benefits of this approach, we revisit the canonical problem of facility location with strategic agents in the two-dimensional Euclidean space. We study both the egalitarian and utilitarian social cost functions, and we propose new strategyproof mechanisms that leverage predictions to guarantee an optimal trade-off between consistency and robustness guarantees. This provides the designer with a menu of mechanism options to choose from, depending on her confidence regarding the prediction accuracy. Furthermore, we also prove parameterized approximation results as a function of the prediction error, showing that our mechanisms perform well even when the predictions are not fully accurate.

preprint2020arXiv

Bandits with Temporal Stochastic Constraints

We study the effect of impairment on stochastic multi-armed bandits and develop new ways to mitigate it. Impairment effect is the phenomena where an agent only accrues reward for an action if they have played it at least a few times in the recent past. It is practically motivated by repetition and recency effects in domains such as advertising (here consumer behavior may require repeat actions by advertisers) and vocational training (here actions are complex skills that can only be mastered with repetition to get a payoff). Impairment can be naturally modelled as a temporal constraint on the strategy space, and we provide two novel algorithms that achieve sublinear regret, each working with different assumptions on the impairment effect. We introduce a new notion called bucketing in our algorithm design, and show how it can effectively address impairment as well as a broader class of temporal constraints. Our regret bounds explicitly capture the cost of impairment and show that it scales (sub-)linearly with the degree of impairment. Our work complements recent work on modeling delays and corruptions, and we provide experimental evidence supporting our claims.

preprint2020arXiv

Incentivising Exploration and Recommendations for Contextual Bandits with Payments

We propose a contextual bandit based model to capture the learning and social welfare goals of a web platform in the presence of myopic users. By using payments to incentivize these agents to explore different items/recommendations, we show how the platform can learn the inherent attributes of items and achieve a sublinear regret while maximizing cumulative social welfare. We also calculate theoretical bounds on the cumulative costs of incentivization to the platform. Unlike previous works in this domain, we consider contexts to be completely adversarial, and the behavior of the adversary is unknown to the platform. Our approach can improve various engagement metrics of users on e-commerce stores, recommendation engines and matching platforms.

preprint2020arXiv

Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect

We study the effect of persistence of engagement on learning in a stochastic multi-armed bandit setting. In advertising and recommendation systems, repetition effect includes a wear-in period, where the user's propensity to reward the platform via a click or purchase depends on how frequently they see the recommendation in the recent past. It also includes a counteracting wear-out period, where the user's propensity to respond positively is dampened if the recommendation was shown too many times recently. Priming effect can be naturally modelled as a temporal constraint on the strategy space, since the reward for the current action depends on historical actions taken by the platform. We provide novel algorithms that achieves sublinear regret in time and the relevant wear-in/wear-out parameters. The effect of priming on the regret upper bound is also additive, and we get back a guarantee that matches popular algorithms such as the UCB1 and Thompson sampling when there is no priming effect. Our work complements recent work on modeling time varying rewards, delays and corruptions in bandits, and extends the usage of rich behavior models in sequential decision making settings.