Source author record

Aleksandrs Slivkins

Aleksandrs Slivkins appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning Computer Science and Game Theory Social and Information Networks physics.soc-ph Artificial Intelligence cs.CY econ.TH Human-Computer Interaction

Catalog footprint

What is connected

25works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Budget Pacing in Repeated Auctions: Regret and Efficiency without Convergence

We study the aggregate welfare and individual regret guarantees of dynamic \emph{pacing algorithms} in the context of repeated auctions with budgets. Such algorithms are commonly used as bidding agents in Internet advertising platforms, adaptively learning to shade bids by a tunable linear multiplier in order to match a specified budget. We show that when agents simultaneously apply a natural form of gradient-based pacing, the liquid welfare obtained over the course of the learning dynamics is at least half the optimal expected liquid welfare obtainable by any allocation rule. Crucially, this result holds \emph{without requiring convergence of the dynamics}, allowing us to circumvent known complexity-theoretic obstacles of finding equilibria. This result is also robust to the correlation structure between agent valuations and holds for any \emph{core auction}, a broad class of auctions that includes first-price, second-price, and generalized second-price auctions as special cases. For individual guarantees, we further show such pacing algorithms enjoy \emph{dynamic regret} bounds for individual utility- and value-maximization, with respect to the sequence of budget-pacing bids, for any auction satisfying a monotone bang-for-buck property. To complement our theoretical findings, we provide semi-synthetic numerical simulations based on auction data from the Bing Advertising platform.

preprint2026arXiv

Exploration and Incentivizing Participation in Randomized Trials

Participation incentives is a well-known issue inhibiting randomized controlled trials (RCTs) in medicine, as well as a potential cause of user dissatisfaction for RCTs in online platforms. We frame this issue as a non-standard exploration-exploitation tradeoff: an RCT would like to explore as uniformly as possible, whereas each "agent" (a patient or a user) prefers "exploitation", i.e., treatments that seem best. We incentivize participation by leveraging information asymmetry between the trial and the agents. We measure statistical performance via worst-case estimation error under adversarially generated outcomes, a standard objective for RCTs. We obtain a near-optimal solution in terms of this objective: an incentive-compatible mechanism with a particular guarantee, and a nearly matching impossibility result for any incentive-compatible mechanism. We consider three model variants: homogeneous agents (of the same "type" comprising beliefs and preferences), heterogeneous agents, and an extension that leverages estimated type frequencies to mitigate the influence of rare-but-difficult agent types.

preprint2022arXiv

Incentivizing Combinatorial Bandit Exploration

Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system. The users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations. While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users. All published work on this problem, known as incentivized exploration, focuses on small, unstructured action sets and mainly targets the case when the users' beliefs are independent across actions. However, realistic exploration problems often feature large, structured action sets and highly correlated beliefs. We focus on a paradigmatic exploration problem with structure: combinatorial semi-bandits. We prove that Thompson Sampling, when applied to combinatorial semi-bandits, is incentive-compatible when initialized with a sufficient number of samples of each arm (where this number is determined in advance by the Bayesian prior). Moreover, we design incentive-compatible algorithms for collecting the initial samples.

preprint2022arXiv

The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents, and the algorithm can only issue recommendations. The algorithm controls the flow of information, and the information asymmetry can incentivize the agents to explore. Prior work achieves optimal regret rates up to multiplicative factors that become arbitrarily large depending on the Bayesian priors, and scale exponentially in the number of arms. A more basic problem of sampling each arm once runs into similar factors. We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility. We prove that Thompson Sampling, a standard bandit algorithm, is incentive-compatible if initialized with sufficiently many data points. The performance loss due to incentives is therefore limited to the initial rounds when these data points are collected. The problem is largely reduced to that of sample complexity: how many rounds are needed? We address this question, providing matching upper and lower bounds and instantiating them in various corollaries. Typically, the optimal sample complexity is polynomial in the number of arms and exponential in the "strength of beliefs".

preprint2022arXiv

Truthful Online Scheduling of Cloud Workloads under Uncertainty

Cloud computing customers often submit repeating jobs and computation pipelines on \emph{approximately} regular schedules, with arrival and running times that exhibit variance. This pattern, typical of training tasks in machine learning, allows customers to partially predict future job requirements. We develop a model of cloud computing platforms that receive statements of work (SoWs) in an online fashion. The SoWs describe future jobs whose arrival times and durations are probabilistic, and whose utility to the submitting agents declines with completion time. The arrival and duration distributions, as well as the utility functions, are considered private customer information and are reported by strategic agents to a scheduler that is optimizing for social welfare. We design pricing, scheduling, and eviction mechanisms that incentivize truthful reporting of SoWs. An important challenge is maintaining incentives despite the possibility of the platform becoming saturated. We introduce a framework to reduce scheduling under uncertainty to a relaxed scheduling problem without uncertainty. Using this framework, we tackle both adversarial and stochastic submissions of statements of work, and obtain logarithmic and constant competitive mechanisms, respectively.

preprint2020arXiv

Beating Greedy For Approximating Reserve Prices in Multi-Unit VCG Auctions

We study the problem of finding personalized reserve prices for unit-demand buyers in multi-unit eager VCG auctions with correlated buyers. The input to this problem is a dataset of submitted bids of $n$ buyers in a set of auctions. The goal is to find a vector of reserve prices, one for each buyer, that maximizes the total revenue across all auctions. Roughgarden and Wang (2016) showed that this problem is APX-hard but admits a greedy $\frac{1}{2}$-approximation algorithm. Later, Derakhshan, Golrezai, and Paes Leme (2019) gave an LP-based algorithm achieving a $0.68$-approximation for the (important) special case of the problem with a single-item, thereby beating greedy. We show in this paper that the algorithm of Derakhshan et al. in fact does not beat greedy for the general multi-item problem. This raises the question of whether or not the general problem admits a better-than-$\frac{1}{2}$ approximation. In this paper, we answer this question in the affirmative and provide a polynomial-time algorithm with a significantly better approximation-factor of $0.63$. Our solution is based on a novel linear programming formulation, for which we propose two different rounding schemes. We prove that the best of these two and the no-reserve case (all-zero vector) is a $0.63$-approximation.

preprint2020arXiv

Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

We study contextual bandit learning with an abstract policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent "zooming" behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information.

preprint2016arXiv

How Many Workers to Ask? Adaptive Exploration for Collecting High Quality Labels

Crowdsourcing has been part of the IR toolbox as a cheap and fast mechanism to obtain labels for system development and evaluation. Successful deployment of crowdsourcing at scale involves adjusting many variables, a very important one being the number of workers needed per human intelligence task (HIT). We consider the crowdsourcing task of learning the answer to simple multiple-choice HITs, which are representative of many relevance experiments. In order to provide statistically significant results, one often needs to ask multiple workers to answer the same HIT. A stopping rule is an algorithm that, given a HIT, decides for any given set of worker answers if the system should stop and output an answer or iterate and ask one more worker. Knowing the historic performance of a worker in the form of a quality score can be beneficial in such a scenario. In this paper we investigate how to devise better stopping rules given such quality scores. We also suggest adaptive exploration as a promising approach for scalable and automatic creation of ground truth. We conduct a data analysis on an industrial crowdsourcing platform, and use the observations from this analysis to design new stopping rules that use the workers' quality scores in a non-trivial manner. We then perform a simulation based on a real-world workload, showing that our algorithm performs better than the more naive approaches.

preprint2015arXiv

Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems

Crowdsourcing markets have emerged as a popular platform for matching available workers with tasks to complete. The payment for a particular task is typically set by the task's requester, and may be adjusted based on the quality of the completed work, for example, through the use of "bonus" payments. In this paper, we study the requester's problem of dynamically adjusting quality-contingent payments for tasks. We consider a multi-round version of the well-known principal-agent model, whereby in each round a worker makes a strategic choice of the effort level which is not directly observable by the requester. In particular, our formulation significantly generalizes the budget-free online task pricing problems studied in prior work. We treat this problem as a multi-armed bandit problem, with each "arm" representing a potential contract. To cope with the large (and in fact, infinite) number of arms, we propose a new algorithm, AgnosticZooming, which discretizes the contract space into a finite number of regions, effectively treating each region as a single arm. This discretization is adaptively refined, so that more promising regions of the contract space are eventually discretized more finely. We analyze this algorithm, showing that it achieves regret sublinear in the time horizon and substantially improves over non-adaptive discretization (which is the only competing approach in the literature). Our results advance the state of art on several different topics: the theory of crowdsourcing markets, principal-agent problems, multi-armed bandits, and dynamic pricing.

preprint2015arXiv

Contextual Dueling Bandits

We consider the problem of learning to choose actions using contextual information when provided with limited feedback in the form of relative pairwise comparisons. We study this problem in the dueling-bandits framework of Yue et al. (2009), which we extend to incorporate context. Roughly, the learner's goal is to find the best policy, or way of behaving, in some space of policies, although "best" is not always so clearly defined. Here, we propose a new and natural solution concept, rooted in game theory, called a von Neumann winner, a randomized policy that beats or ties every other policy. We show that this notion overcomes important limitations of existing solutions, particularly the Condorcet winner which has typically been used in the past, but which requires strong and often unrealistic assumptions. We then present three efficient algorithms for online learning in our setting, and for approximating a von Neumann winner from batch-like data. The first of these algorithms achieves particularly low regret, even when data is adversarial, although its time and space requirements are linear in the size of the policy space. The other two algorithms require time and space only logarithmic in the size of the policy space when provided access to an oracle for solving classification problems on the space.

preprint2015arXiv

Incentivizing High Quality Crowdwork

We study the causal effects of financial incentives on the quality of crowdwork. We focus on performance-based payments (PBPs), bonus payments awarded to workers for producing high quality work. We design and run randomized behavioral experiments on the popular crowdsourcing platform Amazon Mechanical Turk with the goal of understanding when, where, and why PBPs help, identifying properties of the payment, payment structure, and the task itself that make them most effective. We provide examples of tasks for which PBPs do improve quality. For such tasks, the effectiveness of PBPs is not too sensitive to the threshold for quality required to receive the bonus, while the magnitude of the bonus must be large enough to make the reward salient. We also present examples of tasks for which PBPs do not improve quality. Our results suggest that for PBPs to improve quality, the task must be effort-responsive: the task must allow workers to produce higher quality work by exerting more effort. We also give a simple method to determine if a task is effort-responsive a priori. Furthermore, our experiments suggest that all payments on Mechanical Turk are, to some degree, implicitly performance-based in that workers believe their work may be rejected if their performance is sufficiently poor. Finally, we propose a new model of worker behavior that extends the standard principal-agent model from economics to include a worker's subjective beliefs about his likelihood of being paid, and show that the predictions of this model are in line with our experimental findings. This model may be useful as a foundation for theoretical studies of incentives in crowdsourcing markets.

preprint2015arXiv

Resourceful Contextual Bandits

We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items. We design the first algorithm for solving these problems that handles constrained resources other than time, and improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits (arbitrary policy sets, e.g. Dudik et al. (UAI'11)) and bandits with resource constraints (bandits with knapsacks, Badanidiyuru et al. (FOCS'13)), and prove a regret guarantee with near-optimal statistical properties.

preprint2015arXiv

Selection and Influence in Cultural Dynamics

One of the fundamental principles driving diversity or homogeneity in domains such as cultural differentiation, political affiliation, and product adoption is the tension between two forces: influence (the tendency of people to become similar to others they interact with) and selection (the tendency to be affected most by the behavior of others who are already similar). Influence tends to promote homogeneity within a society, while selection frequently causes fragmentation. When both forces act simultaneously, it becomes an interesting question to analyze which societal outcomes should be expected. To study this issue more formally, we analyze a natural stylized model built upon active lines of work in political opinion formation, cultural diversity, and language evolution. We assume that the population is partitioned into "types" according to some traits (such as language spoken or political affiliation). While all types of people interact with one another, only people with sufficiently similar types can possibly influence one another. The "similarity" is captured by a graph on types in which individuals of the same or adjacent types can influence one another. We achieve an essentially complete characterization of (stable) equilibrium outcomes and prove convergence from all starting states. We also consider generalizations of this model.

preprint2015arXiv

Truthful Mechanisms with Implicit Payment Computation

It is widely believed that computing payments needed to induce truthful bidding is somehow harder than simply computing the allocation. We show that the opposite is true: creating a randomized truthful mechanism is essentially as easy as a single call to a monotone allocation rule. Our main result is a general procedure to take a monotone allocation rule for a single-parameter domain and transform it (via a black-box reduction) into a randomized mechanism that is truthful in expectation and individually rational for every realization. The mechanism implements the same outcome as the original allocation rule with probability arbitrarily close to 1, and requires evaluating that allocation rule only once. We also provide an extension of this result to multi-parameter domains and cycle-monotone allocation rules, under mild star-convexity and non-negativity hypotheses on the type space and allocation rule, respectively. Because our reduction is simple, versatile, and general, it has many applications to mechanism design problems in which re-evaluating the allocation rule is either burdensome or informationally impossible. Applying our result to the multi-armed bandit problem, we obtain truthful randomized mechanisms whose regret matches the information-theoretic lower bound up to logarithmic factors, even though prior work showed this is impossible for truthful deterministic mechanisms. We also present applications to offline mechanism design, showing that randomization can circumvent a communication complexity lower bound for deterministic payments computation, and that it can also be used to create truthful shortest path auctions that approximate the welfare of the VCG allocation arbitrarily well, while having the same running time complexity as Dijkstra's algorithm.

preprint2014arXiv

Contextual Bandits with Similarity Information

In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time-invariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now well-understood, a lot of recent work has focused on MAB problems with exponentially or infinitely large strategy sets, where one needs to assume extra structure in order to make the problem tractable. In particular, recent literature considered information on similarity between arms. We consider similarity information in the setting of "contextual bandits", a natural extension of the basic MAB problem where before each round an algorithm is given the "context" -- a hint about the payoffs in this round. Contextual bandits are directly motivated by placing advertisements on webpages, one of the crucial problems in sponsored search. A particularly simple way to represent similarity information in the contextual bandit setting is via a "similarity distance" between the context-arm pairs which gives an upper bound on the difference between the respective expected payoffs. Prior work on contextual bandits with similarity uses "uniform" partitions of the similarity space, which is potentially wasteful. We design more efficient algorithms that are based on adaptive partitions adjusted to "popular" context and "high-payoff" arms.

preprint2014arXiv

Low-distortion Inference of Latent Similarities from a Multiplex Social Network

Much of social network analysis is - implicitly or explicitly - predicated on the assumption that individuals tend to be more similar to their friends than to strangers. Thus, an observed social network provides a noisy signal about the latent underlying "social space:" the way in which individuals are similar or dissimilar. Many research questions frequently addressed via social network analysis are in reality questions about this social space, raising the question of inverting the process: Given a social network, how accurately can we reconstruct the social structure of similarities and dissimilarities? We begin to address this problem formally. Observed social networks are usually multiplex, in the sense that they reflect (dis)similarities in several different "categories," such as geographical proximity, kinship, or similarity of professions/hobbies. We assume that each such category is characterized by a latent metric capturing (dis)similarities in this category. Each category gives rise to a separate social network: a random graph parameterized by this metric. For a concrete model, we consider Kleinberg's small world model and some variations thereof. The observed social network is the unlabeled union of these graphs, i.e., the presence or absence of edges can be observed, but not their origins. Our main result is an algorithm which reconstructs each metric with provably low distortion.

preprint2013arXiv

Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem

Very recently crowdsourcing has become the de facto platform for distributing and collecting human computation for a wide range of tasks and applications such as information retrieval, natural language processing and machine learning. Current crowdsourcing platforms have some limitations in the area of quality control. Most of the effort to ensure good quality has to be done by the experimenter who has to manage the number of workers needed to reach good results. We propose a simple model for adaptive quality control in crowdsourced multiple-choice tasks which we call the \emph{bandit survey problem}. This model is related to, but technically different from the well-known multi-armed bandit problem. We present several algorithms for this problem, and support them with analysis and simulations. Our approach is based in our experience conducting relevance evaluation for a large commercial search engine.

preprint2013arXiv

Characterizing Truthful Multi-Armed Bandit Mechanisms

We consider a multi-round auction setting motivated by pay-per-click auctions for Internet advertising. In each round the auctioneer selects an advertiser and shows her ad, which is then either clicked or not. An advertiser derives value from clicks; the value of a click is her private information. Initially, neither the auctioneer nor the advertisers have any information about the likelihood of clicks on the advertisements. The auctioneer's goal is to design a (dominant strategies) truthful mechanism that (approximately) maximizes the social welfare. If the advertisers bid their true private values, our problem is equivalent to the "multi-armed bandit problem", and thus can be viewed as a strategic version of the latter. In particular, for both problems the quality of an algorithm can be characterized by "regret", the difference in social welfare between the algorithm and the benchmark which always selects the same "best" advertisement. We investigate how the design of multi-armed bandit algorithms is affected by the restriction that the resulting mechanism must be truthful. We find that truthful mechanisms have certain strong structural properties -- essentially, they must separate exploration from exploitation -- and they incur much higher regret than the optimal multi-armed bandit algorithms. Moreover, we provide a truthful mechanism which (essentially) matches our lower bound on regret.

preprint2013arXiv

Dynamic Ad Allocation: Bandits with Budgets

We consider an application of multi-armed bandits to internet advertising (specifically, to dynamic ad allocation in the pay-per-click model, with uncertainty on the click probabilities). We focus on an important practical issue that advertisers are constrained in how much money they can spend on their ad campaigns. This issue has not been considered in the prior work on bandit-based approaches for ad allocation, to the best of our knowledge. We define a simple, stylized model where an algorithm picks one ad to display in each round, and each ad has a \emph{budget}: the maximal amount of money that can be spent on this ad. This model admits a natural variant of UCB1, a well-known algorithm for multi-armed bandits with stochastic rewards. We derive strong provable guarantees for this algorithm.

preprint2013arXiv

Dynamic Pricing with Limited Supply

We consider the problem of dynamic pricing with limited supply. A seller has $k$ identical items for sale and is facing $n$ potential buyers ("agents") that are arriving sequentially. Each agent is interested in buying one item. Each agent's value for an item is an IID sample from some fixed distribution with support $[0,1]$. The seller offers a take-it-or-leave-it price to each arriving agent (possibly different for different agents), and aims to maximize his expected revenue. We focus on "prior-independent" mechanisms -- ones that do not use any information about the distribution. They are desirable because knowing the distribution is unrealistic in many practical scenarios. We study how the revenue of such mechanisms compares to the revenue of the optimal offline mechanism that knows the distribution ("offline benchmark"). We present a prior-independent dynamic pricing mechanism whose revenue is at most $O((k \log n)^{2/3})$ less than the offline benchmark, for every distribution that is regular. In fact, this guarantee holds without *any* assumptions if the benchmark is relaxed to fixed-price mechanisms. Further, we prove a matching lower bound. The performance guarantee for the same mechanism can be improved to $O(\sqrt{k} \log n)$, with a distribution-dependent constant, if $k/n$ is sufficiently small. We show that, in the worst case over all demand distributions, this is essentially the best rate that can be obtained with a distribution-specific constant. On a technical level, we exploit the connection to multi-armed bandits (MAB). While dynamic pricing with unlimited supply can easily be seen as an MAB problem, the intuition behind MAB approaches breaks when applied to the setting with limited supply. Our high-level conceptual contribution is that even the limited supply setting can be fruitfully treated as a bandit problem.

preprint2013arXiv

Multi-parameter Mechanisms with Implicit Payment Computation

In this paper we show that payment computation essentially does not present any obstacle in designing truthful mechanisms, even for multi-parameter domains, and even when we can only call the allocation rule once. We present a general reduction that takes any allocation rule which satisfies "cyclic monotonicity" (a known necessary and sufficient condition for truthfulness) and converts it to a truthful mechanism using a single call to the allocation rule, with arbitrarily small loss to the expected social welfare. A prominent example for a multi-parameter setting in which an allocation rule can only be called once arises in sponsored search auctions. These are multi-parameter domains when each advertiser has multiple possible ads he may display, each with a different value per click. Moreover, the mechanism typically does not have complete knowledge of the click-realization or the click-through rates (CTRs); it can only call the allocation rule a single time and observe the click information for ads that were presented. % are not known. On the negative side, we show that an allocation that is truthful for any realization essentially cannot depend on the bids, and hence cannot do better than random selection for one agent. We then consider a relaxed requirement of truthfulness, only in expectation over the CTRs. Even for that relaxed version, making any progress is challenging as standard techniques for construction of truthful mechanisms (as using VCG or an MIDR allocation rule) cannot be used in this setting. We design an allocation rule with non-trivial performance and directly prove it is cyclic-monotone, and thus it can be used to create a truthful mechanism using our general reduction.

preprint2013arXiv

Online Decision Making in Crowdsourcing Markets: Theoretical Challenges (Position Paper)

Over the past decade, crowdsourcing has emerged as a cheap and efficient method of obtaining solutions to simple tasks that are difficult for computers to solve but possible for humans. The popularity and promise of crowdsourcing markets has led to both empirical and theoretical research on the design of algorithms to optimize various aspects of these markets, such as the pricing and assignment of tasks. Much of the existing theoretical work on crowdsourcing markets has focused on problems that fall into the broad category of online decision making; task requesters or the crowdsourcing platform itself make repeated decisions about prices to set, workers to filter out, problems to assign to specific workers, or other things. Often these decisions are complex, requiring algorithms that learn about the distribution of available tasks or workers over time and take into account the strategic (or sometimes irrational) behavior of workers. As human computation grows into its own field, the time is ripe to address these challenges in a principled way. However, it appears very difficult to capture all pertinent aspects of crowdsourcing markets in a single coherent model. In this paper, we reflect on the modeling issues that inhibit theoretical research on online decision making for crowdsourcing, and identify some steps forward. This paper grew out of the authors' own frustration with these issues, and we hope it will encourage the community to attempt to understand, debate, and ultimately address them. The authors welcome feedback for future revisions of this paper.

preprint2012arXiv

Ranked bandits in metric spaces: learning optimally diverse rankings over large document collections

Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The few approaches that avoid this have rather unsatisfyingly lacked theoretical foundations, or do not scale. We present a learning-to-rank formulation that optimizes the fraction of satisfied users, with several scalable algorithms that explicitly takes document similarity and ranking context into account. Our formulation is a non-trivial common generalization of two multi-armed bandit models from the literature: "ranked bandits" (Radlinski et al., ICML 2008) and "Lipschitz bandits" (Kleinberg et al., STOC 2008). We present theoretical justifications for this approach, as well as a near-optimal algorithm. Our evaluation adds optimizations that improve empirical performance, and shows that our algorithms learn orders of magnitude more quickly than previous approaches.

preprint2012arXiv

The best of both worlds: stochastic and adversarial bandits

We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal), whose regret is, essentially, optimal both for adversarial rewards and for stochastic rewards. Specifically, SAO combines the square-root worst-case regret of Exp3 (Auer et al., SIAM J. on Computing 2002) and the (poly)logarithmic regret of UCB1 (Auer et al., Machine Learning 2002) for stochastic rewards. Adversarial rewards and stochastic rewards are the two main settings in the literature on (non-Bayesian) multi-armed bandits. Prior work on multi-armed bandits treats them separately, and does not attempt to jointly optimize for both. Our result falls into a general theme of achieving good worst-case performance while also taking advantage of "nice" problem instances, an important issue in the design of algorithms with partially known inputs.

preprint2010arXiv

Adapting to the Shifting Intent of Search Queries

Search engines today present results that are often oblivious to abrupt shifts in intent. For example, the query `independence day' usually refers to a US holiday, but the intent of this query abruptly changed during the release of a major film by that name. While no studies exactly quantify the magnitude of intent-shifting traffic, studies suggest that news events, seasonal topics, pop culture, etc account for 50% of all search queries. This paper shows that the signals a search engine receives can be used to both determine that a shift in intent has happened, as well as find a result that is now more relevant. We present a meta-algorithm that marries a classifier with a bandit algorithm to achieve regret that depends logarithmically on the number of query impressions, under certain assumptions. We provide strong evidence that this regret is close to the best achievable. Finally, via a series of experiments, we demonstrate that our algorithm outperforms prior approaches, particularly as the amount of intent-shifting traffic increases.

Aleksandrs Slivkins

What is connected

Connect this record

See the researcher in context

Building this map preview

25 published item(s)

Budget Pacing in Repeated Auctions: Regret and Efficiency without Convergence

Exploration and Incentivizing Participation in Randomized Trials

Incentivizing Combinatorial Bandit Exploration

The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

Truthful Online Scheduling of Cloud Workloads under Uncertainty

Beating Greedy For Approximating Reserve Prices in Multi-Unit VCG Auctions

Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

How Many Workers to Ask? Adaptive Exploration for Collecting High Quality Labels

Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems

Contextual Dueling Bandits

Incentivizing High Quality Crowdwork

Resourceful Contextual Bandits

Selection and Influence in Cultural Dynamics

Truthful Mechanisms with Implicit Payment Computation

Contextual Bandits with Similarity Information

Low-distortion Inference of Latent Similarities from a Multiplex Social Network

Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem

Characterizing Truthful Multi-Armed Bandit Mechanisms

Dynamic Ad Allocation: Bandits with Budgets

Dynamic Pricing with Limited Supply

Multi-parameter Mechanisms with Implicit Payment Computation

Online Decision Making in Crowdsourcing Markets: Theoretical Challenges (Position Paper)

Ranked bandits in metric spaces: learning optimally diverse rankings over large document collections

The best of both worlds: stochastic and adversarial bandits

Adapting to the Shifting Intent of Search Queries