Researcher profile

Yingkai Li

Yingkai Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

Budget Pacing in Repeated Auctions: Regret and Efficiency without Convergence

We study the aggregate welfare and individual regret guarantees of dynamic \emph{pacing algorithms} in the context of repeated auctions with budgets. Such algorithms are commonly used as bidding agents in Internet advertising platforms, adaptively learning to shade bids by a tunable linear multiplier in order to match a specified budget. We show that when agents simultaneously apply a natural form of gradient-based pacing, the liquid welfare obtained over the course of the learning dynamics is at least half the optimal expected liquid welfare obtainable by any allocation rule. Crucially, this result holds \emph{without requiring convergence of the dynamics}, allowing us to circumvent known complexity-theoretic obstacles of finding equilibria. This result is also robust to the correlation structure between agent valuations and holds for any \emph{core auction}, a broad class of auctions that includes first-price, second-price, and generalized second-price auctions as special cases. For individual guarantees, we further show such pacing algorithms enjoy \emph{dynamic regret} bounds for individual utility- and value-maximization, with respect to the sequence of budget-pacing bids, for any auction satisfying a monotone bang-for-buck property. To complement our theoretical findings, we provide semi-synthetic numerical simulations based on auction data from the Bing Advertising platform.

preprint2026arXiv

Exploration and Incentivizing Participation in Randomized Trials

Participation incentives is a well-known issue inhibiting randomized controlled trials (RCTs) in medicine, as well as a potential cause of user dissatisfaction for RCTs in online platforms. We frame this issue as a non-standard exploration-exploitation tradeoff: an RCT would like to explore as uniformly as possible, whereas each "agent" (a patient or a user) prefers "exploitation", i.e., treatments that seem best. We incentivize participation by leveraging information asymmetry between the trial and the agents. We measure statistical performance via worst-case estimation error under adversarially generated outcomes, a standard objective for RCTs. We obtain a near-optimal solution in terms of this objective: an incentive-compatible mechanism with a particular guarantee, and a nearly matching impossibility result for any incentive-compatible mechanism. We consider three model variants: homogeneous agents (of the same "type" comprising beliefs and preferences), heterogeneous agents, and an extension that leverages estimated type frequencies to mitigate the influence of rare-but-difficult agent types.

preprint2021arXiv

Equilibrium Behaviors in Repeated Games

We examine a patient player's behavior when he can build reputations in front of a sequence of myopic opponents. With positive probability, the patient player is a commitment type who plays his Stackelberg action in every period. We characterize the patient player's action frequencies in equilibrium. Our results clarify the extent to which reputations can refine the patient player's behavior and provide new insights to entry deterrence, business transactions, and capital taxation. Our proof makes a methodological contribution by establishing a new concentration inequality.

preprint2021arXiv

Revelation Gap for Pricing from Samples

This paper considers prior-independent mechanism design, in which a single mechanism is designed to achieve approximately optimal performance on every prior distribution from a given class. Most results in this literature focus on mechanisms with truthtelling equilibria, a.k.a., truthful mechanisms. Feng and Hartline (2018) introduce the revelation gap to quantify the loss of the restriction to truthful mechanisms. We solve a main open question left in Feng and Hartline (2018); namely, we identify a non-trivial revelation gap for revenue maximization. Our analysis focuses on the canonical problem of selling a single item to a single agent with only access to a single sample from the agent's valuation distribution. We identify the sample-bid mechanism (a simple non-truthful mechanism) and upper-bound its prior-independent approximation ratio by 1.835 (resp. 1.296) for regular (resp. MHR) distributions. We further prove that no truthful mechanism can achieve prior-independent approximation ratio better than 1.957 (resp. 1.543) for regular (resp. MHR) distributions. Thus, a non-trivial revelation gap is shown as the sample-bid mechanism outperforms the optimal prior-independent truthful mechanism. On the hardness side, we prove that no (possibly non-truthful) mechanism can achieve prior-independent approximation ratio better than 1.073 even for uniform distributions.

preprint2021arXiv

Tight Regret Bounds for Infinite-armed Linear Contextual Bandits

Linear contextual bandit is an important class of sequential decision making problems with a wide range of applications to recommender systems, online advertising, healthcare, and many other machine learning related tasks. While there is a lot of prior research, tight regret bounds of linear contextual bandit with infinite action sets remain open. In this paper, we address this open problem by considering the linear contextual bandit with (changing) infinite action sets. We prove a regret upper bound on the order of $O(\sqrt{d^2T\log T})\times \text{poly}(\log\log T)$ where $d$ is the domain dimension and $T$ is the time horizon. Our upper bound matches the previous lower bound of $Ω(\sqrt{d^2 T\log T})$ in [Li et al., 2019] up to iterated logarithmic terms.

preprint2020arXiv

Bayesian Auctions with Efficient Queries

Generating good revenue is one of the most important problems in Bayesian auction design, and many (approximately) optimal dominant-strategy incentive compatible (DSIC) Bayesian mechanisms have been constructed for various auction settings. However, most existing studies do not consider the complexity for the seller to carry out the mechanism. It is assumed that the seller knows "each single bit" of the distributions and is able to optimize perfectly based on the entire distributions. Unfortunately, this is a strong assumption and may not hold in reality: for example, when the value distributions have exponentially large supports or do not have succinct representations. In this work we consider, for the first time, the query complexity of Bayesian mechanisms. We only allow the seller to have limited oracle accesses to the players' value distributions, via quantile queries and value queries. For a large class of auction settings, we prove logarithmic lower-bounds for the query complexity for any DSIC Bayesian mechanism to be of any constant approximation to the optimal revenue. For single-item auctions and multi-item auctions with unit-demand or additive valuation functions, we prove tight upper-bounds via efficient query schemes, without requiring the distributions to be regular or have monotone hazard rate. Thus, in those auction settings the seller needs to access much less than the full distributions in order to achieve approximately optimal revenue.

preprint2020arXiv

Benchmark Design and Prior-independent Optimization

This paper compares two leading approaches for robust optimization in the models of online algorithms and mechanism design. Competitive analysis compares the performance of an online algorithm to an offline benchmark in worst-case over inputs, and prior-independent mechanism design compares the expected performance of a mechanism on an unknown distribution (of inputs, i.e., agent values) to the optimal mechanism for the distribution in worst case over distributions. For competitive analysis, a critical concern is the choice of benchmark. This paper gives a method for selecting a good benchmark. We show that optimal algorithm/mechanism for the optimal benchmark are equal to the prior-independent optimal algorithm/mechanism. We solve a central open question in prior-independent mechanism design, namely we identify the prior-independent revenue-optimal mechanism for selling a single item to two agents with i.i.d. and regularly distributed values. Via this solution and the above equivalence of prior-independent mechanism design and competitive analysis (a.k.a. prior-free mechanism design) we show that the standard method for lower bounds of prior-free mechanisms is not generally tight for the benchmark design program.

preprint2020arXiv

Multinomial Logit Bandit with Low Switching Cost

We study multinomial logit bandit with limited adaptivity, where the algorithms change their exploration actions as infrequently as possible when achieving almost optimal minimax regret. We propose two measures of adaptivity: the assortment switching cost and the more fine-grained item switching cost. We present an anytime algorithm (AT-DUCB) with $O(N \log T)$ assortment switches, almost matching the lower bound $Ω(\frac{N \log T}{ \log \log T})$. In the fixed-horizon setting, our algorithm FH-DUCB incurs $O(N \log \log T)$ assortment switches, matching the asymptotic lower bound. We also present the ESUCB algorithm with item switching cost $O(N \log^2 T)$.

preprint2020arXiv

Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits

We study the linear contextual bandit problem with finite action sets. When the problem dimension is $d$, the time horizon is $T$, and there are $n \leq 2^{d/2}$ candidate actions per time period, we (1) show that the minimax expected regret is $Ω(\sqrt{dT (\log T) (\log n)})$ for every algorithm, and (2) introduce a Variable-Confidence-Level (VCL) SupLinUCB algorithm whose regret matches the lower bound up to iterated logarithmic factors. Our algorithmic result saves two $\sqrt{\log T}$ factors from previous analysis, and our information-theoretical lower bound also improves previous results by one $\sqrt{\log T}$ factor, revealing a regret scaling quite different from classical multi-armed bandits in which no logarithmic $T$ term is present in minimax regret. Our proof techniques include variable confidence levels and a careful analysis of layer sizes of SupLinUCB on the upper bound side, and delicately constructed adversarial sequences showing the tightness of elliptical potential lemmas on the lower bound side.