Researcher profile

Haifeng Xu

Haifeng Xu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

An AI-guided mechanotyping instrument for fully automated oocyte quality assessment

The mechanical properties of oocytes are regarded as important indicators of their developmental potential. During fertilization, deviations from the normal mechanical range can hinder sperm penetration, ultimately reducing fertilization efficiency and compromising embryo quality. However, current methods for measuring oocyte mechanics often suffer from serious cellular damage, low automation levels, and large measurement errors. To address these limitations, we developed an AI-guided micronewton-scale mechanical measurement system for safe and automated oocyte quality assessment. The system integrates voice interaction with automated experimental workflows to control a magnetically actuated microgripper, which applies defined loading forces to induce micron-scale compressive deformation of the oocyte. Combined with AI-assisted object detection and image segmentation algorithms, the system captures cellular deformation in real time, enabling precise calculation of the oocyte's compressive modulus. This measurement system enables automated, quantitative, and non-destructive evaluation of oocyte mechanical properties, providing an effective approach for oocyte quality screening in in vitro fertilization (IVF) and other assisted reproductive technologies (ART).

preprint2023arXiv

Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards

Incrementality, which is used to measure the causal effect of showing an ad to a potential customer (e.g. a user in an internet platform) versus not, is a central object for advertisers in online advertising platforms. This paper investigates the problem of how an advertiser can learn to optimize the bidding sequence in an online manner \emph{without} knowing the incrementality parameters in advance. We formulate the offline version of this problem as a specially structured episodic Markov Decision Process (MDP) and then, for its online learning counterpart, propose a novel reinforcement learning (RL) algorithm with regret at most $\widetilde{O}(H^2\sqrt{T})$, which depends on the number of rounds $H$ and number of episodes $T$, but does not depend on the number of actions (i.e., possible bids). A fundamental difference between our learning problem from standard RL problems is that the realized reward feedback from conversion incrementality is \emph{mixed} and \emph{delayed}. To handle this difficulty we propose and analyze a novel pairwise moment-matching algorithm to learn the conversion incrementality, which we believe is of independent of interest.

preprint2023arXiv

Learning in Online Principal-Agent Interactions: The Power of Menus

We study a ubiquitous learning challenge in online principal-agent problems during which the principal learns the agent's private information from the agent's revealed preferences in historical interactions. This paradigm includes important special cases such as pricing and contract design, which have been widely studied in recent literature. However, existing work considers the case where the principal can only choose a single strategy at every round to interact with the agent and then observe the agent's revealed preference through their actions. In this paper, we extend this line of study to allow the principal to offer a menu of strategies to the agent and learn additionally from observing the agent's selection from the menu. We provide a thorough investigation of several online principal-agent problem settings and characterize their sample complexities, accompanied by the corresponding algorithms we have developed. We instantiate this paradigm to several important design problems $-$ including Stackelberg (security) games, contract design, and information design. Finally, we also explore the connection between our findings and existing results about online learning in Stackelberg games, and we offer a solution that can overcome a key hard instance of Peng et al. (2019).

preprint2022arXiv

Algorithmic Information Design in Multi-Player Games: Possibility and Limits in Singleton Congestion

Most algorithmic studies on multi-agent information design so far have focused on the restricted situation with no inter-agent externalities; a few exceptions investigated truly strategic games such as zero-sum games and second-price auctions but have all focused only on optimal public signaling. This paper initiates the algorithmic information design of both \emph{public} and \emph{private} signaling in a fundamental class of games with negative externalities, i.e., singleton congestion games, with wide application in today's digital economy, machine scheduling, routing, etc. For both public and private signaling, we show that the optimal information design can be efficiently computed when the number of resources is a constant. To our knowledge, this is the first set of efficient \emph{exact} algorithms for information design in succinctly representable many-player games. Our results hinge on novel techniques such as developing certain "reduced forms" to compactly characterize equilibria in public signaling or to represent players' marginal beliefs in private signaling. When there are many resources, we show computational intractability results. To overcome the issue of multiple equilibria, here we introduce a new notion of equilibrium-\emph{oblivious} hardness, which rules out any possibility of computing a good signaling scheme, irrespective of the equilibrium selection rule.

preprint2022arXiv

Learning from a Learning User for Optimal Recommendations

In real-world recommendation problems, especially those with a formidably large item space, users have to gradually learn to estimate the utility of any fresh recommendations from their experience about previously consumed items. This in turn affects their interaction dynamics with the system and can invalidate previous algorithms built on the omniscient user assumption. In this paper, we formalize a model to capture such "learning users" and design an efficient system-side learning solution, coined Noise-Robust Active Ellipsoid Search (RAES), to confront the challenges brought by the non-stationary feedback from such a learning user. Interestingly, we prove that the regret of RAES deteriorates gracefully as the convergence rate of user learning becomes worse, until reaching linear regret when the user's learning fails to converge. Experiments on synthetic datasets demonstrate the strength of RAES for such a contemporaneous system-user learning problem. Our study provides a novel perspective on modeling the feedback loop in recommendation problems.

preprint2022arXiv

Multi-Channel Bayesian Persuasion

The celebrated Bayesian persuasion model considers strategic communication between an informed agent (the sender) and uninformed decision makers (the receivers). The current rapidly-growing literature mostly assumes a dichotomy: either the sender is powerful enough to communicate separately with each receiver (a.k.a. private persuasion), or she cannot communicate separately at all (a.k.a. public persuasion). We study a model that smoothly interpolates between the two, by considering a natural multi-channel communication structure in which each receiver observes a subset of the sender's communication channels. This captures, e.g., receivers on a network, where information spillover is almost inevitable. We completely characterize when one communication structure is better for the sender than another, in the sense of yielding higher optimal expected utility universally over all prior distributions and utility functions. The characterization is based on a simple pairwise relation among receivers - one receiver information-dominates another if he observes at least the same channels. We prove that a communication structure $M_1$ is (weakly) better than $M_2$ if and only if every information-dominating pair of receivers in $M_1$ is also such in $M_2$. We also provide an additive FPTAS for the optimal sender's signaling scheme when the number of states is constant and the graph of information-dominating pairs is a directed forest. Finally, we prove that finding an optimal signaling scheme under multi-channel persuasion is, generally, computationally harder than under both public and private persuasion.

preprint2022arXiv

Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification

We study bandit algorithms under data poisoning attacks in a bounded reward setting. We consider a strong attacker model in which the attacker can observe both the selected actions and their corresponding rewards and can contaminate the rewards with additive noise. We show that any bandit algorithm with regret $O(\log T)$ can be forced to suffer a regret $Ω(T)$ with an expected amount of contamination $O(\log T)$. This amount of contamination is also necessary, as we prove that there exists an $O(\log T)$ regret bandit algorithm, specifically the classical UCB, that requires $Ω(\log T)$ amount of contamination to suffer regret $Ω(T)$. To combat such attacks, our second main contribution is to propose verification based mechanisms, which use limited verification to access a limited number of uncontaminated rewards. In particular, for the case of unlimited verifications, we show that with $O(\log T)$ expected number of verifications, a simple modified version of the ETC type bandit algorithm can restore the order optimal $O(\log T)$ regret irrespective of the amount of contamination used by the attacker. We also provide a UCB-like verification scheme, called Secure-UCB, that also enjoys full recovery from any attacks, also with $O(\log T)$ expected number of verifications. To derive a matching lower bound on the number of verifications, we prove that for any order-optimal bandit algorithm, this number of verifications $Ω(\log T)$ is necessary to recover the order-optimal regret. On the other hand, when the number of verifications is bounded above by a budget $B$, we propose a novel algorithm, Secure-BARBAR, which provably achieves $O(\min\{C,T/\sqrt{B} \})$ regret with high probability against weak attackers where $C$ is the total amount of contamination by the attacker, which breaks the known $Ω(C)$ lower bound of the non-verified setting if $C$ is large.

preprint2022arXiv

Understanding the Limits of Poisoning Attacks in Episodic Reinforcement Learning

To understand the security threats to reinforcement learning (RL) algorithms, this paper studies poisoning attacks to manipulate \emph{any} order-optimal learning algorithm towards a targeted policy in episodic RL and examines the potential damage of two natural types of poisoning attacks, i.e., the manipulation of \emph{reward} and \emph{action}. We discover that the effect of attacks crucially depend on whether the rewards are bounded or unbounded. In bounded reward settings, we show that only reward manipulation or only action manipulation cannot guarantee a successful attack. However, by combining reward and action manipulation, the adversary can manipulate any order-optimal learning algorithm to follow any targeted policy with $\tildeΘ(\sqrt{T})$ total attack cost, which is order-optimal, without any knowledge of the underlying MDP. In contrast, in unbounded reward settings, we show that reward manipulation attacks are sufficient for an adversary to successfully manipulate any order-optimal learning algorithm to follow any targeted policy using $\tilde{O}(\sqrt{T})$ amount of contamination. Our results reveal useful insights about what can or cannot be achieved by poisoning attacks, and are set to spur more works on the design of robust RL algorithms.

preprint2022arXiv

When Are Linear Stochastic Bandits Attackable?

We study adversarial attacks on linear stochastic bandits: by manipulating the rewards, an adversary aims to control the behaviour of the bandit algorithm. Perhaps surprisingly, we first show that some attack goals can never be achieved. This is in sharp contrast to context-free stochastic bandits, and is intrinsically due to the correlation among arms in linear stochastic bandits. Motivated by this finding, this paper studies the attackability of a $k$-armed linear bandit environment. We first provide a complete necessity and sufficiency characterization of attackability based on the geometry of the arms' context vectors. We then propose a two-stage attack method against LinUCB and Robust Phase Elimination. The method first asserts whether the given environment is attackable; and if yes, it poisons the rewards to force the algorithm to pull a target arm linear times using only a sublinear cost. Numerical experiments further validate the effectiveness and cost-efficiency of the proposed attack method.

preprint2020arXiv

Collapsing Bandits and Their Application to Public Health Interventions

We propose and study Collpasing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus "collapsing" any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the "good" state as possible by planning a limited budget of actions per round. Such Collapsing Bandits are natural models for many healthcare domains in which workers must simultaneously monitor patients and deliver interventions in a way that maximizes the health of their patient cohort. Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable. Our derivation hinges on novel conditions that characterize when the optimal policies may take the form of either "forward" or "reverse" threshold policies. (ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closed-form. (iii) We evaluate our algorithm on several data distributions including data from a real-world healthcare task in which a worker must monitor and deliver interventions to maximize their patients' adherence to tuberculosis medication. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques while achieving similar performance.

preprint2020arXiv

Computing Equilibria of Prediction Markets via Persuasion

We study the computation of equilibria in prediction markets in perhaps the most fundamental special case with two players and three trading opportunities. To do so, we show equivalence of prediction market equilibria with those of a simpler signaling game with commitment introduced by Kong and Schoenebeck (2018). We then extend their results by giving computationally efficient algorithms for additional parameter regimes. Our approach leverages a new connection between prediction markets and Bayesian persuasion, which also reveals interesting conceptual insights.

preprint2020arXiv

Selling Information Through Consulting

We consider a monopoly information holder selling information to a budget-constrained decision maker, who may benefit from the seller's information. The decision maker has a utility function that depends on his action and an uncertain state of the world. The seller and the buyer each observe a private signal regarding the state of the world, which may be correlated with each other. The seller's goal is to sell her private information to the buyer and extract maximum possible revenue, subject to the buyer's budget constraints. We consider three different settings with increasing generality, i.e., the seller's signal and the buyer's signal can be independent, correlated, or follow a general distribution accessed through a black-box sampling oracle. For each setting, we design information selling mechanisms which are both optimal and simple in the sense that they can be naturally interpreted, have succinct representations, and can be efficiently computed. Notably, though the optimal mechanism exhibits slightly increasing complexity as the setting becomes more general, all our mechanisms share the same format of acting as a consultant who recommends the best action to the buyer but uses different and carefully designed payment rules for different settings. Each of our optimal mechanisms can be easily computed by solving a single polynomial-size linear program. This significantly simplifies exponential-size LPs solved by the Ellipsoid method in the previous work, which computes the optimal mechanisms in the same setting but without budget limit. Such simplification is enabled by our new characterizations of the optimal mechanism in the (more realistic) budget-constrained setting.

preprint2019arXiv

Quantum dynamics of atomic Rydberg excitation in strong laser fields

Neutral atoms have been observed to survive intense laser pulses in high Rydberg states with surprisingly large probability. Only with this Rydberg-state excitation (RSE) included is the picture of intense-laser-atom interaction complete. Various mechanisms have been proposed to explain the underlying physics. However, neither one can explain all the features observed in experiments and in time-dependent Schrödinger equation (TDSE) simulations. Here we propose a fully quantum-mechanical model based on the strong-field approximation (SFA). It well reproduces the intensity dependence of RSE obtained by the TDSE, which exhibits a series of modulated peaks. They are due to recapture of the liberated electron and the fact that the pertinent probability strongly depends on the position and the parity of the Rydberg state. We also present measurements of RSE in xenon at 800 nm, which display the peak structure consistent with the calculations.