Researcher profile

Yanjun Han

Yanjun Han contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

The (Marginal) Value of a Search Ad: An Online Causal Framework for Repeated Second-price Auctions

Existing auto-bidding algorithms in digital advertising often treat the value of an ad opportunity as the revenue obtained when an ad is shown and/or clicked, and bid accordingly. This can lead to wasteful spending because the true value is the marginal gain from paid exposure: even without winning a sponsored slot, an advertiser may still earn revenue via an organic search result (e.g., on Google or Amazon). Motivated by recent work, we model ad value as a treatment effect--the outcome difference between winning and losing the auction--and study online learning for bidding in second-price (Vickrey) auctions under this causal perspective. We develop algorithms that attain rate-optimal regret under several feedback models. A key ingredient exploits the information revealed by the second-price payment rule, which strictly improves regret relative to analogous learning problems in first-price auctions.

preprint2023arXiv

Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient

A foundational problem in reinforcement learning and interactive decision making is to understand what modeling assumptions lead to sample-efficient learning guarantees, and what algorithm design principles achieve optimal sample complexity. Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient (DEC), a measure of statistical complexity which leads to upper and lower bounds on the optimal sample complexity for a general class of problems encompassing bandits and reinforcement learning with function approximation. In this paper, we introduce a new variant of the DEC, the Constrained Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts: - They hold in expectation, with no restrictions on the class of algorithms under consideration. - They hold globally, and do not rely on the notion of localization used by Foster et al. (2021). - Most interestingly, they allow the reference model with respect to which the DEC is defined to be improper, establishing that improper reference models play a fundamental role. We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al. (2021). Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.

preprint2022arXiv

Optimal prediction of Markov chains with and without spectral gap

We study the following learning problem with dependent data: Observing a trajectory of length $n$ from a stationary Markov chain with $k$ states, the goal is to predict the next state. For $3 \leq k \leq O(\sqrt{n})$, using techniques from universal compression, the optimal prediction risk in Kullback-Leibler divergence is shown to be $Θ(\frac{k^2}{n}\log \frac{n}{k^2})$, in contrast to the optimal rate of $Θ(\frac{\log \log n}{n})$ for $k=2$ previously shown in Falahatgar et al. (2016). These rates, slower than the parametric rate of $O(\frac{k^2}{n})$, can be attributed to the memory in the data, as the spectral gap of the Markov chain can be arbitrarily small. To quantify the memory effect, we study irreducible reversible chains with a prescribed spectral gap. In addition to characterizing the optimal prediction risk for two states, we show that, as long as the spectral gap is not excessively small, the prediction risk in the Markov model is $O(\frac{k^2}{n})$, which coincides with that of an iid model with the same number of parameters. Extensions to higher-order Markov chains are also obtained.

preprint2021arXiv

Adversarial Combinatorial Bandits with General Non-linear Reward Functions

In this paper we study the adversarial combinatorial bandit with a known non-linear reward function, extending existing work on adversarial linear combinatorial bandit. {The adversarial combinatorial bandit with general non-linear reward is an important open problem in bandit literature, and it is still unclear whether there is a significant gap from the case of linear reward, stochastic bandit, or semi-bandit feedback.} We show that, with $N$ arms and subsets of $K$ arms being chosen at each of $T$ time periods, the minimax optimal regret is $\widetildeΘ_{d}(\sqrt{N^d T})$ if the reward function is a $d$-degree polynomial with $d< K$, and $Θ_K(\sqrt{N^K T})$ if the reward function is not a low-degree polynomial. {Both bounds are significantly different from the bound $O(\sqrt{\mathrm{poly}(N,K)T})$ for the linear case, which suggests that there is a fundamental gap between the linear and non-linear reward structures.} Our result also finds applications to adversarial assortment optimization problem in online recommendation. We show that in the worst-case of adversarial assortment problem, the optimal algorithm must treat each individual $\binom{N}{K}$ assortment as independent.

preprint2021arXiv

Low half-wave-voltage, ultra-high bandwidth thin-film LiNbO3 modulator based on hybrid waveguide and periodic capacitively loaded electrodes

A novel thin-film LiNbO3 (TFLN) electro-optic modulator is proposed and demonstrated. LiNbO3-silica hybrid waveguide is adopted to maintain low optical loss for an electrode spacing as narrow as 3 μm, resulting in a record low half-wave-voltage length product of only 1.7 V*cm. Capacitively loaded traveling-wave electrodes (CL-TWEs) are employed to reduce the microwave loss, while quartz substrate is used in place of silicon substrate to achieve velocity matching. The fabricated TFLN modulator with a 5-mm-long modulation region exhibits a half-wave-voltage of 3.4 V and merely 1.3 dB roll-off in electro-optic response up to 67 GHz, and a 3-dB modulation bandwidth over 110 GHz is predicted.

preprint2021arXiv

Minimax Rate-Optimal Estimation of Divergences between Discrete Distributions

We study the minimax estimation of $α$-divergences between discrete distributions for integer $α\ge 1$, which include the Kullback--Leibler divergence and the $χ^2$-divergences as special examples. Dropping the usual theoretical tricks to acquire independence, we construct the first minimax rate-optimal estimator which does not require any Poissonization, sample splitting, or explicit construction of approximating polynomials. The estimator uses a hybrid approach which solves a problem-independent linear program based on moment matching in the non-smooth regime, and applies a problem-dependent bias-corrected plug-in estimator in the smooth regime, with a soft decision boundary between these regimes.

preprint2021arXiv

On Estimation of $L_{r}$-Norms in Gaussian White Noise Models

We provide a complete picture of asymptotically minimax estimation of $L_r$-norms (for any $r\ge 1$) of the mean in Gaussian white noise model over Nikolskii-Besov spaces. In this regard, we complement the work of Lepski, Nemirovski and Spokoiny (1999), who considered the cases of $r=1$ (with poly-logarithmic gap between upper and lower bounds) and $r$ even (with asymptotically sharp upper and lower bounds) over Hölder spaces. We additionally consider the case of asymptotically adaptive minimax estimation and demonstrate a difference between even and non-even $r$ in terms of an investigator&#39;s ability to produce asymptotically adaptive minimax estimators without paying a penalty.

preprint2021arXiv

On the High Accuracy Limitation of Adaptive Property Estimation

Recent years have witnessed the success of adaptive (or unified) approaches in estimating symmetric properties of discrete distributions, where one first obtains a distribution estimator independent of the target property, and then plugs the estimator into the target property as the final estimator. Several such approaches have been proposed and proved to be adaptively optimal, i.e. they achieve the optimal sample complexity for a large class of properties within a low accuracy, especially for a large estimation error $\varepsilon\gg n^{-1/3}$ where $n$ is the sample size. In this paper, we characterize the high accuracy limitation, or the penalty for adaptation, for all such approaches. Specifically, we show that under a mild assumption that the distribution estimator is close to the true sorted distribution in expectation, any adaptive approach cannot achieve the optimal sample complexity for every $1$-Lipschitz property within accuracy $\varepsilon \ll n^{-1/3}$. In particular, this result disproves a conjecture in [Acharya et al. 2017] that the profile maximum likelihood (PML) plug-in approach is optimal in property estimation for all ranges of $\varepsilon$, and confirms a conjecture in [Han and Shiragur, 2021] that their competitive analysis of the PML is tight.

preprint2021arXiv

Provably Breaking the Quadratic Error Compounding Barrier in Imitation Learning, Optimally

We study the statistical limits of Imitation Learning (IL) in episodic Markov Decision Processes (MDPs) with a state space $\mathcal{S}$. We focus on the known-transition setting where the learner is provided a dataset of $N$ length-$H$ trajectories from a deterministic expert policy and knows the MDP transition. We establish an upper bound $O(|\mathcal{S}|H^{3/2}/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al (2020) which we prove to be computationally efficient. In contrast, we show the minimax suboptimality grows as $Ω( H^{3/2}/N)$ when $|\mathcal{S}|\geq 3$ while the unknown-transition setting suffers from a larger sharp rate $Θ(|\mathcal{S}|H^2/N)$ (Rajaraman et al (2020)). The lower bound is established by proving a two-way reduction between IL and the value estimation problem of the unknown expert policy under any given reward function, as well as building connections with linear functional estimation with subsampled observations. We further show that under the additional assumption that the expert is optimal for the true reward function, there exists an efficient algorithm, which we term as Mimic-Mixture, that provably achieves suboptimality $O(1/N)$ for arbitrary 3-state MDPs with rewards only at the terminal layer. In contrast, no algorithm can achieve suboptimality $O(\sqrt{H}/N)$ with high probability if the expert is not constrained to be optimal. Our work formally establishes the benefit of the expert optimal assumption in the known transition setting, while Rajaraman et al (2020) showed it does not help when transitions are unknown.

preprint2021arXiv

Ultrafast Parallel LiDAR with Time-encoding and Spectral Scanning: Breaking the Time-of-flight Limit

Light detection and ranging (LiDAR) has been widely used in autonomous driving and large-scale manufacturing. Although state-of-the-art scanning LiDAR can perform long-range three-dimensional imaging, the frame rate is limited by both round-trip delay and the beam steering speed, hindering the development of high-speed autonomous vehicles. For hundred-meter level ranging applications, a several-time speedup is highly desirable. Here, we uniquely combine fiber-based encoders with wavelength-division multiplexing devices to implement all-optical time-encoding on the illumination light. Using this method, parallel detection and fast inertia-free spectral scanning can be achieved simultaneously with single-pixel detection. As a result, the frame rate of a scanning LiDAR can be multiplied with scalability. We demonstrate a 4.4-fold speedup for a maximum 75-m detection range, compared with a time-of-flight-limited laser ranging system. This approach has the potential to improve the velocity of LiDAR-based autonomous vehicles to the regime of hundred kilometers per hour and open up a new paradigm for ultrafast-frame-rate LiDAR imaging.

preprint2020arXiv

Bias Correction with Jackknife, Bootstrap, and Taylor Series

We analyze bias correction methods using jackknife, bootstrap, and Taylor series. We focus on the binomial model, and consider the problem of bias correction for estimating $f(p)$, where $f \in C[0,1]$ is arbitrary. We characterize the supremum norm of the bias of general jackknife and bootstrap estimators for any continuous functions, and demonstrate the in delete-$d$ jackknife, different values of $d$ may lead to drastically different behaviors in jackknife. We show that in the binomial model, iterating the bootstrap bias correction infinitely many times may lead to divergence of bias and variance, and demonstrate that the bias properties of the bootstrap bias corrected estimator after $r-1$ rounds are of the same order as that of the $r$-jackknife estimator if a bounded coefficients condition is satisfied.

preprint2020arXiv

Sequential Batch Learning in Finite-Action Linear Contextual Bandits

We study the sequential batch learning problem in linear contextual bandits with finite action sets, where the decision maker is constrained to split incoming individuals into (at most) a fixed number of batches and can only observe outcomes for the individuals within a batch at the batch&#39;s end. Compared to both standard online contextual bandits learning or offline policy learning in contexutal bandits, this sequential batch learning problem provides a finer-grained formulation of many personalized sequential decision making problems in practical applications, including medical treatment in clinical trials, product recommendation in e-commerce and adaptive experiment design in crowdsourcing. We study two settings of the problem: one where the contexts are arbitrarily generated and the other where the contexts are \textit{iid} drawn from some distribution. In each setting, we establish a regret lower bound and provide an algorithm, whose regret upper bound nearly matches the lower bound. As an important insight revealed therefrom, in the former setting, we show that the number of batches required to achieve the fully online performance is polynomial in the time horizon, while for the latter setting, a pure-exploitation algorithm with a judicious batch partition scheme achieves the fully online performance even when the number of batches is less than logarithmic in the time horizon. Together, our results provide a near-complete characterization of sequential decision making in linear contextual bandits when batch constraints are present.