Source author record

Yanjun Han

Yanjun Han appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning math.ST Statistics Theory physics.optics cond-mat.mtrl-sci Data Structures and Algorithms Computer Science and Game Theory cond-mat.mes-hall eess.IV math.OC Methodology physics.app-ph

Catalog footprint

What is connected

23works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

The (Marginal) Value of a Search Ad: An Online Causal Framework for Repeated Second-price Auctions

Existing auto-bidding algorithms in digital advertising often treat the value of an ad opportunity as the revenue obtained when an ad is shown and/or clicked, and bid accordingly. This can lead to wasteful spending because the true value is the marginal gain from paid exposure: even without winning a sponsored slot, an advertiser may still earn revenue via an organic search result (e.g., on Google or Amazon). Motivated by recent work, we model ad value as a treatment effect--the outcome difference between winning and losing the auction--and study online learning for bidding in second-price (Vickrey) auctions under this causal perspective. We develop algorithms that attain rate-optimal regret under several feedback models. A key ingredient exploits the information revealed by the second-price payment rule, which strictly improves regret relative to analogous learning problems in first-price auctions.

preprint2023arXiv

Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient

A foundational problem in reinforcement learning and interactive decision making is to understand what modeling assumptions lead to sample-efficient learning guarantees, and what algorithm design principles achieve optimal sample complexity. Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient (DEC), a measure of statistical complexity which leads to upper and lower bounds on the optimal sample complexity for a general class of problems encompassing bandits and reinforcement learning with function approximation. In this paper, we introduce a new variant of the DEC, the Constrained Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts: - They hold in expectation, with no restrictions on the class of algorithms under consideration. - They hold globally, and do not rely on the notion of localization used by Foster et al. (2021). - Most interestingly, they allow the reference model with respect to which the DEC is defined to be improper, establishing that improper reference models play a fundamental role. We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al. (2021). Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.

preprint2022arXiv

Optimal prediction of Markov chains with and without spectral gap

We study the following learning problem with dependent data: Observing a trajectory of length $n$ from a stationary Markov chain with $k$ states, the goal is to predict the next state. For $3 \leq k \leq O(\sqrt{n})$, using techniques from universal compression, the optimal prediction risk in Kullback-Leibler divergence is shown to be $Θ(\frac{k^2}{n}\log \frac{n}{k^2})$, in contrast to the optimal rate of $Θ(\frac{\log \log n}{n})$ for $k=2$ previously shown in Falahatgar et al. (2016). These rates, slower than the parametric rate of $O(\frac{k^2}{n})$, can be attributed to the memory in the data, as the spectral gap of the Markov chain can be arbitrarily small. To quantify the memory effect, we study irreducible reversible chains with a prescribed spectral gap. In addition to characterizing the optimal prediction risk for two states, we show that, as long as the spectral gap is not excessively small, the prediction risk in the Markov model is $O(\frac{k^2}{n})$, which coincides with that of an iid model with the same number of parameters. Extensions to higher-order Markov chains are also obtained.

preprint2021arXiv

Adversarial Combinatorial Bandits with General Non-linear Reward Functions

In this paper we study the adversarial combinatorial bandit with a known non-linear reward function, extending existing work on adversarial linear combinatorial bandit. {The adversarial combinatorial bandit with general non-linear reward is an important open problem in bandit literature, and it is still unclear whether there is a significant gap from the case of linear reward, stochastic bandit, or semi-bandit feedback.} We show that, with $N$ arms and subsets of $K$ arms being chosen at each of $T$ time periods, the minimax optimal regret is $\widetildeΘ_{d}(\sqrt{N^d T})$ if the reward function is a $d$-degree polynomial with $d< K$, and $Θ_K(\sqrt{N^K T})$ if the reward function is not a low-degree polynomial. {Both bounds are significantly different from the bound $O(\sqrt{\mathrm{poly}(N,K)T})$ for the linear case, which suggests that there is a fundamental gap between the linear and non-linear reward structures.} Our result also finds applications to adversarial assortment optimization problem in online recommendation. We show that in the worst-case of adversarial assortment problem, the optimal algorithm must treat each individual $\binom{N}{K}$ assortment as independent.

preprint2021arXiv

Low half-wave-voltage, ultra-high bandwidth thin-film LiNbO3 modulator based on hybrid waveguide and periodic capacitively loaded electrodes

A novel thin-film LiNbO3 (TFLN) electro-optic modulator is proposed and demonstrated. LiNbO3-silica hybrid waveguide is adopted to maintain low optical loss for an electrode spacing as narrow as 3 μm, resulting in a record low half-wave-voltage length product of only 1.7 V*cm. Capacitively loaded traveling-wave electrodes (CL-TWEs) are employed to reduce the microwave loss, while quartz substrate is used in place of silicon substrate to achieve velocity matching. The fabricated TFLN modulator with a 5-mm-long modulation region exhibits a half-wave-voltage of 3.4 V and merely 1.3 dB roll-off in electro-optic response up to 67 GHz, and a 3-dB modulation bandwidth over 110 GHz is predicted.

preprint2021arXiv

Minimax Rate-Optimal Estimation of Divergences between Discrete Distributions

We study the minimax estimation of $α$-divergences between discrete distributions for integer $α\ge 1$, which include the Kullback--Leibler divergence and the $χ^2$-divergences as special examples. Dropping the usual theoretical tricks to acquire independence, we construct the first minimax rate-optimal estimator which does not require any Poissonization, sample splitting, or explicit construction of approximating polynomials. The estimator uses a hybrid approach which solves a problem-independent linear program based on moment matching in the non-smooth regime, and applies a problem-dependent bias-corrected plug-in estimator in the smooth regime, with a soft decision boundary between these regimes.

preprint2021arXiv

On Estimation of $L_{r}$-Norms in Gaussian White Noise Models

We provide a complete picture of asymptotically minimax estimation of $L_r$-norms (for any $r\ge 1$) of the mean in Gaussian white noise model over Nikolskii-Besov spaces. In this regard, we complement the work of Lepski, Nemirovski and Spokoiny (1999), who considered the cases of $r=1$ (with poly-logarithmic gap between upper and lower bounds) and $r$ even (with asymptotically sharp upper and lower bounds) over Hölder spaces. We additionally consider the case of asymptotically adaptive minimax estimation and demonstrate a difference between even and non-even $r$ in terms of an investigator's ability to produce asymptotically adaptive minimax estimators without paying a penalty.

preprint2021arXiv

On the High Accuracy Limitation of Adaptive Property Estimation

Recent years have witnessed the success of adaptive (or unified) approaches in estimating symmetric properties of discrete distributions, where one first obtains a distribution estimator independent of the target property, and then plugs the estimator into the target property as the final estimator. Several such approaches have been proposed and proved to be adaptively optimal, i.e. they achieve the optimal sample complexity for a large class of properties within a low accuracy, especially for a large estimation error $\varepsilon\gg n^{-1/3}$ where $n$ is the sample size. In this paper, we characterize the high accuracy limitation, or the penalty for adaptation, for all such approaches. Specifically, we show that under a mild assumption that the distribution estimator is close to the true sorted distribution in expectation, any adaptive approach cannot achieve the optimal sample complexity for every $1$-Lipschitz property within accuracy $\varepsilon \ll n^{-1/3}$. In particular, this result disproves a conjecture in [Acharya et al. 2017] that the profile maximum likelihood (PML) plug-in approach is optimal in property estimation for all ranges of $\varepsilon$, and confirms a conjecture in [Han and Shiragur, 2021] that their competitive analysis of the PML is tight.

preprint2021arXiv

Provably Breaking the Quadratic Error Compounding Barrier in Imitation Learning, Optimally

We study the statistical limits of Imitation Learning (IL) in episodic Markov Decision Processes (MDPs) with a state space $\mathcal{S}$. We focus on the known-transition setting where the learner is provided a dataset of $N$ length-$H$ trajectories from a deterministic expert policy and knows the MDP transition. We establish an upper bound $O(|\mathcal{S}|H^{3/2}/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al (2020) which we prove to be computationally efficient. In contrast, we show the minimax suboptimality grows as $Ω( H^{3/2}/N)$ when $|\mathcal{S}|\geq 3$ while the unknown-transition setting suffers from a larger sharp rate $Θ(|\mathcal{S}|H^2/N)$ (Rajaraman et al (2020)). The lower bound is established by proving a two-way reduction between IL and the value estimation problem of the unknown expert policy under any given reward function, as well as building connections with linear functional estimation with subsampled observations. We further show that under the additional assumption that the expert is optimal for the true reward function, there exists an efficient algorithm, which we term as Mimic-Mixture, that provably achieves suboptimality $O(1/N)$ for arbitrary 3-state MDPs with rewards only at the terminal layer. In contrast, no algorithm can achieve suboptimality $O(\sqrt{H}/N)$ with high probability if the expert is not constrained to be optimal. Our work formally establishes the benefit of the expert optimal assumption in the known transition setting, while Rajaraman et al (2020) showed it does not help when transitions are unknown.

preprint2021arXiv

Ultrafast Parallel LiDAR with Time-encoding and Spectral Scanning: Breaking the Time-of-flight Limit

Light detection and ranging (LiDAR) has been widely used in autonomous driving and large-scale manufacturing. Although state-of-the-art scanning LiDAR can perform long-range three-dimensional imaging, the frame rate is limited by both round-trip delay and the beam steering speed, hindering the development of high-speed autonomous vehicles. For hundred-meter level ranging applications, a several-time speedup is highly desirable. Here, we uniquely combine fiber-based encoders with wavelength-division multiplexing devices to implement all-optical time-encoding on the illumination light. Using this method, parallel detection and fast inertia-free spectral scanning can be achieved simultaneously with single-pixel detection. As a result, the frame rate of a scanning LiDAR can be multiplied with scalability. We demonstrate a 4.4-fold speedup for a maximum 75-m detection range, compared with a time-of-flight-limited laser ranging system. This approach has the potential to improve the velocity of LiDAR-based autonomous vehicles to the regime of hundred kilometers per hour and open up a new paradigm for ultrafast-frame-rate LiDAR imaging.

preprint2020arXiv

Bias Correction with Jackknife, Bootstrap, and Taylor Series

We analyze bias correction methods using jackknife, bootstrap, and Taylor series. We focus on the binomial model, and consider the problem of bias correction for estimating $f(p)$, where $f \in C[0,1]$ is arbitrary. We characterize the supremum norm of the bias of general jackknife and bootstrap estimators for any continuous functions, and demonstrate the in delete-$d$ jackknife, different values of $d$ may lead to drastically different behaviors in jackknife. We show that in the binomial model, iterating the bootstrap bias correction infinitely many times may lead to divergence of bias and variance, and demonstrate that the bias properties of the bootstrap bias corrected estimator after $r-1$ rounds are of the same order as that of the $r$-jackknife estimator if a bounded coefficients condition is satisfied.

preprint2020arXiv

Sequential Batch Learning in Finite-Action Linear Contextual Bandits

We study the sequential batch learning problem in linear contextual bandits with finite action sets, where the decision maker is constrained to split incoming individuals into (at most) a fixed number of batches and can only observe outcomes for the individuals within a batch at the batch's end. Compared to both standard online contextual bandits learning or offline policy learning in contexutal bandits, this sequential batch learning problem provides a finer-grained formulation of many personalized sequential decision making problems in practical applications, including medical treatment in clinical trials, product recommendation in e-commerce and adaptive experiment design in crowdsourcing. We study two settings of the problem: one where the contexts are arbitrarily generated and the other where the contexts are \textit{iid} drawn from some distribution. In each setting, we establish a regret lower bound and provide an algorithm, whose regret upper bound nearly matches the lower bound. As an important insight revealed therefrom, in the former setting, we show that the number of batches required to achieve the fully online performance is polynomial in the time horizon, while for the latter setting, a pure-exploitation algorithm with a judicious batch partition scheme achieves the fully online performance even when the number of batches is less than logarithmic in the time horizon. Together, our results provide a near-complete characterization of sequential decision making in linear contextual bandits when batch constraints are present.

preprint2016arXiv

"WM"-Shaped Growth of GaN on Patterned Sapphire Substrates

In metal organic vapor phase epitaxy of GaN, the growth mode is sensitive to reactor temperature. In this study, V-pit-shaped GaN has been grown on normal c-plane cone-patterned sapphire substrate by decreasing the growth temperature of high-temperature-GaN to around 950 oC, which leads to the 3-dimensional growth of GaN. The so-called "WM" well describes the shape that the bottom of GaN V-pit is just right over the top of sapphire cone, and the regular arrangement of V-pits follows the patterns of sapphire substrate strictly. Two types of semipolar facets (1101) and (1122) expose on sidewalls of V-pits. Furthermore, by raising the growth temperature to 1000 oC, the growth mode of GaN can be transferred to 2-demonsional growth. Accordingly, the size of V-pits becomes smaller and the area of c-plane GaN becomes larger, while the total thickness of GaN keeps almost unchanged during this process. As long as the 2-demonsional growth lasts, the V-pits will disappear and only flat c-plane GaN remains. This means the area ratio of c-plane and semipolar plane GaN can be controlled by the duration time of 2-demonsional growth.

preprint2016arXiv

Broadband frequency comb generation in aluminum nitride-on-sapphire microresonators

Development of chip-scale optical frequency comb with the coverage from ultra-violet (UV) to mid-infrared (MIR) wavelength is of great significance. To expand the comb spectrum into the challenging UV region, a material platform with high UV transparency is crucial. In this paper, crystalline aluminum nitride (AlN)-onsapphire film is demonstrated for efficient Kerr frequency comb generation. Near-infrared (NIR) comb with nearly octave-spanning coverage and low parametric threshold is achieved in continuous-wave pumped high-quality-factor AlN microring resonators. The competition between stimulated Raman scattering (SRS) and hyperparametric oscillation is investigated, along with broadband comb generation via Raman-assisted four-wave mixing (FWM). Thanks to its wide bandgap, excellent crystalline quality as well as intrinsic quadratic and cubic susceptibilities, AlN-on-sapphire platform should be appealing for integrated nonlinear optics from MIR to UV region.

preprint2016arXiv

Continuous-wave Raman Lasing in Aluminum Nitride Microresonators

We report the first investigation on continuous-wave Raman lasing in high-quality-factor aluminum nitride (AlN) microring resonators. Although wurtzite AlN is known to exhibit six Raman-active phonons, single-mode Raman lasing with low threshold and high slope efficiency is demonstrated. Selective excitation of A$_1^\mathrm{TO}$ and E$_2^\mathrm{high}$ phonons with Raman shifts of $\sim$612 and 660 cm$^{-1}$ is observed by adjusting the polarization of the pump light. A theoretical analysis of Raman scattering efficiency within ${c}$-plane (0001) of AlN is carried out to help account for the observed lasing behavior. Bidirectional lasing is experimentally confirmed as a result of symmetric Raman gain in micro-scale waveguides. Furthermore, second-order Raman lasing with unparalleled output power of $\sim$11.3 mW is obtained, which offers the capability to yield higher order Raman lasers for mid-infrared applications.

preprint2016arXiv

InGaN/GaN Multi-Quantum-Well and Light-Emitting Diode Based on V-pit-Shaped GaN Grown on Patterned Sapphire Substrate

V-pit-defects in GaN-based light-emitting diodes induced by dislocations are considered beneficial to electroluminescence because they relax the strain in InGaN quantum wells and also enhance the hole lateral injection through sidewall of V-pits. In this paper, regularly arranged V-pits are formed on c-plane GaN grown by metal organic vapor phase epitaxy on conventional c-plane cone-patterned sapphire substrates. The size of V-pits and area of flat GaN can be adjusted by changing growth temperature. Five pairs of InGaN/GaN multi-quantumwell and also a light-emitting diode structure are grown on this V-pit-shaped GaN. Two peaks around 410 nm and 450 nm appearing in both photoluminescence and cathodeluminescence spectra are from the semipolar InGaN/GaN multi-quantum-well on sidewalls of V-pits and cplane InGaN/GaN multi-quantum-well, respectively. In addition, dense bright spots can be observed on the surface of light-emitting diode when it works under small injection current, which are believed owing to the enhanced hole injection around V-pits.

preprint2016arXiv

Mutual Information Bounds via Adjacency Events

The mutual information between two jointly distributed random variables $X$ and $Y$ is a functional of the joint distribution $P_{XY},$ which is sometimes difficult to handle or estimate. A coarser description of the statistical behavior of $(X,Y)$ is given by the marginal distributions $P_X, P_Y$ and the adjacency relation induced by the joint distribution, where $x$ and $y$ are adjacent if $P(x,y)>0$. We derive a lower bound on the mutual information in terms of these entities. The bound is obtained by viewing the channel from $X$ to $Y$ as a probability distribution on a set of possible actions, where an action determines the output for any possible input, and is independently drawn. We also provide an alternative proof based on convex optimization, that yields a generally tighter bound. Finally, we derive an upper bound on the mutual information in terms of adjacency events between the action and the pair $(X,Y)$, where in this case an action $a$ and a pair $(x,y)$ are adjacent if $y=a(x)$. As an example, we apply our bounds to the binary deletion channel and show that for the special case of an i.i.d. input distribution and a range of deletion probabilities, our lower and upper bounds both outperform the best known bounds for the mutual information.

preprint2016arXiv

Understanding different efficiency droop behaviors in InGaN-based near-UV, blue and green light-emitting diodes through differential carrier lifetime measurements

Efficiency droop effect under high injection in GaN-based light emitting diodes (LEDs) strongly depends on wavelength, which is still not well understood. In this paper, through differential carrier lifetime measurements on commercialized near-UV, blue, and green LEDs, their different efficiency droop behaviors are attributed to different carrier lifetimes, which are prolonged as wavelength increases. This relationship between carrier lifetime and indium composition of InGaN quantum well is believed owing to the polarization-induced quantum confinement Stark effect. Long carrier lifetime not only increases the probability of carrier leakage, but also results in high carrier concentration in quantum well. In other words, under the same current density, the carrier concentration in active region in near-UV LED is the lowest while that in green one is the highest. If considering the efficiency droop depending on carrier concentration, the behaviors of LEDs with different wavelengths do not show any abnormality. The reason why the efficiency droop becomes more serious under lower temperature can be also explained by this model as well. Based on this result, the possible solutions to conquer efficiency droop are discussed. It seems that decreasing the carrier lifetime is a fundamental approach to solve the problem.

preprint2015arXiv

Minimax Estimation of Discrete Distributions under $\ell_1$ Loss

We analyze the problem of discrete distribution estimation under $\ell_1$ loss. We provide non-asymptotic upper and lower bounds on the maximum risk of the empirical distribution (the maximum likelihood estimator), and the minimax risk in regimes where the alphabet size $S$ may grow with the number of observations $n$. We show that among distributions with bounded entropy $H$, the asymptotic maximum risk for the empirical distribution is $2H/\ln n$, while the asymptotic minimax risk is $H/\ln n$. Moreover, Moreover, we show that a hard-thresholding estimator oblivious to the unknown upper bound $H$, is asymptotically minimax. However, if we constrain the estimates to lie in the simplex of probability distributions, then the asymptotic minimax risk is again $2H/\ln n$. We draw connections between our work and the literature on density estimation, entropy estimation, total variation distance ($\ell_1$ divergence) estimation, joint distribution estimation in stochastic processes, normal mean estimation, and adaptive estimation.

preprint2015arXiv

Minimax Estimation of Functionals of Discrete Distributions

We propose a general methodology for the construction and analysis of minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions, where the alphabet size $S$ is unknown and may be comparable with the number of observations $n$. We treat the respective regions where the functional is "nonsmooth" and "smooth" separately. In the "nonsmooth" regime, we apply an unbiased estimator for the best polynomial approximation of the functional whereas, in the "smooth" regime, we apply a bias-corrected Maximum Likelihood Estimator (MLE). We illustrate the merit of this approach by thoroughly analyzing two important cases: the entropy $H(P) = \sum_{i = 1}^S -p_i \ln p_i$ and $F_α(P) = \sum_{i = 1}^S p_i^α,α>0$. We obtain the minimax $L_2$ rates for estimating these functionals. In particular, we demonstrate that our estimator achieves the optimal sample complexity $n \asymp S/\ln S$ for entropy estimation. We also show that the sample complexity for estimating $F_α(P),0<α<1$ is $n\asymp S^{1/α}/ \ln S$, which can be achieved by our estimator but not the MLE. For $1<α<3/2$, we show the minimax $L_2$ rate for estimating $F_α(P)$ is $(n\ln n)^{-2(α-1)}$ regardless of the alphabet size, while the $L_2$ rate for the MLE is $n^{-2(α-1)}$. For all the above cases, the behavior of the minimax rate-optimal estimators with $n$ samples is essentially that of the MLE with $n\ln n$ samples. We highlight the practical advantages of our schemes for entropy and mutual information estimation. We demonstrate that our approach reduces running time and boosts the accuracy compared to existing various approaches. Moreover, we show that the mutual information estimator induced by our methodology leads to significant performance boosts over the Chow--Liu algorithm in learning graphical models.

preprint2015arXiv

On the Ergodic Capacity of MIMO Free-Space Optical Systems over Turbulence Channels

The free-space optical (FSO) communications can achieve high capacity with huge unlicensed optical spectrum and low operational costs. The corresponding performance analysis of FSO systems over turbulence channels is very limited, especially when using multiple apertures at both transmitter and receiver sides. This paper aim to provide the ergodic capacity characterization of multiple-input multiple-output (MIMO) FSO systems over atmospheric turbulence-induced fading channels. The fluctuations of the irradiance of optical channels distorted by atmospheric conditions is usually described by a gamma-gamma ($ΓΓ$) distribution, and the distribution of the sum of $ΓΓ$ random variables (RVs) is required to model the MIMO optical links. We use an $α$-$μ$ distribution to efficiently approximate the probability density function (PDF) of the sum of independent and identical distributed $ΓΓ$ RVs through moment-based estimators. Furthermore, the PDF of the sum of independent, but not necessarily identically distributed $ΓΓ$ RVs can be efficiently approximated by a finite weighted sum of PDFs of $ΓΓ$ distributions. Based on these reliable approximations, novel and precise analytical expressions for the ergodic capacity of MIMO FSO systems are derived. Additionally, we deduce the asymptotic simple expressions in high signal-to-noise ratio regimes, which provide useful insights into the impact of the system parameters on the ergodic capacity. Finally, our proposed results are validated via Monte-Carlo simulations.

preprint2015arXiv

Performance Limits and Geometric Properties of Array Localization

Location-aware networks are of great importance and interest in both civil and military applications. This paper determines the localization accuracy of an agent, which is equipped with an antenna array and localizes itself using wireless measurements with anchor nodes, in a far-field environment. In view of the Cramér-Rao bound, we first derive the localization information for static scenarios and demonstrate that such information is a weighed sum of Fisher information matrices from each anchor-antenna measurement pair. Each matrix can be further decomposed into two parts: a distance part with intensity proportional to the squared baseband effective bandwidth of the transmitted signal and a direction part with intensity associated with the normalized anchor-antenna visual angle. Moreover, in dynamic scenarios, we show that the Doppler shift contributes additional direction information, with intensity determined by the agent velocity and the root mean squared time duration of the transmitted signal. In addition, two measures are proposed to evaluate the localization performance of wireless networks with different anchor-agent and array-antenna geometries, and both formulae and simulations are provided for typical anchor deployments and antenna arrays.

preprint2014arXiv

Beyond Maximum Likelihood: from Theory to Practice

Maximum likelihood is the most widely used statistical estimation technique. Recent work by the authors introduced a general methodology for the construction of estimators for functionals in parametric models, and demonstrated improvements - both in theory and in practice - over the maximum likelihood estimator (MLE), particularly in high dimensional scenarios involving parameter dimension comparable to or larger than the number of samples. This approach to estimation, building on results from approximation theory, is shown to yield minimax rate-optimal estimators for a wide class of functionals, implementable with modest computational requirements. In a nutshell, a message of this recent work is that, for a wide class of functionals, the performance of these essentially optimal estimators with $n$ samples is comparable to that of the MLE with $n \ln n$ samples. In the present paper, we highlight the applicability of the aforementioned methodology to statistical problems beyond functional estimation, and show that it can yield substantial gains. For example, we demonstrate that for learning tree-structured graphical models, our approach achieves a significant reduction of the required data size compared with the classical Chow--Liu algorithm, which is an implementation of the MLE, to achieve the same accuracy. The key step in improving the Chow--Liu algorithm is to replace the empirical mutual information with the estimator for mutual information proposed by the authors. Further, applying the same replacement approach to classical Bayesian network classification, the resulting classifiers uniformly outperform the previous classifiers on 26 widely used datasets.

Yanjun Han

What is connected

Connect this record

See the researcher in context

Building this map preview

23 published item(s)

The (Marginal) Value of a Search Ad: An Online Causal Framework for Repeated Second-price Auctions

Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient

Optimal prediction of Markov chains with and without spectral gap

Adversarial Combinatorial Bandits with General Non-linear Reward Functions

Low half-wave-voltage, ultra-high bandwidth thin-film LiNbO3 modulator based on hybrid waveguide and periodic capacitively loaded electrodes

Minimax Rate-Optimal Estimation of Divergences between Discrete Distributions

On Estimation of $L_{r}$-Norms in Gaussian White Noise Models

On the High Accuracy Limitation of Adaptive Property Estimation

Provably Breaking the Quadratic Error Compounding Barrier in Imitation Learning, Optimally

Ultrafast Parallel LiDAR with Time-encoding and Spectral Scanning: Breaking the Time-of-flight Limit

Bias Correction with Jackknife, Bootstrap, and Taylor Series

Sequential Batch Learning in Finite-Action Linear Contextual Bandits

"WM"-Shaped Growth of GaN on Patterned Sapphire Substrates

Broadband frequency comb generation in aluminum nitride-on-sapphire microresonators

Continuous-wave Raman Lasing in Aluminum Nitride Microresonators

InGaN/GaN Multi-Quantum-Well and Light-Emitting Diode Based on V-pit-Shaped GaN Grown on Patterned Sapphire Substrate

Mutual Information Bounds via Adjacency Events

Understanding different efficiency droop behaviors in InGaN-based near-UV, blue and green light-emitting diodes through differential carrier lifetime measurements

Minimax Estimation of Discrete Distributions under $\ell_1$ Loss

Minimax Estimation of Functionals of Discrete Distributions

On the Ergodic Capacity of MIMO Free-Space Optical Systems over Turbulence Channels

Performance Limits and Geometric Properties of Array Localization

Beyond Maximum Likelihood: from Theory to Practice