Researcher profile

Jianqing Fan

Jianqing Fan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
21works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

21 published item(s)

preprint2023arXiv

Ranking Inferences Based on the Top Choice of Multiway Comparisons

This paper considers ranking inference of $n$ items based on the observed data on the top choice among $M$ randomly selected items at each trial. This is a useful modification of the Plackett-Luce model for $M$-way ranking with only the top choice observed and is an extension of the celebrated Bradley-Terry-Luce model that corresponds to $M=2$. Under a uniform sampling scheme in which any $M$ distinguished items are selected for comparisons with probability $p$ and the selected $M$ items are compared $L$ times with multinomial outcomes, we establish the statistical rates of convergence for underlying $n$ preference scores using both $\ell_2$-norm and $\ell_\infty$-norm, with the minimum sampling complexity. In addition, we establish the asymptotic normality of the maximum likelihood estimator that allows us to construct confidence intervals for the underlying scores. Furthermore, we propose a novel inference framework for ranking items through a sophisticated maximum pairwise difference statistic whose distribution is estimated via a valid Gaussian multiplier bootstrap. The estimated distribution is then used to construct simultaneous confidence intervals for the differences in the preference scores and the ranks of individual items. They also enable us to address various inference questions on the ranks of these items. Extensive simulation studies lend further support to our theoretical results. A real data application illustrates the usefulness of the proposed methods convincingly.

preprint2022arXiv

An $\ell_p$ theory of PCA and spectral clustering

Principal Component Analysis (PCA) is a powerful tool in statistics and machine learning. While existing study of PCA focuses on the recovery of principal components and their associated eigenvalues, there are few precise characterizations of individual principal component scores that yield low-dimensional embedding of samples. That hinders the analysis of various spectral methods. In this paper, we first develop an $\ell_p$ perturbation theory for a hollowed version of PCA in Hilbert spaces which provably improves upon the vanilla PCA in the presence of heteroscedastic noises. Through a novel $\ell_p$ analysis of eigenvectors, we investigate entrywise behaviors of principal component score vectors and show that they can be approximated by linear functionals of the Gram matrix in $\ell_p$ norm, which includes $\ell_2$ and $\ell_\infty$ as special examples. For sub-Gaussian mixture models, the choice of $p$ giving optimal bounds depends on the signal-to-noise ratio, which further yields optimality guarantees for spectral clustering. For contextual community detection, the $\ell_p$ theory leads to a simple spectral algorithm that achieves the information threshold for exact recovery. These also provide optimal recovery results for Gaussian mixture and stochastic block models as special cases.

preprint2022arXiv

Are Latent Factor Regression and Sparse Regression Adequate?

We propose the Factor Augmented sparse linear Regression Model (FARM) that not only encompasses both the latent factor regression and sparse linear regression as special cases but also bridges dimension reduction and sparse regression together. We provide theoretical guarantees for the estimation of our model under the existence of sub-Gaussian and heavy-tailed noises (with bounded (1+x)-th moment, for all x>0), respectively. In addition, the existing works on supervised learning often assume the latent factor regression or the sparse linear regression is the true underlying model without justifying its adequacy. To fill in such an important gap, we also leverage our model as the alternative model to test the sufficiency of the latent factor regression and the sparse linear regression models. To accomplish these goals, we propose the Factor-Adjusted de-Biased Test (FabTest) and a two-stage ANOVA type test respectively. We also conduct large-scale numerical experiments including both synthetic and FRED macroeconomics data to corroborate the theoretical properties of our methods. Numerical results illustrate the robustness and effectiveness of our model against latent factor regression and sparse linear regression models.

preprint2022arXiv

Bridging factor and sparse models

Factor and sparse models are two widely used methods to impose a low-dimensional structure in high-dimensions. However, they are seemingly mutually exclusive. We propose a lifting method that combines the merits of these two models in a supervised learning methodology that allows for efficiently exploring all the information in high-dimensional datasets. The method is based on a flexible model for high-dimensional panel data, called factor-augmented regression model with observable and/or latent common factors, as well as idiosyncratic components. This model not only includes both principal component regression and sparse regression as specific models but also significantly weakens the cross-sectional dependence and facilitates model selection and interpretability. The method consists of several steps and a novel test for (partial) covariance structure in high dimensions to infer the remaining cross-section dependence at each step. We develop the theory for the model and demonstrate the validity of the multiplier bootstrap for testing a high-dimensional (partial) covariance structure. The theory is supported by a simulation study and applications.

preprint2022arXiv

Do We Exploit all Information for Counterfactual Analysis? Benefits of Factor Models and Idiosyncratic Correction

Optimal pricing, i.e., determining the price level that maximizes profit or revenue of a given product, is a vital task for the retail industry. To select such a quantity, one needs first to estimate the price elasticity from the product demand. Regression methods usually fail to recover such elasticities due to confounding effects and price endogeneity. Therefore, randomized experiments are typically required. However, elasticities can be highly heterogeneous depending on the location of stores, for example. As the randomization frequently occurs at the municipal level, standard difference-in-differences methods may also fail. Possible solutions are based on methodologies to measure the effects of treatments on a single (or just a few) treated unit(s) based on counterfactuals constructed from artificial controls. For example, for each city in the treatment group, a counterfactual may be constructed from the untreated locations. In this paper, we apply a novel high-dimensional statistical method to measure the effects of price changes on daily sales from a major retailer in Brazil. The proposed methodology combines principal components (factors) and sparse regressions, resulting in a method called Factor-Adjusted Regularized Method for Treatment evaluation (\texttt{FarmTreat}). The data consist of daily sales and prices of five different products over more than 400 municipalities. The products considered belong to the \emph{sweet and candies} category and experiments have been conducted over the years of 2016 and 2017. Our results confirm the hypothesis of a high degree of heterogeneity yielding very different pricing strategies over distinct municipalities.

preprint2022arXiv

How do noise tails impact on deep ReLU networks?

This paper investigates the stability of deep ReLU neural networks for nonparametric regression under the assumption that the noise has only a finite p-th moment. We unveil how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions with hierarchical composition structure when both the adaptive Huber loss and deep ReLU neural networks are used. This optimal rate of convergence cannot be obtained by the ordinary least squares but can be achieved by the Huber loss with a properly chosen parameter that adapts to the sample size, smoothness, and moment parameters. A concentration inequality for the adaptive Huber ReLU neural network estimators with allowable optimization errors is also derived. To establish a matching lower bound within the class of neural network estimators using the Huber loss, we employ a different strategy from the traditional route: constructing a deep ReLU network estimator that has a better empirical loss than the true function and the difference between these two functions furnishes a low bound. This step is related to the Huberization bias, yet more critically to the approximability of deep ReLU networks. As a result, we also contribute some new results on the approximation theory of deep ReLU neural networks.

preprint2022arXiv

Policy Optimization Using Semi-parametric Models for Dynamic Pricing

In this paper, we study the contextual dynamic pricing problem where the market value of a product is linear in its observed features plus some market noise. Products are sold one at a time, and only a binary response indicating success or failure of a sale is observed. Our model setting is similar to Javanmard and Nazerzadeh [2019] except that we expand the demand curve to a semiparametric model and need to learn dynamically both parametric and nonparametric components. We propose a dynamic statistical learning and decision-making policy that combines semiparametric estimation from a generalized linear model with an unknown link and online decision-making to minimize regret (maximize revenue). Under mild conditions, we show that for a market noise c.d.f. $F(\cdot)$ with $m$-th order derivative ($m\geq 2$), our policy achieves a regret upper bound of $\tilde{O}_{d}(T^{\frac{2m+1}{4m-1}})$, where $T$ is time horizon and $\tilde{O}_{d}$ is the order that hides logarithmic terms and the dimensionality of feature $d$. The upper bound is further reduced to $\tilde{O}_{d}(\sqrt{T})$ if $F$ is super smooth whose Fourier transform decays exponentially. In terms of dependence on the horizon $T$, these upper bounds are close to $Ω(\sqrt{T})$, the lower bound where $F$ belongs to a parametric class. We further generalize these results to the case with dynamically dependent product features under the strong mixing condition.

preprint2022arXiv

Robust Matrix Completion with Heavy-tailed Noise

This paper studies low-rank matrix completion in the presence of heavy-tailed and possibly asymmetric noise, where we aim to estimate an underlying low-rank matrix given a set of highly incomplete noisy entries. Though the matrix completion problem has attracted much attention in the past decade, there is still lack of theoretical understanding when the observations are contaminated by heavy-tailed noises. Prior theory falls short of explaining the empirical results and is unable to capture the optimal dependence of the estimation error on the noise level. In this paper, we adopt an adaptive Huber loss to accommodate heavy-tailed noise, which is robust against large and possibly asymmetric errors when the parameter in the loss function is carefully designed to balance the Huberization biases and robustness to outliers. Then, we propose an efficient nonconvex algorithm via a balanced low-rank Burer-Monteiro matrix factorization and gradient decent with robust spectral initialization. We prove that under merely bounded second moment condition on the error distributions, rather than the sub-Gaussian assumption, the Euclidean error of the iterates generated by the proposed algorithm decrease geometrically fast until achieving a minimax-optimal statistical estimation error, which has the same order as that in the sub-Gaussian case. The key technique behind this significant advancement is a powerful leave-one-out analysis framework. The theoretical results are corroborated by our simulation studies.

preprint2022arXiv

Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments

We study offline reinforcement learning under a novel model called strategic MDP, which characterizes the strategic interactions between a principal and a sequence of myopic agents with private types. Due to the bilevel structure and private types, strategic MDP involves information asymmetry between the principal and the agents. We focus on the offline RL problem, where the goal is to learn the optimal policy of the principal concerning a target population of agents based on a pre-collected dataset that consists of historical interactions. The unobserved private types confound such a dataset as they affect both the rewards and observations received by the principal. We propose a novel algorithm, Pessimistic policy Learning with Algorithmic iNstruments (PLAN), which leverages the ideas of instrumental variable regression and the pessimism principle to learn a near-optimal principal's policy in the context of general function approximation. Our algorithm is based on the critical observation that the principal's actions serve as valid instrumental variables. In particular, under a partial coverage assumption on the offline dataset, we prove that PLAN outputs a $1 / \sqrt{K}$-optimal policy with $K$ being the number of collected trajectories. We further apply our framework to some special cases of strategic MDP, including strategic regression, strategic bandit, and noncompliance in recommendation systems.

preprint2022arXiv

The Efficacy of Pessimism in Asynchronous Q-Learning

This paper is concerned with the asynchronous form of Q-learning, which applies a stochastic approximation scheme to Markovian data samples. Motivated by the recent advances in offline reinforcement learning, we develop an algorithmic framework that incorporates the principle of pessimism into asynchronous Q-learning, which penalizes infrequently-visited state-action pairs based on suitable lower confidence bounds (LCBs). This framework leads to, among other things, improved sample efficiency and enhanced adaptivity in the presence of near-expert data. Our approach permits the observed data in some important scenarios to cover only partial state-action space, which is in stark contrast to prior theory that requires uniform coverage of all state-action pairs. When coupled with the idea of variance reduction, asynchronous Q-learning with LCB penalization achieves near-optimal sample complexity, provided that the target accuracy level is small enough. In comparison, prior works were suboptimal in terms of the dependency on the effective horizon even when i.i.d. sampling is permitted. Our results deliver the first theoretical support for the use of pessimism principle in the presence of Markovian non-i.i.d. data.

preprint2021arXiv

Bridging Convex and Nonconvex Optimization in Robust PCA: Noise, Outliers, and Missing Data

This paper delivers improved theoretical guarantees for the convex programming approach in low-rank matrix estimation, in the presence of (1) random noise, (2) gross sparse outliers, and (3) missing data. This problem, often dubbed as robust principal component analysis (robust PCA), finds applications in various domains. Despite the wide applicability of convex relaxation, the available statistical support (particularly the stability analysis vis-à-vis random noise) remains highly suboptimal, which we strengthen in this paper. When the unknown matrix is well-conditioned, incoherent, and of constant rank, we demonstrate that a principled convex program achieves near-optimal statistical accuracy, in terms of both the Euclidean loss and the $\ell_{\infty}$ loss. All of this happens even when nearly a constant fraction of observations are corrupted by outliers with arbitrary magnitudes. The key analysis idea lies in bridging the convex program in use and an auxiliary nonconvex optimization algorithm, and hence the title of this paper.

preprint2021arXiv

Optimal estimation of functionals of high-dimensional mean and covariance matrix

Motivated by portfolio allocation and linear discriminant analysis, we consider estimating a functional $\mathbfμ^T \mathbfΣ^{-1} \mathbfμ$ involving both the mean vector $\mathbfμ$ and covariance matrix $\mathbfΣ$. We study the minimax estimation of the functional in the high-dimensional setting where $\mathbfΣ^{-1} \mathbfμ$ is sparse. Akin to past works on functional estimation, we show that the optimal rate for estimating the functional undergoes a phase transition between regular parametric rate and some form of high-dimensional estimation rate. We further show that the optimal rate is attained by a carefully designed plug-in estimator based on de-biasing, while a family of naive plug-in estimators are proved to fall short. We further generalize the estimation problem and techniques that allow robust inputs of mean and covariance matrix estimators. Extensive numerical experiments lend further supports to our theoretical results.

preprint2021arXiv

The Interplay of Demographic Variables and Social Distancing Scores in Deep Prediction of U.S. COVID-19 Cases

With the severity of the COVID-19 outbreak, we characterize the nature of the growth trajectories of counties in the United States using a novel combination of spectral clustering and the correlation matrix. As the U.S. and the rest of the world are experiencing a severe second wave of infections, the importance of assigning growth membership to counties and understanding the determinants of the growth are increasingly evident. Subsequently, we select the demographic features that are most statistically significant in distinguishing the communities. Lastly, we effectively predict the future growth of a given county with an LSTM using three social distancing scores. This comprehensive study captures the nature of counties' growth in cases at a very micro-level using growth communities, demographic factors, and social distancing performance to help government agencies utilize known information to make appropriate decisions regarding which potential counties to target resources and funding to.

preprint2020arXiv

A Theoretical Analysis of Deep Q-Learning

Despite the great empirical success of deep reinforcement learning, its theoretical foundation is less well understood. In this work, we make the first attempt to theoretically understand the deep Q-network (DQN) algorithm (Mnih et al., 2015) from both algorithmic and statistical perspectives. In specific, we focus on a slight simplification of DQN that fully captures its key features. Under mild assumptions, we establish the algorithmic and statistical rates of convergence for the action-value functions of the iterative policy sequence obtained by DQN. In particular, the statistical error characterizes the bias and variance that arise from approximating the action-value function using deep neural network, while the algorithmic error converges to zero at a geometric rate. As a byproduct, our analysis provides justifications for the techniques of experience replay and target network, which are crucial to the empirical success of DQN. Furthermore, as a simple extension of DQN, we propose the Minimax-DQN algorithm for zero-sum Markov game with two players. Borrowing the analysis of DQN, we also quantify the difference between the policies obtained by Minimax-DQN and the Nash equilibrium of the Markov game in terms of both the algorithmic and statistical rates of convergence.

preprint2020arXiv

Asymmetry Helps: Eigenvalue and Eigenvector Analyses of Asymmetrically Perturbed Low-Rank Matrices

This paper is concerned with the interplay between statistical asymmetry and spectral methods. Suppose we are interested in estimating a rank-1 and symmetric matrix $\mathbf{M}^{\star}\in \mathbb{R}^{n\times n}$, yet only a randomly perturbed version $\mathbf{M}$ is observed. The noise matrix $\mathbf{M}-\mathbf{M}^{\star}$ is composed of zero-mean independent (but not necessarily homoscedastic) entries and is, therefore, not symmetric in general. This might arise, for example, when we have two independent samples for each entry of $\mathbf{M}^{\star}$ and arrange them into an {\em asymmetric} data matrix $\mathbf{M}$. The aim is to estimate the leading eigenvalue and eigenvector of $\mathbf{M}^{\star}$. We demonstrate that the leading eigenvalue of the data matrix $\mathbf{M}$ can be $O(\sqrt{n})$ times more accurate --- up to some log factor --- than its (unadjusted) leading singular value in eigenvalue estimation. Further, the perturbation of any linear form of the leading eigenvector of $\mathbf{M}$ --- say, entrywise eigenvector perturbation --- is provably well-controlled. This eigen-decomposition approach is fully adaptive to heteroscedasticity of noise without the need of careful bias correction or any prior knowledge about the noise variance. We also provide partial theory for the more general rank-$r$ case. The takeaway message is this: arranging the data samples in an asymmetric manner and performing eigen-decomposition could sometimes be beneficial.

preprint2020arXiv

Bootstrapping $\ell_p$-Statistics in High Dimensions

This paper considers a new bootstrap procedure to estimate the distribution of high-dimensional $\ell_p$-statistics, i.e. the $\ell_p$-norms of the sum of $n$ independent $d$-dimensional random vectors with $d \gg n$ and $p \in [1, \infty]$. We provide a non-asymptotic characterization of the sampling distribution of $\ell_p$-statistics based on Gaussian approximation and show that the bootstrap procedure is consistent in the Kolmogorov-Smirnov distance under mild conditions on the covariance structure of the data. As an application of the general theory we propose a bootstrap hypothesis test for simultaneous inference on high-dimensional mean vectors. We establish its asymptotic correctness and consistency under high-dimensional alternatives, and discuss the power of the test as well as the size of associated confidence sets. We illustrate the bootstrap and testing procedure numerically on simulated data.

preprint2020arXiv

Community Network Auto-Regression for High-Dimensional Time Series

Modeling responses on the nodes of a large-scale network is an important task that arises commonly in practice. This paper proposes a community network vector autoregressive (CNAR) model, which utilizes the network structure to characterize the dependence and intra-community homogeneity of the high dimensional time series. The CNAR model greatly increases the flexibility and generality of the network vector autoregressive (Zhu et al, 2017, NAR) model by allowing heterogeneous network effects across different network communities. In addition, the non-community-related latent factors are included to account for unknown cross-sectional dependence. The number of network communities can diverge as the network expands, which leads to estimating a diverging number of model parameters. We obtain a set of stationary conditions and develop an efficient two-step weighted least-squares estimator. The consistency and asymptotic normality properties of the estimators are established. The theoretical results show that the two-step estimator improves the one-step estimator by an order of magnitude when the error admits a factor structure. The advantages of the CNAR model are further illustrated on a variety of synthetic and real datasets.

preprint2020arXiv

Hypothesis testing for eigenspaces of covariance matrix

Eigenspaces of covariance matrices play an important role in statistical machine learning, arising in variety of modern algorithms. Quantitatively, it is convenient to describe the eigenspaces in terms of spectral projectors. This work focuses on hypothesis testing for the spectral projectors, both in one- and two-sample scenario. We present new tests, based on a specific matrix norm developed in order to utilize the structure of the spectral projectors. A new resampling technique of independent interest is introduced and analyzed: it serves as an alternative to the well-known multiplier bootstrap, significantly reducing computational complexity of bootstrap-based methods. We provide theoretical guarantees for the type-I error of our procedures, which remarkably improve the previously obtained results in the field. Moreover, we analyze power of our tests. Numerical experiments illustrate good performance of the proposed methods compared to previously developed ones.

preprint2020arXiv

Learning Latent Factors from Diversified Projections and its Applications to Over-Estimated and Weak Factors

Estimations and applications of factor models often rely on the crucial condition that the number of latent factors is consistently estimated, which in turn also requires that factors be relatively strong, data are stationary and weak serial dependence, and the sample size be fairly large, although in practical applications, one or several of these conditions may fail. In these cases it is difficult to analyze the eigenvectors of the data matrix. To address this issue, we propose simple estimators of the latent factors using cross-sectional projections of the panel data, by weighted averages with pre-determined weights. These weights are chosen to diversify away the idiosyncratic components, resulting in "diversified factors". Because the projections are conducted cross-sectionally, they are robust to serial conditions, easy to analyze and work even for finite length of time series. We formally prove that this procedure is robust to over-estimating the number of factors, and illustrate it in several applications, including post-selection inference, big data forecasts, large covariance estimation and factor specification tests. We also recommend several choices for the diversified weights.

preprint2020arXiv

Recent Developments on Factor Models and its Applications in Econometric Learning

This paper makes a selective survey on the recent development of the factor model and its application on statistical learnings. We focus on the perspective of the low-rank structure of factor models, and particularly draws attentions to estimating the model from the low-rank recovery point of view. The survey mainly consists of three parts: the first part is a review on new factor estimations based on modern techniques on recovering low-rank structures of high-dimensional models. The second part discusses statistical inferences of several factor-augmented models and applications in econometric learning models. The final part summarizes new developments dealing with unbalanced panels from the matrix completion perspective.

preprint2019arXiv

Testability of high-dimensional linear models with non-sparse structures

Understanding statistical inference under possibly non-sparse high-dimensional models has gained much interest recently. For a given component of the regression coefficient, we show that the difficulty of the problem depends on the sparsity of the corresponding row of the precision matrix of the covariates, not the sparsity of the regression coefficients. We develop new concepts of uniform and essentially uniform non-testability that allow the study of limitations of tests across a broad set of alternatives. Uniform non-testability identifies a collection of alternatives such that the power of any test, against any alternative in the group, is asymptotically at most equal to the nominal size. Implications of the new constructions include new minimax testability results that, in sharp contrast to the current results, do not depend on the sparsity of the regression parameters. We identify new tradeoffs between testability and feature correlation. In particular, we show that, in models with weak feature correlations, minimax lower bound can be attained by a test whose power has the $\sqrt{n}$ rate, regardless of the size of the model sparsity.