Source author record

Weidong Liu

Weidong Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Machine Learning math.ST Statistics Theory Applications Computation Information Retrieval math.PR

Catalog footprint

What is connected

18works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Efficient Sparse Least Absolute Deviation Regression with Differential Privacy

In recent years, privacy-preserving machine learning algorithms have attracted increasing attention because of their important applications in many scientific fields. However, in the literature, most privacy-preserving algorithms demand learning objectives to be strongly convex and Lipschitz smooth, which thus cannot cover a wide class of robust loss functions (e.g., quantile/least absolute loss). In this work, we aim to develop a fast privacy-preserving learning solution for a sparse robust regression problem. Our learning loss consists of a robust least absolute loss and an $\ell_1$ sparse penalty term. To fast solve the non-smooth loss under a given privacy budget, we develop a Fast Robust And Privacy-Preserving Estimation (FRAPPE) algorithm for least absolute deviation regression. Our algorithm achieves a fast estimation by reformulating the sparse LAD problem as a penalized least square estimation problem and adopts a three-stage noise injection to guarantee the $(ε,δ)$-differential privacy. We show that our algorithm can achieve better privacy and statistical accuracy trade-off compared with the state-of-the-art privacy-preserving regression algorithms. In the end, we conduct experiments to verify the efficiency of our proposed FRAPPE algorithm.

preprint2022arXiv

Fast and Robust Sparsity Learning over Networks: A Decentralized Surrogate Median Regression Approach

Decentralized sparsity learning has attracted a significant amount of attention recently due to its rapidly growing applications. To obtain the robust and sparse estimators, a natural idea is to adopt the non-smooth median loss combined with a $\ell_1$ sparsity regularizer. However, most of the existing methods suffer from slow convergence performance caused by the {\em double} non-smooth objective. To accelerate the computation, in this paper, we proposed a decentralized surrogate median regression (deSMR) method for efficiently solving the decentralized sparsity learning problem. We show that our proposed algorithm enjoys a linear convergence rate with a simple implementation. We also investigate the statistical guarantee, and it shows that our proposed estimator achieves a near-oracle convergence rate without any restriction on the number of network nodes. Moreover, we establish the theoretical results for sparse support recovery. Thorough numerical experiments and real data study are provided to demonstrate the effectiveness of our method.

preprint2021arXiv

First-order Newton-type Estimator for Distributed Estimation and Inference

This paper studies distributed estimation and inference for a general statistical problem with a convex loss that could be non-differentiable. For the purpose of efficient computation, we restrict ourselves to stochastic first-order optimization, which enjoys low per-iteration complexity. To motivate the proposed method, we first investigate the theoretical properties of a straightforward Divide-and-Conquer Stochastic Gradient Descent (DC-SGD) approach. Our theory shows that there is a restriction on the number of machines and this restriction becomes more stringent when the dimension $p$ is large. To overcome this limitation, this paper proposes a new multi-round distributed estimation procedure that approximates the Newton step only using stochastic subgradient. The key component in our method is the proposal of a computationally efficient estimator of $Σ^{-1} w$, where $Σ$ is the population Hessian matrix and $w$ is any given vector. Instead of estimating $Σ$ (or $Σ^{-1}$) that usually requires the second-order differentiability of the loss, the proposed First-Order Newton-type Estimator (FONE) directly estimates the vector of interest $Σ^{-1} w$ as a whole and is applicable to non-differentiable losses. Our estimator also facilitates the inference for the empirical risk minimizer. It turns out that the key term in the limiting covariance has the form of $Σ^{-1} w$, which can be estimated by FONE.

preprint2021arXiv

Variance Reduced Median-of-Means Estimator for Byzantine-Robust Distributed Inference

This paper develops an efficient distributed inference algorithm, which is robust against a moderate fraction of Byzantine nodes, namely arbitrary and possibly adversarial machines in a distributed learning system. In robust statistics, the median-of-means (MOM) has been a popular approach to hedge against Byzantine failures due to its ease of implementation and computational efficiency. However, the MOM estimator has the shortcoming in terms of statistical efficiency. The first main contribution of the paper is to propose a variance reduced median-of-means (VRMOM) estimator, which improves the statistical efficiency over the vanilla MOM estimator and is computationally as efficient as the MOM. Based on the proposed VRMOM estimator, we develop a general distributed inference algorithm that is robust against Byzantine failures. Theoretically, our distributed algorithm achieves a fast convergence rate with only a constant number of rounds of communications. We also provide the asymptotic normality result for the purpose of statistical inference. To the best of our knowledge, this is the first normality result in the setting of Byzantine-robust distributed learning. The simulation results are also presented to illustrate the effectiveness of our method.

preprint2020arXiv

Distributed High-dimensional Regression Under a Quantile Loss Function

This paper studies distributed estimation and support recovery for high-dimensional linear regression model with heavy-tailed noise. To deal with heavy-tailed noise whose variance can be infinite, we adopt the quantile regression loss function instead of the commonly used squared loss. However, the non-smooth quantile loss poses new challenges to high-dimensional distributed estimation in both computation and theoretical development. To address the challenge, we transform the response variable and establish a new connection between quantile regression and ordinary linear regression. Then, we provide a distributed estimator that is both computationally and communicationally efficient, where only the gradient information is communicated at each iteration. Theoretically, we show that, after a constant number of iterations, the proposed estimator achieves a near-oracle convergence rate without any restriction on the number of machines. Moreover, we establish the theoretical guarantee for the support recovery. The simulation analysis is provided to demonstrate the effectiveness of our method.

preprint2020arXiv

Median Matrix Completion: from Embarrassment to Optimality

In this paper, we consider matrix completion with absolute deviation loss and obtain an estimator of the median matrix. Despite several appealing properties of median, the non-smooth absolute deviation loss leads to computational challenge for large-scale data sets which are increasingly common among matrix completion problems. A simple solution to large-scale problems is parallel computing. However, embarrassingly parallel fashion often leads to inefficient estimators. Based on the idea of pseudo data, we propose a novel refinement step, which turns such inefficient estimators into a rate (near-)optimal matrix completion procedure. The refined estimator is an approximation of a regularized least median estimator, and therefore not an ordinary regularized empirical risk estimator. This leads to a non-standard analysis of asymptotic behaviors. Empirical results are also provided to confirm the effectiveness of the proposed method.

preprint2020arXiv

Neural Interactive Collaborative Filtering

In this paper, we study collaborative filtering in an interactive setting, in which the recommender agents iterate between making recommendations and updating the user profile based on the interactive feedback. The most challenging problem in this scenario is how to suggest items when the user profile has not been well established, i.e., recommend for cold-start users or warm-start users with taste drifting. Existing approaches either rely on overly pessimistic linear exploration strategy or adopt meta-learning based algorithms in a full exploitation way. In this work, to quickly catch up with the user's interests, we propose to represent the exploration policy with a neural network and directly learn it from the feedback data. Specifically, the exploration policy is encoded in the weights of multi-channel stacked self-attention neural networks and trained with efficient Q-learning by maximizing users' overall satisfaction in the recommender systems. The key insight is that the satisfied recommendations triggered by the exploration recommendation can be viewed as the exploration bonus (delayed reward) for its contribution on improving the quality of the user profile. Therefore, the proposed exploration policy, to balance between learning the user profile and making accurate recommendations, can be directly optimized by maximizing users' long-term satisfaction with reinforcement learning. Extensive experiments and analysis conducted on three benchmark collaborative filtering datasets have demonstrated the advantage of our method over state-of-the-art methods.

preprint2016arXiv

Fast and Adaptive Sparse Precision Matrix Estimation in High Dimensions

This paper proposes a new method for estimating sparse precision matrices in the high dimensional setting. It has been popular to study fast computation and adaptive procedures for this problem. We propose a novel approach, called Sparse Column-wise Inverse Operator, to address these two issues. We analyze an adaptive procedure based on cross validation, and establish its convergence rate under the Frobenius norm. The convergence rates under other matrix norms are also established. This method also enjoys the advantage of fast computation for large-scale problems, via a coordinate descent algorithm. Numerical merits are illustrated using both simulated and real datasets. In particular, it performs favorably on an HIV brain tissue dataset and an ADHD resting-state fMRI dataset.

preprint2014arXiv

Incorporation of Sparsity Information in Large-scale Multiple Two-sample $t$ Tests

Large-scale multiple two-sample {\em Student}'s $t$ testing problems often arise from the statistical analysis of scientific data. To detect components with different values between two mean vectors, a well-known procedure is to apply the Benjamini and Hochberg (B-H) method and two-sample {\em Student}'s $t$ statistics to control the false discovery rate (FDR). In many applications, mean vectors are expected to be sparse or asymptotically sparse. When dealing with such type of data, {\em can we gain more power than the standard procedure such as the B-H method with Student's $t$ statistics while keeping the FDR under control?} The answer is positive. By exploiting the possible sparsity information in mean vectors, we present an uncorrelated screening-based (US) FDR control procedure, which is shown to be more powerful than the B-H method. The US testing procedure depends on a novel construction of screening statistics, which are asymptotically uncorrelated with two-sample {\em Student}'s $t$ statistics. The US testing procedure is different from some existing {\em testing following screening} methods (Reiner, et al., 2007; Yekutieli, 2008) in which independence between screening and testing is crucial to control the FDR, while the independence often requires additional data or splitting of samples. An inappropriate splitting of samples may result in a loss rather than an improvement of statistical power. Instead, the uncorrelated screening US is based on the original data and does not need to split the samples. Theoretical results show that the US testing procedure controls the desired FDR asymptotically. Numerical studies are conducted and indicate that the proposed procedure works quite well.

preprint2014arXiv

Komlós-Major-Tusnády approximation under dependence

The celebrated results of Komlós, Major and Tusnády [Z. Wahrsch. Verw. Gebiete 32 (1975) 111-131; Z. Wahrsch. Verw. Gebiete 34 (1976) 33-58] give optimal Wiener approximation for the partial sums of i.i.d. random variables and provide a powerful tool in probability and statistics. In this paper we extend KMT approximation for a large class of dependent stationary processes, solving a long standing open problem in probability theory. Under the framework of stationary causal processes and functional dependence measures of Wu [Proc. Natl. Acad. Sci. USA 102 (2005) 14150-14154], we show that, under natural moment conditions, the partial sum processes can be approximated by Wiener process with an optimal rate. Our dependence conditions are mild and easily verifiable. The results are applied to ergodic sums, as well as to nonlinear time series and Volterra processes, an important class of nonlinear processes.

preprint2013arXiv

A Cramér moderate deviation theorem for Hotelling's $T^2$-statistic with applications to global tests

A Cramer moderate deviation theorem for Hotelling's $T^2$-statistic is proved under a finite $(3+δ)$th moment. The result is applied to large scale tests on the equality of mean vectors and is shown that the number of tests can be as large as $e^{o(n^{1/3})}$ before the chi-squared distribution calibration becomes inaccurate. As an application of the moderate deviation results, a global test on the equality of m mean vectors based on the maximum of Hotelling's $T^2$-statistics is developed and its asymptotic null distribution is shown to be an extreme value type I distribution. A novel intermediate approximation to the null distribution is proposed to improve the slow convergence rate of the extreme distribution approximation. Numerical studies show that the new test procedure works well even for a small sample size and performs favorably in analyzing a breast cancer dataset.

preprint2013arXiv

Gaussian Graphical Model Estimation with False Discovery Rate Control

This paper studies the estimation of high dimensional Gaussian graphical model (GGM). Typically, the existing methods depend on regularization techniques. As a result, it is necessary to choose the regularized parameter. However, the precise relationship between the regularized parameter and the number of false edges in GGM estimation is unclear. Hence, it is impossible to evaluate their performance rigorously. In this paper, we propose an alternative method by a multiple testing procedure. Based on our new test statistics for conditional dependence, we propose a simultaneous testing procedure for conditional dependence in GGM. Our method can control the false discovery rate (FDR) asymptotically. The numerical performance of the proposed method shows that our method works quite well.

preprint2013arXiv

Phase Transition and Regularized Bootstrap in Large Scale $t$-tests with False Discovery Rate Control

Applying Benjamini and Hochberg (B-H) method to multiple Student's $t$ tests is a popular technique in gene selection in microarray data analysis. Because of the non-normality of the population, the true p-values of the hypothesis tests are typically unknown. Hence, it is common to use the standard normal distribution N(0,1), Student's $t$ distribution $t_{n-1}$ or the bootstrap method to estimate the p-values. In this paper, we first study N(0,1) and $t_{n-1}$ calibrations. We prove that, when the population has the finite 4-th moment and the dimension $m$ and the sample size $n$ satisfy $\log m=o(n^{1/3})$, B-H method controls the false discovery rate (FDR) at a given level $α$ asymptotically with p-values estimated from N(0,1) or $t_{n-1}$ distribution. However, a phase transition phenomenon occurs when $\log m\geq c_{0}n^{1/3}$. In this case, the FDR of B-H method may be larger than $α$ or even tends to one. In contrast, the bootstrap calibration is accurate for $\log m=o(n^{1/2})$ as long as the underlying distribution has the sub-Gaussian tails. However, such light tailed condition can not be weakened in general. The simulation study shows that for the heavy tailed distributions, the bootstrap calibration is very conservative. In order to solve this problem, a regularized bootstrap correction is proposed and is shown to be robust to the tails of the distributions. The simulation study shows that the regularized bootstrap method performs better than the usual bootstrap method.

preprint2013arXiv

Self-normalized Cramér type moderate deviations for the maximum of sums

Let $X_1,X_2,...$ be independent random variables with zero means and finite variances, and let $S_n=\sum_{i=1}^nX_i$ and $V^2_n=\sum_{i=1}^nX^2_i$. A Cramér type moderate deviation for the maximum of the self-normalized sums $\max_{1\leq k\leq n}S_k/V_n$ is obtained. In particular, for identically distributed $X_1,X_2,...,$ it is proved that $P(\max_{1\leq k\leq n}S_k\geq xV_n)/(1-Φ(x))\rightarrow2$ uniformly for $0<x\leq\mathrm{o}(n^{1/6})$ under the optimal finite third moment of $X_1$.

preprint2012arXiv

Estimating Sparse Precision Matrix: Optimal Rates of Convergence and Adaptive Estimation

Precision matrix is of significant importance in a wide range of applications in multivariate analysis. This paper considers adaptive minimax estimation of sparse precision matrices in the high dimensional setting. Optimal rates of convergence are established for a range of matrix norm losses. A fully data driven estimator based on adaptive constrained $\ell_1$ minimization is proposed and its rate of convergence is obtained over a collection of parameter spaces. The estimator, called ACLIME, is easy to implement and performs well numerically. A major step in establishing the minimax rate of convergence is the derivation of a rate-sharp lower bound. A "two-directional" lower bound technique is applied to obtain the minimax lower bound. The upper and lower bounds together yield the optimal rates ofconvergence for sparse precision matrix estimation and show that the ACLIME estimator is adaptively minimax rate optimal for a collection of parameter spaces and a range of matrix norm losses simultaneously.

preprint2011arXiv

A Direct Estimation Approach to Sparse Linear Discriminant Analysis

This paper considers sparse linear discriminant analysis of high-dimensional data. In contrast to the existing methods which are based on separate estimation of the precision matrix $Ø$ and the difference $\de$ of the mean vectors, we introduce a simple and effective classifier by estimating the product $Ø\de$ directly through constrained $\ell_1$ minimization. The estimator can be implemented efficiently using linear programming and the resulting classifier is called the linear programming discriminant (LPD) rule. The LPD rule is shown to have desirable theoretical and numerical properties. It exploits the approximate sparsity of $Ø\de$ and as a consequence allows cases where it can still perform well even when $Ø$ and/or $\de$ cannot be estimated consistently. Asymptotic properties of the LPD rule are investigated and consistency and rate of convergence results are given. The LPD classifier has superior finite sample performance and significant computational advantages over the existing methods that require separate estimation of $Ø$ and $\de$. The LPD rule is also applied to analyze real datasets from lung cancer and leukemia studies. The classifier performs favorably in comparison to existing methods.

preprint2011arXiv

On non-stationary threshold autoregressive models

In this paper we study the limiting distributions of the least-squares estimators for the non-stationary first-order threshold autoregressive (TAR(1)) model. It is proved that the limiting behaviors of the TAR(1) process are very different from those of the classical unit root model and the explosive AR(1).

preprint2010arXiv

Simultaneous nonparametric inference of time series

We consider kernel estimation of marginal densities and regression functions of stationary processes. It is shown that for a wide class of time series, with proper centering and scaling, the maximum deviations of kernel density and regression estimates are asymptotically Gumbel. Our results substantially generalize earlier ones which were obtained under independence or beta mixing assumptions. The asymptotic results can be applied to assess patterns of marginal densities or regression functions via the construction of simultaneous confidence bands for which one can perform goodness-of-fit tests. As an application, we construct simultaneous confidence bands for drift and volatility functions in a dynamic short-term rate model for the U.S. Treasury yield curve rates data.

Weidong Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Efficient Sparse Least Absolute Deviation Regression with Differential Privacy

Fast and Robust Sparsity Learning over Networks: A Decentralized Surrogate Median Regression Approach

First-order Newton-type Estimator for Distributed Estimation and Inference

Variance Reduced Median-of-Means Estimator for Byzantine-Robust Distributed Inference

Distributed High-dimensional Regression Under a Quantile Loss Function

Median Matrix Completion: from Embarrassment to Optimality

Neural Interactive Collaborative Filtering

Fast and Adaptive Sparse Precision Matrix Estimation in High Dimensions

Incorporation of Sparsity Information in Large-scale Multiple Two-sample $t$ Tests

Komlós-Major-Tusnády approximation under dependence

A Cramér moderate deviation theorem for Hotelling's $T^2$-statistic with applications to global tests

Gaussian Graphical Model Estimation with False Discovery Rate Control

Phase Transition and Regularized Bootstrap in Large Scale $t$-tests with False Discovery Rate Control

Self-normalized Cramér type moderate deviations for the maximum of sums

Estimating Sparse Precision Matrix: Optimal Rates of Convergence and Adaptive Estimation

A Direct Estimation Approach to Sparse Linear Discriminant Analysis

On non-stationary threshold autoregressive models

Simultaneous nonparametric inference of time series