Source author record

Lie Wang

Lie Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Machine Learning math.ST Statistics Theory

Catalog footprint

What is connected

7works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2015arXiv

Neyman-Pearson Classification under High-Dimensional Settings

Most existing binary classification methods target on the optimization of the overall classification risk and may fail to serve some real-world applications such as cancer diagnosis, where users are more concerned with the risk of misclassifying one specific class than the other. Neyman-Pearson (NP) paradigm was introduced in this context as a novel statistical framework for handling asymmetric type I/II error priorities. It seeks classifiers with a minimal type II error and a constrained type I error under a user specified level. This article is the first attempt to construct classifiers with guaranteed theoretical performance under the NP paradigm in high-dimensional settings. Based on the fundamental Neyman-Pearson Lemma, we used a plug-in approach to construct NP-type classifiers for Naive Bayes models. The proposed classifiers satisfy the NP oracle inequalities, which are natural NP paradigm counterparts of the oracle inequalities in classical binary classification. Besides their desirable theoretical properties, we also demonstrated their numerical advantages in prioritized error control via both simulation and real data studies.

preprint2014arXiv

Pivotal estimation via square-root Lasso in nonparametric regression

We propose a self-tuning $\sqrt{\mathrm {Lasso}}$ method that simultaneously resolves three important practical problems in high-dimensional regression analysis, namely it handles the unknown scale, heteroscedasticity and (drastic) non-Gaussianity of the noise. In addition, our analysis allows for badly behaved designs, for example, perfectly collinear regressors, and generates sharp bounds even in extreme cases, such as the infinite variance case and the noiseless case, in contrast to Lasso. We establish various nonasymptotic bounds for $\sqrt{\mathrm {Lasso}}$ including prediction norm rate and sparsity. Our analysis is based on new impact factors that are tailored for bounding prediction norm. In order to cover heteroscedastic non-Gaussian noise, we rely on moderate deviation theory for self-normalized sums to achieve Gaussian-like results under weak conditions. Moreover, we derive bounds on the performance of ordinary least square (ols) applied to the model selected by $\sqrt{\mathrm {Lasso}}$ accounting for possible misspecification of the selected model. Under mild conditions, the rate of convergence of ols post $\sqrt{\mathrm {Lasso}}$ is as good as $\sqrt{\mathrm {Lasso}}$'s rate. As an application, we consider the use of $\sqrt{\mathrm {Lasso}}$ and ols post $\sqrt{\mathrm {Lasso}}$ as estimators of nuisance parameters in a generic semiparametric problem (nonlinear moment condition or $Z$-problem), resulting in a construction of $\sqrt{n}$-consistent and asymptotically normal estimators of the main parameters.

preprint2013arXiv

Estimating the Ratio of Two Functions in a Nonparametric Regression Model

Due to measurement noise, a common problem in in various fields is how to estimate the ratio of two functions. We consider this problem of estimating the ratio of two functions in a nonparametric regression model. Assuming the noise is normally distributed, this is equivalent to estimating the ratio of the means of two normally distributed random variables. We identified a consistent estimator that gives the mean squared loss of order $O(1/n)$ ($n$ is the sample size) when conditioned on a highly probable event. We also present our result applied to both the real data from EAPS and on simulated data, confirming our theoretical results.

preprint2013arXiv

Optimization for Compressed Sensing: the Simplex Method and Kronecker Sparsification

In this paper we present two new approaches to efficiently solve large-scale compressed sensing problems. These two ideas are independent of each other and can therefore be used either separately or together. We consider all possibilities. For the first approach, we note that the zero vector can be taken as the initial basic (infeasible) solution for the linear programming problem and therefore, if the true signal is very sparse, some variants of the simplex method can be expected to take only a small number of pivots to arrive at a solution. We implemented one such variant and demonstrate a dramatic improvement in computation time on very sparse signals. The second approach requires a redesigned sensing mechanism in which the vector signal is stacked into a matrix. This allows us to exploit the Kronecker compressed sensing (KCS) mechanism. We show that the Kronecker sensing requires stronger conditions for perfect recovery compared to the original vector problem. However, the Kronecker sensing, modeled correctly, is a much sparser linear optimization problem. Hence, algorithms that benefit from sparse problem representation, such as interior-point methods, can solve the Kronecker sensing problems much faster than the corresponding vector problem. In our numerical studies, we demonstrate a ten-fold improvement in the computation time.

preprint2012arXiv

L1 penalized LAD estimator for high dimensional linear

In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables is larger than the number of observations. We investigate the L1 penalized least absolute deviation method. Different from most of other methods, the L1 penalized LAD method does not need any knowledge of standard deviation of the noises or any moment assumptions of the noises. Our analysis shows that the method achieves near oracle performance, i.e. with large probability, the L2 norm of the estimation error is of order $O(\sqrt{k \log p/n})$. The result is true for a wide range of noise distributions, even for the Cauchy distribution. Numerical results are also presented.

preprint2012arXiv

TIGER: A Tuning-Insensitive Approach for Optimally Estimating Gaussian Graphical Models

We propose a new procedure for estimating high dimensional Gaussian graphical models. Our approach is asymptotically tuning-free and non-asymptotically tuning-insensitive: it requires very few efforts to choose the tuning parameter in finite sample settings. Computationally, our procedure is significantly faster than existing methods due to its tuning-insensitive property. Theoretically, the obtained estimator is simultaneously minimax optimal for precision matrix estimation under different norms. Empirically, we illustrate the advantages of our method using thorough simulated and real examples. The R package bigmatrix implementing the proposed methods is available on the Comprehensive R Archive Network: http://cran.r-project.org/.

preprint2011arXiv

Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming

We propose a pivotal method for estimating high-dimensional sparse linear regression models, where the overall number of regressors $p$ is large, possibly much larger than $n$, but only $s$ regressors are significant. The method is a modification of the lasso, called the square-root lasso. The method is pivotal in that it neither relies on the knowledge of the standard deviation $σ$ or nor does it need to pre-estimate $σ$. Moreover, the method does not rely on normality or sub-Gaussianity of noise. It achieves near-oracle performance, attaining the convergence rate $σ\{(s/n)\log p\}^{1/2}$ in the prediction norm, and thus matching the performance of the lasso with known $σ$. These performance results are valid for both Gaussian and non-Gaussian errors, under some mild moment restrictions. We formulate the square-root lasso as a solution to a convex conic programming problem, which allows us to implement the estimator using efficient algorithmic methods, such as interior-point and first-order methods.

Lie Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Neyman-Pearson Classification under High-Dimensional Settings

Pivotal estimation via square-root Lasso in nonparametric regression

Estimating the Ratio of Two Functions in a Nonparametric Regression Model

Optimization for Compressed Sensing: the Simplex Method and Kronecker Sparsification

L1 penalized LAD estimator for high dimensional linear

TIGER: A Tuning-Insensitive Approach for Optimally Estimating Gaussian Graphical Models

Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming