Researcher profile

Yijun Zuo

Yijun Zuo contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2022arXiv

Asymptotic normality of the least sum of squares of trimmed residuals estimator

To enhance the robustness of the classic least sum of squares (LS) of the residuals estimator, Zuo (2022) introduced the least sum of squares of trimmed (LST) residuals estimator. The LST enjoys many desired properties and serves well as a robust alternative to the LS. Its asymptotic properties, including strong and root-n consistency, have been established whereas the asymptotic normality is left unaddressed. This article solves this remained problem.

preprint2022arXiv

Non-asymptotic analysis and inference for an outlyingness induced winsorized mean

Robust estimation of a mean vector, a topic regarded as obsolete in the traditional robust statistics community, has recently surged in machine learning literature in the last decade. The latest focus is on the sub-Gaussian performance and computability of the estimators in a non-asymptotic setting. Numerous traditional robust estimators are computationally intractable, which partly contributes to the renewal of the interest in the robust mean estimation. Robust centrality estimators, however, include the trimmed mean and the sample median. The latter has the best robustness but suffers a low-efficiency drawback. Trimmed mean and median of means, %as robust alternatives to the sample mean, and achieving sub-Gaussian performance have been proposed and studied in the literature. This article investigates the robustness of leading sub-Gaussian estimators of mean and reveals that none of them can resist greater than $25\%$ contamination in data and consequently introduces an outlyingness induced winsorized mean which has the best possible robustness (can resist up to $50\%$ contamination without breakdown) meanwhile achieving high efficiency. Furthermore, it has a sub-Gaussian performance for uncontaminated samples and a bounded estimation error for contaminated samples at a given confidence level in a finite sample setting. It can be computed in linear time.

preprint2021arXiv

Computation of projection regression depth and its induced median

Notions of depth in regression have been introduced and studied in the literature. The most famous example is Regression Depth (RD), which is a direct extension of location depth to regression. The projection regression depth (PRD) is the extension of another prevailing location depth, the projection depth, to regression. The computation issues of the RD have been discussed in the literature. The computation issues of the PRD have never been dealt with before. The computation issues of the PRD and its induced median (maximum depth estimator) in a regression setting are addressed now. For a given $\bsβ\in\R^p$ exact algorithms for the PRD with cost $O(n^2\log n)$ ($p=2$) and $O(N(n, p)(p^{3}+n\log n+np^{1.5}+npN_{Iter}))$ ($p>2$) and approximate algorithms for the PRD and its induced median with cost respectively $O(N_{\mb{v}}np)$ and $O(Rp N_{\bsβ}(p^2+nN_{\mb{v}}N_{Iter}))$ are proposed. Here $N(n, p)$ is a number defined based on the total number of $(p-1)$ dimensional hyperplanes formed by points induced from sample points and the $\bsβ$; $N_{\mb{v}}$ is the total number of unit directions $\mb{v}$ utilized; $N_{\bsβ}$ is the total number of candidate regression parameters $\bsβ$ employed; $N_{Iter}$ is the total number of iterations carried out in an optimization algorithm; $R$ is the total number of replications. Furthermore, as the second major contribution, three PRD induced estimators, which can be computed up to 30 times faster than that of the PRD induced median while maintaining a similar level of accuracy are introduced. Examples and simulation studies reveal that the depth median induced from the PRD is favorable in terms of robustness and efficiency, compared to the maximum depth estimator induced from the RD, which is the current leading regression median.

preprint2020arXiv

Depth induced regression medians and uniqueness

Notion of median in one dimension is a foundational element in nonparametric statistics. It has been extended to multi-dimensional cases both in location and in regression via notions of data depth. Regression depth (RD) and projection regression depth (PRD) represent the two most promising notions in regression. Carrizosa depth $D_C$ is another depth notion in regression.Depth induced regression medians (maximum depth estimators) serve as robust alternatives to the classical least squares estimator. The uniqueness of regression medians is indispensable in the discussion of their properties and the asymptotics (consistency and limiting distribution) of sample regression medians. Are the regression medians induced from RD, PRD, and $D_C$ unique? Answering this question is the main goal of this article. It is found that only the regression median induced from PRD possesses the desired uniqueness property. The conventional remedy measure for non-uniqueness, taking average of all medians, might yield an estimator that no longer possesses the maximum depth in both RD and $D_C$ cases. These and other findings indicate that the PRD and its induced median are highly favorable among their leading competitors.

preprint2020arXiv

Exact computation of projection regression depth and fast computation of its induced median and other estimators

Zuo (2019) (Z19) addressed the computation of the projection regression depth (PRD) and its induced median (the maximum depth estimator). Z19 achieved the exact computation of PRD via a modified version of regular univariate sample median, which resulted in the loss of invariance of PRD and the equivariance of depth induced median. This article achieves the exact computation without scarifying the invariance of PRD and the equivariance of the regression median. Z19 also addressed the approximate computation of PRD induced median, the naive algorithm in Z19 is very slow. This article modifies the approximation in Z19 and adopts Rcpp package and consequently obtains a much (could be $100$ times) faster algorithm with an even better level of accuracy meanwhile. Furthermore, as the third major contribution, this article introduces three new depth induced estimators which can run $300$ times faster than that of Z19 meanwhile maintaining the same (or a bit better) level of accuracy. Real as well as simulated data examples are presented to illustrate the difference between the algorithms of Z19 and the ones proposed in this article. Findings support the statements above and manifest the major contributions of the article.

preprint2020arXiv

Large sample properties of the regression depth induced median

Notions of depth in regression have been introduced and studied in the literature. Regression depth (RD) of Rousseeuw and Hubert (1999), the most famous one, is a direct extension of Tukey location depth (Tukey (1975)) to regression. Like its location counterpart, the most remarkable advantage of the notion of depth in regression is to directly introduce the maximum (or deepest) regression depth estimator (aka depth induced median) for regression parameters in a multi-dimensional setting. Classical questions for the regression depth induced median include (i) is it a consistent estimator (or rather under what sufficient conditions, it is consistent)? and (ii) is there any limiting distribution? Bai and He (1999) (BH99) pioneered an attempt to answer these questions. Under some stringent conditions on (i) the design points, (ii) the conditional distributions of $y$ given $\bs{x}_i$, and (iii) the error distributions, BH99 proved the strong consistency of the depth induced median. Under another set of conditions, BH99 showed the existence of the limiting distribution of the estimator. This article establishes the strong consistency of the depth induced median without any of the stringent conditions in BH99, and proves the existence of the limiting distribution of the estimator by sufficient conditions and an approach different from BH99.

preprint2020arXiv

On general notions of depth for regression

Depth notions in location have fascinated tremendous attention in the literature. In fact data depth and its applications remain one of the most active research topics in statistics in the last two decades. Most favored notions of depth in location include Tukey (1975) halfspace depth (HD), Liu (1990) simplicial depth, and projection depth (Stahel (1981) and Donoho (1982), Liu (1992), Zuo and Serfling (2000) (ZS00) and Zuo (2003)), among others. Depth notions in regression have also been proposed, sporadically nevertheless. Regression depth (RD) of Rousseeuw and Hubert (1999) (RH99) is the most famous one which is a direct extension of Tukey HD to regression. Others include Carrizosa (1996) and the ones induced from Marrona and Yohai (1993) (MY93) proposed in this article. Is there any relationship between Carrizosa depth and RD of RH99? Do these depth notions possess desirable properties? What are the desirable properties? Can existing notions really serve as depth functions in regression? These questions remain open. Revealing the equivalence between Carrizosa depth and RD of RH99; expanding location depth evaluating criteria in ZS00 for regression depth notions; examining the existing regression notions with respect to the gauges; and proposing the regression counterpart of the eminent projection depth in location are the four major objectives of the article.

preprint2011arXiv

Exactly computing bivariate projection depth contours and median

Among their competitors, projection depth and its induced estimators are very favorable because they can enjoy very high breakdown point robustness without having to pay the price of low efficiency, meanwhile providing a promising center-outward ordering of multi-dimensional data. However, their further applications have been severely hindered due to their computational challenge in practice. In this paper, we derive a simple form of the projection depth function, when (μ, σ) = (Med, MAD). This simple form enables us to extend the existing result of point-wise exact computation of projection depth (PD) of Zuo and Lai (2011) to depth contours and median for bivariate data.

preprint2007arXiv

On the limiting distributions of multivariate depth-based rank sum statistics and related tests

A depth-based rank sum statistic for multivariate data introduced by Liu and Singh [J. Amer. Statist. Assoc. 88 (1993) 252--260] as an extension of the Wilcoxon rank sum statistic for univariate data has been used in multivariate rank tests in quality control and in experimental studies. Those applications, however, are based on a conjectured limiting distribution, provided by Liu and Singh [J. Amer. Statist. Assoc. 88 (1993) 252--260]. The present paper proves the conjecture under general regularity conditions and, therefore, validates various applications of the rank sum statistic in the literature. The paper also shows that the corresponding rank sum tests can be more powerful than Hotelling's T^2 test and some commonly used multivariate rank tests in detecting location-scale changes in multivariate distributions.