Source author record

Pengsheng Ji

Pengsheng Ji appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Digital Libraries Methodology

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Rate optimal multiple testing procedure in high-dimensional regression

In the high dimensional regression analysis when the number of predictors is much larger than the sample size, an important question is to select the important variable which are relevant to the response variable of interest. Variable selection and the multiple testing are both tools to address this issue. However, there is little discussion on the connection of these two areas. When the signal strength is strong enough such that the selection consistency is achievable, it seems to be unnecessary to control the false discovery rate. In this paper, we consider the regime where the signals are both rare and weak such that the selection consistency is not achievable and propose a method which controls the false discovery rate asymptotically. It is theoretically shown that the false non-discovery rate of the proposed method converges to zero at the optimal rate. Numerical results are provided to demonstrate the advantage of the proposed method.

preprint2022arXiv

Co-citation and Co-authorship Networks of Statisticians

We collected and cleaned a large data set on publications in statistics. The data set consists of the coauthor relationships and citation relationships of 83, 331 papers published in 36 representative journals in statistics, probability, and machine learning, spanning 41 years. The data set allows us to construct many different networks, and motivates a number of research problems about the research patterns and trends, research impacts, and network topology of the statistics community. In this paper we focus on (i) using the citation relationships to estimate the research interests of authors, and (ii) using the coauthor relationships to study the network topology. Using co-citation networks we constructed, we discover a "statistics triangle", reminiscent of the statistical philosophy triangle (Efron, 1998). We propose new approaches to constructing the "research map" of statisticians, as well as the "research trajectory" for a given author to visualize his/her research interest evolvement. Using co-authorship networks we constructed, we discover a multi-layer community tree and produce a Sankey diagram to visualize the author migrations in different sub-areas. We also propose several new metrics for research diversity of individual authors. We find that "Bayes", "Biostatistics", and "Nonparametric" are three primary areas in statistics. We also identify 15 sub-areas, each of which can be viewed as a weighted average of the primary areas, and identify several underlying reasons for the formation of co-authorship communities. We also find that the research interests of statisticians have evolved significantly in the 41-year time window we studied: some areas (e.g., biostatistics, high-dimensional data analysis, etc.) have become increasingly more popular.

preprint2012arXiv

Sharp adaptive nonparametric testing for Sobolev ellipsoids

We consider testing for presence of a signal in Gaussian white noise with intensity 1/sqrt(n), when the alternatives are given by smoothness ellipsoids with an L2-ball of (squared) radius rho removed. It is known that, for a fixed Sobolev type ellipsoid of smoothness beta and size M, a rho which is of order n to the power -4 beta/(4 beta+1)} is the critical separation rate, in the sense that the minimax error of second kind over alpha-tests stays asymptotically between 0 and 1 strictly (Ingster, 1982). In addition, Ermakov (1990) found the sharp asymptotics of the minimax error of second kind at the separation rate. For adaptation over both beta and M in that context, it is known that a loglog-penalty over the separation rate for rho is necessary for a nonzero asymptotic power. Here, following an example in nonparametric estimation related to the Pinsker constant, we investigate the adaptation problem over the ellipsoid size M only, for fixed smoothness degree beta. It is established that the sharp risk asymptotics can be replicated in that adaptive setting, if rho tends to zero slower than the separation rate. The penalty for adaptation here turns out to be a sequence tending to infinity arbitrarily slowly.

preprint2012arXiv

UPS delivers optimal phase diagram in high-dimensional variable selection

Consider a linear model $Y=Xβ+z$, $z\sim N(0,I_n)$. Here, $X=X_{n,p}$, where both $p$ and $n$ are large, but $p>n$. We model the rows of $X$ as i.i.d. samples from $N(0,\frac{1}{n}Ω)$, where $Ω$ is a $p\times p$ correlation matrix, which is unknown to us but is presumably sparse. The vector $β$ is also unknown but has relatively few nonzero coordinates, and we are interested in identifying these nonzeros. We propose the Univariate Penalization Screeing (UPS) for variable selection. This is a screen and clean method where we screen with univariate thresholding and clean with penalized MLE. It has two important properties: sure screening and separable after screening. These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately. The UPS is effective both in theory and in computation.

Pengsheng Ji

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Rate optimal multiple testing procedure in high-dimensional regression

Co-citation and Co-authorship Networks of Statisticians

Sharp adaptive nonparametric testing for Sobolev ellipsoids

UPS delivers optimal phase diagram in high-dimensional variable selection