Researcher profile

Pengsheng Ji

Pengsheng Ji contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2023arXiv

Rate optimal multiple testing procedure in high-dimensional regression

In the high dimensional regression analysis when the number of predictors is much larger than the sample size, an important question is to select the important variable which are relevant to the response variable of interest. Variable selection and the multiple testing are both tools to address this issue. However, there is little discussion on the connection of these two areas. When the signal strength is strong enough such that the selection consistency is achievable, it seems to be unnecessary to control the false discovery rate. In this paper, we consider the regime where the signals are both rare and weak such that the selection consistency is not achievable and propose a method which controls the false discovery rate asymptotically. It is theoretically shown that the false non-discovery rate of the proposed method converges to zero at the optimal rate. Numerical results are provided to demonstrate the advantage of the proposed method.

preprint2022arXiv

Co-citation and Co-authorship Networks of Statisticians

We collected and cleaned a large data set on publications in statistics. The data set consists of the coauthor relationships and citation relationships of 83, 331 papers published in 36 representative journals in statistics, probability, and machine learning, spanning 41 years. The data set allows us to construct many different networks, and motivates a number of research problems about the research patterns and trends, research impacts, and network topology of the statistics community. In this paper we focus on (i) using the citation relationships to estimate the research interests of authors, and (ii) using the coauthor relationships to study the network topology. Using co-citation networks we constructed, we discover a "statistics triangle", reminiscent of the statistical philosophy triangle (Efron, 1998). We propose new approaches to constructing the "research map" of statisticians, as well as the "research trajectory" for a given author to visualize his/her research interest evolvement. Using co-authorship networks we constructed, we discover a multi-layer community tree and produce a Sankey diagram to visualize the author migrations in different sub-areas. We also propose several new metrics for research diversity of individual authors. We find that "Bayes", "Biostatistics", and "Nonparametric" are three primary areas in statistics. We also identify 15 sub-areas, each of which can be viewed as a weighted average of the primary areas, and identify several underlying reasons for the formation of co-authorship communities. We also find that the research interests of statisticians have evolved significantly in the 41-year time window we studied: some areas (e.g., biostatistics, high-dimensional data analysis, etc.) have become increasingly more popular.

preprint2012arXiv

Sharp adaptive nonparametric testing for Sobolev ellipsoids

We consider testing for presence of a signal in Gaussian white noise with intensity 1/sqrt(n), when the alternatives are given by smoothness ellipsoids with an L2-ball of (squared) radius rho removed. It is known that, for a fixed Sobolev type ellipsoid of smoothness beta and size M, a rho which is of order n to the power -4 beta/(4 beta+1)} is the critical separation rate, in the sense that the minimax error of second kind over alpha-tests stays asymptotically between 0 and 1 strictly (Ingster, 1982). In addition, Ermakov (1990) found the sharp asymptotics of the minimax error of second kind at the separation rate. For adaptation over both beta and M in that context, it is known that a loglog-penalty over the separation rate for rho is necessary for a nonzero asymptotic power. Here, following an example in nonparametric estimation related to the Pinsker constant, we investigate the adaptation problem over the ellipsoid size M only, for fixed smoothness degree beta. It is established that the sharp risk asymptotics can be replicated in that adaptive setting, if rho tends to zero slower than the separation rate. The penalty for adaptation here turns out to be a sequence tending to infinity arbitrarily slowly.

preprint2012arXiv

UPS delivers optimal phase diagram in high-dimensional variable selection

Consider a linear model $Y=Xβ+z$, $z\sim N(0,I_n)$. Here, $X=X_{n,p}$, where both $p$ and $n$ are large, but $p>n$. We model the rows of $X$ as i.i.d. samples from $N(0,\frac{1}{n}Ω)$, where $Ω$ is a $p\times p$ correlation matrix, which is unknown to us but is presumably sparse. The vector $β$ is also unknown but has relatively few nonzero coordinates, and we are interested in identifying these nonzeros. We propose the Univariate Penalization Screeing (UPS) for variable selection. This is a screen and clean method where we screen with univariate thresholding and clean with penalized MLE. It has two important properties: sure screening and separable after screening. These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately. The UPS is effective both in theory and in computation.