Source author record

Xiaofeng Shao

Xiaofeng Shao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology math.ST Statistics Theory Applications Computation econ.EM physics.soc-ph

Catalog footprint

What is connected

14works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Robust Inference for Change Points in High Dimension

This paper proposes a new test for a change point in the mean of high-dimensional data based on the spatial sign and self-normalization. The test is easy to implement with no tuning parameters, robust to heavy-tailedness and theoretically justified with both fixed-$n$ and sequential asymptotics under both null and alternatives, where $n$ is the sample size. We demonstrate that the fixed-$n$ asymptotics provide a better approximation to the finite sample distribution and thus should be preferred in both testing and testing-based estimation. To estimate the number and locations when multiple change-points are present, we propose to combine the p-value under the fixed-$n$ asymptotics with the seeded binary segmentation (SBS) algorithm. Through numerical experiments, we show that the spatial sign based procedures are robust with respect to the heavy-tailedness and strong coordinate-wise dependence, whereas their non-robust counterparts proposed in Wang et al. (2022) appear to under-perform. A real data example is also provided to illustrate the robustness and broad applicability of the proposed test and its corresponding estimation algorithm.

preprint2022arXiv

Segmenting Time Series via Self-Normalization

We propose a novel and unified framework for change-point estimation in multivariate time series. The proposed method is fully nonparametric, enjoys effortless tuning and is robust to temporal dependence. One salient and distinct feature of the proposed method is its versatility, where it allows change-point detection for a broad class of parameters (such as mean, variance, correlation and quantile) in a unified fashion. At the core of our method, we couple the self-normalization (SN) based tests with a novel nested local-window segmentation algorithm, which seems new in the growing literature of change-point analysis. Due to the presence of an inconsistent long-run variance estimator in the SN test, non-standard theoretical arguments are further developed to derive the consistency and convergence rate of the proposed SN-based change-point detection method. Extensive numerical experiments and relevant real data analysis are conducted to illustrate the effectiveness and broad applicability of our proposed method in comparison with state-of-the-art approaches in the literature.

preprint2021arXiv

Adaptive Change Point Monitoring for High-Dimensional Data

In this paper, we propose a class of monitoring statistics for a mean shift in a sequence of high-dimensional observations. Inspired by the recent U-statistic based retrospective tests developed by Wang et al.(2019) and Zhang et al.(2020), we advance the U-statistic based approach to the sequential monitoring problem by developing a new adaptive monitoring procedure that can detect both dense and sparse changes in real-time. Unlike Wang et al.(2019) and Zhang et al.(2020), where self-normalization was used in their tests, we instead introduce a class of estimators for $q$-norm of the covariance matrix and prove their ratio consistency. To facilitate fast computation, we further develop recursive algorithms to improve the computational efficiency of the monitoring procedure. The advantage of the proposed methodology is demonstrated via simulation studies and real data illustrations.

preprint2021arXiv

Adaptive Inference for Change Points in High-Dimensional Data

In this article, we propose a class of test statistics for a change point in the mean of high-dimensional independent data. Our test integrates the U-statistic based approach in a recent work by \cite{hdcp} and the $L_q$-norm based high-dimensional test in \cite{he2018}, and inherits several appealing features such as being tuning parameter free and asymptotic independence for test statistics corresponding to even $q$s. A simple combination of test statistics corresponding to several different $q$s leads to a test with adaptive power property, that is, it can be powerful against both sparse and dense alternatives. On the estimation front, we obtain the convergence rate of the maximizer of our test statistic standardized by sample size when there is one change-point in mean and $q=2$, and propose to combine our tests with a wild binary segmentation (WBS) algorithm to estimate the change-point number and locations when there are multiple change-points. Numerical comparisons using both simulated and real data demonstrate the advantage of our adaptive test and its corresponding estimation method.

preprint2020arXiv

Dating the Break in High-dimensional Data

This paper is concerned with estimation and inference for the location of a change point in the mean of independent high-dimensional data. Our change point location estimator maximizes a new U-statistic based objective function, and its convergence rate and asymptotic distribution after suitable centering and normalization are obtained under mild assumptions. Our estimator turns out to have better efficiency as compared to the least squares based counterpart in the literature. Based on the asymptotic theory, we construct a confidence interval by plugging in consistent estimates of several quantities in the normalization. We also provide a bootstrap-based confidence interval and state its asymptotic validity under suitable conditions. Through simulation studies, we demonstrate favorable finite sample performance of the new change point location estimator as compared to its least squares based counterpart, and our bootstrap-based confidence intervals, as compared to several existing competitors. The asymptotic theory based on high-dimensional U-statistic is substantially different from those developed in the literature and is of independent interest.

preprint2020arXiv

Interpoint Distance Based Two Sample Tests in High Dimension

In this paper, we study a class of two sample test statistics based on inter-point distances in the high dimensional and low sample size setting. Our test statistics include the well-known energy distance and maximum mean discrepancy with Gaussian and Laplacian kernels, and the critical values are obtained via permutations. We show that all these tests are inconsistent when the two high dimensional distributions correspond to the same marginal distributions but differ in other aspects of the distributions. The tests based on energy distance and maximum mean discrepancy are mainly targeting the differences between marginal means and variances, whereas the test based on $L^1$-distance can capture the difference in marginal distributions. Our theory sheds new light on the limitation of inter-point distance based tests, the impact of different distance metrics, and the behavior of permutation tests in high dimension. Some simulation results and a real data illustration are also presented to corroborate our theoretical findings.

preprint2020arXiv

Time Series Analysis of COVID-19 Infection Curve: A Change-Point Perspective

In this paper, we model the trajectory of the cumulative confirmed cases and deaths of COVID-19 (in log scale) via a piecewise linear trend model. The model naturally captures the phase transitions of the epidemic growth rate via change-points and further enjoys great interpretability due to its semiparametric nature. On the methodological front, we advance the nascent self-normalization (SN) technique (Shao, 2010) to testing and estimation of a single change-point in the linear trend of a nonstationary time series. We further combine the SN-based change-point test with the NOT algorithm (Baranowski et al., 2019) to achieve multiple change-point estimation. Using the proposed method, we analyze the trajectory of the cumulative COVID-19 cases and deaths for 30 major countries and discover interesting patterns with potentially relevant implications for effectiveness of the pandemic responses by different countries. Furthermore, based on the change-point detection algorithm and a flexible extrapolation function, we design a simple two-stage forecasting scheme for COVID-19 and demonstrate its promising performance in predicting cumulative deaths in the U.S.

preprint2015arXiv

A subsampled double bootstrap for massive data

The bootstrap is a popular and powerful method for assessing precision of estimators and inferential methods. However, for massive datasets which are increasingly prevalent, the bootstrap becomes prohibitively costly in computation and its feasibility is questionable even with modern parallel computing platforms. Recently Kleiner, Talwalkar, Sarkar, and Jordan (2014) proposed a method called BLB (Bag of Little Bootstraps) for massive data which is more computationally scalable with little sacrifice of statistical accuracy. Building on BLB and the idea of fast double bootstrap, we propose a new resampling method, the subsampled double bootstrap, for both independent data and time series data. We establish consistency of the subsampled double bootstrap under mild conditions for both independent and dependent cases. Methodologically, the subsampled double bootstrap is superior to BLB in terms of running time, more sample coverage and automatic implementation with less tuning parameters for a given time budget. Its advantage relative to BLB and bootstrap is also demonstrated in numerical simulations and a data illustration.

preprint2015arXiv

Two sample inference for the second-order property of temporally dependent functional data

Motivated by the need to statistically quantify the difference between two spatio-temporal datasets that arise in climate downscaling studies, we propose new tests to detect the differences of the covariance operators and their associated characteristics of two functional time series. Our two sample tests are constructed on the basis of functional principal component analysis and self-normalization, the latter of which is a new studentization technique recently developed for the inference of a univariate time series. Compared to the existing tests, our SN-based tests allow for weak dependence within each sample and it is robust to the dependence between the two samples in the case of equal sample sizes. Asymptotic properties of the SN-based test statistics are derived under both the null and local alternatives. Through extensive simulations, our SN-based tests are shown to outperform existing alternatives in size and their powers are found to be respectable. The tests are then applied to the gridded climate model outputs and interpolated observations to detect the difference in their spatial dynamics.

preprint2014arXiv

On the Coverage Bound Problem of Empirical Likelihood Methods For Time Series

The upper bounds on the coverage probabilities of the confidence regions based on blockwise empirical likelihood [Kitamura (1997)] and nonstandard expansive empirical likelihood [Nordman et al. (2013)] methods for time series data are investigated via studying the probability for the violation of the convex hull constraint. The large sample bounds are derived on the basis of the pivotal limit of the blockwise empirical log-likelihood ratio obtained under the fixed-b asymptotics, which has been recently shown to provide a more accurate approximation to the finite sample distribution than the conventional chi-square approximation. Our theoretical and numerical findings suggest that both the finite sample and large sample upper bounds for coverage probabilities are strictly less than one and the blockwise empirical likelihood confidence region can exhibit serious undercoverage when (i) the dimension of moment conditions is moderate or large; (ii) the time series dependence is positively strong; or (iii) the block size is large relative to sample size. A similar finite sample coverage problem occurs for the nonstandard expansive empirical likelihood. To alleviate the coverage bound problem, we propose to penalize both empirical likelihood methods by relaxing the convex hull constraint. Numerical simulations and data illustration demonstrate the effectiveness of our proposed remedies in terms of delivering confidence sets with more accurate coverage.

preprint2013arXiv

A general approach to the joint asymptotic analysis of statistics from sub-samples

In time series analysis, statistics based on collections of estimators computed from sub-samples play a crucial role in an increasing variety of important applications. Proving results about the joint asymptotic distribution of such statistics is challenging since it typically involves a nontrivial verification of technical conditions and tedious case-by-case asymptotic analysis. In this paper, we provide a novel technique that allows to circumvent those problems in a general setting. Our approach consists of two major steps: a probabilistic part which is mainly concerned with weak convergence of sequential empirical processes, and an analytic part providing general ways to extend this weak convergence to functionals of the sequential empirical process. Our theory provides a unified treatment of asymptotic distributions for a large class of statistics, including recently proposed self-normalized statistics and sub-sampling based p-values. In addition, we comment on the consistency of bootstrap procedures and obtain general results on compact differentiability of certain mappings that seem to be of independent interest.

preprint2013arXiv

Fixed-smoothing asymptotics for time series

In this paper, we derive higher order Edgeworth expansions for the finite sample distributions of the subsampling-based t-statistic and the Wald statistic in the Gaussian location model under the so-called fixed-smoothing paradigm. In particular, we show that the error of asymptotic approximation is at the order of the reciprocal of the sample size and obtain explicit forms for the leading error terms in the expansions. The results are used to justify the second-order correctness of a new bootstrap method, the Gaussian dependent bootstrap, in the context of Gaussian location model.

preprint2012arXiv

Fixed-b Subsampling and Block Bootstrap: Improved Confidence Sets Based on P-value Calibration

Subsampling and block-based bootstrap methods have been used in a wide range of inference problems for time series. To accommodate the dependence, these resampling methods involve a bandwidth parameter, such as subsampling window width and block size in the block-based bootstrap. In empirical work, using different bandwidth parameters could lead to different inference results, but the traditional first order asymptotic theory does not capture the choice of the bandwidth. In this article, we propose to adopt the fixed-b approach, as advocated by Kiefer and Vogelsang (2005) in the heteroscedasticity-autocorrelation robust testing context, to account for the influence of the bandwidth on the inference. Under the fixed-b asymptotic framework, we derive the asymptotic null distribution of the p-values for subsampling and the moving block bootstrap, and further propose a calibration of the traditional small-b based confidence intervals (regions, bands) and tests. Our treatment is fairly general as it includes both finite dimensional parameters and infinite dimensional parameters, such as marginal distribution function and normalized spectral distribution function. Simulation results show that the fixed-b approach is more accurate than the traditional small-b approach in terms of approximating the finite sample distribution, and that the calibrated confidence sets tend to have smaller coverage errors than the uncalibrated counterparts.

preprint2010arXiv

A self-normalized approach to confidence interval construction in time series

We propose a new method to construct confidence intervals for quantities that are associated with a stationary time series, which avoids direct estimation of the asymptotic variances. Unlike the existing tuning-parameter-dependent approaches, our method has the attractive convenience of being free of choosing any user-chosen number or smoothing parameter. The interval is constructed on the basis of an asymptotically distribution-free self-normalized statistic, in which the normalizing matrix is computed using recursive estimates. Under mild conditions, we establish the theoretical validity of our method for a broad class of statistics that are functionals of the empirical distribution of fixed or growing dimension. From a practical point of view, our method is conceptually simple, easy to implement and can be readily used by the practitioner. Monte-Carlo simulations are conducted to compare the finite sample performance of the new method with those delivered by the normal approximation and the block bootstrap approach.

Xiaofeng Shao

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Robust Inference for Change Points in High Dimension

Segmenting Time Series via Self-Normalization

Adaptive Change Point Monitoring for High-Dimensional Data

Adaptive Inference for Change Points in High-Dimensional Data

Dating the Break in High-dimensional Data

Interpoint Distance Based Two Sample Tests in High Dimension

Time Series Analysis of COVID-19 Infection Curve: A Change-Point Perspective

A subsampled double bootstrap for massive data

Two sample inference for the second-order property of temporally dependent functional data

On the Coverage Bound Problem of Empirical Likelihood Methods For Time Series

A general approach to the joint asymptotic analysis of statistics from sub-samples

Fixed-smoothing asymptotics for time series

Fixed-b Subsampling and Block Bootstrap: Improved Confidence Sets Based on P-value Calibration

A self-normalized approach to confidence interval construction in time series