Source author record

Hanzhong Liu

Hanzhong Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology math.ST Statistics Theory

Catalog footprint

What is connected

8works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Design-based theory for cluster rerandomization

Complete randomization balances covariates on average, but covariate imbalance often exists in finite samples. Rerandomization can ensure covariate balance in the realized experiment by discarding the undesired treatment assignments. Many field experiments in public health and social sciences assign the treatment at the cluster level due to logistical constraints or policy considerations. Moreover, they are frequently combined with rerandomization in the design stage. We refer to cluster rerandomization as a cluster-randomized experiment compounded with rerandomization to balance covariates at the individual or cluster level. Existing asymptotic theory can only deal with rerandomization with treatments assigned at the individual level, leaving that for cluster rerandomization an open problem. To fill the gap, we provide a design-based theory for cluster rerandomization. Moreover, we compare two cluster rerandomization schemes that use prior information on the importance of the covariates: one based on the weighted Euclidean distance and the other based on the Mahalanobis distance with tiers of covariates. We demonstrate that the former dominates the latter with optimal weights and orthogonalized covariates. Last but not least, we discuss the role of covariate adjustment in the analysis stage and recommend covariate-adjusted procedures that can be conveniently implemented by least squares with the associated robust standard errors.

preprint2022arXiv

Pair-switching rerandomization

Rerandomization discards assignments with covariates unbalanced in the treatment and control groups to improve estimation and inference efficiency. However, the acceptance-rejection sampling method used in rerandomization is computationally inefficient. As a result, it is time-consuming for rerandomization to draw numerous independent assignments, which are necessary for performing Fisher randomization tests and constructing randomization-based confidence intervals. To address this problem, we propose a pair-switching rerandomization method to draw balanced assignments efficiently. We obtain the unbiasedness and variance reduction of the difference-in-means estimator and show that the Fisher randomization tests are valid under pair-switching rerandomization. Moreover, we propose an exact approach to invert Fisher randomization tests to confidence intervals, which is faster than the existing methods. In addition, our method is applicable to both non-sequentially and sequentially randomized experiments. We conduct comprehensive simulation studies to compare the finite-sample performance of the proposed method with that of classical rerandomization. Simulation results indicate that pair-switching rerandomization leads to comparable power of Fisher randomization tests and is 3--23 times faster than classical rerandomization. Finally, we apply the pair-switching rerandomization method to analyze two clinical trial datasets, both of which demonstrate the advantages of our method.

preprint2022arXiv

Randomization-based joint central limit theorem and efficient covariate adjustment in stratified $2^K$ factorial experiments

Randomized block factorial experiments are widely used in industrial engineering, clinical trials, and social science. Researchers often use a linear model and analysis of covariance to analyze experimental results; however, limited studies have addressed the validity and robustness of the resulting inferences because assumptions for a linear model might not be justified by randomization in randomized block factorial experiments. In this paper, we establish a new finite population joint central limit theorem for usual (unadjusted) factorial effect estimators in randomized block $2^K$ factorial experiments. Our theorem is obtained under a randomization-based inference framework, making use of an extension of the vector form of the Wald--Wolfowitz--Hoeffding theorem for a linear rank statistic. It is robust to model misspecification, numbers of blocks, block sizes, and propensity scores across blocks. To improve the estimation and inference efficiency, we propose four covariate adjustment methods. We show that under mild conditions, the resulting covariate-adjusted factorial effect estimators are consistent, jointly asymptotically normal, and generally more efficient than the unadjusted estimator. In addition, we propose Neyman-type conservative estimators for the asymptotic covariances to facilitate valid inferences. Simulation studies and a clinical trial data analysis demonstrate the benefits of the covariate adjustment methods.

preprint2020arXiv

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

Constructing confidence intervals for the coefficients of high-dimensional sparse linear models remains a challenge, mainly because of the complicated limiting distributions of the widely used estimators, such as the lasso. Several methods have been developed for constructing such intervals. Bootstrap lasso+ols is notable for its technical simplicity, good interpretability, and performance that is comparable with that of other more complicated methods. However, bootstrap lasso+ols depends on the beta-min assumption, a theoretic criterion that is often violated in practice. Thus, we introduce a new method, called bootstrap lasso+partial ridge, to relax this assumption. Lasso+partial ridge is a two-stage estimator. First, the lasso is used to select features. Then, the partial ridge is used to refit the coefficients. Simulation results show that bootstrap lasso+partial ridge outperforms bootstrap lasso+ols when there exist small, but nonzero coefficients, a common situation that violates the beta-min assumption. For such coefficients, the confidence intervals constructed using bootstrap lasso+partial ridge have, on average, $50\%$ larger coverage probabilities than those of bootstrap lasso+ols. Bootstrap lasso+partial ridge also has, on average, $35\%$ shorter confidence interval lengths than those of the de-sparsified lasso methods, regardless of whether the linear models are misspecified. Additionally, we provide theoretical guarantees for bootstrap lasso+partial ridge under appropriate conditions, and implement it in the R package "HDCI."

preprint2020arXiv

Confidence intervals for parameters in high-dimensional sparse vector autoregression

Vector autoregression (VAR) models are widely used to analyze the interrelationship between multiple variables over time. Estimation and inference for the transition matrices of VAR models are crucial for practitioners to make decisions in fields such as economics and finance. However, when the number of variables is larger than the sample size, it remains a challenge to perform statistical inference of the model parameters. In this article, we propose the de-biased Lasso and two bootstrap de-biased Lasso methods to construct confidence intervals for the elements of the transition matrices of high-dimensional VAR models. We show that the proposed methods are asymptotically valid under appropriate sparsity and other regularity conditions. To implement our methods, we develop feasible and parallelizable algorithms, which save a large amount of computation required by the nodewise Lasso and bootstrap. A simulation study illustrates that our methods perform well in finite samples. Finally, we apply our methods to analyze the price data of stocks in the S&P 500 index in 2019. We find that some stocks, such as the largest producer of gold in the world, Newmont Corporation, have significant predictive power over the most stocks.

preprint2020arXiv

Regression-adjusted average treatment effect estimates in stratified randomized experiments

Researchers often use linear regression to analyse randomized experiments to improve treatment effect estimation by adjusting for imbalances of covariates in the treatment and control groups. Our work offers a randomization-based inference framework for regression adjustment in stratified randomized experiments. Under mild conditions, we re-establish the finite population central limit theorem for a stratified experiment. We prove that both the stratified difference-in-means and the regression-adjusted average treatment effect estimators are consistent and asymptotically normal. The asymptotic variance of the latter is no greater and is typically lesser than that of the former. We also provide conservative variance estimators to construct large-sample confidence intervals for the average treatment effect.

preprint2015arXiv

Lasso adjustments of treatment effect estimates in randomized experiments

We provide a principled way for investigators to analyze randomized experiments when the number of covariates is large. Investigators often use linear multivariate regression to analyze randomized experiments instead of simply reporting the difference of means between treatment and control groups. Their aim is to reduce the variance of the estimated treatment effect by adjusting for covariates. If there are a large number of covariates relative to the number of observations, regression may perform poorly because of overfitting. In such cases, the Lasso may be helpful. We study the resulting Lasso-based treatment effect estimator under the Neyman-Rubin model of randomized experiments. We present theoretical conditions that guarantee that the estimator is more efficient than the simple difference-of-means estimator, and we provide a conservative estimator of the asymptotic variance, which can yield tighter confidence intervals than the difference-of-means estimator. Simulation and data examples show that Lasso-based adjustment can be advantageous even when the number of covariates is less than the number of observations. Specifically, a variant using Lasso for selection and OLS for estimation performs particularly well, and it chooses a smoothing parameter based on combined performance of Lasso and OLS.

preprint2014arXiv

Asymptotic Properties of Lasso+mLS and Lasso+Ridge in Sparse High-dimensional Linear Regression

We study the asymptotic properties of Lasso+mLS and Lasso+Ridge under the sparse high-dimensional linear regression model: Lasso selecting predictors and then modified Least Squares (mLS) or Ridge estimating their coefficients. First, we propose a valid inference procedure for parameter estimation based on parametric residual bootstrap after Lasso+mLS and Lasso+Ridge. Second, we derive the asymptotic unbiasedness of Lasso+mLS and Lasso+Ridge. More specifically, we show that their biases decay at an exponential rate and they can achieve the oracle convergence rate of $s/n$ (where $s$ is the number of nonzero regression coefficients and $n$ is the sample size) for mean squared error (MSE). Third, we show that Lasso+mLS and Lasso+Ridge are asymptotically normal. They have an oracle property in the sense that they can select the true predictors with probability converging to 1 and the estimates of nonzero parameters have the same asymptotic normal distribution that they would have if the zero parameters were known in advance. In fact, our analysis is not limited to adopting Lasso in the selection stage, but is applicable to any other model selection criteria with exponentially decay rates of the probability of selecting wrong models.

Hanzhong Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Design-based theory for cluster rerandomization

Pair-switching rerandomization

Randomization-based joint central limit theorem and efficient covariate adjustment in stratified $2^K$ factorial experiments

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

Confidence intervals for parameters in high-dimensional sparse vector autoregression

Regression-adjusted average treatment effect estimates in stratified randomized experiments

Lasso adjustments of treatment effect estimates in randomized experiments

Asymptotic Properties of Lasso+mLS and Lasso+Ridge in Sparse High-dimensional Linear Regression