Source author record

Jushan Bai

Jushan Bai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

econ.EM math.ST Statistics Theory Methodology

Catalog footprint

What is connected

8works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Factor-Based Imputation of Missing Values and Covariances in Panel Data of Large Dimensions

Economists are blessed with a wealth of data for analysis, but more often than not, values in some entries of the data matrix are missing. Various methods have been proposed to handle missing observations in a few variables. We exploit the factor structure in panel data of large dimensions. Our \textsc{tall-project} algorithm first estimates the factors from a \textsc{tall} block in which data for all rows are observed, and projections of variable specific length are then used to estimate the factor loadings. A missing value is imputed as the estimated common component which we show is consistent and asymptotically normal without further iteration. Implications for using imputed data in factor augmented regressions are then discussed. To compensate for the downward bias in covariance matrices created by an omitted noise when the data point is not observed, we overlay the imputed data with re-sampled idiosyncratic residuals many times and use the average of the covariances to estimate the parameters of interest. Simulations show that the procedures have desirable finite sample properties.

preprint2020arXiv

Feasible Generalized Least Squares for Panel Data with Cross-sectional and Serial Correlations

This paper considers generalized least squares (GLS) estimation for linear panel data models. By estimating the large error covariance matrix consistently, the proposed feasible GLS (FGLS) estimator is more efficient than the ordinary least squares (OLS) in the presence of heteroskedasticity, serial, and cross-sectional correlations. To take into account the serial correlations, we employ the banding method. To take into account the cross-sectional correlations, we suggest to use the thresholding method. We establish the limiting distribution of the proposed estimator. A Monte Carlo study is considered. The proposed method is applied to an empirical application.

preprint2020arXiv

Simpler Proofs for Approximate Factor Models of Large Dimensions

Estimates of the approximate factor model are increasingly used in empirical work. Their theoretical properties, studied some twenty years ago, also laid the ground work for analysis on large dimensional panel data models with cross-section dependence. This paper presents simplified proofs for the estimates by using alternative rotation matrices, exploiting properties of low rank matrices, as well as the singular value decomposition of the data in addition to its covariance structure. These simplifications facilitate interpretation of results and provide a more friendly introduction to researchers new to the field. New results are provided to allow linear restrictions to be imposed on factor models.

preprint2020arXiv

Standard Errors for Panel Data Models with Unknown Clusters

This paper develops a new standard-error estimator for linear panel data models. The proposed estimator is robust to heteroskedasticity, serial correlation, and cross-sectional correlation of unknown forms. The serial correlation is controlled by the Newey-West method. To control for cross-sectional correlations, we propose to use the thresholding method, without assuming the clusters to be known. We establish the consistency of the proposed estimator. Monte Carlo simulations show the method works well. An empirical application is considered.

preprint2014arXiv

Theory and methods of panel data models with interactive effects

This paper considers the maximum likelihood estimation of panel data models with interactive effects. Motivated by applications in economics and other social sciences, a notable feature of the model is that the explanatory variables are correlated with the unobserved effects. The usual within-group estimator is inconsistent. Existing methods for consistent estimation are either designed for panel data with short time periods or are less efficient. The maximum likelihood estimator has desirable properties and is easy to implement, as illustrated by the Monte Carlo simulations. This paper develops the inferential theory for the maximum likelihood estimator, including consistency, rate of convergence and the limiting distributions. We further extend the model to include time-invariant regressors and common regressors (cross-section invariant). The regression coefficients for the time-invariant regressors are time-varying, and the coefficients for the common regressors are cross-sectionally varying.

preprint2013arXiv

Statistical Inferences Using Large Estimated Covariances for Panel Data and Factor Models

While most of the convergence results in the literature on high dimensional covariance matrix are concerned about the accuracy of estimating the covariance matrix (and precision matrix), relatively less is known about the effect of estimating large covariances on statistical inferences. We study two important models: factor analysis and panel data model with interactive effects, and focus on the statistical inference and estimation efficiency of structural parameters based on large covariance estimators. For efficient estimation, both models call for a weighted principle components (WPC), which relies on a high dimensional weight matrix. This paper derives an efficient and feasible WPC using the covariance matrix estimator of Fan et al. (2013). However, we demonstrate that existing results on large covariance estimation based on absolute convergence are not suitable for statistical inferences of the structural parameters. What is needed is some weighted consistency and the associated rate of convergence, which are obtained in this paper. Finally, the proposed method is applied to the US divorce rate data. We find that the efficient WPC identifies the significant effects of divorce-law reforms on the divorce rate, and it provides more accurate estimation and tighter confidence intervals than existing methods.

preprint2012arXiv

Efficient Estimation of Approximate Factor Models via Regularized Maximum Likelihood

We study the estimation of a high dimensional approximate factor model in the presence of both cross sectional dependence and heteroskedasticity. The classical method of principal components analysis (PCA) does not efficiently estimate the factor loadings or common factors because it essentially treats the idiosyncratic error to be homoskedastic and cross sectionally uncorrelated. For efficient estimation it is essential to estimate a large error covariance matrix. We assume the model to be conditionally sparse, and propose two approaches to estimating the common factors and factor loadings; both are based on maximizing a Gaussian quasi-likelihood and involve regularizing a large covariance sparse matrix. In the first approach the factor loadings and the error covariance are estimated separately while in the second approach they are estimated jointly. Extensive asymptotic analysis has been carried out. In particular, we develop the inferential theory for the two-step estimation. Because the proposed approaches take into account the large error covariance matrix, they produce more efficient estimators than the classical PCA methods or methods based on a strict factor model.

preprint2012arXiv

Statistical analysis of factor models of high dimension

This paper considers the maximum likelihood estimation of factor models of high dimension, where the number of variables (N) is comparable with or even greater than the number of observations (T). An inferential theory is developed. We establish not only consistency but also the rate of convergence and the limiting distributions. Five different sets of identification conditions are considered. We show that the distributions of the MLE estimators depend on the identification restrictions. Unlike the principal components approach, the maximum likelihood estimator explicitly allows heteroskedasticities, which are jointly estimated with other parameters. Efficiency of MLE relative to the principal components method is also considered.

Jushan Bai

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Factor-Based Imputation of Missing Values and Covariances in Panel Data of Large Dimensions

Feasible Generalized Least Squares for Panel Data with Cross-sectional and Serial Correlations

Simpler Proofs for Approximate Factor Models of Large Dimensions

Standard Errors for Panel Data Models with Unknown Clusters

Theory and methods of panel data models with interactive effects

Statistical Inferences Using Large Estimated Covariances for Panel Data and Factor Models

Efficient Estimation of Approximate Factor Models via Regularized Maximum Likelihood

Statistical analysis of factor models of high dimension