Source author record

Serena Ng

Serena Ng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

econ.EM Methodology Computation Machine Learning math.ST Statistics Theory

Catalog footprint

What is connected

8works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Factor-Based Imputation of Missing Values and Covariances in Panel Data of Large Dimensions

Economists are blessed with a wealth of data for analysis, but more often than not, values in some entries of the data matrix are missing. Various methods have been proposed to handle missing observations in a few variables. We exploit the factor structure in panel data of large dimensions. Our \textsc{tall-project} algorithm first estimates the factors from a \textsc{tall} block in which data for all rows are observed, and projections of variable specific length are then used to estimate the factor loadings. A missing value is imputed as the estimated common component which we show is consistent and asymptotically normal without further iteration. Implications for using imputed data in factor augmented regressions are then discussed. To compensate for the downward bias in covariance matrices created by an omitted noise when the data point is not observed, we overlay the imputed data with re-sampled idiosyncratic residuals many times and use the average of the covariances to estimate the parameters of interest. Simulations show that the procedures have desirable finite sample properties.

preprint2022arXiv

Least Squares Estimation Using Sketched Data with Heteroskedastic Errors

Researchers may perform regressions using a sketch of data of size $m$ instead of the full sample of size $n$ for a variety of reasons. This paper considers the case when the regression errors do not have constant variance and heteroskedasticity robust standard errors would normally be needed for test statistics to provide accurate inference. We show that estimates using data sketched by random projections will behave `as if' the errors were homoskedastic. Estimation by random sampling would not have this property. The result arises because the sketched estimates in the case of random projections can be expressed as degenerate $U$-statistics, and under certain conditions, these statistics are asymptotically normal with homoskedastic variance. We verify that the conditions hold not only in the case of least squares regression when the covariates are exogenous, but also in instrumental variables estimation when the covariates are endogenous. The result implies that inference, including first-stage F tests for instrument relevance, can be simpler than the full sample case if the sketching scheme is appropriately chosen.

preprint2022arXiv

Time Series Estimation of the Dynamic Effects of Disaster-Type Shock

This paper provides three results for SVARs under the assumption that the primitive shocks are mutually independent. First, a framework is proposed to accommodate a disaster-type variable with infinite variance into a SVAR. We show that the least squares estimates of the SVAR are consistent but have non-standard asymptotics. Second, the disaster shock is identified as the component with the largest kurtosis and whose impact effect is negative. An estimator that is robust to infinite variance is used to recover the mutually independent components. Third, an independence test on the residuals pre-whitened by the Choleski decomposition is proposed to test the restrictions imposed on a SVAR. The test can be applied whether the data have fat or thin tails, and to over as well as exactly identified models. Three applications are considered. In the first, the independence test is used to shed light on the conflicting evidence regarding the role of uncertainty in economic fluctuations. In the second, disaster shocks are shown to have short term economic impact arising mostly from feedback dynamics. The third uses the framework to study the dynamic effects of economic shocks post-covid.

preprint2021arXiv

Estimation and Inference by Stochastic Optimization: Three Examples

This paper illustrates two algorithms designed in Forneron & Ng (2020): the resampled Newton-Raphson (rNR) and resampled quasi-Newton (rqN) algorithms which speed-up estimation and bootstrap inference for structural models. An empirical application to BLP shows that computation time decreases from nearly 5 hours with the standard bootstrap to just over 1 hour with rNR, and only 15 minutes using rqN. A first Monte-Carlo exercise illustrates the accuracy of the method for estimation and inference in a probit IV regression. A second exercise additionally illustrates statistical efficiency gains relative to standard estimation for simulation-based estimation using a dynamic panel regression example.

preprint2020arXiv

An Econometric Perspective on Algorithmic Subsampling

Datasets that are terabytes in size are increasingly common, but computer bottlenecks often frustrate a complete analysis of the data. While more data are better than less, diminishing returns suggest that we may not need terabytes of data to estimate a parameter or test a hypothesis. But which rows of data should we analyze, and might an arbitrary subset of rows preserve the features of the original data? This paper reviews a line of work that is grounded in theoretical computer science and numerical linear algebra, and which finds that an algorithmically desirable sketch, which is a randomly chosen subset of the data, must preserve the eigenstructure of the data, a property known as a subspace embedding. Building on this work, we study how prediction and inference can be affected by data sketching within a linear regression setup. We show that the sketching error is small compared to the sample size effect which a researcher can control. As a sketch size that is algorithmically optimal may not be suitable for prediction and inference, we use statistical arguments to provide 'inference conscious' guides to the sketch size. When appropriately implemented, an estimator that pools over different sketches can be nearly as efficient as the infeasible one using the full sample.

preprint2020arXiv

Inference by Stochastic Optimization: A Free-Lunch Bootstrap

Assessing sampling uncertainty in extremum estimation can be challenging when the asymptotic variance is not analytically tractable. Bootstrap inference offers a feasible solution but can be computationally costly especially when the model is complex. This paper uses iterates of a specially designed stochastic optimization algorithm as draws from which both point estimates and bootstrap standard errors can be computed in a single run. The draws are generated by the gradient and Hessian computed from batches of data that are resampled at each iteration. We show that these draws yield consistent estimates and asymptotically valid frequentist inference for a large class of regular problems. The algorithm provides accurate standard errors in simulation examples and empirical applications at low computational costs. The draws from the algorithm also provide a convenient way to detect data irregularities.

preprint2020arXiv

Simpler Proofs for Approximate Factor Models of Large Dimensions

Estimates of the approximate factor model are increasingly used in empirical work. Their theoretical properties, studied some twenty years ago, also laid the ground work for analysis on large dimensional panel data models with cross-section dependence. This paper presents simplified proofs for the estimates by using alternative rotation matrices, exploiting properties of low rank matrices, as well as the singular value decomposition of the data in addition to its covariance structure. These simplifications facilitate interpretation of results and provide a more friendly introduction to researchers new to the field. New results are provided to allow linear restrictions to be imposed on factor models.

preprint2015arXiv

A Likelihood-Free Reverse Sampler of the Posterior Distribution

This paper considers properties of an optimization based sampler for targeting the posterior distribution when the likelihood is intractable and auxiliary statistics are used to summarize information in the data. Our reverse sampler approximates the likelihood-based posterior distribution by solving a sequence of simulated minimum distance problems. By a change of variable argument, these estimates are reweighted with a prior and the volume of the jacobian matrix to serve as draws from the desired posterior distribution. The sampler provides a conceptual framework to understand the difference between two types of likelihood free estimation. Because simulated minimum distance estimation always results in acceptable draws, the reverse sampler is potentially an alternative to existing approximate Bayesian methods that are computationally demanding because of a low acceptance rate.

Serena Ng

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Factor-Based Imputation of Missing Values and Covariances in Panel Data of Large Dimensions

Least Squares Estimation Using Sketched Data with Heteroskedastic Errors

Time Series Estimation of the Dynamic Effects of Disaster-Type Shock

Estimation and Inference by Stochastic Optimization: Three Examples

An Econometric Perspective on Algorithmic Subsampling

Inference by Stochastic Optimization: A Free-Lunch Bootstrap

Simpler Proofs for Approximate Factor Models of Large Dimensions

A Likelihood-Free Reverse Sampler of the Posterior Distribution