Source author record

Sokbae Lee

Sokbae Lee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

econ.EM math.ST Statistics Theory Methodology Machine Learning Applications Computation econ.GN q-fin.EC

Catalog footprint

What is connected

12works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Policy Learning with Confidence

This paper introduces a rule for policy selection in the presence of estimation uncertainty, explicitly accounting for estimation risk. The rule belongs to the class of risk-aware rules on the efficient decision frontier, characterized as policies offering maximal estimated welfare for a given level of estimation risk. Among this class, the proposed rule is chosen to provide a reporting guarantee, ensuring that the welfare delivered exceeds a threshold with a pre-specified confidence level. We apply this approach to the allocation of a limited budget among social programs using estimates of their marginal value of public funds and associated standard errors.

preprint2022arXiv

Least Squares Estimation Using Sketched Data with Heteroskedastic Errors

Researchers may perform regressions using a sketch of data of size $m$ instead of the full sample of size $n$ for a variety of reasons. This paper considers the case when the regression errors do not have constant variance and heteroskedasticity robust standard errors would normally be needed for test statistics to provide accurate inference. We show that estimates using data sketched by random projections will behave `as if' the errors were homoskedastic. Estimation by random sampling would not have this property. The result arises because the sketched estimates in the case of random projections can be expressed as degenerate $U$-statistics, and under certain conditions, these statistics are asymptotically normal with homoskedastic variance. We verify that the conditions hold not only in the case of least squares regression when the covariates are exogenous, but also in instrumental variables estimation when the covariates are endogenous. The result implies that inference, including first-stage F tests for instrument relevance, can be simpler than the full sample case if the sketching scheme is appropriately chosen.

preprint2021arXiv

Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling

We develop a new method of online inference for a vector of parameters estimated by the Polyak-Ruppert averaging procedure of stochastic gradient descent (SGD) algorithms. We leverage insights from time series regression in econometrics and construct asymptotically pivotal statistics via random scaling. Our approach is fully operational with online data and is rigorously underpinned by a functional central limit theorem. Our proposed inference method has a couple of key advantages over the existing methods. First, the test statistic is computed in an online fashion with only SGD iterates and the critical values can be obtained without any resampling methods, thereby allowing for efficient implementation suitable for massive online data. Second, there is no need to estimate the asymptotic variance and our inference method is shown to be robust to changes in the tuning parameters for SGD algorithms in simulation experiments with synthetic data.

preprint2020arXiv

An Econometric Perspective on Algorithmic Subsampling

Datasets that are terabytes in size are increasingly common, but computer bottlenecks often frustrate a complete analysis of the data. While more data are better than less, diminishing returns suggest that we may not need terabytes of data to estimate a parameter or test a hypothesis. But which rows of data should we analyze, and might an arbitrary subset of rows preserve the features of the original data? This paper reviews a line of work that is grounded in theoretical computer science and numerical linear algebra, and which finds that an algorithmically desirable sketch, which is a randomly chosen subset of the data, must preserve the eigenstructure of the data, a property known as a subspace embedding. Building on this work, we study how prediction and inference can be affected by data sketching within a linear regression setup. We show that the sketching error is small compared to the sample size effect which a researcher can control. As a sketch size that is algorithmically optimal may not be suitable for prediction and inference, we use statistical arguments to provide 'inference conscious' guides to the sketch size. When appropriately implemented, an estimator that pools over different sketches can be nearly as efficient as the infeasible one using the full sample.

preprint2020arXiv

Desperate times call for desperate measures: government spending multipliers in hard times

We investigate state-dependent effects of fiscal multipliers and allow for endogenous sample splitting to determine whether the US economy is in a slack state. When the endogenized slack state is estimated as the period of the unemployment rate higher than about 12 percent, the estimated cumulative multipliers are significantly larger during slack periods than non-slack periods and are above unity. We also examine the possibility of time-varying regimes of slackness and find that our empirical results are robust under a more flexible framework. Our estimation results point out the importance of the heterogenous effects of fiscal policy and shed light on the prospect of fiscal policy in response to economic shocks from the current COVID-19 pandemic.

preprint2020arXiv

Factor-Driven Two-Regime Regression

We propose a novel two-regime regression model where regime switching is driven by a vector of possibly unobservable factors. When the factors are latent, we estimate them by the principal component analysis of a panel data set. We show that the optimization problem can be reformulated as mixed integer optimization, and we present two alternative computational algorithms. We derive the asymptotic distribution of the resulting estimator under the scheme that the threshold effect shrinks to zero. In particular, we establish a phase transition that describes the effect of first-stage factor estimation as the cross-sectional dimension of panel data increases relative to the time-series dimension. Moreover, we develop bootstrap inference and illustrate our methods via numerical studies.

preprint2020arXiv

Sparse HP Filter: Finding Kinks in the COVID-19 Contact Rate

In this paper, we estimate the time-varying COVID-19 contact rate of a Susceptible-Infected-Recovered (SIR) model. Our measurement of the contact rate is constructed using data on actively infected, recovered and deceased cases. We propose a new trend filtering method that is a variant of the Hodrick-Prescott (HP) filter, constrained by the number of possible kinks. We term it the $\textit{sparse HP filter}$ and apply it to daily data from five countries: Canada, China, South Korea, the UK and the US. Our new method yields the kinks that are well aligned with actual events in each country. We find that the sparse HP filter provides a fewer kinks than the $\ell_1$ trend filter, while both methods fitting data equally well. Theoretically, we establish risk consistency of both the sparse HP and $\ell_1$ trend filters. Ultimately, we propose to use time-varying $\textit{contact growth rates}$ to document and monitor outbreaks of COVID-19.

preprint2015arXiv

Testing for a General Class of Functional Inequalities

In this paper, we propose a general method for testing inequality restrictions on nonparametric functions. Our framework includes many nonparametric testing problems in a unified framework, with a number of possible applications in auction models, game theoretic models, wage inequality, and revealed preferences. Our test involves a one-sided version of $L_{p}$ functionals of kernel-type estimators $(1\leq p <\infty )$ and is easy to implement in general, mainly due to its recourse to the bootstrap method. The bootstrap procedure is based on nonparametric bootstrap applied to kernel-based test statistics, with estimated "contact sets." We provide regularity conditions under which the bootstrap test is asymptotically valid uniformly over a large class of distributions, including the cases that the limiting distribution of the test statistic is degenerate. Our bootstrap test is shown to exhibit good power properties in Monte Carlo experiments, and we provide a general form of the local power function. As an illustration, we consider testing implications from auction theory, provide primitive conditions for our test, and demonstrate its usefulness by applying our test to real data. We supplement this example with the second empirical illustration in the context of wage inequality.

preprint2015arXiv

Uniform Asymptotics for Nonparametric Quantile Regression with an Application to Testing Monotonicity

In this paper, we establish a uniform error rate of a Bahadur representation for local polynomial estimators of quantile regression functions. The error rate is uniform over a range of quantiles, a range of evaluation points in the regressors, and over a wide class of probabilities for observed random variables. Most of the existing results on Bahadur representations for local polynomial quantile regression estimators apply to the fixed data generating process. In the context of testing monotonicity where the null hypothesis is of a complex composite hypothesis, it is particularly relevant to establish Bahadur expansions that hold uniformly over a large class of data generating processes. In addition, we establish the same error rate for bootstrap local polynomial estimators which can be useful for various bootstrap inference. As an illustration, we apply to testing monotonicity of quantile regression and present Monte Carlo experiments based on this example.

preprint2014arXiv

Structural Change in Sparsity

In the high-dimensional sparse modeling literature, it has been crucially assumed that the sparsity structure of the model is homogeneous over the entire population. That is, the identities of important regressors are invariant across the population and across the individuals in the collected sample. In practice, however, the sparsity structure may not always be invariant in the population, due to heterogeneity across different sub-populations. We consider a general, possibly non-smooth M-estimation framework, allowing a possible structural change regarding the identities of important regressors in the population. Our penalized M-estimator not only selects covariates but also discriminates between a model with homogeneous sparsity and a model with a structural change in sparsity. As a result, it is not necessary to know or pretest whether the structural change is present, or where it occurs. We derive asymptotic bounds on the estimation loss of the penalized M-estimators, and achieve the oracle properties. We also show that when there is a structural change, the estimator of the threshold parameter is super-consistent. If the signal is relatively strong, the rates of convergence can be further improved and asymptotic distributional properties of the estimators including the threshold estimator can be established using an adaptive penalization. The proposed methods are then applied to quantile regression and logistic regression models and are illustrated via Monte Carlo experiments.

preprint2013arXiv

Intersection Bounds: Estimation and Inference

We develop a practical and novel method for inference on intersection bounds, namely bounds defined by either the infimum or supremum of a parametric or nonparametric function, or equivalently, the value of a linear programming problem with a potentially infinite constraint set. We show that many bounds characterizations in econometrics, for instance bounds on parameters under conditional moment inequalities, can be formulated as intersection bounds. Our approach is especially convenient for models comprised of a continuum of inequalities that are separable in parameters, and also applies to models with inequalities that are non-separable in parameters. Since analog estimators for intersection bounds can be severely biased in finite samples, routinely underestimating the size of the identified set, we also offer a median-bias-corrected estimator of such bounds as a by-product of our inferential procedures. We develop theory for large sample inference based on the strong approximation of a sequence of series or kernel-based empirical processes by a sequence of "penultimate" Gaussian processes. These penultimate processes are generally not weakly convergent, and thus non-Donsker. Our theoretical results establish that we can nonetheless perform asymptotically valid inference based on these processes. Our construction also provides new adaptive inequality/moment selection methods. We provide conditions for the use of nonparametric kernel and series estimators, including a novel result that establishes strong approximation for any general series estimator admitting linearization, which may be of independent interest.

preprint2013arXiv

Maximum Score Estimation of Preference Parameters for a Binary Choice Model under Uncertainty

This paper develops maximum score estimation of preference parameters in the binary choice model under uncertainty in which the decision rule is affected by conditional expectations. The preference parameters are estimated in two stages: we estimate conditional expectations nonparametrically in the first stage and then the preference parameters in the second stage based on Manski (1975, 1985)'s maximum score estimator using the choice data and first stage estimates. The paper establishes consistency and derives rate of convergence of the two-stage maximum score estimator. Moreover, the paper also provides sufficient conditions under which the two-stage estimator is asymptotically equivalent in distribution to the corresponding single-stage estimator that assumes the first stage input is known. These results are of independent interest for maximum score estimation with nonparametrically generated regressors. The paper also presents some Monte Carlo simulation results for finite-sample behavior of the two-stage estimator.

Sokbae Lee

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Policy Learning with Confidence

Least Squares Estimation Using Sketched Data with Heteroskedastic Errors

Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling

An Econometric Perspective on Algorithmic Subsampling

Desperate times call for desperate measures: government spending multipliers in hard times

Factor-Driven Two-Regime Regression

Sparse HP Filter: Finding Kinks in the COVID-19 Contact Rate

Testing for a General Class of Functional Inequalities

Uniform Asymptotics for Nonparametric Quantile Regression with an Application to Testing Monotonicity

Structural Change in Sparsity

Intersection Bounds: Estimation and Inference

Maximum Score Estimation of Preference Parameters for a Binary Choice Model under Uncertainty