Source author record

Xiaohong Chen

Xiaohong Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

econ.EM Machine Learning math.ST Statistics Theory Methodology Software Engineering Artificial Intelligence cs.CY Logic in Computer Science physics.app-ph

Catalog footprint

What is connected

14works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Explicating Tacit Regulatory Knowledge from LLMs to Auto-Formalize Requirements for Compliance Test Case Generation

Compliance testing in highly regulated domains is crucial but largely manual, requiring domain experts to translate complex regulations into executable test cases. While large language models (LLMs) show promise for automation, their susceptibility to hallucinations limits reliable application. Existing hybrid approaches mitigate this issue by constraining LLMs with formal models, but still rely on costly manual modeling. To solve this problem, this paper proposes RAFT, a framework for requirements auto-formalization and compliance test generation via explicating tacit regulatory knowledge from multiple LLMs. RAFT employs an Adaptive Purification-Aggregation strategy to explicate tacit regulatory knowledge from multiple LLMs and integrate it into three artifacts: a domain meta-model, a formal requirements representation, and testability constraints. These artifacts are then dynamically injected into prompts to guide high-precision requirement formalization and automated test generation. Experiments across financial, automotive, and power domains show that RAFT achieves expert-level performance, substantially outperforms state-of-the-art (SOTA) methods while reducing overall generation and review time.

preprint2024arXiv

Adaptive Estimation and Uniform Confidence Bands for Nonparametric Structural Functions and Elasticities

We introduce two data-driven procedures for optimal estimation and inference in nonparametric models using instrumental variables. The first is a data-driven choice of sieve dimension for a popular class of sieve two-stage least squares estimators. When implemented with this choice, estimators of both the structural function $h_0$ and its derivatives (such as elasticities) converge at the fastest possible (i.e., minimax) rates in sup-norm. The second is for constructing uniform confidence bands (UCBs) for $h_0$ and its derivatives. Our UCBs guarantee coverage over a generic class of data-generating processes and contract at the minimax rate, possibly up to a logarithmic factor. As such, our UCBs are asymptotically more efficient than UCBs based on the usual approach of undersmoothing. As an application, we estimate the elasticity of the intensive margin of firm exports in a monopolistic competition model of international trade. Simulations illustrate the good performance of our procedures in empirically calibrated designs. Our results provide evidence against common parameterizations of the distribution of unobserved firm heterogeneity.

preprint2023arXiv

Inference on Time Series Nonparametric Conditional Moment Restrictions Using General Sieves

General nonlinear sieve learnings are classes of nonlinear sieves that can approximate nonlinear functions of high dimensional variables much more flexibly than various linear sieves (or series). This paper considers general nonlinear sieve quasi-likelihood ratio (GN-QLR) based inference on expectation functionals of time series data, where the functionals of interest are based on some nonparametric function that satisfy conditional moment restrictions and are learned using multilayer neural networks. While the asymptotic normality of the estimated functionals depends on some unknown Riesz representer of the functional space, we show that the optimally weighted GN-QLR statistic is asymptotically Chi-square distributed, regardless whether the expectation functional is regular (root-$n$ estimable) or not. This holds when the data are weakly dependent beta-mixing condition. We apply our method to the off-policy evaluation in reinforcement learning, by formulating the Bellman equation into the conditional moment restriction framework, so that we can make inference about the state-specific value functional using the proposed GN-QLR method with time series data. In addition, estimating the averaged partial means and averaged partial derivatives of nonparametric instrumental variables and quantile IV models are also presented as leading examples. Finally, a Monte Carlo study shows the finite sample performance of the procedure

preprint2022arXiv

Adaptive incomplete multi-view learning via tensor graph completion

With the advancement of the data acquisition techniques, multi-view learning has become a hot topic. Some multi-view learning methods assume that the multi-view data is complete, which means that all instances are present, but this too ideal. Certain tensor-based methods for handing incomplete multi-view data have emerged and have achieved better result. However, there are still some problems, such as use of traditional tensor norm which makes the computation high and is not able to handle out-of-sample. To solve these two problems, we proposed a new incomplete multi view learning method. A new tensor norm is defined to implement graph tensor data recover. The recovered graphs are then regularized to a consistent low-dimensional representation of the samples. In addition, adaptive weights are equipped to each view to adjust the importance of different views. Compared with the existing methods, our method nor only explores the consistency among views, but also obtains the low-dimensional representation of the new samples by using the learned projection matrix. An efficient algorithm based on inexact augmented Lagrange multiplier (ALM) method are designed to solve the model and convergence is proved. Experimental results on four datasets show the effectiveness of our method.

preprint2022arXiv

On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation

We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions. We recast the $Q$-function estimation into a special form of the nonparametric instrumental variables (NPIV) estimation problem. We first show that under one mild condition the NPIV formulation of $Q$-function estimation is well-posed in the sense of $L^2$-measure of ill-posedness with respect to the data generating distribution, bypassing a strong assumption on the discount factor $γ$ imposed in the recent literature for obtaining the $L^2$ convergence rates of various $Q$-function estimators. Thanks to this new well-posed property, we derive the first minimax lower bounds for the convergence rates of nonparametric estimation of $Q$-function and its derivatives in both sup-norm and $L^2$-norm, which are shown to be the same as those for the classical nonparametric regression (Stone, 1982). We then propose a sieve two-stage least squares estimator and establish its rate-optimality in both norms under some mild conditions. Our general results on the well-posedness and the minimax lower bounds are of independent interest to study not only other nonparametric estimators for $Q$-function but also efficient estimation on the value of any target policy in off-policy settings.

preprint2022arXiv

Simple Adaptive Estimation of Quadratic Functionals in Nonparametric IV Models

This paper considers adaptive, minimax estimation of a quadratic functional in a nonparametric instrumental variables (NPIV) model, which is an important problem in optimal estimation of a nonlinear functional of an ill-posed inverse regression with an unknown operator. We first show that a leave-one-out, sieve NPIV estimator of the quadratic functional can attain a convergence rate that coincides with the lower bound previously derived in Chen and Christensen [2018]. The minimax rate is achieved by the optimal choice of the sieve dimension (a key tuning parameter) that depends on the smoothness of the NPIV function and the degree of ill-posedness, both are unknown in practice. We next propose a Lepski-type data-driven choice of the key sieve dimension adaptive to the unknown NPIV model features. The adaptive estimator of the quadratic functional is shown to attain the minimax optimal rate in the severely ill-posed case and in the regular mildly ill-posed case, but up to a multiplicative $\sqrt{\log n}$ factor in the irregular mildly ill-posed case.

preprint2022arXiv

Stacked conductive metal organic framework nanorods for high performance vacuum electronic devices

Metal-organic frameworks (MOFs) possessing many unique features have been utilized in several fields in recent years. However, their application in field emission (FE) vacuum electronic device is hindered by their poor electrical conductivity. Herein, a novel conductive MOF of Cu-catecholate (Cu-CAT) with the nanorod length of 200 nm and conductivity of 0.01 S/cm is grown on the graphite paper (GP). Under an applied electric field, a large number of electrons can be emitted from the nanoscale emitter tips of MOF surface to the anode. The great field emission performance of Cu-CAT@GP cold cathode film including a low turn-on field of 0.59e6 V/m and ultra-high field enhancement factor of 29622, even comparable to most carbon-based materials that have been widely investigated in FE studies, is achieved in this work. Meanwhile, Cu-CAT@GP film has a good electrical stability with a current attenuation of 9.4% in two hours. The findings reveal the cathode film fabricated by conductive MOF can be a promising candidate of cold electron source for vacuum electronic applications.

preprint2020arXiv

Using LDA and LSTM Models to Study Public Opinions and Critical Groups Towards Congestion Pricing in New York City through 2007 to 2019

This study explores how people view and respond to the proposals of NYC congestion pricing evolve in time. To understand these responses, Twitter data is collected and analyzed. Critical groups in the recurrent process are detected by statistically analyzing the active users and the most mentioned accounts, and the trends of people's attitudes and concerns over the years are identified with text mining and hybrid Nature Language Processing techniques, including LDA topic modeling and LSTM sentiment classification. The result shows that multiple interest groups were involved and played crucial roles during the proposal, especially Mayor and Governor, MTA, and outer-borough representatives. The public shifted the concern of focus from the plan details to a wider city's sustainability and fairness. Furthermore, the plan's approval relies on several elements, the joint agreement reached in the political process, strong motivation in the real-world, the scheme based on balancing multiple interests, and groups' awareness of tolling's benefits and necessity.

preprint2017arXiv

Monte Carlo Confidence Sets for Identified Sets

In complicated/nonlinear parametric models, it is generally hard to know whether the model parameters are point identified. We provide computationally attractive procedures to construct confidence sets (CSs) for identified sets of full parameters and of subvectors in models defined through a likelihood or a vector of moment equalities or inequalities. These CSs are based on level sets of optimal sample criterion functions (such as likelihood or optimally-weighted or continuously-updated GMM criterions). The level sets are constructed using cutoffs that are computed via Monte Carlo (MC) simulations directly from the quasi-posterior distributions of the criterions. We establish new Bernstein-von Mises (or Bayesian Wilks) type theorems for the quasi-posterior distributions of the quasi-likelihood ratio (QLR) and profile QLR in partially-identified regular models and some non-regular models. These results imply that our MC CSs have exact asymptotic frequentist coverage for identified sets of full parameters and of subvectors in partially-identified regular models, and have valid but potentially conservative coverage in models with reduced-form parameters on the boundary. Our MC CSs for identified sets of subvectors are shown to have exact asymptotic coverage in models with singularities. We also provide results on uniform validity of our CSs over classes of DGPs that include point and partially identified models. We demonstrate good finite-sample coverage properties of our procedures in two simulation experiments. Finally, our procedures are applied to two non-trivial empirical examples: an airline entry game and a model of trade flows.

preprint2017arXiv

Optimal Sup-norm Rates and Uniform Inference on Nonlinear Functionals of Nonparametric IV Regression

This paper makes several important contributions to the literature about nonparametric instrumental variables (NPIV) estimation and inference on a structural function $h_0$ and its functionals. First, we derive sup-norm convergence rates for computationally simple sieve NPIV (series 2SLS) estimators of $h_0$ and its derivatives. Second, we derive a lower bound that describes the best possible (minimax) sup-norm rates of estimating $h_0$ and its derivatives, and show that the sieve NPIV estimator can attain the minimax rates when $h_0$ is approximated via a spline or wavelet sieve. Our optimal sup-norm rates surprisingly coincide with the optimal root-mean-squared rates for severely ill-posed problems, and are only a logarithmic factor slower than the optimal root-mean-squared rates for mildly ill-posed problems. Third, we use our sup-norm rates to establish the uniform Gaussian process strong approximations and the score bootstrap uniform confidence bands (UCBs) for collections of nonlinear functionals of $h_0$ under primitive conditions, allowing for mildly and severely ill-posed problems. Fourth, as applications, we obtain the first asymptotic pointwise and uniform inference results for plug-in sieve t-statistics of exact consumer surplus (CS) and deadweight loss (DL) welfare functionals under low-level conditions when demand is estimated via sieve NPIV. Empiricists could read our real data application of UCBs for exact CS and DL functionals of gasoline demand that reveals interesting patterns and is applicable to other markets.

preprint2016arXiv

Towards Concolic Testing for Hybrid Systems

Hybrid systems exhibit both continuous and discrete behavior. Analyzing hybrid systems is known to be hard. Inspired by the idea of concolic testing (of programs), we investigate whether we can combine random sampling and symbolic execution in order to effectively verify hybrid systems. We identify a sufficient condition under which such a combination is more effective than random sampling. Furthermore, we analyze different strategies of combining random sampling and symbolic execution and propose an algorithm which allows us to dynamically switch between them so as to reduce the overall cost. Our method has been implemented as a web-based checker named HyChecker. HyChecker has been evaluated with benchmark hybrid systems and a water treatment system in order to test its effectiveness.

preprint2015arXiv

High dimensional generalized empirical likelihood for moment restrictions with dependent data

This paper considers the maximum generalized empirical likelihood (GEL) estimation and inference on parameters identified by high dimensional moment restrictions with weakly dependent data when the dimensions of the moment restrictions and the parameters diverge along with the sample size. The consistency with rates and the asymptotic normality of the GEL estimator are obtained by properly restricting the growth rates of the dimensions of the parameters and the moment restrictions, as well as the degree of data dependence. It is shown that even in the high dimensional time series setting, the GEL ratio can still behave like a chi-square random variable asymptotically. A consistent test for the over-identification is proposed. A penalized GEL method is also provided for estimation under sparsity setting.

preprint2015arXiv

Optimal Uniform Convergence Rates and Asymptotic Normality for Series Estimators Under Weak Dependence and Weak Conditions

We show that spline and wavelet series regression estimators for weakly dependent regressors attain the optimal uniform (i.e. sup-norm) convergence rate $(n/\log n)^{-p/(2p+d)}$ of Stone (1982), where $d$ is the number of regressors and $p$ is the smoothness of the regression function. The optimal rate is achieved even for heavy-tailed martingale difference errors with finite $(2+(d/p))$th absolute moment for $d/p<2$. We also establish the asymptotic normality of t statistics for possibly nonlinear, irregular functionals of the conditional mean function under weak conditions. The results are proved by deriving a new exponential inequality for sums of weakly dependent random matrices, which is of independent interest.

preprint2014arXiv

Self-normalized Cramér Type Moderate Deviations under Dependence

We establish a Cramér-type moderate deviation result for self-normalized sums of weakly dependent random variables, where the moment requirement is much weaker than the non-self-normalized counterpart. The range of the moderate deviation is shown to depend on the moment condition and the degree of dependence of the underlying processes. We consider two types of self-normalization: the big-block-small-block scheme and the interlacing or equal-block scheme. Simulation study shows that the latter can have a better finite-sample performance. Our result is applied to multiple testing and construction of simultaneous confidence intervals for high-dimensional time series mean vectors.

Xiaohong Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Explicating Tacit Regulatory Knowledge from LLMs to Auto-Formalize Requirements for Compliance Test Case Generation

Adaptive Estimation and Uniform Confidence Bands for Nonparametric Structural Functions and Elasticities

Inference on Time Series Nonparametric Conditional Moment Restrictions Using General Sieves

Adaptive incomplete multi-view learning via tensor graph completion

On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation

Simple Adaptive Estimation of Quadratic Functionals in Nonparametric IV Models

Stacked conductive metal organic framework nanorods for high performance vacuum electronic devices

Using LDA and LSTM Models to Study Public Opinions and Critical Groups Towards Congestion Pricing in New York City through 2007 to 2019

Monte Carlo Confidence Sets for Identified Sets

Optimal Sup-norm Rates and Uniform Inference on Nonlinear Functionals of Nonparametric IV Regression

Towards Concolic Testing for Hybrid Systems

High dimensional generalized empirical likelihood for moment restrictions with dependent data

Optimal Uniform Convergence Rates and Asymptotic Normality for Series Estimators Under Weak Dependence and Weak Conditions

Self-normalized Cramér Type Moderate Deviations under Dependence