Source author record

Heng Lian

Heng Lian appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Computation Artificial Intelligence Machine Learning Computer Vision Cryptography and Security math.ST Numerical Analysis Software Engineering Statistics Theory

Catalog footprint

What is connected

21works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

Financial agents powered by large language models (LLMs) are increasingly deployed for investment analysis, risk assessment, and automated decision-making, where their abilities to plan, invoke tools, and manipulate mutable state introduce new security risks in high-stakes and highly regulated financial environments. However, existing safety evaluations largely focus on language-model-level content compliance or abstract agent settings, failing to capture execution-grounded risks arising from real operational workflows and state-changing actions. To bridge this gap, we propose FinVault, the first execution-grounded security benchmark for financial agents, comprising 31 regulatory case-driven sandbox scenarios with state-writable databases and explicit compliance constraints, together with 107 real-world vulnerabilities and 963 test cases that systematically cover prompt injection, jailbreaking, financially adapted attacks, as well as benign inputs for false-positive evaluation. Experimental results reveal that existing defense mechanisms remain ineffective in realistic financial agent settings, with average attack success rates (ASR) still reaching up to 50.0\% on state-of-the-art models and remaining non-negligible even for the most robust systems (ASR 6.7\%), highlighting the limited transferability of current safety designs and the need for stronger financial-specific defenses. Our code can be found at https://github.com/aifinlab/FinVault.

preprint2026arXiv

MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences

While autonomous software engineering (SWE) agents are reshaping programming paradigms, they currently suffer from a "closed-world" limitation: they attempt to fix bugs from scratch or solely using local context, ignoring the immense historical human experience available on platforms like GitHub. Accessing this open-world experience is hindered by the unstructured and fragmented nature of real-world issue-tracking data. In this paper, we introduce MemGovern, a framework designed to govern and transform raw GitHub data into actionable experiential memory for agents. MemGovern employs experience governance to convert human experience into agent-friendly experience cards and introduces an agentic experience search strategy that enables logic-driven retrieval of human expertise. By producing 135K governed experience cards, MemGovern achieves a significant performance boost, improving resolution rates on the SWE-bench Verified by 4.65%. As a plug-in approach, MemGovern provides a solution for agent-friendly memory infrastructure.

preprint2026arXiv

Robust Tensor Regression with Nonconvexity: Algorithmic and Statistical Theory

Tensor regression is an important tool for tensor data analysis, but existing works have not considered the impact of outliers, making them potentially sensitive to such data points. This paper proposes a low tubal rank robust regression method for analyzing high-dimensional tensor data with heavy-tailed random noise. The proposed method is based on a nonconvex relaxation of the tensor tubal rank within a general optimization framework, which allows for nonconvexity in both the loss and penalty functions. We develop an implementable estimation algorithm and establish its global convergence under some mild assumptions. Furthermore, we provide general statistical theories regarding stationary point, including the rates of convergence and bounds on the prediction error. These theoretical results cover many important models, such as linear models, generalized linear models, and Huber regression, and even encompass some nonconvex losses like correntropy and minimum distance criterion-induced losses. Supportive numerical evidence is provided through simulations and application studies.

preprint2022arXiv

Nonparametric Quantile Regression for Homogeneity Pursuit in Panel Data Models

Many panel data have the latent subgroup effect on individuals, and it is important to correctly identify these groups since the efficiency of resulting estimators can be improved significantly by pooling the information of individuals within each group. However, the currently assumed parametric and semiparametric relationship between the response and predictors may be misspecified, which leads to a wrong grouping result, and the nonparametric approach hence can be considered to avoid such mistakes. Moreover, the response may depend on predictors in different ways at various quantile levels, and the corresponding grouping structure may also vary. To tackle these problems, this article proposes a nonparametric quantile regression method for homogeneity pursuit in panel data models with individual effects, and a pairwise fused penalty is used to automatically select the number of groups. The asymptotic properties are established, and an ADMM algorithm is also developed. The finite sample performance is evaluated by simulation experiments, and the usefulness of the proposed methodology is further illustrated by an empirical example.

preprint2022arXiv

Online Deep Learning from Doubly-Streaming Data

This paper investigates a new online learning problem with doubly-streaming data, where the data streams are described by feature spaces that constantly evolve, with new features emerging and old features fading away. The challenges of this problem are two folds: 1) Data samples ceaselessly flowing in may carry shifted patterns over time, requiring learners to update hence adapt on-the-fly. 2) Newly emerging features are described by very few samples, resulting in weak learners that tend to make error predictions. A plausible idea to overcome the challenges is to establish relationship between the pre-and-post evolving feature spaces, so that an online learner can leverage the knowledge learned from the old features to better the learning performance on the new features. Unfortunately, this idea does not scale up to high-dimensional media streams with complex feature interplay, which suffers an tradeoff between onlineness (biasing shallow learners) and expressiveness(requiring deep learners). Motivated by this, we propose a novel OLD^3S paradigm, where a shared latent subspace is discovered to summarize information from the old and new feature spaces, building intermediate feature mapping relationship. A key trait of OLD^3S is to treat the model capacity as a learnable semantics, yields optimal model depth and parameters jointly, in accordance with the complexity and non-linearity of the input data streams in an online fashion. Both theoretical analyses and empirical studies substantiate the viability and effectiveness of our proposal.

preprint2015arXiv

Greedy Forward Regression for Variable Screening

Two popular variable screening methods under the ultra-high dimensional setting with the desirable sure screening property are the sure independence screening (SIS) and the forward regression (FR). Both are classical variable screening methods and recently have attracted greater attention under the new light of high-dimensional data analysis. We consider a new and simple screening method that incorporates multiple predictors in each step of forward regression, with decision on which variables to incorporate based on the same criterion. If only one step is carried out, it actually reduces to the SIS. Thus it can be regarded as a generalization and unification of the FR and the SIS. More importantly, it preserves the sure screening property and has similar computational complexity as FR in each step, yet it can discover the relevant covariates in fewer steps. Thus, it reduces the computational burden of FR drastically while retaining advantages of the latter over SIS. Furthermore, we show that it can find all the true variables if the number of steps taken is the same as the correct model size, even when using the original FR. An extensive simulation study and application to two real data examples demonstrate excellent performance of the proposed method.

preprint2014arXiv

Variable Selection and Estimation for Partially Linear Single-index Models with Longitudinal Data

In this paper, we consider the partially linear single-index models with longitudinal data. To deal with the variable selection problem in this context, we propose a penalized procedure combined with two bias correction methods, resulting in the bias-corrected generalized estimating equation (GEE) and the bias-corrected quadratic inference function (QIF), which can take into account the correlations. Asymptotic properties of these methods are demonstrated. We also evaluate the finite sample performance of the proposed methods via Monte Carlo simulation studies and a real data analysis.

preprint2013arXiv

Bayesian Quantile Regression for Partially Linear Additive Models

In this article, we develop a semiparametric Bayesian estimation and model selection approach for partially linear additive models in conditional quantile regression. The asymmetric Laplace distribution provides a mechanism for Bayesian inferences of quantile regression models based on the check loss. The advantage of this new method is that nonlinear, linear and zero function components can be separated automatically and simultaneously during model fitting without the need of pre-specification or parameter tuning. This is achieved by spike-and-slab priors using two sets of indicator variables. For posterior inferences, we design an effective partially collapsed Gibbs sampler. Simulation studies are used to illustrate our algorithm. The proposed approach is further illustrated by applications to two real data sets.

preprint2013arXiv

Reduced-rank Regression in Sparse Multivariate Varying-Coefficient Models with High-dimensional Covariates

In genetic studies, not only can the number of predictors obtained from microarray measurements be extremely large, there can also be multiple response variables. Motivated by such a situation, we consider semiparametric dimension reduction methods in sparse multivariate regression models. Previous studies on joint variable and rank selection have focused on parametric models while here we consider the more challenging varying-coefficient models which make the investigation on nonlinear interactions of variables possible. Spline approximation, rank constraints and concave group penalties are utilized for model estimation. Asymptotic oracle properties of the estimators are presented. We also propose reduced-rank independent screening to deal with the situation when the dimension is so high that penalized estimation cannot be efficiently applied. In simulations, we show the advantages of simultaneously performing variable and rank selection. A real data set is analyzed to illustrate the good prediction performance when incorporating interactions between genetic variables and an index variable.

preprint2012arXiv

Functional Partial Linear Model

When predicting scalar responses in the situation where the explanatory variables are functions, it is sometimes the case that some functional variables are related to responses linearly while other variables have more complicated relationships with the responses. In this paper, we propose a new semi-parametric model to take advantage of both parametric and nonparametric functional modeling. Asymptotic properties of the proposed estimators are established and finite sample behavior is investigated through a small simulation experiment.

preprint2012arXiv

Minimax Prediction for Functional Linear Regression with Functional Responses in Reproducing Kernel Hilbert Spaces

In this article, we consider convergence rates in functional linear regression with functional responses, where the linear coefficient lies in a reproducing kernel Hilbert space (RKHS). Without assuming that the reproducing kernel and the covariate covariance kernel are aligned, or assuming polynomial rate of decay of the eigenvalues of the covariance kernel, convergence rates in prediction risk are established. The corresponding lower bound in rates is derived by reducing to the scalar response case. Simulation studies and two benchmark datasets are used to illustrate that the proposed approach can significantly outperform the functional PCA approach in prediction.

preprint2011arXiv

Bayesian Quantile Regression for Single-Index Models

Using an asymmetric Laplace distribution, which provides a mechanism for Bayesian inference of quantile regression models, we develop a fully Bayesian approach to fitting single-index models in conditional quantile regression. In this work, we use a Gaussian process prior for the unknown nonparametric link function and a Laplace distribution on the index vector, with the latter motivated by the recent popularity of the Bayesian lasso idea. We design a Markov chain Monte Carlo algorithm for posterior inference. Careful consideration of the singularity of the kernel matrix, and tractability of some of the full conditional distributions leads to a partially collapsed approach where the nonparametric link function is integrated out in some of the sampling steps. Our simulations demonstrate the superior performance of the Bayesian method versus the frequentist approach. The method is further illustrated by an application to the hurricane data.

preprint2011arXiv

Bias-corrected GEE estimation and smooth-threshold GEE variable selection for single-index models with clustered data

In this paper, we present a generalized estimating equations based estimation approach and a variable selection procedure for single-index models when the observed data are clustered. Unlike the case of independent observations, bias-correction is necessary when general working correlation matrices are used in the estimating equations. Our variable selection procedure based on smooth-threshold estimating equations \citep{Ueki-2009} can automatically eliminate irrelevant parameters by setting them as zeros and is computationally simpler than alternative approaches based on shrinkage penalty. The resulting estimator consistently identifies the significant variables in the index, even when the working correlation matrix is misspecified. The asymptotic property of the estimator is the same whether or not the nonzero parameters are known (in both cases we use the same estimating equations), thus achieving the oracle property in the sense of \cite{Fan-Li-2001}. The finite sample properties of the estimator are illustrated by some simulation examples, as well as a real data application.

preprint2011arXiv

Convergence of Nonparametric Functional Regression Estimates with Functional Responses

We consider nonparametric functional regression when both predictors and responses are functions. More specifically, we let $(X_1,Y_1),...,(X_n,Y_n)$ be random elements in $\mathcal{F}\times\mathcal{H}$ where $\mathcal{F}$ is a semi-metric space and $\mathcal{H}$ is a separable Hilbert space. Based on a recently introduced notion of weak dependence for functional data, we showed the almost sure convergence rates of both the Nadaraya-Watson estimator and the nearest neighbor estimator, in a unified manner. Several factors, including functional nature of the responses, the assumptions on the functional variables using the Orlicz norm and the desired generality on weakly dependent data, make the theoretical investigations more challenging and interesting.

preprint2011arXiv

Gaussian process single-index models as emulators for computer experiments

A single-index model (SIM) provides for parsimonious multi-dimensional nonlinear regression by combining parametric (linear) projection with univariate nonparametric (non-linear) regression models. We show that a particular Gaussian process (GP) formulation is simple to work with and ideal as an emulator for some types of computer experiment as it can outperform the canonical separable GP regression model commonly used in this setting. Our contribution focuses on drastically simplifying, re-interpreting, and then generalizing a recently proposed fully Bayesian GP-SIM combination, and then illustrating its favorable performance on synthetic data and a real-data computer experiment. Two R packages, both released on CRAN, have been augmented to facilitate inference under our proposed model(s).

preprint2011arXiv

Semiparametric Bayesian Information Criterion for Model Selection in Ultra-high Dimensional Additive Models

For linear models with a diverging number of parameters, it has recently been shown that modified versions of Bayesian information criterion (BIC) can identify the true model consistently. However, in many cases there is little justification that the effects of the covariates are actually linear. Thus a semiparametric model such as the additive model studied here, is a viable alternative. We demonstrate that theoretical results on the consistency of BIC-type criterion can be extended to this more challenging situation, with dimension diverging exponentially fast with sample size. Besides, the noise assumptions are relaxed in our theoretical studies. These efforts significantly enlarge the applicability of the criterion to a more general class of models.

preprint2011arXiv

Shrinkage Estimation and Selection for Multiple Functional Regression

Functional linear regression is a useful extension of simple linear regression and has been investigated by many researchers. However, functional variable selection problems when multiple functional observations exist, which is the counterpart in the functional context of multiple linear regression, is seldom studied. Here we propose a method using group smoothly clipped absolute deviation penalty (gSCAD) which can perform regression estimation and variable selection simultaneously. We show the method can identify the true model consistently and discuss construction of pointwise confidence interval for the estimated functional coefficients. Our methodology and theory is verified by simulation studies as well as an application to spectrometrics data.

preprint2010arXiv

A simple and efficient algorithm for fused lasso signal approximator with convex loss function

We consider the augmented Lagrangian method (ALM) as a solver for the fused lasso signal approximator (FLSA) problem. The ALM is a dual method in which squares of the constraint functions are added as penalties to the Lagrangian. In order to apply this method to FLSA, two types of auxiliary variables are introduced to transform the original unconstrained minimization problem into a linearly constrained minimization problem. Each updating in this iterative algorithm consists of just a simple one-dimensional convex programming problem, with closed form solution in many cases. While the existing literature mostly focused on the quadratic loss function, our algorithm can be easily implemented for general convex loss. The most attractive feature of this algorithm is its simplicity in implementation compared to other existing fast solvers. We also provide some convergence analysis of the algorithm. Finally, the method is illustrated with some simulation datasets.

preprint2010arXiv

Flexible Shrinkage Estimation in High-Dimensional Varying Coefficient Models

We consider the problem of simultaneous variable selection and constant coefficient identification in high-dimensional varying coefficient models based on B-spline basis expansion. Both objectives can be considered as some type of model selection problems and we show that they can be achieved by a double shrinkage strategy. We apply the adaptive group Lasso penalty in models involving a diverging number of covariates, which can be much larger than the sample size, but we assume the number of relevant variables is smaller than the sample size via model sparsity. Such so-called ultra-high dimensional settings are especially challenging in semiparametric models as we consider here and has not been dealt with before. Under suitable conditions, we show that consistency in terms of both variable selection and constant coefficient identification can be achieved, as well as the oracle property of the constant coefficients. Even in the case that the zero and constant coefficients are known a priori, our results appear to be new in that it reduces to semivarying coefficient models (a.k.a. partially linear varying coefficient models) with a diverging number of covariates. We also theoretically demonstrate the consistency of a semiparametric BIC-type criterion in this high-dimensional context, extending several previous results. The finite sample behavior of the estimator is evaluated by some Monte Carlo studies.

preprint2010arXiv

Gaussian Process Models for Nonparametric Functional Regression with Functional Responses

Recently nonparametric functional model with functional responses has been proposed within the functional reproducing kernel Hilbert spaces (fRKHS) framework. Motivated by its superior performance and also its limitations, we propose a Gaussian process model whose posterior mode coincide with the fRKHS estimator. The Bayesian approach has several advantages compared to its predecessor. Firstly, the multiple unknown parameters can be inferred together with the regression function in a unified framework. Secondly, as a Bayesian method, the statistical inferences are straightforward through the posterior distributions. We also use the predictive process models adapted from the spatial statistics literature to overcome the computational limitations, thus extending the applicability of this popular technique to a new problem. Modifications of predictive process models are nevertheless critical in our context to obtain valid inferences. The numerical results presented demonstrate the effectiveness of the modifications.

preprint2009arXiv

Total Variation, Adaptive Total Variation and Nonconvex Smoothly Clipped Absolute Deviation Penalty for Denoising Blocky Images

The total variation-based image denoising model has been generalized and extended in numerous ways, improving its performance in different contexts. We propose a new penalty function motivated by the recent progress in the statistical literature on high-dimensional variable selection. Using a particular instantiation of the majorization-minimization algorithm, the optimization problem can be efficiently solved and the computational procedure realized is similar to the spatially adaptive total variation model. Our two-pixel image model shows theoretically that the new penalty function solves the bias problem inherent in the total variation model. The superior performance of the new penalty is demonstrated through several experiments. Our investigation is limited to "blocky" images which have small total variation.

Heng Lian

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences

Robust Tensor Regression with Nonconvexity: Algorithmic and Statistical Theory

Nonparametric Quantile Regression for Homogeneity Pursuit in Panel Data Models

Online Deep Learning from Doubly-Streaming Data

Greedy Forward Regression for Variable Screening

Variable Selection and Estimation for Partially Linear Single-index Models with Longitudinal Data

Bayesian Quantile Regression for Partially Linear Additive Models

Reduced-rank Regression in Sparse Multivariate Varying-Coefficient Models with High-dimensional Covariates

Functional Partial Linear Model

Minimax Prediction for Functional Linear Regression with Functional Responses in Reproducing Kernel Hilbert Spaces

Bayesian Quantile Regression for Single-Index Models

Bias-corrected GEE estimation and smooth-threshold GEE variable selection for single-index models with clustered data

Convergence of Nonparametric Functional Regression Estimates with Functional Responses

Gaussian process single-index models as emulators for computer experiments

Semiparametric Bayesian Information Criterion for Model Selection in Ultra-high Dimensional Additive Models

Shrinkage Estimation and Selection for Multiple Functional Regression

A simple and efficient algorithm for fused lasso signal approximator with convex loss function

Flexible Shrinkage Estimation in High-Dimensional Varying Coefficient Models

Gaussian Process Models for Nonparametric Functional Regression with Functional Responses

Total Variation, Adaptive Total Variation and Nonconvex Smoothly Clipped Absolute Deviation Penalty for Denoising Blocky Images