Source author record

Wenyang Zhang

Wenyang Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology math.ST Statistics Theory

Catalog footprint

What is connected

7works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Minimax Optimal Robust Sparse Regression with Heavy-Tailed Designs: A Gradient-Based Approach

We investigate high-dimensional sparse regression when both the noise and the design matrix exhibit heavy-tailed behavior. Standard algorithms typically fail in this regime, as heavy-tailed covariates distort the empirical risk geometry. We propose a unified framework, Robust Iterative Gradient descent with Hard Thresholding (RIGHT), which employs a robust gradient estimator to bypass the need for higher-order moment conditions. Our analysis reveals a fundamental decoupling phenomenon: in linear regression, the estimation error rate is governed by the noise tail index, while the sample complexity required for stability is governed by the design tail index. This implies that while heavy-tailed noise limits precision, heavy-tailed designs primarily raise the sample size barrier for convergence. In contrast, for logistic regression, we show that the bounded gradient naturally robustifies the estimator against heavy-tailed designs, restoring standard parametric rates. We derive matching minimax lower bounds to prove that RIGHT achieves optimal estimation accuracy and sample complexity across these regimes, without requiring sample splitting or the existence of the population risk.

preprint2022arXiv

Model Averaging based Semiparametric Modelling for Conditional Quantile Prediction

In real data analysis, the underlying model is usually unknown, modelling strategy plays a key role in the success of data analysis. Stimulated by the idea of model averaging, we propose a novel semiparametric modelling strategy for conditional quantile prediction, without assuming the underlying model is any specific parametric or semiparametric model. Thanks the optimality of the selected weights by cross-validation, the proposed modelling strategy results in a more accurate prediction than that based on some commonly used semiparametric models, such as the varying coefficient models and additive models. Asymptotic properties are established of the proposed modelling strategy together with its estimation procedure. Intensive simulation studies are conducted to demonstrate how well the proposed method works, compared with its alternatives under various circumstances. The results show the proposed method indeed leads to more accurate predictions than its alternatives. Finally, the proposed modelling strategy together with its prediction procedure are applied to the Boston housing data, which result in more accurate predictions of the quantiles of the house prices than that based on some commonly used alternative methods, therefore, present us a more accurate picture of the housing market in Boston.

preprint2020arXiv

Estimation and Inference for Multi-Kink Quantile Regression

The Multi-Kink Quantile Regression (MKQR) model is an important tool for analyzing data with heterogeneous conditional distributions, especially when quantiles of response variable are of interest, due to its robustness to outliers and heavy-tailed errors in the response. It assumes different linear quantile regression forms in different regions of the domain of the threshold covariate but are still continuous at kink points. In this paper, we investigate parameter estimation, kink point detection and statistical inference in MKQR models. We propose an iterative segmented quantile regression algorithm for estimating both the regression coefficients and the locations of kink points. The proposed algorithm is much more computationally efficient than the grid search algorithm and not sensitive to the selection of initial values. Asymptotic properties, such as selection consistency of the number of kink points, asymptotic normality of the estimators of both regression coefficients and kink effects, are established to justify the proposed method theoretically. A score test, based on partial subgradients, is developed to verify whether the kink effects exist or not. Test-inversion confidence intervals for kink location parameters are also constructed. Intensive simulation studies conducted show the proposed methods work very well when sample size is finite. Finally, we apply the MKQR models together with the proposed methods to the dataset about secondary industrial structure of China and the dataset about triceps skinfold thickness of Gambian females, which leads to some very interesting findings. A new R package MultiKink is developed to implement the proposed methods.

preprint2016arXiv

Factor Models for Asset Returns Based on Transformed Factors

The Fama-French three factor models are commonly used in the description of asset returns in finance. Statistically speaking, the Fama-French three factor models imply that the return of an asset can be accounted for directly by the Fama-French three factors, i.e. market, size and value factor, through a linear function. A natural question is: would some kind of transformed Fama-French three factors work better than the three factors? If so, what kind of transformation should be imposed on each factor in order to make the transformed three factors better account for asset returns? In this paper, we are going to address these questions through nonparametric modelling. We propose a data driven approach to construct the transformation for each factor concerned. A generalised maximum likelihood ratio based hypothesis test is also proposed to test whether transformations on the Fama-French three factors are needed for a given data set. Asymptotic properties are established to justify the proposed methods. Intensive simulation studies are conducted to show how the proposed methods work when sample size is finite. Finally, we apply the proposed methods to a real data set, which leads to some interesting findings.

preprint2015arXiv

A Dynamic Structure for High Dimensional Covariance Matrices and its Application in Portfolio Allocation

Estimation of high dimensional covariance matrices is an interesting and important research topic. In this paper, we propose a dynamic structure and develop an estimation procedure for high dimensional covariance matrices. Asymptotic properties are derived to justify the estimation procedure and simulation studies are conducted to demonstrate its performance when the sample size is finite. By exploring a financial application, an empirical study shows that portfolio allocation based on dynamic high dimensional covariance matrices can significantly outperform the market from 1995 to 2014. Our proposed method also outperforms portfolio allocation based on the sample covariance matrix and the portfolio allocation proposed in Fan, Fan and Lv (2008).

preprint2015arXiv

Model selection and structure specification in ultra-high dimensional generalised semi-varying coefficient models

In this paper, we study the model selection and structure specification for the generalised semi-varying coefficient models (GSVCMs), where the number of potential covariates is allowed to be larger than the sample size. We first propose a penalised likelihood method with the LASSO penalty function to obtain the preliminary estimates of the functional coefficients. Then, using the quadratic approximation for the local log-likelihood function and the adaptive group LASSO penalty (or the local linear approximation of the group SCAD penalty) with the help of the preliminary estimation of the functional coefficients, we introduce a novel penalised weighted least squares procedure to select the significant covariates and identify the constant coefficients among the coefficients of the selected covariates, which could thus specify the semiparametric modelling structure. The developed model selection and structure specification approach not only inherits many nice statistical properties from the local maximum likelihood estimation and nonconcave penalised likelihood method, but also computationally attractive thanks to the computational algorithm that is proposed to implement our method. Under some mild conditions, we establish the asymptotic properties for the proposed model selection and estimation procedure such as the sparsity and oracle property. We also conduct simulation studies to examine the finite sample performance of the proposed method, and finally apply the method to analyse a real data set, which leads to some interesting findings.

preprint2014arXiv

A semiparametric spatial dynamic model

Stimulated by the Boston house price data, in this paper, we propose a semiparametric spatial dynamic model, which extends the ordinary spatial autoregressive models to accommodate the effects of some covariates associated with the house price. A profile likelihood based estimation procedure is proposed. The asymptotic normality of the proposed estimators are derived. We also investigate how to identify the parametric/nonparametric components in the proposed semiparametric model. We show how many unknown parameters an unknown bivariate function amounts to, and propose an AIC/BIC of nonparametric version for model selection. Simulation studies are conducted to examine the performance of the proposed methods. The simulation results show our methods work very well. We finally apply the proposed methods to analyze the Boston house price data, which leads to some interesting findings.

Wenyang Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Minimax Optimal Robust Sparse Regression with Heavy-Tailed Designs: A Gradient-Based Approach

Model Averaging based Semiparametric Modelling for Conditional Quantile Prediction

Estimation and Inference for Multi-Kink Quantile Regression

Factor Models for Asset Returns Based on Transformed Factors

A Dynamic Structure for High Dimensional Covariance Matrices and its Application in Portfolio Allocation

Model selection and structure specification in ultra-high dimensional generalised semi-varying coefficient models

A semiparametric spatial dynamic model