Researcher profile

Han Lin Shang

Han Lin Shang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2026arXiv

On the Distributed Estimation for Scalar-on-Function Regression Models

This paper proposes distributed estimation procedures for three scalar-on-function regression models: the functional linear model (FLM), the functional non-parametric model (FNPM), and the functional partial linear model (FPLM). The framework addresses two key challenges in functional data analysis, namely the high computational cost of large samples and limitations on sharing raw data across institutions. Monte Carlo simulations show that the distributed estimators substantially reduce computation time while preserving high estimation and prediction accuracy for all three models. When block sizes become too small, the FPLM exhibits overfitting, leading to narrower prediction intervals and reduced empirical coverage probability. An example of an empirical study using the \textit{tecator} dataset further supports these findings.

preprint2026arXiv

Testing for integer integration in functional time series

We develop a statistical testing procedure to examine whether the curve-valued time series of interest is integrated of order d for an integer d. The proposed procedure can distinguish between integer-integrated time series and fractionally-integrated ones, and it has broad applicability in practice. Monte Carlo simulation experiments show that the proposed testing procedure performs reasonably well. We apply our methodology to Canadian yield curve data and French sub-national age-specific mortality data. We find evidence that these time series are mostly integrated of order one, while some have fractional orders exceeding or falling below one.

preprint2022arXiv

A Robust Functional Partial Least Squares for Scalar-on-Multiple-Function Regression

The scalar-on-function regression model has become a popular analysis tool to explore the relationship between a scalar response and multiple functional predictors. Most of the existing approaches to estimate this model are based on the least-squares estimator, which can be seriously affected by outliers in empirical datasets. When outliers are present in the data, it is known that the least-squares-based estimates may not be reliable. This paper proposes a robust functional partial least squares method, allowing a robust estimate of the regression coefficients in a scalar-on-multiple-function regression model. In our method, the functional partial least squares components are computed via the partial robust M-regression. The predictive performance of the proposed method is evaluated using several Monte Carlo experiments and two chemometric datasets: glucose concentration spectrometric data and sugar process data. The results produced by the proposed method are compared favorably with some of the classical functional or multivariate partial least squares and functional principal component analysis methods.

preprint2022arXiv

A robust scalar-on-function logistic regression for classification

Scalar-on-function logistic regression, where the response is a binary outcome and the predictor consists of random curves, has become a general framework to explore a linear relationship between the binary outcome and functional predictor. Most of the methods used to estimate this model are based on the least-squares type estimators. However, the least-squares estimator is seriously hindered by outliers, leading to biased parameter estimates and an increased probability of misclassification. This paper proposes a robust partial least squares method to estimate the regression coefficient function in the scalar-on-function logistic regression. The regression coefficient function represented by functional partial least squares decomposition is estimated by a weighted likelihood method, which downweighs the effect of outliers in the response and predictor. The estimation and classification performance of the proposed method is evaluated via a series of Monte Carlo experiments and a strawberry puree data set. The results obtained from the proposed method are compared favorably with existing methods.

preprint2022arXiv

Clustering and Forecasting Multiple Functional Time Series

Modelling and forecasting homogeneous age-specific mortality rates of multiple countries could lead to improvements in long-term forecasting. Data fed into joint models are often grouped according to nominal attributes, such as geographic regions, ethnic groups, and socioeconomic status, which may still contain heterogeneity and deteriorate the forecast results. Our paper proposes a novel clustering technique to pursue homogeneity among multiple functional time series based on functional panel data modelling to address this issue. Using a functional panel data model with fixed effects, we can extract common functional time series features. These common features could be decomposed into two components: the functional time trend and the mode of variations of functions (functional pattern). The functional time trend reflects the dynamics across time, while the functional pattern captures the fluctuations within curves. The proposed clustering method searches for homogeneous age-specific mortality rates of multiple countries by accounting for both the modes of variations and the temporal dynamics among curves. We demonstrate that the proposed clustering technique outperforms other existing methods through a Monte Carlo simulation and could handle complicated cases with slow decaying eigenvalues. In empirical data analysis, we find that the clustering results of age-specific mortality rates can be explained by the combination of geographic region, ethnic groups, and socioeconomic status. We further show that our model produces more accurate forecasts than several benchmark methods in forecasting age-specific mortality rates.

preprint2022arXiv

Factor-augmented model for functional data

We propose modeling raw functional data as a mixture of a smooth function and a high-dimensional factor component. The conventional approach to retrieving the smooth function from the raw data is through various smoothing techniques. However, the smoothing model is inadequate to recover the smooth curve or capture the data variation in some situations. These include cases where there is a large amount of measurement error, the smoothing basis functions are incorrectly identified, or the step jumps in the functional mean levels are neglected. A factor-augmented smoothing model is proposed to address these challenges, and an iterative numerical estimation approach is implemented in practice. Including the factor model component in the proposed method solves the aforementioned problems since a few common factors often drive the variation that cannot be captured by the smoothing model. Asymptotic theorems are also established to demonstrate the effects of including factor structures on the smoothing results. Specifically, we show that the smoothing coefficients projected on the complement space of the factor loading matrix are asymptotically normal. As a byproduct of independent interest, an estimator for the population covariance matrix of the raw data is presented based on the proposed model. Extensive simulation studies illustrate that these factor adjustments are essential in improving estimation accuracy and avoiding the curse of dimensionality. The superiority of our model is also shown in modeling Australian temperature data.

preprint2022arXiv

Forecasting: theory and practice

Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.

preprint2021arXiv

Double bootstrapping for visualising the distribution of descriptive statistics of functional data

We propose a double bootstrap procedure for reducing coverage error in the confidence intervals of descriptive statistics for independent and identically distributed functional data. Through a series of Monte Carlo simulations, we compare the finite sample performance of single and double bootstrap procedures for estimating the distribution of descriptive statistics for independent and identically distributed functional data. At the cost of longer computational time, the double bootstrap with the same bootstrap method reduces confidence level error and provides improved coverage accuracy than the single bootstrap. Illustrated by a Canadian weather station data set, the double bootstrap procedure presents a tool for visualising the distribution of the descriptive statistics for the functional data.

preprint2021arXiv

Factor-augmented Smoothing Model for Functional Data

We propose modeling raw functional data as a mixture of a smooth function and a highdimensional factor component. The conventional approach to retrieving the smooth function from the raw data is through various smoothing techniques. However, the smoothing model is not adequate to recover the smooth curve or capture the data variation in some situations. These include cases where there is a large amount of measurement error, the smoothing basis functions are incorrectly identified, or the step jumps in the functional mean levels are neglected. To address these challenges, a factor-augmented smoothing model is proposed, and an iterative numerical estimation approach is implemented in practice. Including the factor model component in the proposed method solves the aforementioned problems since a few common factors often drive the variation that cannot be captured by the smoothing model. Asymptotic theorems are also established to demonstrate the effects of including factor structures on the smoothing results. Specifically, we show that the smoothing coefficients projected on the complement space of the factor loading matrix is asymptotically normal. As a byproduct of independent interest, an estimator for the population covariance matrix of the raw data is presented based on the proposed model. Extensive simulation studies illustrate that these factor adjustments are essential in improving estimation accuracy and avoiding the curse of dimensionality. The superiority of our model is also shown in modeling Canadian weather data and Australian temperature data.

preprint2020arXiv

A comparison of Hurst exponent estimators in long-range dependent curve time series

The Hurst exponent is the simplest numerical summary of self-similar long-range dependent stochastic processes. We consider the estimation of Hurst exponent in long-range dependent curve time series. Our estimation method begins by constructing an estimate of the long-run covariance function, which we use, via dynamic functional principal component analysis, in estimating the orthonormal functions spanning the dominant sub-space of functional time series. Within the context of functional autoregressive fractionally integrated moving average models, we compare finite-sample bias, variance and mean square error among some time- and frequency-domain Hurst exponent estimators and make our recommendations.

preprint2020arXiv

A comparison of parameter estimation in function-on-function regression

Recent technological developments have enabled us to collect complex and high-dimensional data in many scientific fields, such as population health, meteorology, econometrics, geology, and psychology. It is common to encounter such datasets collected repeatedly over a continuum. Functional data, whose sample elements are functions in the graphical forms of curves, images, and shapes, characterize these data types. Functional data analysis techniques reduce the complex structure of these data and focus on the dependences within and (possibly) between the curves. A common research question is to investigate the relationships in regression models that involve at least one functional variable. However, the performance of functional regression models depends on several factors, such as the smoothing technique, the number of basis functions, and the estimation method. This paper provides a selective comparison for function-on-function regression models where both the response and predictor(s) are functions, to determine the optimal choice of basis function from a set of model evaluation criteria. We also propose a bootstrap method to construct a confidence interval for the response function. The numerical comparisons are implemented through Monte Carlo simulations and two real data examples.

preprint2020arXiv

Forecasting multiple functional time series in a group structure: an application to mortality

When modeling sub-national mortality rates, we should consider three features: (1) how to incorporate any possible correlation among sub-populations to potentially improve forecast accuracy through multi-population joint modeling; (2) how to reconcile sub-national mortality forecasts so that they aggregate adequately across various levels of a group structure; (3) among the forecast reconciliation methods, how to combine their forecasts to achieve improved forecast accuracy. To address these issues, we introduce an extension of grouped univariate functional time series method. We first consider a multivariate functional time series method to jointly forecast multiple related series. We then evaluate the impact and benefit of using forecast combinations among the forecast reconciliation methods. Using the Japanese regional age-specific mortality rates, we investigate one-step-ahead to 15-step-ahead point and interval forecast accuracies of our proposed extension and make recommendations.

preprint2020arXiv

Functional linear models for interval-valued data

Aggregation of large databases in a specific format is a frequently used process to make the data easily manageable. Interval-valued data is one of the data types that is generated by such an aggregation process. Using traditional methods to analyze interval-valued data results in loss of information, and thus, several interval-valued data models have been proposed to gather reliable information from such data types. On the other hand, recent technological developments have led to high dimensional and complex data in many application areas, which may not be analyzed by traditional techniques. Functional data analysis is one of the most commonly used techniques to analyze such complex datasets. While the functional extensions of much traditional statistical techniques are available, the functional form of the interval-valued data has not been studied well. This paper introduces the functional forms of some well-known regression models that take interval-valued data. The proposed methods are based on the function-on-function regression model, where both the response and predictor/s are functional. Through several Monte Carlo simulations and empirical data analysis, the finite sample performance of the proposed methods is evaluated and compared with the state-of-the-art.

preprint2020arXiv

Retiree mortality forecasting: A partial age-range or a full age-range model?

An essential input of annuity pricing is the future retiree mortality. From observed age-specific mortality data, modeling and forecasting can be taken place in two routes. On the one hand, we can first truncate the available data to retiree ages and then produce mortality forecasts based on a partial age-range model. On the other hand, with all available data, we can first apply a full age-range model to produce forecasts and then truncate the mortality forecasts to retiree ages. We investigate the difference in modeling the logarithmic transformation of the central mortality rates between a partial age-range and a full age-range model, using data from mainly developed countries in the Human Mortality Database (2020). By evaluating and comparing the short-term point and interval forecast accuracies, we recommend the first strategy by truncating all available data to retiree ages and then produce mortality forecasts. However, when considering the long-term forecasts, it is unclear which strategy is better since it is more difficult to find a model and parameters that are optimal. This is a disadvantage of using methods based on time series extrapolation for long-term forecasting. Instead, an expectation approach, in which experts set a future target, could be considered, noting that this method has also had limited success in the past.

preprint2020arXiv

Synergy in fertility forecasting: Improving forecast accuracy through model averaging

Accuracy in fertility forecasting has proved challenging and warrants renewed attention. One way to improve accuracy is to combine the strengths of a set of existing models through model averaging. The model-averaged forecast is derived using empirical model weights that optimise forecast accuracy at each forecast horizon based on historical data. We apply model averaging to fertility forecasting for the first time, using data for 17 countries and six models. Four model-averaging methods are compared: frequentist, Bayesian, model confidence set, and equal weights. We compute individual-model and model-averaged point and interval forecasts at horizons of one to 20 years. We demonstrate gains in average accuracy of 4-23\% for point forecasts and 3-24\% for interval forecasts, with greater gains from the frequentist and equal-weights approaches at longer horizons. Data for England \& Wales are used to illustrate model averaging in forecasting age-specific fertility to 2036. The advantages and further potential of model averaging for fertility forecasting are discussed. As the accuracy of model-averaged forecasts depends on the accuracy of the individual models, there is ongoing need to develop better models of fertility for use in forecasting and model averaging. We conclude that model averaging holds considerable promise for the improvement of fertility forecasting in a systematic way using existing models and warrants further investigation.

preprint2019arXiv

Dynamic principal component regression for forecasting functional time series in a group structure

When generating social policies and pricing annuity at national and subnational levels, it is essential both to forecast mortality accurately and ensure that forecasts at the subnational level add up to the forecasts at the national level. This has motivated recent developments in forecasting functional time series in a group structure, where static principal component analysis is used. In the presence of moderate to strong temporal dependence, static principal component analysis designed for independent and identically distributed functional data may be inadequate. Thus, through using the dynamic functional principal component analysis, we consider a functional time series forecasting method with static and dynamic principal component regression to forecast each series in a group structure. Through using the regional age-specific mortality rates in Japan obtained from the Japanese Mortality Database (2019), we investigate the point and interval forecast accuracies of our proposed extension, and subsequently make recommendations.

preprint2019arXiv

Forecasting age distribution of death counts: An application to annuity pricing

We consider a compositional data analysis approach to forecasting the age distribution of death counts. Using the age-specific period life-table death counts in Australia obtained from the Human Mortality Database, the compositional data analysis approach produces more accurate one- to 20-step-ahead point and interval forecasts than Lee-Carter method, Hyndman-Ullah method, and two naïve random walk methods. The improved forecast accuracy of period life-table death counts is of great interest to demographers for estimating survival probabilities and life expectancy, and to actuaries for determining temporary annuity prices for various ages and maturities. Although we focus on temporary annuity prices, we consider long-term contracts which make the annuity almost lifetime, in particular when the age at entry is sufficiently high.

preprint2019arXiv

Forecasting functional time series using weighted likelihood methodology

Functional time series whose sample elements are recorded sequentially over time are frequently encountered with increasing technology. Recent studies have shown that analyzing and forecasting of functional time series can be performed easily using functional principal component analysis and existing univariate/multivariate time series models. However, the forecasting performance of such functional time series models may be affected by the presence of outlying observations which are very common in many scientific fields. Outliers may distort the functional time series model structure, and thus, the underlying model may produce high forecast errors. We introduce a robust forecasting technique based on weighted likelihood methodology to obtain point and interval forecasts in functional time series in the presence of outliers. The finite sample performance of the proposed method is illustrated by Monte Carlo simulations and four real-data examples. Numerical results reveal that the proposed method exhibits superior performance compared with the existing method(s).

preprint2019arXiv

Implied volatility surface predictability: the case of commodity markets

Recent literature seek to forecast implied volatility derived from equity, index, foreign exchange, and interest rate options using latent factor and parametric frameworks. Motivated by increased public attention borne out of the financialization of futures markets in the early 2000s, we investigate if these extant models can uncover predictable patterns in the implied volatility surfaces of the most actively traded commodity options between 2006 and 2016. Adopting a rolling out-of-sample forecasting framework that addresses the common multiple comparisons problem, we establish that, for energy and precious metals options, explicitly modeling the term structure of implied volatility using the Nelson-Siegel factors produces the most accurate forecasts.

preprint2019arXiv

On function-on-function regression: Partial least squares approach

Functional data analysis tools, such as function-on-function regression models, have received considerable attention in various scientific fields because of their observed high-dimensional and complex data structures. Several statistical procedures, including least squares, maximum likelihood, and maximum penalized likelihood, have been proposed to estimate such function-on-function regression models. However, these estimation techniques produce unstable estimates in the case of degenerate functional data or are computationally intensive. To overcome these issues, we proposed a partial least squares approach to estimate the model parameters in the function-on-function regression model. In the proposed method, the B-spline basis functions are utilized to convert discretely observed data into their functional forms. Generalized cross-validation is used to control the degrees of roughness. The finite-sample performance of the proposed method was evaluated using several Monte-Carlo simulations and an empirical data analysis. The results reveal that the proposed method competes favorably with existing estimation techniques and some other available function-on-function regression models, with significantly shorter computational time.