Source author record

Yanrong Yang

Yanrong Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology math.ST Statistics Theory Applications Computation econ.EM

Catalog footprint

What is connected

8works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Clustering and Forecasting Multiple Functional Time Series

Modelling and forecasting homogeneous age-specific mortality rates of multiple countries could lead to improvements in long-term forecasting. Data fed into joint models are often grouped according to nominal attributes, such as geographic regions, ethnic groups, and socioeconomic status, which may still contain heterogeneity and deteriorate the forecast results. Our paper proposes a novel clustering technique to pursue homogeneity among multiple functional time series based on functional panel data modelling to address this issue. Using a functional panel data model with fixed effects, we can extract common functional time series features. These common features could be decomposed into two components: the functional time trend and the mode of variations of functions (functional pattern). The functional time trend reflects the dynamics across time, while the functional pattern captures the fluctuations within curves. The proposed clustering method searches for homogeneous age-specific mortality rates of multiple countries by accounting for both the modes of variations and the temporal dynamics among curves. We demonstrate that the proposed clustering technique outperforms other existing methods through a Monte Carlo simulation and could handle complicated cases with slow decaying eigenvalues. In empirical data analysis, we find that the clustering results of age-specific mortality rates can be explained by the combination of geographic region, ethnic groups, and socioeconomic status. We further show that our model produces more accurate forecasts than several benchmark methods in forecasting age-specific mortality rates.

preprint2022arXiv

Factor-augmented model for functional data

We propose modeling raw functional data as a mixture of a smooth function and a high-dimensional factor component. The conventional approach to retrieving the smooth function from the raw data is through various smoothing techniques. However, the smoothing model is inadequate to recover the smooth curve or capture the data variation in some situations. These include cases where there is a large amount of measurement error, the smoothing basis functions are incorrectly identified, or the step jumps in the functional mean levels are neglected. A factor-augmented smoothing model is proposed to address these challenges, and an iterative numerical estimation approach is implemented in practice. Including the factor model component in the proposed method solves the aforementioned problems since a few common factors often drive the variation that cannot be captured by the smoothing model. Asymptotic theorems are also established to demonstrate the effects of including factor structures on the smoothing results. Specifically, we show that the smoothing coefficients projected on the complement space of the factor loading matrix are asymptotically normal. As a byproduct of independent interest, an estimator for the population covariance matrix of the raw data is presented based on the proposed model. Extensive simulation studies illustrate that these factor adjustments are essential in improving estimation accuracy and avoiding the curse of dimensionality. The superiority of our model is also shown in modeling Australian temperature data.

preprint2022arXiv

Robust PCA for High Dimensional Data based on Characteristic Transformation

In this paper, we propose a novel robust Principal Component Analysis (PCA) for high-dimensional data in the presence of various heterogeneities, especially the heavy-tailedness and outliers. A transformation motivated by the characteristic function is constructed to improve the robustness of the classical PCA. Besides the typical outliers, the proposed method has the unique advantage of dealing with heavy-tail-distributed data, whose covariances could be nonexistent (positively infinite, for instance). The proposed approach is also a case of kernel principal component analysis (KPCA) method and adopts the robust and non-linear properties via a bounded and non-linear kernel function. The merits of the new method are illustrated by some statistical properties including the upper bound of the excess error and the behaviors of the large eigenvalues under a spiked covariance model. In addition, we show the advantages of our method over the classical PCA by a variety of simulations. At last, we apply the new robust PCA to classify mice with different genotypes in a biological study based on their protein expression data and find that our method is more accurately on identifying abnormal mice comparing to the classical PCA.

preprint2021arXiv

Decomposition of Bilateral Trade Flows Using a Three-Dimensional Panel Data Model

This study decomposes the bilateral trade flows using a three-dimensional panel data model. Under the scenario that all three dimensions diverge to infinity, we propose an estimation approach to identify the number of global shocks and country-specific shocks sequentially, and establish the asymptotic theories accordingly. From the practical point of view, being able to separate the pervasive and nonpervasive shocks in a multi-dimensional panel data is crucial for a range of applications, such as, international financial linkages, migration flows, etc. In the numerical studies, we first conduct intensive simulations to examine the theoretical findings, and then use the proposed approach to investigate the international trade flows from two major trading groups (APEC and EU) over 1982-2019, and quantify the network of bilateral trade.

preprint2021arXiv

Factor-augmented Smoothing Model for Functional Data

We propose modeling raw functional data as a mixture of a smooth function and a highdimensional factor component. The conventional approach to retrieving the smooth function from the raw data is through various smoothing techniques. However, the smoothing model is not adequate to recover the smooth curve or capture the data variation in some situations. These include cases where there is a large amount of measurement error, the smoothing basis functions are incorrectly identified, or the step jumps in the functional mean levels are neglected. To address these challenges, a factor-augmented smoothing model is proposed, and an iterative numerical estimation approach is implemented in practice. Including the factor model component in the proposed method solves the aforementioned problems since a few common factors often drive the variation that cannot be captured by the smoothing model. Asymptotic theorems are also established to demonstrate the effects of including factor structures on the smoothing results. Specifically, we show that the smoothing coefficients projected on the complement space of the factor loading matrix is asymptotically normal. As a byproduct of independent interest, an estimator for the population covariance matrix of the raw data is presented based on the proposed model. Extensive simulation studies illustrate that these factor adjustments are essential in improving estimation accuracy and avoiding the curse of dimensionality. The superiority of our model is also shown in modeling Canadian weather data and Australian temperature data.

preprint2021arXiv

Mortality Forecasting using Factor Models: Time-varying or Time-invariant Factor Loadings?

Many existing mortality models follow the framework of classical factor models, such as the Lee-Carter model and its variants. Latent common factors in factor models are defined as time-related mortality indices (such as $κ_t$ in the Lee-Carter model). Factor loadings, which capture the linear relationship between age variables and latent common factors (such as $β_x$ in the Lee-Carter model), are assumed to be time-invariant in the classical framework. This assumption is usually too restrictive in reality as mortality datasets typically span a long period of time. Driving forces such as medical improvement of certain diseases, environmental changes and technological progress may significantly influence the relationship of different variables. In this paper, we first develop a factor model with time-varying factor loadings (time-varying factor model) as an extension of the classical factor model for mortality modelling. Two forecasting methods to extrapolate the factor loadings, the local regression method and the naive method, are proposed for the time-varying factor model. From the empirical data analysis, we find that the new model can capture the empirical feature of time-varying factor loadings and improve mortality forecasting over different horizons and countries. Further, we propose a novel approach based on change point analysis to estimate the optimal `boundary' between short-term and long-term forecasting, which is favoured by the local linear regression and naive method, respectively. Additionally, simulation studies are provided to show the performance of the time-varying factor model under various scenarios.

preprint2015arXiv

Independence test for high dimensional data based on regularized canonical correlation coefficients

This paper proposes a new statistic to test independence between two high dimensional random vectors ${\mathbf{X}}:p_1\times1$ and ${\mathbf{Y}}:p_2\times1$. The proposed statistic is based on the sum of regularized sample canonical correlation coefficients of ${\mathbf{X}}$ and ${\mathbf{Y}}$. The asymptotic distribution of the statistic under the null hypothesis is established as a corollary of general central limit theorems (CLT) for the linear statistics of classical and regularized sample canonical correlation coefficients when $p_1$ and $p_2$ are both comparable to the sample size $n$. As applications of the developed independence test, various types of dependent structures, such as factor models, ARCH models and a general uncorrelated but dependent case, etc., are investigated by simulations. As an empirical application, cross-sectional dependence of daily stock returns of companies between different sections in the New York Stock Exchange (NYSE) is detected by the proposed test.

preprint2014arXiv

High Dimensional Correlation Matrices: CLT and Its Applications

Statistical inferences for sample correlation matrices are important in high dimensional data analysis. Motivated by this, this paper establishes a new central limit theorem (CLT) for a linear spectral statistic (LSS) of high dimensional sample correlation matrices for the case where the dimension p and the sample size $n$ are comparable. This result is of independent interest in large dimensional random matrix theory. Meanwhile, we apply the linear spectral statistic to an independence test for $p$ random variables, and then an equivalence test for p factor loadings and $n$ factors in a factor model. The finite sample performance of the proposed test shows its applicability and effectiveness in practice. An empirical application to test the independence of household incomes from different cities in China is also conducted.

Yanrong Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Clustering and Forecasting Multiple Functional Time Series

Factor-augmented model for functional data

Robust PCA for High Dimensional Data based on Characteristic Transformation

Decomposition of Bilateral Trade Flows Using a Three-Dimensional Panel Data Model

Factor-augmented Smoothing Model for Functional Data

Mortality Forecasting using Factor Models: Time-varying or Time-invariant Factor Loadings?

Independence test for high dimensional data based on regularized canonical correlation coefficients

High Dimensional Correlation Matrices: CLT and Its Applications