Researcher profile

Guojun Gan

Guojun Gan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
1topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2021arXiv

Applications of Clustering with Mixed Type Data in Life Insurance

Death benefits are generally the largest cash flow item that affects financial statements of life insurers where some still do not have a systematic process to track and monitor death claims experience. In this article, we explore data clustering to examine and understand how actual death claims differ from expected, an early stage of developing a monitoring system crucial for risk management. We extend the $k$-prototypes clustering algorithm to draw inference from a life insurance dataset using only the insured's characteristics and policy information without regard to known mortality. This clustering has the feature to efficiently handle categorical, numerical, and spatial attributes. Using gap statistics, the optimal clusters obtained from the algorithm are then used to compare actual to expected death claims experience of the life insurance portfolio. Our empirical data contains observations, during 2014, of approximately 1.14 million policies with a total insured amount of over 650 billion dollars. For this portfolio, the algorithm produced three natural clusters, with each cluster having a lower actual to expected death claims but with differing variability. The analytical results provide management a process to identify policyholders' attributes that dominate significant mortality deviations, and thereby enhance decision making for taking necessary actions.

preprint2021arXiv

Compositional Data Regression in Insurance with Exponential Family PCA

Compositional data are multivariate observations that carry only relative information between components. Applying standard multivariate statistical methodology directly to analyze compositional data can lead to paradoxes and misinterpretations. Compositional data also frequently appear in insurance, especially with telematics information. However, such type of data does not receive deserved special treatment in most existing actuarial literature. In this paper, we explore and investigate the use of exponential family principal component analysis (EPCA) to analyze compositional data in insurance. The method is applied to analyze a dataset obtained from the U.S. Mine Safety and Health Administration. The numerical results show that EPCA is able to produce principal components that are significant predictors and improve the prediction accuracy of the regression model. The EPCA method can be a promising useful tool for actuaries to analyze compositional data.

preprint2020arXiv

Analysis of Prescription Drug Utilization with Beta Regression Models

The healthcare sector in the U.S. is complex and is also a large sector that generates about 20% of the country's gross domestic product. Healthcare analytics has been used by researchers and practitioners to better understand the industry. In this paper, we examine and demonstrate the use of Beta regression models to study the utilization of brand name drugs in the U.S. to understand the variability of brand name drug utilization across different areas. The models are fitted to public datasets obtained from the Medicare & Medicaid Services and the Internal Revenue Service. Integrated Nested Laplace Approximation (INLA) is used to perform the inference. The numerical results show that Beta regression models can fit the brand name drug claim rates well and including spatial dependence improves the performance of the Beta regression models. Such models can be used to reflect the effect of prescription drug utilization when updating an insured's health risk in a risk scoring model.

preprint2020arXiv

Hybrid Tree-based Models for Insurance Claims

Two-part models and Tweedie generalized linear models (GLMs) have been used to model loss costs for short-term insurance contract. For most portfolios of insurance claims, there is typically a large proportion of zero claims that leads to imbalances resulting in inferior prediction accuracy of these traditional approaches. This article proposes the use of tree-based models with a hybrid structure that involves a two-step algorithm as an alternative approach to these traditional models. The first step is the construction of a classification tree to build the probability model for frequency. In the second step, we employ elastic net regression models at each terminal node from the classification tree to build the distribution model for severity. This hybrid structure captures the benefits of tuning hyperparameters at each step of the algorithm; this allows for improved prediction accuracy and tuning can be performed to meet specific business objectives. We examine and compare the predictive performance of such a hybrid tree-based structure in relation to the traditional Tweedie model using both real and synthetic datasets. Our empirical results show that these hybrid tree-based models produce more accurate predictions without the loss of intuitive interpretation.

preprint2020arXiv

Skewed link regression models for imbalanced binary response with applications to life insurance

For a portfolio of life insurance policies observed for a stated period of time, e.g., one year, mortality is typically a rare event. When we examine the outcome of dying or not from such portfolios, we have an imbalanced binary response. The popular logistic and probit regression models can be inappropriate for imbalanced binary response as model estimates may be biased, and if not addressed properly, it can lead to serious adverse predictions. In this paper, we propose the use of skewed link regression models (Generalized Extreme Value, Weibull, and Frechet link models) as more superior models to handle imbalanced binary response. We adopt a fully Bayesian approach for the generalized linear models (GLMs) under the proposed link functions to help better explain the high skewness. To calibrate our proposed Bayesian models, we use a real dataset of death claims experience drawn from a life insurance company's portfolio. Bayesian estimates of parameters were obtained using the Metropolis-Hastings algorithm and for Bayesian model selection and comparison, the Deviance Information Criterion (DIC) statistic has been used. For our mortality dataset, we find that these skewed link models are more superior than the widely used binary models with standard link functions. We evaluate the predictive power of the different underlying models by measuring and comparing aggregated death counts and death benefits.