Researcher profile

Juan-Juan Cai

Juan-Juan Cai contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2020arXiv

Interpretable random forest models through forward variable selection

Random forest is a popular prediction approach for handling high dimensional covariates. However, it often becomes infeasible to interpret the obtained high dimensional and non-parametric model. Aiming for obtaining an interpretable predictive model, we develop a forward variable selection method using the continuous ranked probability score (CRPS) as the loss function. Our stepwise procedure leads to a smallest set of variables that optimizes the CRPS risk by performing at each step a hypothesis test on a significant decrease in CRPS risk. We provide mathematical motivation for our method by proving that in population sense the method attains the optimal set. Additionally, we show that the test is consistent provided that the random forest estimator of a quantile function is consistent. In a simulation study, we compare the performance of our method with an existing variable selection method, for different sample sizes and different correlation strength of covariates. Our method is observed to have a much lower false positive rate. We also demonstrate an application of our method to statistical post-processing of daily maximum temperature forecasts in the Netherlands. Our method selects about 10% covariates while retaining the same predictive power.

preprint2012arXiv

Estimation of extreme risk regions under multivariate regular variation

When considering d possibly dependent random variables, one is often interested in extreme risk regions, with very small probability p. We consider risk regions of the form ${\mathbf{z}\in\mathbb{R}^d:f(\mathbf{z})\leqβ}$, where f is the joint density and $β$ a small number. Estimation of such an extreme risk region is difficult since it contains hardly any or no data. Using extreme value theory, we construct a natural estimator of an extreme risk region and prove a refined form of consistency, given a random sample of multivariate regularly varying random vectors. In a detailed simulation and comparison study, the good performance of the procedure is demonstrated. We also apply our estimator to financial data.