Researcher profile

Geurt Jongbloed

Geurt Jongbloed contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2021arXiv

Statistical Integration of Heterogeneous Data with PO2PLS

The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high-dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic two-way partial least squares (PO2PLS), which addresses these challenges. PO2PLS models the relationship between two datasets using joint and data-specific latent variables. For maximum likelihood estimation of the parameters, we implement a fast EM algorithm and show that the estimator is asymptotically normally distributed. A global test for testing the relationship between two datasets is proposed, and its asymptotic distribution is derived. Notably, several existing omics integration methods are special cases of PO2PLS. Via extensive simulations, we show that PO2PLS performs better than alternatives in feature selection and prediction performance. In addition, the asymptotic distribution appears to hold when the sample size is sufficiently large. We illustrate PO2PLS with two examples from commonly used study designs: a large population cohort and a small case-control study. Besides recovering known relationships, PO2PLS also identified novel findings. The methods are implemented in our R-package PO2PLS. Supplementary materials for this article are available online.

preprint2020arXiv

Bayesian estimation of a decreasing density

Suppose $X_1,\dots, X_n$ is a random sample from a bounded and decreasing density $f_0$ on $[0,\infty)$. We are interested in estimating such $f_0$, with special interest in $f_0(0)$. This problem is encountered in various statistical applications and has gained quite some attention in the statistical literature. It is well known that the maximum likelihood estimator is inconsistent at zero. This has led several authors to propose alternative estimators which are consistent. As any decreasing density can be represented as a scale mixture of uniform densities, a Bayesian estimator is obtained by endowing the mixture distribution with the Dirichlet process prior. Assuming this prior, we derive contraction rates of the posterior density at zero by carefully revising arguments presented in Salomond (2014). Various methods for estimating the density are compared using a simulation study. We apply the Bayesian procedure to the current durations data described in Keiding et al.(2012).

preprint2020arXiv

Interpretable random forest models through forward variable selection

Random forest is a popular prediction approach for handling high dimensional covariates. However, it often becomes infeasible to interpret the obtained high dimensional and non-parametric model. Aiming for obtaining an interpretable predictive model, we develop a forward variable selection method using the continuous ranked probability score (CRPS) as the loss function. Our stepwise procedure leads to a smallest set of variables that optimizes the CRPS risk by performing at each step a hypothesis test on a significant decrease in CRPS risk. We provide mathematical motivation for our method by proving that in population sense the method attains the optimal set. Additionally, we show that the test is consistent provided that the random forest estimator of a quantile function is consistent. In a simulation study, we compare the performance of our method with an existing variable selection method, for different sample sizes and different correlation strength of covariates. Our method is observed to have a much lower false positive rate. We also demonstrate an application of our method to statistical post-processing of daily maximum temperature forecasts in the Netherlands. Our method selects about 10% covariates while retaining the same predictive power.

preprint2020arXiv

Isotonic regression for metallic microstructure data: estimation and testing under order restrictions

Investigating the main determinants of the mechanical performance of metals is not a simple task. Already known physical inspired qualitative relations between 2D microstructure characteristics and 3D mechanical properties can act as the starting point of the investigation. Isotonic regression allows to take into account ordering relations and leads to more efficient and accurate results when the underlying assumptions actually hold. The main goal in this paper is to test order relations in a model inspired by a materials science application. The statistical estimation procedure is described considering three different scenarios according to the knowledge of the variances: known variance ratio, completely unknown variances, variances under order restrictions. New likelihood ratio tests are developed in the last two cases. Both parametric and non-parametric bootstrap approaches are developed for finding the distribution of the test statistics under the null hypothesis. Finally an application on the relation between Geometrically Necessary Dislocations and number of observed microstructure precipitations is shown.

preprint2011arXiv

Smooth and non-smooth estimates of a monotone hazard

We discuss a number of estimates of the hazard under the assumption that the hazard is monotone on an interval [0,a]. The usual isotonic least squares estimators of the hazard are inconsistent at the boundary points 0 and a. We use penalization to obtain uniformly consistent estimators. Moreover, we determine the optimal penalization constants, extending related work in this direction by Woodroofe and Sun (1993) and Woodroofe and Sun (1999). Two methods of obtaining smooth monotone estimates based on a non-smooth monotone estimator are discussed. One is based on kernel smoothing, the other on penalization.

preprint2011arXiv

Smooth plug-in inverse estimators in the current status continuous mark model

We consider the problem of estimating the joint distribution function of the event time and a continuous mark variable when the event time is subject to interval censoring case 1 and the continuous mark variable is only observed in case the event occurred before the time of inspection. The nonparametric maximum likelihood estimator in this model is known to be inconsistent. We study two alternative smooth estimators, based on the explicit (inverse) expression of the distribution function of interest in terms of the density of the observable vector. We derive the pointwise asymptotic distribution of both estimators.

preprint2010arXiv

Maximum smoothed likelihood estimation and smoothed maximum likelihood estimation in the current status model

We consider the problem of estimating the distribution function, the density and the hazard rate of the (unobservable) event time in the current status model. A well studied and natural nonparametric estimator for the distribution function in this model is the nonparametric maximum likelihood estimator (MLE). We study two alternative methods for the estimation of the distribution function, assuming some smoothness of the event time distribution. The first estimator is based on a maximum smoothed likelihood approach. The second method is based on smoothing the (discrete) MLE of the distribution function. These estimators can be used to estimate the density and hazard rate of the event time distribution based on the plug-in principle.