Source author record

Geurt Jongbloed

Geurt Jongbloed appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Methodology Applications Machine Learning

Catalog footprint

What is connected

12works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Statistical Integration of Heterogeneous Data with PO2PLS

The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high-dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic two-way partial least squares (PO2PLS), which addresses these challenges. PO2PLS models the relationship between two datasets using joint and data-specific latent variables. For maximum likelihood estimation of the parameters, we implement a fast EM algorithm and show that the estimator is asymptotically normally distributed. A global test for testing the relationship between two datasets is proposed, and its asymptotic distribution is derived. Notably, several existing omics integration methods are special cases of PO2PLS. Via extensive simulations, we show that PO2PLS performs better than alternatives in feature selection and prediction performance. In addition, the asymptotic distribution appears to hold when the sample size is sufficiently large. We illustrate PO2PLS with two examples from commonly used study designs: a large population cohort and a small case-control study. Besides recovering known relationships, PO2PLS also identified novel findings. The methods are implemented in our R-package PO2PLS. Supplementary materials for this article are available online.

preprint2020arXiv

Bayesian estimation of a decreasing density

Suppose $X_1,\dots, X_n$ is a random sample from a bounded and decreasing density $f_0$ on $[0,\infty)$. We are interested in estimating such $f_0$, with special interest in $f_0(0)$. This problem is encountered in various statistical applications and has gained quite some attention in the statistical literature. It is well known that the maximum likelihood estimator is inconsistent at zero. This has led several authors to propose alternative estimators which are consistent. As any decreasing density can be represented as a scale mixture of uniform densities, a Bayesian estimator is obtained by endowing the mixture distribution with the Dirichlet process prior. Assuming this prior, we derive contraction rates of the posterior density at zero by carefully revising arguments presented in Salomond (2014). Various methods for estimating the density are compared using a simulation study. We apply the Bayesian procedure to the current durations data described in Keiding et al.(2012).

preprint2020arXiv

Interpretable random forest models through forward variable selection

Random forest is a popular prediction approach for handling high dimensional covariates. However, it often becomes infeasible to interpret the obtained high dimensional and non-parametric model. Aiming for obtaining an interpretable predictive model, we develop a forward variable selection method using the continuous ranked probability score (CRPS) as the loss function. Our stepwise procedure leads to a smallest set of variables that optimizes the CRPS risk by performing at each step a hypothesis test on a significant decrease in CRPS risk. We provide mathematical motivation for our method by proving that in population sense the method attains the optimal set. Additionally, we show that the test is consistent provided that the random forest estimator of a quantile function is consistent. In a simulation study, we compare the performance of our method with an existing variable selection method, for different sample sizes and different correlation strength of covariates. Our method is observed to have a much lower false positive rate. We also demonstrate an application of our method to statistical post-processing of daily maximum temperature forecasts in the Netherlands. Our method selects about 10% covariates while retaining the same predictive power.

preprint2020arXiv

Isotonic regression for metallic microstructure data: estimation and testing under order restrictions

Investigating the main determinants of the mechanical performance of metals is not a simple task. Already known physical inspired qualitative relations between 2D microstructure characteristics and 3D mechanical properties can act as the starting point of the investigation. Isotonic regression allows to take into account ordering relations and leads to more efficient and accurate results when the underlying assumptions actually hold. The main goal in this paper is to test order relations in a model inspired by a materials science application. The statistical estimation procedure is described considering three different scenarios according to the knowledge of the variances: known variance ratio, completely unknown variances, variances under order restrictions. New likelihood ratio tests are developed in the last two cases. Both parametric and non-parametric bootstrap approaches are developed for finding the distribution of the test statistics under the null hypothesis. Finally an application on the relation between Geometrically Necessary Dislocations and number of observed microstructure precipitations is shown.

preprint2015arXiv

Nonparametric confidence intervals for monotone functions

We study nonparametric isotonic confidence intervals for monotone functions. In Banerjee and Wellner (2001) pointwise confidence intervals, based on likelihood ratio tests for the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the treatment of other models with monotone functions, and demonstrate our method by a new proof of the results in Banerjee and Wellner (2001) and also by constructing confidence intervals for monotone densities, for which still theory had to be developed. For the latter model we prove that the limit distribution of the LR test under the null hypothesis is the same as in the current status model. We compare the confidence intervals, so obtained, with confidence intervals using the smoothed maximum likelihood estimator (SMLE), using bootstrap methods. The `Lagrange-modified' cusum diagrams, developed here, are an essential tool both for the computation of the restricted MLEs and for the development of the theory for the confidence intervals, based on the LR tests.

preprint2013arXiv

On the identifiability of copulas in bivariate competing risks models

In competing risks models, the joint distribution of the event times is not identifiable even when the margins are fully known, which has been referred to as the "identifiability crisis in competing risks analysis" (Crowder, 1991). We model the dependence between the event times by an unknown copula and show that identification is actually possible within many frequently used families of copulas. The result is then extended to the case where one margin is unknown.

preprint2011arXiv

A maximum smoothed likelihood estimator in the current status continuous mark model

We consider the problem of estimating the joint distribution function of the event time and a continuous mark variable based on censored data. More specifically, the event time is subject to current status censoring and the continuous mark is only observed in case inspection takes place after the event time. The nonparametric maximum likelihood estimator (MLE) in this model is known to be inconsistent. We propose and study an alternative likelihood based estimator, maximizing a smoothed log-likelihood, hence called a maximum smoothed likelihood estimator (MSLE). This estimator is shown to be well defined and consistent, and a simple algorithm is described that can be used to compute it. The MSLE is compared with other estimators in a small simulation study.

preprint2011arXiv

Isotonic L_2-projection test for local monotonicity of a hazard

We introduce a new test statistic for testing the null hypothesis that the sampling distribution has an increasing hazard rate on a specified interval [0,a]. It is based on a comparison of the empirical distribution function with an isotonic estimate, using the restriction that the hazard is increasing, and measures the excursions of the empirical distribution above the isotonic estimate, due to local non-monotonicity. It is proved in the companion paper Groeneboom and Jongbloed (2011a) that the test statistic is asymptotically normal if the hazard is strictly increasing on the interval [0,a] and certain regularity conditions are satisfied. We discuss a bootstrap method for computing the critical values and compare the test, thus obtained, with other proposals in a simulation study.

preprint2011arXiv

Smooth and non-smooth estimates of a monotone hazard

We discuss a number of estimates of the hazard under the assumption that the hazard is monotone on an interval [0,a]. The usual isotonic least squares estimators of the hazard are inconsistent at the boundary points 0 and a. We use penalization to obtain uniformly consistent estimators. Moreover, we determine the optimal penalization constants, extending related work in this direction by Woodroofe and Sun (1993) and Woodroofe and Sun (1999). Two methods of obtaining smooth monotone estimates based on a non-smooth monotone estimator are discussed. One is based on kernel smoothing, the other on penalization.

preprint2011arXiv

Smooth plug-in inverse estimators in the current status continuous mark model

We consider the problem of estimating the joint distribution function of the event time and a continuous mark variable when the event time is subject to interval censoring case 1 and the continuous mark variable is only observed in case the event occurred before the time of inspection. The nonparametric maximum likelihood estimator in this model is known to be inconsistent. We study two alternative smooth estimators, based on the explicit (inverse) expression of the distribution function of interest in terms of the density of the observable vector. We derive the pointwise asymptotic distribution of both estimators.

preprint2011arXiv

Testing monotonicity of a hazard: asymptotic distribution theory

Two new test statistics are introduced to test the null hypotheses that the sampling distribution has an increasing hazard rate on a specified interval [0,a]. These statistics are empirical L_1-type distances between the isotonic estimates, which use the monotonicity constraint, and either the empirical distribution function or the empirical cumulative hazard. They measure the excursions of the empirical estimates with respect to the isotonic estimates, due to local non-monotonicity. Asymptotic normality of the test statistics, if the hazard is strictly increasing on [0,a], is established under mild conditions. This is done by first approximating the global empirical distance by an distance with respect to the underlying distribution function. The resulting integral is treated as sum of increasingly many local integrals to which a CLT can be applied. The behavior of the local integrals is determined by a canonical process: the difference between the stochastic process x -> W(x)+x^2 where W is standard two-sided Brownian Motion, and its greatest convex minorant.

preprint2010arXiv

Maximum smoothed likelihood estimation and smoothed maximum likelihood estimation in the current status model

We consider the problem of estimating the distribution function, the density and the hazard rate of the (unobservable) event time in the current status model. A well studied and natural nonparametric estimator for the distribution function in this model is the nonparametric maximum likelihood estimator (MLE). We study two alternative methods for the estimation of the distribution function, assuming some smoothness of the event time distribution. The first estimator is based on a maximum smoothed likelihood approach. The second method is based on smoothing the (discrete) MLE of the distribution function. These estimators can be used to estimate the density and hazard rate of the event time distribution based on the plug-in principle.

Geurt Jongbloed

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Statistical Integration of Heterogeneous Data with PO2PLS

Bayesian estimation of a decreasing density

Interpretable random forest models through forward variable selection

Isotonic regression for metallic microstructure data: estimation and testing under order restrictions

Nonparametric confidence intervals for monotone functions

On the identifiability of copulas in bivariate competing risks models

A maximum smoothed likelihood estimator in the current status continuous mark model

Isotonic L_2-projection test for local monotonicity of a hazard

Smooth and non-smooth estimates of a monotone hazard

Smooth plug-in inverse estimators in the current status continuous mark model

Testing monotonicity of a hazard: asymptotic distribution theory

Maximum smoothed likelihood estimation and smoothed maximum likelihood estimation in the current status model