Source author record

Ingrid Van Keilegom

Ingrid Van Keilegom appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Methodology

Catalog footprint

What is connected

15works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Testing for sufficient follow-up in cure models with categorical covariates

In survival analysis, estimating the fraction of 'immune' or 'cured' subjects who will never experience the event of interest, requires a sufficiently long follow-up period. A few statistical tests have been proposed to test the assumption of sufficient follow-up, i.e. whether the right extreme of the censoring distribution exceeds that of the survival time of the uncured subjects. However, in practice the problem remains challenging. To address this, a relaxed notion of 'practically' sufficient follow-up has been introduced recently, suggesting that the follow-up would be considered sufficiently long if the probability for the event occurring after the end of the study is very small. All these existing tests do not incorporate covariate information, which might affect the cure rate and the survival times. We extend the test for 'practically' sufficient follow-up to settings with categorical covariates. While a straightforward intersection-union type test could reject the null hypothesis of insufficient follow-up only if such hypothesis is rejected for all covariate values, in practice this approach is overly conservative and lacks power. To improve upon this, we propose a novel test procedure that relies on the test decision for one properly chosen covariate value. Our approach relies on the assumption that the conditional density of the uncured survival time is a non-increasing function of time in the tail region. We show that both methods yield tests of asymptotically level $α$ and investigate their finite sample performance through simulations. The practical application of the methods is illustrated using a skin melanoma dataset.

preprint2023arXiv

Density estimation and regression analysis on S^d in the presence of measurement error

This paper studies density estimation and regression analysis with contaminated data observed on the unit hypersphere S^d. Our methodology and theory are based on harmonic analysis on general S^d. We establish novel nonparametric density and regression estimators, and study their asymptotic properties including the rates of convergence and asymptotic distributions. We also provide asymptotic confidence intervals based on the asymptotic distributions of the estimators and on the empirical likelihood technique. We present practical details on implementation as well as the results of numerical studies.

preprint2022arXiv

A 2-step estimation procedure for semiparametric mixture cure models

Cure models have been developed as an alternative modelling approach to conventional survival analysis in order to account for the presence of cured subjects that will never experience the event of interest. Mixture cure models, which model separately the cure probability and the survival of uncured subjects depending on a set of covariates, are particularly useful for distinguishing curative from life-prolonging effects. In practice, it is common to assume a parametric model for the cure probability and a semiparametric model for the survival of the susceptibles. Because of the latent cure status, maximum likelihood estimation is performed by means of the iterative EM algorithm. Here, we focus on the cure probabilities and propose a two-step procedure to improve upon the performance of the maximum likelihood estimator when the sample size is not large. The new method is based on the idea of presmoothing by first constructing a nonparametric estimator and then projecting it into the desired parametric class. We investigate the theoretical properties of the resulting estimator and show through an extensive simulation study for the logistic-Cox model that it outperforms the existing method. Practical use of the method is illustrated through two melanoma datasets.

preprint2021arXiv

A test for comparing conditional ROC curves with multidimensional covariates

The comparison of Receiver Operating Characteristic (ROC) curves is frequently used in the literature to compare the discriminatory capability of different classification procedures based on diagnostic variables. The performance of these variables can be sometimes influenced by the presence of other covariates, and thus they should be taken into account when making the comparison. A new non-parametric test is proposed here for testing the equality of two or more dependent ROC curves conditioned to the value of a multidimensional covariate. Projections are used for transforming the problem into a one-dimensional approach easier to handle. Simulations are carried out to study the practical performance of the new methodology. A real data set of patients with Pleural Effusion is analysed to illustrate this procedure.

preprint2020arXiv

A simulation-extrapolation approach for the mixture cure model with mismeasured covariates

We consider survival data from a population with cured subjects in the presence of mismeasured covariates. We use the mixture cure model to account for the individuals that will never experience the event and at the same time distinguish between the effect of the covariates on the cure probabilities and on survival times. In particular, for practical applications, it seems of interest to assume a logistic form of the incidence and a Cox proportional hazards model for the latency. To correct the estimators for the bias introduced by the measurement error, we use the simex algorithm, which is a very general simulation based method. It essentially estimates this bias by introducing additional error to the data and then recovers bias corrected estimators through an extrapolation approach. The estimators are shown to be consistent and asymptotically normally distributed when the true extrapolation function is known. We investigate their finite sample performance through a simulation study and apply the proposed method to analyse the effect of the prostate specific antigen (PSA) on patients with prostate cancer.

preprint2020arXiv

Specification testing in semi-parametric transformation models

In transformation regression models the response is transformed before fitting a regression model to covariates and transformed response. We assume such a model where the errors are independent from the covariates and the regression function is modeled nonparametrically. We suggest a test for goodness-of-fit of a parametric transformation class based on a distance between a nonparametric transformation estimator and the parametric class. We present asymptotic theory under the null hypothesis of validity of the semi-parametric model and under local alternatives. A bootstrap algorithm is suggested in order to apply the test. We also consider relevant hypotheses to distinguish between large and small distances of the parametric transformation class to the `true' transformation.

preprint2020arXiv

Testing parametric models in linear-directional regression

This paper presents a goodness-of-fit test for parametric regression models with scalar response and directional predictor, that is, a vector on a sphere of arbitrary dimension. The testing procedure is based on the weighted squared distance between a smooth and a parametric regression estimator, where the smooth regression estimator is obtained by a projected local approach. Asymptotic behavior of the test statistic under the null hypothesis and local alternatives is provided, jointly with a consistent bootstrap algorithm for application in practice. A simulation study illustrates the performance of the test in finite samples. The procedure is applied to test a linear model in text mining.

preprint2016arXiv

Semiparametric Copula Quantile Regression for Complete or Censored Data

When facing multivariate covariates, general semiparametric regression techniques come at hand to propose flexible models that are unexposed to the curse of dimensionality. In this work a semiparametric copula-based estimator for conditional quantiles is investigated for complete or right-censored data. In spirit, the methodology is extending the recent work of Noh et al. (2013) and Noh et al. (2015), as the main idea consists in appropriately defining the quantile regression in terms of a multivariate copula and marginal distributions. Prior estimation of the latter and simple plug-in lead to an easily implementable estimator expressed, for both contexts with or without censoring, as a weighted quantile of the observed response variable. In addition, and contrary to the initial suggestion in the literature, a semiparametric estimation scheme for the multivariate copula density is studied, motivated by the possible shortcomings of a purely parametric approach and driven by the regression context. The resulting quantile regression estimator has the valuable property of being automatically monotonic across quantile levels, and asymptotic normality for both complete and censored data is obtained under classical regularity conditions. Finally, numerical examples as well as a real data application are used to illustrate the validity and finite sample performance of the proposed procedure.

preprint2014arXiv

Heteroscedastic semiparametric transformation models: estimation and testing for validity

In this paper we consider a heteroscedastic transformation model, where the transformation belongs to a parametric family of monotone transformations, the regression and variance function are modelled nonparametrically and the error is independent of the multidimensional covariates. In this model, we first consider the estimation of the unknown components of the model, namely the transformation parameter, regression and variance function and the distribution of the error. We show the asymptotic normality of the proposed estimators. Second, we propose tests for the validity of the model, and establish the limiting distribution of the test statistics under the null hypothesis. A bootstrap procedure is proposed to approximate the critical values of the tests. Finally, we carry out a simulation study to verify the small sample behavior of the proposed estimators and tests.

preprint2013arXiv

On the identifiability of copulas in bivariate competing risks models

In competing risks models, the joint distribution of the event times is not identifiable even when the margins are fully known, which has been referred to as the "identifiability crisis in competing risks analysis" (Crowder, 1991). We model the dependence between the event times by an unknown copula and show that identification is actually possible within many frequently used families of copulas. The result is then extended to the case where one margin is unknown.

preprint2013arXiv

Single index regression models in the presence of censoring depending on the covariates

Consider a random vector (X',Y)', where X is d-dimensional and Y is one-dimensional. We assume that Y is subject to random right censoring. The aim of this paper is twofold. First, we propose a new estimator of the joint distribution of (X',Y)'. This estimator overcomes the common curse-of-dimensionality problem, by using a new dimension reduction technique. Second, we assume that the relation between X and Y is given by a mean regression single index model, and propose a new estimator of the parameters in this model. The asymptotic properties of all proposed estimators are obtained.

preprint2012arXiv

Uniform in bandwidth exact rates for a class of kernel estimators

Given an i.i.d sample $(Y_i,Z_i)$, taking values in $\RRR^{d'}\times \RRR^d$, we consider a collection Nadarya-Watson kernel estimators of the conditional expectations $\EEE(<c_g(z),g(Y)>+d_g(z)\mid Z=z)$, where $z$ belongs to a compact set $H\subset \RRR^d$, $g$ a Borel function on $\RRR^{d'}$ and $c_g(\cdot),d_g(\cdot)$ are continuous functions on $\RRR^d$. Given two bandwidth sequences $h_n<\wth_n$ fulfilling mild conditions, we obtain an exact and explicit almost sure limit bounds for the deviations of these estimators around their expectations, uniformly in $g\in\GG,\;z\in H$ and $h_n\le h\le \wth_n$ under mild conditions on the density $f_Z$, the class $\GG$, the kernel $K$ and the functions $c_g(\cdot),d_g(\cdot)$. We apply this result to prove that smoothed empirical likelihood can be used to build confidence intervals for conditional probabilities $\PPP(Y\in C\mid Z=z)$, that hold uniformly in $z\in H,\; C\in \CC,\; h\in [h_n,\wth_n]$. Here $\CC$ is a Vapnik-Chervonenkis class of sets.

preprint2011arXiv

Estimation of the Error Density in a Semiparametric Transformation Model

Consider the semiparametric transformation model $Λ_{θ_o}(Y)=m(X)+ε$, where $θ_o$ is an unknown finite dimensional parameter, the functions $Λ_{θ_o}$ and $m$ are smooth, $ε$ is independent of $X$, and $\esp(ε)=0$. We propose a kernel-type estimator of the density of the error $ε$, and prove its asymptotic normality. The estimated errors, which lie at the basis of this estimator, are obtained from a profile likelihood estimator of $θ_o$ and a nonparametric kernel estimator of $m$. The practical performance of the proposed density estimator is evaluated in a simulation study.

preprint2011arXiv

Nonparametric regression with filtered data

We present a general principle for estimating a regression function nonparametrically, allowing for a wide variety of data filtering, for example, repeated left truncation and right censoring. Both the mean and the median regression cases are considered. The method works by first estimating the conditional hazard function or conditional survivor function and then integrating. We also investigate improved methods that take account of model structure such as independent errors and show that such methods can improve performance when the model structure is true. We establish the pointwise asymptotic normality of our estimators.

preprint2010arXiv

A goodness-of-fit test for parametric and semi-parametric models in multiresponse regression

We propose an empirical likelihood test that is able to test the goodness of fit of a class of parametric and semi-parametric multiresponse regression models. The class includes as special cases fully parametric models; semi-parametric models, like the multiindex and the partially linear models; and models with shape constraints. Another feature of the test is that it allows both the response variable and the covariate be multivariate, which means that multiple regression curves can be tested simultaneously. The test also allows the presence of infinite-dimensional nuisance functions in the model to be tested. It is shown that the empirical likelihood test statistic is asymptotically normally distributed under certain mild conditions and permits a wild bootstrap calibration. Despite the large size of the class of models to be considered, the empirical likelihood test enjoys good power properties against departures from a hypothesized model within the class.

Ingrid Van Keilegom

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Testing for sufficient follow-up in cure models with categorical covariates

Density estimation and regression analysis on S^d in the presence of measurement error

A 2-step estimation procedure for semiparametric mixture cure models

A test for comparing conditional ROC curves with multidimensional covariates

A simulation-extrapolation approach for the mixture cure model with mismeasured covariates

Specification testing in semi-parametric transformation models

Testing parametric models in linear-directional regression

Semiparametric Copula Quantile Regression for Complete or Censored Data

Heteroscedastic semiparametric transformation models: estimation and testing for validity

On the identifiability of copulas in bivariate competing risks models

Single index regression models in the presence of censoring depending on the covariates

Uniform in bandwidth exact rates for a class of kernel estimators

Estimation of the Error Density in a Semiparametric Transformation Model

Nonparametric regression with filtered data

A goodness-of-fit test for parametric and semi-parametric models in multiresponse regression