Source author record

Stefan Van Aelst

Stefan Van Aelst appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Machine Learning math.ST Statistics Theory

Catalog footprint

What is connected

7works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Split Regression Modeling

Sparse methods are the standard approach to obtain interpretable models with high prediction accuracy. Alternatively, algorithmic ensemble methods can achieve higher prediction accuracy at the cost of loss of interpretability. However, the use of blackbox methods has been heavily criticized for high-stakes decisions and it has been argued that there does not have to be a trade-off between accuracy and interpretability. To combine high accuracy with interpretability, we generalize best subset selection to best split selection. Best split selection constructs a small number of sparse models learned jointly from the data which are then combined in an ensemble. Best split selection determines the models by splitting the available predictor variables among the different models when fitting the data. The proposed methodology results in an ensemble of sparse and diverse models that each provide a possible explanation for the relationship between the predictors and the response. The high computational cost of best split selection motivates the need for computational tractable approximations. We evaluate a method developed by Christidis et al. (2020) which can be seen as a multi-convex relaxation of best split selection.

preprint2021arXiv

Robust optimal estimation of location from discretely sampled functional data

Estimating location is a central problem in functional data analysis, yet most current estimation procedures either unrealistically assume completely observed trajectories or lack robustness with respect to the many kinds of anomalies one can encounter in the functional setting. To remedy these deficiencies we introduce the first class of optimal robust location estimators based on discretely sampled functional data. The proposed method is based on M-type smoothing spline estimation with repeated measurements and is suitable for both commonly and independently observed trajectories that are subject to measurement error. We show that under suitable assumptions the proposed family of estimators is minimax rate optimal both for commonly and independently observed trajectories and we illustrate its highly competitive performance and practical usefulness in a Monte-Carlo study and a real-data example involving recent Covid-19 data.

preprint2021arXiv

Robust penalized spline estimation with difference penalties

Penalized spline estimation with discrete difference penalties (P-splines) is a popular estimation method for semiparametric models, but the classical least-squares estimator is highly sensitive to deviations from its ideal model assumptions. To remedy this deficiency, a broad class of P-spline estimators based on general loss functions is introduced and studied. Robust estimators are obtained by well-chosen loss functions, such as the Huber or Tukey loss function. A preliminary scale estimator can also be included in the loss function. It is shown that this class of P-spline estimators enjoys the same optimal asymptotic properties as least-squares P-splines, thereby providing strong theoretical motivation for its use. The proposed estimators may be computed very efficiently through a simple adaptation of well-established iterative least squares algorithms and exhibit excellent performance even in finite samples, as evidenced by a numerical study and a real-data example.

preprint2020arXiv

M-type penalized splines with auxiliary scale estimation

Penalized spline smoothing is a popular and flexible method of obtaining estimates in nonparametric regression but the classical least-squares criterion is highly susceptible to model deviations and atypical observations. Penalized spline estimation with a resistant loss function is a natural remedy, yet to this day the asymptotic properties of M-type penalized spline estimators have not been studied. We show in this paper that M-type penalized spline estimators achieve the same rates of convergence as their least-squares counterparts, even with auxiliary scale estimation. We further find theoretical justification for the use of a small number of knots relative to the sample size. We illustrate the benefits of M-type penalized splines in a Monte-Carlo study and two real-data examples, which contain atypical observations.

preprint2018arXiv

Robust functional regression based on principal components

Functional data analysis is a fast evolving branch of modern statistics and the functional linear model has become popular in recent years. However, most estimation methods for this model rely on generalized least squares procedures and therefore are sensitive to atypical observations. To remedy this, we propose a two-step estimation procedure that combines robust functional principal components and robust linear regression. Moreover, we propose a transformation that reduces the curvature of the estimators and can be advantageous in many settings. For these estimators we prove Fisher-consistency at elliptical distributions and consistency under mild regularity conditions. The influence function of the estimators is investigated as well. Simulation experiments show that the proposed estimators have reasonable efficiency, protect against outlying observations, produce smooth estimates and perform well in comparison to existing approaches.

preprint2014arXiv

On the consistency of a spatial-type interval-valued median for random intervals

The sample $d_θ$-median is a robust estimator of the central tendency or location of an interval-valued random variable. While the interval-valued sample mean can be highly influenced by outliers, this spatial-type interval-valued median remains much more reliable. In this paper, we show that under general conditions the sample $d_θ$-median is a strongly consistent estimator of the $d_θ$-median of an interval-valued random variable.

preprint2011arXiv

On the stability of bootstrap estimators

It is shown that bootstrap approximations of an estimator which is based on a continuous operator from the set of Borel probability measures defined on a compact metric space into a complete separable metric space is stable in the sense of qualitative robustness. Support vector machines based on shifted loss functions are treated as special cases.