Researcher profile

Davy Paindaveine

Davy Paindaveine contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2023arXiv

Revisiting the name variant of the two-children problem

Initially proposed by Martin Gardner in the 1950s, the famous two-children problem is often presented as a paradox in probability theory. A relatively recent variant of this paradox states that, while in a two-children family for which at least one child is a girl, the probability that the other child is a boy is $2/3$, this probability becomes $1/2$ if the first name of the girl is disclosed (provided that two sisters may not be given the same first name). We revisit this variant of the problem and show that, if one adopts a natural model for the way first names are given to girls, then the probability that the other child is a boy may take any value in $(0,2/3)$. By exploiting the concept of Schur-concavity, we study how this probability depends on model parameters.

preprint2020arXiv

On the behavior of extreme $d$-dimensional spatial quantiles under minimal assumptions

"Spatial" or "geometric" quantiles are the only multivariate quantiles coping with both high-dimensional data and functional data, also in the framework of multiple-output quantile regression. This work studies spatial quantiles in the finite-dimensional case, where the spatial quantile $μ_{α,u}(P)$ of the distribution $P$ taking values in $\mathbb{R}^d $ is a point in $\mathbb{R}^d$ indexed by an order $α\in[0,1)$ and a direction $u$ in the unit sphere $\mathcal{S}^{d-1}$ of $\mathbb{R}^d$ --- or equivalently by a vector $αu$ in the open unit ball of $\mathbb{R}^d$. Recently, Girard and Stupfler (2017) proved that (i) the extreme quantiles $μ_{α,u}(P)$ obtained as $α\to 1$ exit all compact sets of $\mathbb{R}^d$ and that (ii) they do so in a direction converging to $u$. These results help understanding the nature of these quantiles: the first result is particularly striking as it holds even if $P$ has a bounded support, whereas the second one clarifies the delicate dependence of spatial quantiles on $u$. However, they were established under assumptions imposing that $P$ is non-atomic, so that it is unclear whether they hold for empirical probability measures. We improve on this by proving these results under much milder conditions, allowing for the sample case. This prevents using gradient condition arguments, which makes the proofs very challenging. We also weaken the well-known sufficient condition for uniqueness of finite-dimensional spatial quantiles.

preprint2015arXiv

Nonparametrically consistent depth-based classifiers

We introduce a class of depth-based classification procedures that are of a nearest-neighbor nature. Depth, after symmetrization, indeed provides the center-outward ordering that is necessary and sufficient to define nearest neighbors. Like all their depth-based competitors, the resulting classifiers are affine-invariant, hence in particular are insensitive to unit changes. Unlike the former, however, the latter achieve Bayes consistency under virtually any absolutely continuous distributions - a concept we call nonparametric consistency, to stress the difference with the stronger universal consistency of the standard $k$NN classifiers. We investigate the finite-sample performances of the proposed classifiers through simulations and show that they outperform affine-invariant nearest-neighbor classifiers obtained through an obvious standardization construction. We illustrate the practical value of our classifiers on two real data examples. Finally, we shortly discuss the possible uses of our depth-based neighbors in other inference problems.

preprint2014arXiv

Conditional quantile estimation through optimal quantization

In this paper, we use quantization to construct a nonparametric estimator of conditional quantiles of a scalar response $Y$ given a d-dimensional vector of covariates $X$. First we focus on the population level and show how optimal quantization of $X$, which consists in discretizing $X$ by projecting it on an appropriate grid of $N$ points, allows to approximate conditional quantiles of $Y$ given $X$. We show that this is approximation is arbitrarily good as $N$ goes to infinity and provide a rate of convergence for the approximation error. Then we turn to the sample case and define an estimator of conditional quantiles based on quantization ideas. We prove that this estimator is consistent for its fixed-$N$ population counterpart. The results are illustrated on a numerical example. Dominance of our estimators over local constant/linear ones and nearest neighbor ones is demonstrated through extensive simulations in the companion paper Charlier et al.(2014b).

preprint2014arXiv

Depth-based Runs Tests for Bivariate Central Symmetry

McWilliams (1990) introduced a nonparametric procedure based on runs for the problem of testing univariate symmetry about the origin (equivalently, about an arbitrary specified center). His procedure first reorders the observations according to their absolute values, then rejects the null when the number of runs in the resulting series of signs is too small. This test is universally consistent and enjoys nice robustness properties, but is unfortunately limited to the univariate setup. In this paper, we extend McWilliams' procedure into tests of bivariate central symmetry. The proposed tests first reorder the observations according to their statistical depth in a symmetrized version of the sample, then reject the null when an original concept of simplicial runs is too small. Our tests are affine-invariant and have good robustness properties. In particular, they do not require any finite moment assumption. We derive their limiting null distribution, which establishes their asymptotic distribution-freeness. We study their finite-sample properties through Monte Carlo experiments, and conclude with some final comments.

preprint2014arXiv

High-dimensional tests for spherical location and spiked covariance

Rotationally symmetric distributions on the p-dimensional unit hypersphere, extremely popular in directional statistics, involve a location parameter theta that indicates the direction of the symmetry axis. The most classical way of addressing the spherical location problem H_0:theta=theta_0, with theta_0 a fixed location, is the so-called Watson test, which is based on the sample mean of the observations. This test enjoys many desirable properties, but its implementation requires the sample size n to be large compared to the dimension p. This is a severe limitation, since more and more problems nowadays involve high-dimensional directional data (e.g., in genetics or text mining). In this work, we therefore introduce a modified Watson statistic that can cope with high-dimensionality. We derive its asymptotic null distribution as both n and p go to infinity. This is achieved in a universal asymptotic framework that allows p to go to infinity arbitrarily fast (or slowly) as a function of n. We further show that our results also provide high-dimensional tests for a problem that has recently attracted much attention, namely that of testing that the covariance matrix of a multinormal distribution has a "theta_0-spiked" structure. Finally, a Monte Carlo simulation study corroborates our asymptotic results.

preprint2014arXiv

Probit transformation for nonparametric kernel estimation of the copula density

Copula modelling has become ubiquitous in modern statistics. Here, the problem of nonparametrically estimating a copula density is addressed. Arguably the most popular nonparametric density estimator, the kernel estimator is not suitable for the unit-square-supported copula densities, mainly because it is heavily affected by boundary bias issues. In addition, most common copulas admit unbounded densities, and kernel methods are not consistent in that case. In this paper, a kernel-type copula density estimator is proposed. It is based on the idea of transforming the uniform marginals of the copula density into normal distributions via the probit function, estimating the density in the transformed domain, which can be accomplished without boundary problems, and obtaining an estimate of the copula density through back-transformation. Although natural, a raw application of this procedure was, however, seen not to perform very well in the earlier literature. Here, it is shown that, if combined with local likelihood density estimation methods, the idea yields very good and easy to implement estimators, fixing boundary issues in a natural way and able to cope with unbounded copula densities. The asymptotic properties of the suggested estimators are derived, and a practical way of selecting the crucially important smoothing parameters is devised. Finally, extensive simulation studies and a real data analysis evidence their excellent performance compared to their main competitors.

preprint2012arXiv

Optimal rank-based testing for principal components

This paper provides parametric and rank-based optimal tests for eigenvectors and eigenvalues of covariance or scatter matrices in elliptical families. The parametric tests extend the Gaussian likelihood ratio tests of Anderson (1963) and their pseudo-Gaussian robustifications by Davis (1977) and Tyler (1981, 1983). The rank-based tests address a much broader class of problems, where covariance matrices need not exist and principal components are associated with more general scatter matrices. The proposed tests are shown to outperform daily practice both from the point of view of validity as from the point of view of efficiency. This is achieved by utilizing the Le Cam theory of locally asymptotically normal experiments, in the nonstandard context, however, of a curved parametrization. The results we derive for curved experiments are of independent interest, and likely to apply in other contexts.

preprint2012arXiv

Semiparametrically efficient inference based on signed ranks in symmetric independent component models

We consider semiparametric location-scatter models for which the $p$-variate observation is obtained as $X=ΛZ+μ$, where $μ$ is a $p$-vector, $Λ$ is a full-rank $p\times p$ matrix and the (unobserved) random $p$-vector $Z$ has marginals that are centered and mutually independent but are otherwise unspecified. As in blind source separation and independent component analysis (ICA), the parameter of interest throughout the paper is $Λ$. On the basis of $n$ i.i.d. copies of $X$, we develop, under a symmetry assumption on $Z$, signed-rank one-sample testing and estimation procedures for $Λ$. We exploit the uniform local and asymptotic normality (ULAN) of the model to define signed-rank procedures that are semiparametrically efficient under correctly specified densities. Yet, as is usual in rank-based inference, the proposed procedures remain valid (correct asymptotic size under the null, for hypothesis testing, and root-$n$ consistency, for point estimation) under a very broad range of densities. We derive the asymptotic properties of the proposed procedures and investigate their finite-sample behavior through simulations.

preprint2011arXiv

A class of optimal tests for symmetry based on local Edgeworth approximations

The objective of this paper is to provide, for the problem of univariate symmetry (with respect to specified or unspecified location), a concept of optimality, and to construct tests achieving such optimality. This requires embedding symmetry into adequate families of asymmetric (local) alternatives. We construct such families by considering non-Gaussian generalizations of classical first-order Edgeworth expansions indexed by a measure of skewness such that (i) location, scale and skewness play well-separated roles (diagonality of the corresponding information matrices) and (ii) the classical tests based on the Pearson--Fisher coefficient of skewness are optimal in the vicinity of Gaussian densities.

preprint2010arXiv

A Stochastic Analysis of some Two-Person Sports

We consider two-person sports where each rally is initiated by a \emph{server}, the other player (the \emph{receiver}) becoming the server when he/she wins a rally. Historically, these sports used a scoring based on the \emph{side-out scoring system}, in which points are only scored by the server. Recently, however, some federations have switched to the \emph{rally-point scoring system} in which a point is scored on every rally. As various authors before us, we study how much this change affects the game. Our approach is based on a \emph{rally-level analysis} of the process through which, besides the well-known probability distribution of the scores, we also obtain the distribution of the number of rallies. This yields a comprehensive knowledge of the process at hand, and allows for an in-depth comparison of both scoring systems. In particular, our results {help} to explain why the transition from one scoring system to the other has more important implications than those predicted from game-winning probabilities alone. Some of our findings are quite surprising, and unattainable through Monte Carlo experiments. Our results are of high practical relevance to international federations and local tournament organizers alike, and also open the way to efficient estimation of the rally-winning probabilities, which should have a significant impact on the quality of ranking procedures.

preprint2010arXiv

Multivariate quantiles and multiple-output regression quantiles: From $L_1$ optimization to halfspace depth

A new multivariate concept of quantile, based on a directional version of Koenker and Bassett's traditional regression quantiles, is introduced for multivariate location and multiple-output regression problems. In their empirical version, those quantiles can be computed efficiently via linear programming techniques. Consistency, Bahadur representation and asymptotic normality results are established. Most importantly, the contours generated by those quantiles are shown to coincide with the classical halfspace depth contours associated with the name of Tukey. This relation does not only allow for efficient depth contour computations by means of parametric linear programming, but also for transferring from the quantile to the depth universe such asymptotic results as Bahadur representations. Finally, linear programming duality opens the way to promising developments in depth-related multivariate rank-based inference.

preprint2007arXiv

Semiparametrically efficient rank-based inference for shape II. Optimal R-estimation of shape

A class of R-estimators based on the concepts of multivariate signed ranks and the optimal rank-based tests developed in Hallin and Paindaveine [Ann. Statist. 34 (2006)] is proposed for the estimation of the shape matrix of an elliptical distribution. These R-estimators are root-n consistent under any radial density g, without any moment assumptions, and semiparametrically efficient at some prespecified density f. When based on normal scores, they are uniformly more efficient than the traditional normal-theory estimator based on empirical covariance matrices (the asymptotic normality of which, moreover, requires finite moments of order four), irrespective of the actual underlying elliptical density. They rely on an original rank-based version of Le Cam's one-step methodology which avoids the unpleasant nonparametric estimation of cross-information quantities that is generally required in the context of R-estimation. Although they are not strictly affine-equivariant, they are shown to be equivariant in a weak asymptotic sense. Simulations confirm their feasibility and excellent finite-sample performances.