Source author record

Davy Paindaveine

Davy Paindaveine appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory math.PR Methodology math.HO stat.OT

Catalog footprint

What is connected

18works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Revisiting the name variant of the two-children problem

Initially proposed by Martin Gardner in the 1950s, the famous two-children problem is often presented as a paradox in probability theory. A relatively recent variant of this paradox states that, while in a two-children family for which at least one child is a girl, the probability that the other child is a boy is $2/3$, this probability becomes $1/2$ if the first name of the girl is disclosed (provided that two sisters may not be given the same first name). We revisit this variant of the problem and show that, if one adopts a natural model for the way first names are given to girls, then the probability that the other child is a boy may take any value in $(0,2/3)$. By exploiting the concept of Schur-concavity, we study how this probability depends on model parameters.

preprint2020arXiv

On the behavior of extreme $d$-dimensional spatial quantiles under minimal assumptions

"Spatial" or "geometric" quantiles are the only multivariate quantiles coping with both high-dimensional data and functional data, also in the framework of multiple-output quantile regression. This work studies spatial quantiles in the finite-dimensional case, where the spatial quantile $μ_{α,u}(P)$ of the distribution $P$ taking values in $\mathbb{R}^d $ is a point in $\mathbb{R}^d$ indexed by an order $α\in[0,1)$ and a direction $u$ in the unit sphere $\mathcal{S}^{d-1}$ of $\mathbb{R}^d$ --- or equivalently by a vector $αu$ in the open unit ball of $\mathbb{R}^d$. Recently, Girard and Stupfler (2017) proved that (i) the extreme quantiles $μ_{α,u}(P)$ obtained as $α\to 1$ exit all compact sets of $\mathbb{R}^d$ and that (ii) they do so in a direction converging to $u$. These results help understanding the nature of these quantiles: the first result is particularly striking as it holds even if $P$ has a bounded support, whereas the second one clarifies the delicate dependence of spatial quantiles on $u$. However, they were established under assumptions imposing that $P$ is non-atomic, so that it is unclear whether they hold for empirical probability measures. We improve on this by proving these results under much milder conditions, allowing for the sample case. This prevents using gradient condition arguments, which makes the proofs very challenging. We also weaken the well-known sufficient condition for uniqueness of finite-dimensional spatial quantiles.

preprint2016arXiv

Inference on the mode of weak directional signals: a Le Cam perspective on hypothesis testing near singularities

We revisit, in an original and challenging perspective, the problem of testing the null hypothesis that the mode of a directional signal is equal to a given value. Motivated by a real data example where the signal is weak, we consider this problem under asymptotic scenarios for which the signal strength goes to zero at an arbitrary rate~$η_n$. Both under the null and the alternative, we focus on rotationally symmetric distributions. We show that, while they are asymptotically equivalent under fixed signal strength, the classical Wald and Watson tests exhibit very different (null and non-null) behaviours when the signal becomes arbitrarily weak. To fully characterize how challenging the problem is as a function of~$η_n$, we adopt a Le Cam, convergence-of-statistical-experiments, point of view and show that the resulting limiting experiments crucially depend on~$η_n$. In the light of these results, the Watson test is shown to be \emph{adaptively} rate-consistent and essentially adaptively Le Cam optimal. Throughout, our theoretical findings are illustrated via Monte-Carlo simulations. The practical relevance of our results is also shown on the real data example that motivated the present work.

preprint2016arXiv

On high-dimensional sign tests

Sign tests are among the most successful procedures in multivariate nonparametric statistics. In this paper, we consider several testing problems in multivariate analysis, directional statistics and multivariate time series analysis, and we show that, under appropriate symmetry assumptions, the fixed-$p$ multivariate sign tests remain valid in the high-dimensional case. Remarkably, our asymptotic results are universal, in the sense that, unlike in most previous works in high-dimensional statistics, $p$ may go to infinity in an arbitrary way as $n$ does. We conduct simulations that (i) confirm our asymptotic results, (ii) reveal that, even for relatively large $p$, chi-square critical values are to be favoured over the (asymptotically equivalent) Gaussian ones and (iii) show that, for testing i.i.d.-ness against serial dependence in the high-dimensional case, Portmanteau sign tests outperform their competitors in terms of validity-robustness.

preprint2016arXiv

Testing uniformity on high-dimensional spheres against monotone rotationally symmetric alternatives

We consider the problem of testing uniformity on high-dimensional unit spheres. We are primarily interested in non-null issues. We show that rotationally symmetric alternatives lead to two Local Asymptotic Normality (LAN) structures. The first one is for fixed modal location $θ$ and allows to derive locally asymptotically most powerful tests under specified $θ$. The second one, that addresses the Fisher-von Mises-Langevin (FvML) case, relates to the unspecified-$θ$ problem and shows that the high-dimensional Rayleigh test is locally asymptotically most powerful invariant. Under mild assumptions, we derive the asymptotic non-null distribution of this test, which allows to extend away from the FvML case the asymptotic powers obtained there from Le Cam's third lemma. Throughout, we allow the dimension $p$ to go to infinity in an arbitrary way as a function of the sample size $n$. Some of our results also strengthen the local optimality properties of the Rayleigh test in low dimensions. We perform a Monte Carlo study to illustrate our asymptotic results. Finally, we treat an application related to testing for sphericity in high dimensions.

preprint2015arXiv

Local bilinear multiple-output quantile/depth regression

A new quantile regression concept, based on a directional version of Koenker and Bassett's traditional single-output one, has been introduced in [Ann. Statist. (2010) 38 635-669] for multiple-output location/linear regression problems. The polyhedral contours provided by the empirical counterpart of that concept, however, cannot adapt to unknown nonlinear and/or heteroskedastic dependencies. This paper therefore introduces local constant and local linear (actually, bilinear) versions of those contours, which both allow to asymptotically recover the conditional halfspace depth contours that completely characterize the response's conditional distributions. Bahadur representation and asymptotic normality results are established. Illustrations are provided both on simulated and real data.

preprint2015arXiv

Nonparametrically consistent depth-based classifiers

We introduce a class of depth-based classification procedures that are of a nearest-neighbor nature. Depth, after symmetrization, indeed provides the center-outward ordering that is necessary and sufficient to define nearest neighbors. Like all their depth-based competitors, the resulting classifiers are affine-invariant, hence in particular are insensitive to unit changes. Unlike the former, however, the latter achieve Bayes consistency under virtually any absolutely continuous distributions - a concept we call nonparametric consistency, to stress the difference with the stronger universal consistency of the standard $k$NN classifiers. We investigate the finite-sample performances of the proposed classifiers through simulations and show that they outperform affine-invariant nearest-neighbor classifiers obtained through an obvious standardization construction. We illustrate the practical value of our classifiers on two real data examples. Finally, we shortly discuss the possible uses of our depth-based neighbors in other inference problems.

preprint2014arXiv

Conditional quantile estimation through optimal quantization

In this paper, we use quantization to construct a nonparametric estimator of conditional quantiles of a scalar response $Y$ given a d-dimensional vector of covariates $X$. First we focus on the population level and show how optimal quantization of $X$, which consists in discretizing $X$ by projecting it on an appropriate grid of $N$ points, allows to approximate conditional quantiles of $Y$ given $X$. We show that this is approximation is arbitrarily good as $N$ goes to infinity and provide a rate of convergence for the approximation error. Then we turn to the sample case and define an estimator of conditional quantiles based on quantization ideas. We prove that this estimator is consistent for its fixed-$N$ population counterpart. The results are illustrated on a numerical example. Dominance of our estimators over local constant/linear ones and nearest neighbor ones is demonstrated through extensive simulations in the companion paper Charlier et al.(2014b).

preprint2014arXiv

Depth-based Runs Tests for Bivariate Central Symmetry

McWilliams (1990) introduced a nonparametric procedure based on runs for the problem of testing univariate symmetry about the origin (equivalently, about an arbitrary specified center). His procedure first reorders the observations according to their absolute values, then rejects the null when the number of runs in the resulting series of signs is too small. This test is universally consistent and enjoys nice robustness properties, but is unfortunately limited to the univariate setup. In this paper, we extend McWilliams' procedure into tests of bivariate central symmetry. The proposed tests first reorder the observations according to their statistical depth in a symmetrized version of the sample, then reject the null when an original concept of simplicial runs is too small. Our tests are affine-invariant and have good robustness properties. In particular, they do not require any finite moment assumption. We derive their limiting null distribution, which establishes their asymptotic distribution-freeness. We study their finite-sample properties through Monte Carlo experiments, and conclude with some final comments.

preprint2014arXiv

High-dimensional tests for spherical location and spiked covariance

Rotationally symmetric distributions on the p-dimensional unit hypersphere, extremely popular in directional statistics, involve a location parameter theta that indicates the direction of the symmetry axis. The most classical way of addressing the spherical location problem H_0:theta=theta_0, with theta_0 a fixed location, is the so-called Watson test, which is based on the sample mean of the observations. This test enjoys many desirable properties, but its implementation requires the sample size n to be large compared to the dimension p. This is a severe limitation, since more and more problems nowadays involve high-dimensional directional data (e.g., in genetics or text mining). In this work, we therefore introduce a modified Watson statistic that can cope with high-dimensionality. We derive its asymptotic null distribution as both n and p go to infinity. This is achieved in a universal asymptotic framework that allows p to go to infinity arbitrarily fast (or slowly) as a function of n. We further show that our results also provide high-dimensional tests for a problem that has recently attracted much attention, namely that of testing that the covariance matrix of a multinormal distribution has a "theta_0-spiked" structure. Finally, a Monte Carlo simulation study corroborates our asymptotic results.

preprint2014arXiv

Probit transformation for nonparametric kernel estimation of the copula density

Copula modelling has become ubiquitous in modern statistics. Here, the problem of nonparametrically estimating a copula density is addressed. Arguably the most popular nonparametric density estimator, the kernel estimator is not suitable for the unit-square-supported copula densities, mainly because it is heavily affected by boundary bias issues. In addition, most common copulas admit unbounded densities, and kernel methods are not consistent in that case. In this paper, a kernel-type copula density estimator is proposed. It is based on the idea of transforming the uniform marginals of the copula density into normal distributions via the probit function, estimating the density in the transformed domain, which can be accomplished without boundary problems, and obtaining an estimate of the copula density through back-transformation. Although natural, a raw application of this procedure was, however, seen not to perform very well in the earlier literature. Here, it is shown that, if combined with local likelihood density estimation methods, the idea yields very good and easy to implement estimators, fixing boundary issues in a natural way and able to cope with unbounded copula densities. The asymptotic properties of the suggested estimators are derived, and a practical way of selecting the crucially important smoothing parameters is devised. Finally, extensive simulation studies and a real data analysis evidence their excellent performance compared to their main competitors.

preprint2012arXiv

Optimal rank-based testing for principal components

This paper provides parametric and rank-based optimal tests for eigenvectors and eigenvalues of covariance or scatter matrices in elliptical families. The parametric tests extend the Gaussian likelihood ratio tests of Anderson (1963) and their pseudo-Gaussian robustifications by Davis (1977) and Tyler (1981, 1983). The rank-based tests address a much broader class of problems, where covariance matrices need not exist and principal components are associated with more general scatter matrices. The proposed tests are shown to outperform daily practice both from the point of view of validity as from the point of view of efficiency. This is achieved by utilizing the Le Cam theory of locally asymptotically normal experiments, in the nonstandard context, however, of a curved parametrization. The results we derive for curved experiments are of independent interest, and likely to apply in other contexts.

preprint2012arXiv

Semiparametrically efficient inference based on signed ranks in symmetric independent component models

We consider semiparametric location-scatter models for which the $p$-variate observation is obtained as $X=ΛZ+μ$, where $μ$ is a $p$-vector, $Λ$ is a full-rank $p\times p$ matrix and the (unobserved) random $p$-vector $Z$ has marginals that are centered and mutually independent but are otherwise unspecified. As in blind source separation and independent component analysis (ICA), the parameter of interest throughout the paper is $Λ$. On the basis of $n$ i.i.d. copies of $X$, we develop, under a symmetry assumption on $Z$, signed-rank one-sample testing and estimation procedures for $Λ$. We exploit the uniform local and asymptotic normality (ULAN) of the model to define signed-rank procedures that are semiparametrically efficient under correctly specified densities. Yet, as is usual in rank-based inference, the proposed procedures remain valid (correct asymptotic size under the null, for hypothesis testing, and root-$n$ consistency, for point estimation) under a very broad range of densities. We derive the asymptotic properties of the proposed procedures and investigate their finite-sample behavior through simulations.

preprint2011arXiv

A class of optimal tests for symmetry based on local Edgeworth approximations

The objective of this paper is to provide, for the problem of univariate symmetry (with respect to specified or unspecified location), a concept of optimality, and to construct tests achieving such optimality. This requires embedding symmetry into adequate families of asymmetric (local) alternatives. We construct such families by considering non-Gaussian generalizations of classical first-order Edgeworth expansions indexed by a measure of skewness such that (i) location, scale and skewness play well-separated roles (diagonality of the corresponding information matrices) and (ii) the classical tests based on the Pearson--Fisher coefficient of skewness are optimal in the vicinity of Gaussian densities.

preprint2010arXiv

A Stochastic Analysis of some Two-Person Sports

We consider two-person sports where each rally is initiated by a \emph{server}, the other player (the \emph{receiver}) becoming the server when he/she wins a rally. Historically, these sports used a scoring based on the \emph{side-out scoring system}, in which points are only scored by the server. Recently, however, some federations have switched to the \emph{rally-point scoring system} in which a point is scored on every rally. As various authors before us, we study how much this change affects the game. Our approach is based on a \emph{rally-level analysis} of the process through which, besides the well-known probability distribution of the scores, we also obtain the distribution of the number of rallies. This yields a comprehensive knowledge of the process at hand, and allows for an in-depth comparison of both scoring systems. In particular, our results {help} to explain why the transition from one scoring system to the other has more important implications than those predicted from game-winning probabilities alone. Some of our findings are quite surprising, and unattainable through Monte Carlo experiments. Our results are of high practical relevance to international federations and local tournament organizers alike, and also open the way to efficient estimation of the rally-winning probabilities, which should have a significant impact on the quality of ranking procedures.

preprint2010arXiv

Multivariate quantiles and multiple-output regression quantiles: From $L_1$ optimization to halfspace depth

A new multivariate concept of quantile, based on a directional version of Koenker and Bassett's traditional regression quantiles, is introduced for multivariate location and multiple-output regression problems. In their empirical version, those quantiles can be computed efficiently via linear programming techniques. Consistency, Bahadur representation and asymptotic normality results are established. Most importantly, the contours generated by those quantiles are shown to coincide with the classical halfspace depth contours associated with the name of Tukey. This relation does not only allow for efficient depth contour computations by means of parametric linear programming, but also for transferring from the quantile to the depth universe such asymptotic results as Bahadur representations. Finally, linear programming duality opens the way to promising developments in depth-related multivariate rank-based inference.

preprint2010arXiv

Rejoinder to "Multivariate quantiles and multiple-output regression quantiles: From $L_1$ optimization to halfspace depth"

Rejoinder to "Multivariate quantiles and multiple-output regression quantiles: From $L_1$ optimization to halfspace depth" by M. Hallin, D. Paindaveine and M. Siman [arXiv:1002.4486]

preprint2007arXiv

Semiparametrically efficient rank-based inference for shape II. Optimal R-estimation of shape

A class of R-estimators based on the concepts of multivariate signed ranks and the optimal rank-based tests developed in Hallin and Paindaveine [Ann. Statist. 34 (2006)] is proposed for the estimation of the shape matrix of an elliptical distribution. These R-estimators are root-n consistent under any radial density g, without any moment assumptions, and semiparametrically efficient at some prespecified density f. When based on normal scores, they are uniformly more efficient than the traditional normal-theory estimator based on empirical covariance matrices (the asymptotic normality of which, moreover, requires finite moments of order four), irrespective of the actual underlying elliptical density. They rely on an original rank-based version of Le Cam's one-step methodology which avoids the unpleasant nonparametric estimation of cross-information quantities that is generally required in the context of R-estimation. Although they are not strictly affine-equivariant, they are shown to be equivariant in a weak asymptotic sense. Simulations confirm their feasibility and excellent finite-sample performances.

Davy Paindaveine

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Revisiting the name variant of the two-children problem

On the behavior of extreme $d$-dimensional spatial quantiles under minimal assumptions

Inference on the mode of weak directional signals: a Le Cam perspective on hypothesis testing near singularities

On high-dimensional sign tests

Testing uniformity on high-dimensional spheres against monotone rotationally symmetric alternatives

Local bilinear multiple-output quantile/depth regression

Nonparametrically consistent depth-based classifiers

Conditional quantile estimation through optimal quantization

Depth-based Runs Tests for Bivariate Central Symmetry

High-dimensional tests for spherical location and spiked covariance

Probit transformation for nonparametric kernel estimation of the copula density

Optimal rank-based testing for principal components

Semiparametrically efficient inference based on signed ranks in symmetric independent component models

A class of optimal tests for symmetry based on local Edgeworth approximations

A Stochastic Analysis of some Two-Person Sports

Multivariate quantiles and multiple-output regression quantiles: From $L_1$ optimization to halfspace depth

Rejoinder to "Multivariate quantiles and multiple-output regression quantiles: From $L_1$ optimization to halfspace depth"

Semiparametrically efficient rank-based inference for shape II. Optimal R-estimation of shape