Source author record

Luc Pronzato

Luc Pronzato appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Computation Machine Learning Methodology math.PR

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Non-asymptotic quantisation of spherically symmetric distributions

Zador's celebrated theorem is a cornerstone of optimal quantisation, establishing both the weak limit of the empirical distribution of an $n$-point optimal quantiser in $R^d$ and the decay rate of the associated $L_s$-mean quantisation error. However, for large dimensions $d$, observing this asymptotic behaviour demands an astronomically large sample size $n$, which grows super-exponentially with $d$. Through a detailed analysis of the quantisation problem for spherically symmetric distributions, we demonstrate that for moderate $n$ random quantisers uniformly distributed on a sphere of suitable radius $r$ achieve exceptional performance. The expected distortion, expressed as a triple integral, can be computed with arbitrary precision, and the optimal radius $r$ can be efficiently determined numerically. Leveraging results from extreme-value theory, we derive approximations for $r$, particularly in scenarios where $n$ scales with $d$. Depending on the growth rate of $n$, $r$ may either converge to zero or approach a limiting value that is independent of $s$.

preprint2022arXiv

Model predictivity assessment: incremental test-set selection and accuracy evaluation

Unbiased assessment of the predictivity of models learnt by supervised machine-learning methods requires knowledge of the learned function over a reserved test set (not used by the learning algorithm). The quality of the assessment depends, naturally, on the properties of the test set and on the error statistic used to estimate the prediction error. In this work we tackle both issues, proposing a new predictivity criterion that carefully weights the individual observed errors to obtain a global error estimate, and using incremental experimental design methods to "optimally" select the test points on which the criterion is computed. Several incremental constructions are studied, including greedy-packing (coffee-house design), support points and kernel herding techniques. Our results show that the incremental and weighted versions of the latter two, based on Maximum Mean Discrepancy concepts, yield superior performance. An industrial test case provided by the historical French electricity supplier (EDF) illustrates the practical relevance of the methodology, indicating that it is an efficient alternative to expensive cross-validation techniques.

preprint2022arXiv

Performance analysis of greedy algorithms for minimising a Maximum Mean Discrepancy

We analyse the performance of several iterative algorithms for the quantisation of a probability measure $μ$, based on the minimisation of a Maximum Mean Discrepancy (MMD). Our analysis includes kernel herding, greedy MMD minimisation and Sequential Bayesian Quadrature (SBQ). We show that the finite-sample-size approximation error, measured by the MMD, decreases as $1/n$ for SBQ and also for kernel herding and greedy MMD minimisation when using a suitable step-size sequence. The upper bound on the approximation error is slightly better for SBQ, but the other methods are significantly faster, with a computational cost that increases only linearly with the number of points selected. This is illustrated by two numerical examples, with the target measure $μ$ being uniform (a space-filling design application) and with $μ$ a Gaussian mixture. They suggest that the bounds derived in the paper are overly pessimistic, in particular for SBQ. The sources of this pessimism are identified but seem difficult to counter.

preprint2020arXiv

Sequential online subsampling for thinning experimental designs

We consider a design problem where experimental conditions (design points $X_i$) are presented in the form of a sequence of i.i.d.\ random variables, generated with an unknown probability measure $μ$, and only a given proportion $α\in(0,1)$ can be selected. The objective is to select good candidates $X_i$ on the fly and maximize a concave function $Φ$ of the corresponding information matrix. The optimal solution corresponds to the construction of an optimal bounded design measure $ξ_α^*\leq μ/α$, with the difficulty that $μ$ is unknown and $ξ_α^*$ must be constructed online. The construction proposed relies on the definition of a threshold $τ$ on the directional derivative of $Φ$ at the current information matrix, the value of $τ$ being fixed by a certain quantile of the distribution of this directional derivative. Combination with recursive quantile estimation yields a nonlinear two-time-scale stochastic approximation method. It can be applied to very long design sequences since only the current information matrix and estimated quantile need to be stored. Convergence to an optimum design is proved. Various illustrative examples are presented.

preprint2015arXiv

An extended Generalised Variance, with Applications

We consider a measure $ψ$ k of dispersion which extends the notion of Wilk's generalised variance, or entropy, for a d-dimensional distribution, and is based on the mean squared volume of simplices of dimension k $\le$ d formed by k + 1 independent copies. We show how $ψ$ k can be expressed in terms of the eigenvalues of the covariance matrix of the distribution, also when a n-point sample is used for its estimation, and prove its concavity when raised at a suitable power. Some properties of entropy-maximising distributions are derived, including a necessary and sufficient condition for optimality. Finally, we show how this measure of dispersion can be used for the design of optimal experiments, with equivalence to A and D-optimal design for k = 1 and k = d respectively. Simple illustrative examples are presented.

preprint2014arXiv

Optimum design accounting for the global nonlinear behavior of the model

Among the major difficulties that one may encounter when estimating parameters in a nonlinear regression model are the nonuniqueness of the estimator, its instability with respect to small perturbations of the observations and the presence of local optimizers of the estimation criterion. We show that these estimability issues can be taken into account at the design stage, through the definition of suitable design criteria. Extensions of $E$-, $c$- and $G$-optimality criteria are considered, which when evaluated at a given $θ^0$ (local optimal design), account for the behavior of the model response $η(θ)$ for $θ$ far from $θ^0$. In particular, they ensure some protection against close-to-overlapping situations where $\|η(θ)-η(θ^0)\|$ is small for some $θ$ far from $θ^0$. These extended criteria are concave and necessary and sufficient conditions for optimality (equivalence theorems) can be formulated. They are not differentiable, but when the design space is finite and the set $Θ$ of admissible $θ$ is discretized, optimal design forms a linear programming problem which can be solved directly or via relaxation when $Θ$ is just compact. Several examples are presented.

preprint2013arXiv

A delimitation of the support of optimal designs for Kiefer's $ϕ_p$-class of criteria

The paper extends the result of Harman and Pronzato [Stat. & Prob. Lett., 77:90--94, 2007], which corresponds to $p=0$, to all strictly concave criteria in Kiefer's $ϕ_p$-class. Let $ξ$ be any design on a compact set $X\subset\mathbb{R}^m$ with a nonsingular information matrix $\Mb(ξ)$, and let $δ$ be the maximum of the directional derivative $F_{ϕ_p}(ξ,x)$ over all $x\in X$. We show that any support point $x_*$ of a $ϕ_p$-optimal design satisfies the inequality $F_{ϕ_p}(ξ,x_*) \geq h_p[\Mb(ξ),δ]$, where the bound $h_p[\Mb(ξ),δ]$ is easily computed: it requires the determination of the unique root of a simple univariate equation (polynomial when $p$ is integer) in a given interval. The construction can be used to accelerate algorithms for $ϕ_p$-optimal design and is illustrated on an example with $A$-optimal design.

preprint2013arXiv

Efficient Prediction Designs for Random Fields

For estimation and predictions of random fields it is increasingly acknowledged that the kriging variance may be a poor representative of true uncertainty. Experimental designs based on more elaborate criteria that are appropriate for empirical kriging are then often non-space-filling and very costly to determine. In this paper, we investigate the possibility of using a compound criterion inspired by an equivalence theorem type relation to build designs quasi-optimal for the empirical kriging variance, when space-filling designs become unsuitable. Two algorithms are proposed, one relying on stochastic optimization to explicitly identify the Pareto front, while the second uses the surrogate criteria as local heuristic to chose the points at which the (costly) true Empirical Kriging variance is effectively computed. We illustrate the performance of the algorithms presented on both a simple simulated example and a real oceanographic dataset.

preprint2012arXiv

A class of Rényi information estimators for multidimensional densities

A class of estimators of the Rényi and Tsallis entropies of an unknown distribution $f$ in $\mathbb{R}^m$ is presented. These estimators are based on the $k$th nearest-neighbor distances computed from a sample of $N$ i.i.d. vectors with distribution $f$. We show that entropies of any order $q$, including Shannon's entropy, can be estimated consistently with minimal assumptions on $f$. Moreover, we show that it is straightforward to extend the nearest-neighbor method to estimate the statistical distance between two distributions using one i.i.d. sample from each. (Wit Correction.)

Luc Pronzato

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Non-asymptotic quantisation of spherically symmetric distributions

Model predictivity assessment: incremental test-set selection and accuracy evaluation

Performance analysis of greedy algorithms for minimising a Maximum Mean Discrepancy

Sequential online subsampling for thinning experimental designs

An extended Generalised Variance, with Applications

Optimum design accounting for the global nonlinear behavior of the model

A delimitation of the support of optimal designs for Kiefer's $ϕ_p$-class of criteria

Efficient Prediction Designs for Random Fields

A class of Rényi information estimators for multidimensional densities