Researcher profile

Michael Gastpar

Michael Gastpar contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

Contraction of Rényi Divergences for Discrete Channels: Properties and Applications

This work explores properties of Strong Data-Processing constants for Rényi Divergences. Parallels are made with the well-studied $φ$-Divergences, and it is shown that the order $α$ of Rényi Divergences dictates whether certain properties of the contraction of $φ$-Divergences are mirrored or not. In particular, we demonstrate that when $α>1$, the contraction properties can deviate quite strikingly from those of $φ$-Divergences. We also uncover specific characteristics of contraction for the $\infty$-Rényi Divergence and relate it to $\varepsilon$-Local Differential Privacy. The results are then applied to bound the speed of convergence of Markov chains, where we argue that the contraction of Rényi Divergences offers a new perspective on the contraction of $L^α$-norms commonly studied in the literature.

preprint2026arXiv

Model non-collapse: Minimax bounds for recursive discrete distribution estimation

Learning discrete distributions from i.i.d. samples is a well-understood problem. However, advances in generative machine learning prompt an interesting new, non-i.i.d. setting: after receiving a certain number of samples, an estimated distribution is fixed, and samples from this estimate are drawn and introduced into the sample corpus, undifferentiated from real samples. Subsequent generations of estimators now face contaminated environments, a scenario referred to in the machine learning literature as self-consumption. Empirically, it has been observed that models in fully synthetic self-consuming loops collapse -- their performance deteriorates with each batch of training -- but accumulating data has been shown to prevent complete degeneration. This, in turn, begs the question: What happens when fresh real samples \textit{are} added at every stage? In this paper, we study the minimax loss of self-consuming discrete distribution estimation in such loops. We show that even when model collapse is consciously averted, the ratios between the minimax losses with and without source information can grow unbounded as the batch size increases. In the data accumulation setting, where all batches of samples are available for estimation, we provide minimax lower bounds and upper bounds that are order-optimal under mild conditions for the expected $\ell_2^2$ and $\ell_1$ losses at every stage. We provide conditions for regimes where there is a strict gap in the convergence rates compared to the corresponding oracle-assisted minimax loss where real and synthetic samples are differentiated, and provide examples where this gap is easily observed. We also provide a lower bound on the minimax loss in the data replacement setting, where only the latest batch of samples is available, and use it to find a lower bound for the worst-case loss for bounded estimate trajectories.

preprint2022arXiv

A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs

How does the geometric representation of a dataset change after the application of each randomly initialized layer of a neural network? The celebrated Johnson--Lindenstrauss lemma answers this question for linear fully-connected neural networks (FNNs), stating that the geometry is essentially preserved. For FNNs with the ReLU activation, the angle between two inputs contracts according to a known mapping. The question for non-linear convolutional neural networks (CNNs) becomes much more intricate. To answer this question, we introduce a geometric framework. For linear CNNs, we show that the Johnson--Lindenstrauss lemma continues to hold, namely, that the angle between two inputs is preserved. For CNNs with ReLU activation, on the other hand, the behavior is richer: The angle between the outputs contracts, where the level of contraction depends on the nature of the inputs. In particular, after one layer, the geometry of natural images is essentially preserved, whereas for Gaussian correlated inputs, CNNs exhibit the same contracting behavior as FNNs with ReLU activation.

preprint2022arXiv

Finite Littlestone Dimension Implies Finite Information Complexity

We prove that every online learnable class of functions of Littlestone dimension $d$ admits a learning algorithm with finite information complexity. Towards this end, we use the notion of a globally stable algorithm. Generally, the information complexity of such a globally stable algorithm is large yet finite, roughly exponential in $d$. We also show there is room for improvement; for a canonical online learnable class, indicator functions of affine subspaces of dimension $d$, the information complexity can be upper bounded logarithmically in $d$.

preprint2022arXiv

From Generalisation Error to Transportation-cost Inequalities and Back

In this work, we connect the problem of bounding the expected generalisation error with transportation-cost inequalities. Exposing the underlying pattern behind both approaches we are able to generalise them and go beyond Kullback-Leibler Divergences/Mutual Information and sub-Gaussian measures. In particular, we are able to provide a result showing the equivalence between two families of inequalities: one involving functionals and one involving measures. This result generalises the one proposed by Bobkov and Götze that connects transportation-cost inequalities with concentration of measure. Moreover, it allows us to recover all standard generalisation error bounds involving mutual information and to introduce new, more general bounds, that involve arbitrary divergence measures.

preprint2022arXiv

Lower-bounds on the Bayesian Risk in Estimation Procedures via $f$-Divergences

We consider the problem of parameter estimation in a Bayesian setting and propose a general lower-bound that includes part of the family of $f$-Divergences. The results are then applied to specific settings of interest and compared to other notable results in the literature. In particular, we show that the known bounds using Mutual Information can be improved by using, for example, Maximal Leakage, Hellinger divergence, or generalizations of the Hockey-Stick divergence.

preprint2022arXiv

On Sibson's $α$-Mutual Information

We explore a family of information measures that stems from Rényi&#39;s $α$-Divergences with $α<0$. In particular, we extend the definition of Sibson&#39;s $α$-Mutual Information to negative values of $α$ and show several properties of these objects. Moreover, we highlight how this family of information measures is related to functional inequalities that can be employed in a variety of fields, including lower-bounds on the Risk in Bayesian Estimation Procedures.

preprint2022arXiv

Shannon Bounds on Lossy Gray-Wyner Networks

The Gray-Wyner network subject to a fidelity criterion is studied. Upper and lower bounds for the trade-offs between the private sum-rate and the common rate are obtained for arbitrary sources subject to mean-squared error distortion. The bounds meet exactly, leading to the computation of the rate region, when the source is jointly Gaussian. They meet partially when the sources are modeled via an additive Gaussian &#34;channel&#34;. The bounds are inspired from the Shannon bounds on the rate-distortion problem.

preprint2022arXiv

The Price of Distributed: Rate Loss in the CEO Problem

In the distributed remote (CEO) source coding problem, many separate encoders observe independently noisy copies of an underlying source. The rate loss is the difference between the rate required in this distributed setting and the rate that would be required in a setting where the encoders can fully cooperate. In this sense, the rate loss characterizes the price of distributed processing. We survey and extend the known results on the rate loss in various settings, with a particular emphasis on the case where the noise in the observations is Gaussian, but the underlying source is general.

preprint2021arXiv

Learning, compression, and leakage: Minimising classification error via meta-universal compression principles

Learning and compression are driven by the common aim of identifying and exploiting statistical regularities in data, which opens the door for fertile collaboration between these areas. A promising group of compression techniques for learning scenarios is normalised maximum likelihood (NML) coding, which provides strong guarantees for compression of small datasets - in contrast with more popular estimators whose guarantees hold only in the asymptotic limit. Here we consider a NML-based decision strategy for supervised classification problems, and show that it attains heuristic PAC learning when applied to a wide variety of models. Furthermore, we show that the misclassification rate of our method is upper bounded by the maximal leakage, a recently proposed metric to quantify the potential of data leakage in privacy-sensitive scenarios.

preprint2021arXiv

Lower bound on Wyner&#39;s Common Information

An important notion of common information between two random variables is due to Wyner. In this paper, we derive a lower bound on Wyner&#39;s common information for continuous random variables. The new bound improves on the only other general lower bound on Wyner&#39;s common information, which is the mutual information. We also show that the new lower bound is tight for the so-called &#34;Gaussian channels&#34; case, namely, when the joint distribution of the random variables can be written as the sum of a single underlying random variable and Gaussian noises. We motivate this work from the recent variations of Wyner&#39;s common information and applications to network data compression problems such as the Gray-Wyner network.

preprint2020arXiv

Common Information Components Analysis

We give an information-theoretic interpretation of Canonical Correlation Analysis (CCA) via (relaxed) Wyner&#39;s common information. CCA permits to extract from two high-dimensional data sets low-dimensional descriptions (features) that capture the commonalities between the data sets, using a framework of correlations and linear transforms. Our interpretation first extracts the common information up to a pre-selected resolution level, and then projects this back onto each of the data sets. In the case of Gaussian statistics, this procedure precisely reduces to CCA, where the resolution level specifies the number of CCA components that are extracted. This also suggests a novel algorithm, Common Information Components Analysis (CICA), with several desirable features, including a natural extension to beyond just two data sets.

preprint2020arXiv

Robust Generalization via $α$-Mutual Information

The aim of this work is to provide bounds connecting two probability measures of the same event using Rényi $α$-Divergences and Sibson&#39;s $α$-Mutual Information, a generalization of respectively the Kullback-Leibler Divergence and Shannon&#39;s Mutual Information. A particular case of interest can be found when the two probability measures considered are a joint distribution and the corresponding product of marginals (representing the statistically independent scenario). In this case, a bound using Sibson&#39;s $α-$Mutual Information is retrieved, extending a result involving Maximal Leakage to general alphabets. These results have broad applications, from bounding the generalization error of learning algorithms to the more general framework of adaptive data analysis, provided that the divergences and/or information measures used are amenable to such an analysis ({\it i.e.,} are robust to post-processing and compose adaptively). The generalization error bounds are derived with respect to high-probability events but a corresponding bound on expected generalization error is also retrieved.