Source author record

George Deligiannidis

George Deligiannidis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.PR Computation cond-mat.stat-mech Computer Vision eess.SP Information Retrieval Information Theory math.DS math.IT math.OC Methodology

Catalog footprint

What is connected

17works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Ranking In Generalized Linear Bandits

We study the ranking problem in generalized linear bandits. At each time, the learning agent selects an ordered list of items and observes stochastic outcomes. In recommendation systems, displaying an ordered list of the most attractive items is not always optimal as both position and item dependencies result in a complex reward function. A very naive example is the lack of diversity when all the most attractive items are from the same category. We model the position and item dependencies in the ordered list and design UCB and Thompson Sampling type algorithms for this problem. Our work generalizes existing studies in several directions, including position dependencies where position discount is a particular case, and connecting the ranking problem to graph theory.

preprint2023arXiv

A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs

U-Net architectures are ubiquitous in state-of-the-art deep learning, however their regularisation properties and relationship to wavelets are understudied. In this paper, we formulate a multi-resolution framework which identifies U-Nets as finite-dimensional truncations of models on an infinite-dimensional function space. We provide theoretical results which prove that average pooling corresponds to projection within the space of square-integrable functions and show that U-Nets with average pooling implicitly learn a Haar wavelet basis representation of the data. We then leverage our framework to identify state-of-the-art hierarchical VAEs (HVAEs), which have a U-Net architecture, as a type of two-step forward Euler discretisation of multi-resolution diffusion processes which flow from a point mass, introducing sampling instabilities. We also demonstrate that HVAEs learn a representation of time which allows for improved parameter efficiency through weight-sharing. We use this observation to achieve state-of-the-art HVAE performance with half the number of parameters of existing models, exploiting the properties of our continuous-time formulation.

preprint2022arXiv

Chained Generalisation Bounds

This work discusses how to derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique. By developing a general theoretical framework, we establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts, which can be obtained by lifting the regularity assumption from the loss onto its gradient. This allows us to re-derive the chaining mutual information bound from the literature, and to obtain novel chained information-theoretic generalisation bounds, based on the Wasserstein distance and other probability metrics. We show on some toy examples that the chained generalisation bound can be significantly tighter than its standard counterpart, particularly when the distribution of the hypotheses selected by the algorithm is very concentrated. Keywords: Generalisation bounds; Chaining; Information-theoretic bounds; Mutual information; Wasserstein distance; PAC-Bayes.

preprint2022arXiv

Conditional Simulation Using Diffusion Schrödinger Bridges

Denoising diffusion models have recently emerged as a powerful class of generative models. They provide state-of-the-art results, not only for unconditional simulation, but also when used to solve conditional simulation problems arising in a wide range of inverse problems. A limitation of these models is that they are computationally intensive at generation time as they require simulating a diffusion process over a long time horizon. When performing unconditional simulation, a Schrödinger bridge formulation of generative modeling leads to a theoretically grounded algorithm shortening generation time which is complementary to other proposed acceleration techniques. We extend the Schrödinger bridge framework to conditional simulation. We demonstrate this novel methodology on various applications including image super-resolution, optimal filtering for state-space models and the refinement of pre-trained networks. Our code can be found at https://github.com/vdeborto/cdsb.

preprint2022arXiv

Conditionally Gaussian PAC-Bayes

Recent studies have empirically investigated different methods to train stochastic neural networks on a classification task by optimising a PAC-Bayesian bound via stochastic gradient descent. Most of these procedures need to replace the misclassification error with a surrogate loss, leading to a mismatch between the optimisation objective and the actual generalisation bound. The present paper proposes a novel training algorithm that optimises the PAC-Bayesian bound, without relying on any surrogate loss. Empirical results show that this approach outperforms currently available PAC-Bayesian training methods.

preprint2021arXiv

Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks

Despite its success in a wide range of applications, characterizing the generalization properties of stochastic gradient descent (SGD) in non-convex deep learning problems is still an important challenge. While modeling the trajectories of SGD via stochastic differential equations (SDE) under heavy-tailed gradient noise has recently shed light over several peculiar characteristics of SGD, a rigorous treatment of the generalization properties of such SDEs in a learning theoretical framework is still missing. Aiming to bridge this gap, in this paper, we prove generalization bounds for SGD under the assumption that its trajectories can be well-approximated by a \emph{Feller process}, which defines a rich class of Markov processes that include several recent SDE representations (both Brownian or heavy-tailed) as its special case. We show that the generalization error can be controlled by the \emph{Hausdorff dimension} of the trajectories, which is intimately linked to the tail behavior of the driving process. Our results imply that heavier-tailed processes should achieve better generalization; hence, the tail-index of the process can be used as a notion of "capacity metric". We support our theory with experiments on deep neural networks illustrating that the proposed capacity metric accurately estimates the generalization error, and it does not necessarily grow with the number of parameters unlike the existing capacity metrics in the literature.

preprint2020arXiv

Efficient Irreversible Monte Carlo samplers

We present here two irreversible Markov chain Monte Carlo algorithms for general discrete state systems, one of the algorithms is based on the random-scan Gibbs sampler for discrete states and the other on its improved version, the Metropolized-Gibbs sampler. The algorithms we present incorporate the lifting framework with skewed detailed balance condition and construct irreversible Markov chains that satisfy the balance condition. We have applied our algorithms to 1D 4-state Potts model. The integrated autocorrelation times for magnetisation and energy density indicate a reduction of the dynamical scaling exponent from $z \approx 1$ to $z \approx 1/2$. In addition, we have generalized an irreversible Metropolis-Hastings algorithm with skewed detailed balance, initially introduced by Turitsyn et al. (2011) for the mean field Ising model, to be now readily applicable to classical spin systems in general; application to 1D 4-state Potts model indicate a square root reduction of the mixing time at high temperatures.

preprint2020arXiv

Ensemble Rejection Sampling

We introduce Ensemble Rejection Sampling, a scheme for exact simulation from the posterior distribution of the latent states of a class of non-linear non-Gaussian state-space models. Ensemble Rejection Sampling relies on a proposal for the high-dimensional state sequence built using ensembles of state samples. Although this algorithm can be interpreted as a rejection sampling scheme acting on an extended space, we show under regularity conditions that the expected computational cost to obtain an exact sample increases cubically with the length of the state sequence instead of exponentially for standard rejection sampling. We demonstrate this methodology by sampling exactly state sequences according to the posterior distribution of a stochastic volatility model and a non-linear autoregressive process. We also present an application to rare event simulation.

preprint2020arXiv

Simulated tempering with irreversible Gibbs sampling techniques

We present here two novel algorithms for simulated tempering simulations, which break detailed balance condition (DBC) but satisfy the skewed detailed balance to ensure invariance of the target distribution. The irreversible methods we present here are based on Gibbs sampling and concern breaking DBC at the update scheme of the temperature swaps. We utilise three systems as a test bed for our methods: an MCMC simulation on a simple system described by a 1D double well potential, the Ising model and MD simulations on Alanine pentapeptide (ALA5). The relaxation times of inverse temperature, magnetic susceptibility and energy density for the Ising model indicate clear gains in sampling efficiency over conventional Gibbs sampling techniques with DBC and also over the conventionally used simulated tempering with Metropolis-Hastings (MH) scheme. Simulations on ALA5 with large number of temperatures indicate distinct gains in mixing times for inverse temperature and consequently the energy of the system compared to conventional MH. With no additional computational overhead, our methods were found to be more efficient alternatives to conventionally used simulated tempering methods with DBC. Our algorithms should be particularly advantageous in simulations of large systems with many temperature ladders, as our algorithms showed a more favorable constant scaling in Ising spin systems as compared with both reversible and irreversible MH algorithms. In future applications, our irreversible methods can also be easily tailored to utilize a given dynamical variable other than temperature to flatten rugged free energy landscapes.

preprint2020arXiv

Unbiased Markov chain Monte Carlo for intractable target distributions

Performing numerical integration when the integrand itself cannot be evaluated point-wise is a challenging task that arises in statistical analysis, notably in Bayesian inference for models with intractable likelihood functions. Markov chain Monte Carlo (MCMC) algorithms have been proposed for this setting, such as the pseudo-marginal method for latent variable models and the exchange algorithm for a class of undirected graphical models. As with any MCMC algorithm, the resulting estimators are justified asymptotically in the limit of the number of iterations, but exhibit a bias for any fixed number of iterations due to the Markov chains starting outside of stationarity. This "burn-in" bias is known to complicate the use of parallel processors for MCMC computations. We show how to use coupling techniques to generate unbiased estimators in finite time, building on recent advances for generic MCMC algorithms. We establish the theoretical validity of some of these procedures by extending existing results to cover the case of polynomially ergodic Markov chains. The efficiency of the proposed estimators is compared with that of standard MCMC estimators, with theoretical arguments and numerical experiments including state space models and Ising models.

preprint2016arXiv

Which ergodic averages have finite asymptotic variance?

We show that the class of $L^2$ functions for which ergodic averages of a reversible Markov chain have finite asymptotic variance is determined by the class of $L^2$ functions for which ergodic averages of its associated jump chain have finite asymptotic variance. This allows us to characterize completely which ergodic averages have finite asymptotic variance when the Markov chain is an independence sampler. In addition, we obtain a simple sufficient condition for all ergodic averages of $L^2$ functions of the primary variable in a pseudo-marginal Markov chain to have finite asymptotic variance.

preprint2015arXiv

Optimal bounds for self-intersection local times

For a random walk $S_n, n\geq 0$ in $\mathbb{Z}^d$, let $l(n,x)$ be its local time at the site $x\in \mathbb{Z}^d$. Define the $α$-fold self intersection local time $L_n(α) := \sum_{x} l(n,x)^α$, and let $L_n(α|ε, d)$ the corresponding quantity for $d$-dimensional simple random walk. Without imposing any moment conditions, we show that the variances of the local times $\mathop{var}(L_n(α))$ of any genuinely $d$-dimensional random walk are bounded above by the corresponding characteristics of the simple symmetric random walk in $\mathbb{Z}^d$, i.e. $\mathop{var}(L_n(α)) \leq C \mathop{var}[L_n(α|ε, d)]\sim K_{d,α}v_{d,α}(n)$. In particular, variances of local times of all genuinely $d$-dimensional random walks, $d\geq 4$, are similar to the $4$-dimensional symmetric case $\mathop{var}(L_n(α)) = O(n)$. On the other hand, in dimensions $d\leq 3$ the resemblance to the simple random walk $\liminf_{n\to \infty} \mathop{var}(L_n(α))/v_{d,α}(n)>0$ implies that the jumps must have zero mean and finite second moment.

preprint2015arXiv

Relative Complexity of Random Walks in Random Scenery in the absence of a weak invariance principle for the local times

We answer the question of Aaronson about the relative complexity of Random Walks in Random Sceneries driven by either aperiodic two dimensional random walks, two-dimensional Simple Random walk, or by aperiodic random walks in the domain of attraction of the Cauchy distribution. A key step is proving that the range of the random walk satisfies the Fölner property almost surely.

preprint2014arXiv

Asymptotic variance of stationary reversible and normal Markov processes

We obtain necessary and sufficient conditions for the regular variation of the variance of partial sums of functionals of discrete and continuous-time stationary Markov processes with normal transition operators. We also construct a class of Metropolis-Hastings algorithms which satisfy a central limit theorem and invariance principle when the variance is not linear in $n$.

preprint2014arXiv

Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator

When an unbiased estimator of the likelihood is used within a Metropolis--Hastings chain, it is necessary to trade off the number of Monte Carlo samples used to construct this estimator against the asymptotic variances of averages computed under this chain. Many Monte Carlo samples will typically result in Metropolis--Hastings averages with lower asymptotic variances than the corresponding Metropolis--Hastings averages using fewer samples. However, the computing time required to construct the likelihood estimator increases with the number of Monte Carlo samples. Under the assumption that the distribution of the additive noise introduced by the log-likelihood estimator is Gaussian with variance inversely proportional to the number of Monte Carlo samples and independent of the parameter value at which it is evaluated, we provide guidelines on the number of samples to select. We demonstrate our results by considering a stochastic volatility model applied to stock index returns.

preprint2013arXiv

Variance of partial sums of stationary sequences

Let $X_1,X_2,\ldots$ be a centred sequence of weakly stationary random variables with spectral measure $F$ and partial sums $S_n=X_1+\cdots+X_n$. We show that $\operatorname {var}(S_n)$ is regularly varying of index $γ$ at infinity, if and only if $G(x):=\int_{-x}^xF(\mathrm {d}x)$ is regularly varying of index $2-γ$ at the origin ($0<γ<2$).

preprint2010arXiv

An asymptotic variance of the self-intersections of random walks

We present a Darboux-Wiener type lemma and apply it to obtain an exact asymptotic for the variance of the self-intersection of one and two-dimensional random walks. As a corollary, we obtain a central limit theorem for random walk in random scenery conjectured by Kesten and Spitzer in 1979.

George Deligiannidis

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Ranking In Generalized Linear Bandits

A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs

Chained Generalisation Bounds

Conditional Simulation Using Diffusion Schrödinger Bridges

Conditionally Gaussian PAC-Bayes

Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks

Efficient Irreversible Monte Carlo samplers

Ensemble Rejection Sampling

Simulated tempering with irreversible Gibbs sampling techniques

Unbiased Markov chain Monte Carlo for intractable target distributions

Which ergodic averages have finite asymptotic variance?

Optimal bounds for self-intersection local times

Relative Complexity of Random Walks in Random Scenery in the absence of a weak invariance principle for the local times

Asymptotic variance of stationary reversible and normal Markov processes

Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator

Variance of partial sums of stationary sequences

An asymptotic variance of the self-intersections of random walks