Researcher profile

George Deligiannidis

George Deligiannidis contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2024arXiv

Ranking In Generalized Linear Bandits

We study the ranking problem in generalized linear bandits. At each time, the learning agent selects an ordered list of items and observes stochastic outcomes. In recommendation systems, displaying an ordered list of the most attractive items is not always optimal as both position and item dependencies result in a complex reward function. A very naive example is the lack of diversity when all the most attractive items are from the same category. We model the position and item dependencies in the ordered list and design UCB and Thompson Sampling type algorithms for this problem. Our work generalizes existing studies in several directions, including position dependencies where position discount is a particular case, and connecting the ranking problem to graph theory.

preprint2023arXiv

A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs

U-Net architectures are ubiquitous in state-of-the-art deep learning, however their regularisation properties and relationship to wavelets are understudied. In this paper, we formulate a multi-resolution framework which identifies U-Nets as finite-dimensional truncations of models on an infinite-dimensional function space. We provide theoretical results which prove that average pooling corresponds to projection within the space of square-integrable functions and show that U-Nets with average pooling implicitly learn a Haar wavelet basis representation of the data. We then leverage our framework to identify state-of-the-art hierarchical VAEs (HVAEs), which have a U-Net architecture, as a type of two-step forward Euler discretisation of multi-resolution diffusion processes which flow from a point mass, introducing sampling instabilities. We also demonstrate that HVAEs learn a representation of time which allows for improved parameter efficiency through weight-sharing. We use this observation to achieve state-of-the-art HVAE performance with half the number of parameters of existing models, exploiting the properties of our continuous-time formulation.

preprint2022arXiv

Chained Generalisation Bounds

This work discusses how to derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique. By developing a general theoretical framework, we establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts, which can be obtained by lifting the regularity assumption from the loss onto its gradient. This allows us to re-derive the chaining mutual information bound from the literature, and to obtain novel chained information-theoretic generalisation bounds, based on the Wasserstein distance and other probability metrics. We show on some toy examples that the chained generalisation bound can be significantly tighter than its standard counterpart, particularly when the distribution of the hypotheses selected by the algorithm is very concentrated. Keywords: Generalisation bounds; Chaining; Information-theoretic bounds; Mutual information; Wasserstein distance; PAC-Bayes.

preprint2022arXiv

Conditional Simulation Using Diffusion Schrödinger Bridges

Denoising diffusion models have recently emerged as a powerful class of generative models. They provide state-of-the-art results, not only for unconditional simulation, but also when used to solve conditional simulation problems arising in a wide range of inverse problems. A limitation of these models is that they are computationally intensive at generation time as they require simulating a diffusion process over a long time horizon. When performing unconditional simulation, a Schrödinger bridge formulation of generative modeling leads to a theoretically grounded algorithm shortening generation time which is complementary to other proposed acceleration techniques. We extend the Schrödinger bridge framework to conditional simulation. We demonstrate this novel methodology on various applications including image super-resolution, optimal filtering for state-space models and the refinement of pre-trained networks. Our code can be found at https://github.com/vdeborto/cdsb.

preprint2022arXiv

Conditionally Gaussian PAC-Bayes

Recent studies have empirically investigated different methods to train stochastic neural networks on a classification task by optimising a PAC-Bayesian bound via stochastic gradient descent. Most of these procedures need to replace the misclassification error with a surrogate loss, leading to a mismatch between the optimisation objective and the actual generalisation bound. The present paper proposes a novel training algorithm that optimises the PAC-Bayesian bound, without relying on any surrogate loss. Empirical results show that this approach outperforms currently available PAC-Bayesian training methods.

preprint2021arXiv

Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks

Despite its success in a wide range of applications, characterizing the generalization properties of stochastic gradient descent (SGD) in non-convex deep learning problems is still an important challenge. While modeling the trajectories of SGD via stochastic differential equations (SDE) under heavy-tailed gradient noise has recently shed light over several peculiar characteristics of SGD, a rigorous treatment of the generalization properties of such SDEs in a learning theoretical framework is still missing. Aiming to bridge this gap, in this paper, we prove generalization bounds for SGD under the assumption that its trajectories can be well-approximated by a \emph{Feller process}, which defines a rich class of Markov processes that include several recent SDE representations (both Brownian or heavy-tailed) as its special case. We show that the generalization error can be controlled by the \emph{Hausdorff dimension} of the trajectories, which is intimately linked to the tail behavior of the driving process. Our results imply that heavier-tailed processes should achieve better generalization; hence, the tail-index of the process can be used as a notion of "capacity metric". We support our theory with experiments on deep neural networks illustrating that the proposed capacity metric accurately estimates the generalization error, and it does not necessarily grow with the number of parameters unlike the existing capacity metrics in the literature.

preprint2020arXiv

Efficient Irreversible Monte Carlo samplers

We present here two irreversible Markov chain Monte Carlo algorithms for general discrete state systems, one of the algorithms is based on the random-scan Gibbs sampler for discrete states and the other on its improved version, the Metropolized-Gibbs sampler. The algorithms we present incorporate the lifting framework with skewed detailed balance condition and construct irreversible Markov chains that satisfy the balance condition. We have applied our algorithms to 1D 4-state Potts model. The integrated autocorrelation times for magnetisation and energy density indicate a reduction of the dynamical scaling exponent from $z \approx 1$ to $z \approx 1/2$. In addition, we have generalized an irreversible Metropolis-Hastings algorithm with skewed detailed balance, initially introduced by Turitsyn et al. (2011) for the mean field Ising model, to be now readily applicable to classical spin systems in general; application to 1D 4-state Potts model indicate a square root reduction of the mixing time at high temperatures.

preprint2020arXiv

Ensemble Rejection Sampling

We introduce Ensemble Rejection Sampling, a scheme for exact simulation from the posterior distribution of the latent states of a class of non-linear non-Gaussian state-space models. Ensemble Rejection Sampling relies on a proposal for the high-dimensional state sequence built using ensembles of state samples. Although this algorithm can be interpreted as a rejection sampling scheme acting on an extended space, we show under regularity conditions that the expected computational cost to obtain an exact sample increases cubically with the length of the state sequence instead of exponentially for standard rejection sampling. We demonstrate this methodology by sampling exactly state sequences according to the posterior distribution of a stochastic volatility model and a non-linear autoregressive process. We also present an application to rare event simulation.

preprint2020arXiv

Simulated tempering with irreversible Gibbs sampling techniques

We present here two novel algorithms for simulated tempering simulations, which break detailed balance condition (DBC) but satisfy the skewed detailed balance to ensure invariance of the target distribution. The irreversible methods we present here are based on Gibbs sampling and concern breaking DBC at the update scheme of the temperature swaps. We utilise three systems as a test bed for our methods: an MCMC simulation on a simple system described by a 1D double well potential, the Ising model and MD simulations on Alanine pentapeptide (ALA5). The relaxation times of inverse temperature, magnetic susceptibility and energy density for the Ising model indicate clear gains in sampling efficiency over conventional Gibbs sampling techniques with DBC and also over the conventionally used simulated tempering with Metropolis-Hastings (MH) scheme. Simulations on ALA5 with large number of temperatures indicate distinct gains in mixing times for inverse temperature and consequently the energy of the system compared to conventional MH. With no additional computational overhead, our methods were found to be more efficient alternatives to conventionally used simulated tempering methods with DBC. Our algorithms should be particularly advantageous in simulations of large systems with many temperature ladders, as our algorithms showed a more favorable constant scaling in Ising spin systems as compared with both reversible and irreversible MH algorithms. In future applications, our irreversible methods can also be easily tailored to utilize a given dynamical variable other than temperature to flatten rugged free energy landscapes.

preprint2020arXiv

Unbiased Markov chain Monte Carlo for intractable target distributions

Performing numerical integration when the integrand itself cannot be evaluated point-wise is a challenging task that arises in statistical analysis, notably in Bayesian inference for models with intractable likelihood functions. Markov chain Monte Carlo (MCMC) algorithms have been proposed for this setting, such as the pseudo-marginal method for latent variable models and the exchange algorithm for a class of undirected graphical models. As with any MCMC algorithm, the resulting estimators are justified asymptotically in the limit of the number of iterations, but exhibit a bias for any fixed number of iterations due to the Markov chains starting outside of stationarity. This "burn-in" bias is known to complicate the use of parallel processors for MCMC computations. We show how to use coupling techniques to generate unbiased estimators in finite time, building on recent advances for generic MCMC algorithms. We establish the theoretical validity of some of these procedures by extending existing results to cover the case of polynomially ergodic Markov chains. The efficiency of the proposed estimators is compared with that of standard MCMC estimators, with theoretical arguments and numerical experiments including state space models and Ising models.