Source author record

Ioannis Kontoyiannis

Ioannis Kontoyiannis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

21works
12topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

21 published item(s)

preprint2024arXiv

Relative entropy bounds for sampling with and without replacement

Sharp, nonasymptotic bounds are obtained for the relative entropy between the distributions of sampling with and without replacement from an urn with balls of $c\geq 2$ colors. Our bounds are asymptotically tight in certain regimes and, unlike previous results, they depend on the number of balls of each colour in the urn. The connection of these results with finite de Finetti-style theorems is explored, and it is observed that a sampling bound due to Stam (1978) combined with the convexity of relative entropy yield a new finite de Finetti bound in relative entropy, which achieves the optimal asymptotic convergence rate.

preprint2022arXiv

Bayesian Context Trees: Modelling and exact inference for discrete time series

We develop a new Bayesian modelling framework for the class of higher-order, variable-memory Markov chains, and introduce an associated collection of methodological tools for exact inference with discrete time series. We show that a version of the context tree weighting algorithm can compute the prior predictive likelihood exactly (averaged over both models and parameters), and two related algorithms are introduced, which identify the a posteriori most likely models and compute their exact posterior probabilities. All three algorithms are deterministic and have linear-time complexity. A family of variable-dimension Markov chain Monte Carlo samplers is also provided, facilitating further exploration of the posterior. The performance of the proposed methods in model selection, Markov order estimation and prediction is illustrated through simulation experiments and real-world applications with data from finance, genetics, neuroscience, and animal communication. The associated algorithms are implemented in the R package BCT.

preprint2022arXiv

Information in probability: Another information-theoretic proof of a finite de Finetti theorem

We recall some of the history of the information-theoretic approach to deriving core results in probability theory and indicate parts of the recent resurgence of interest in this area with current progress along several interesting directions. Then we give a new information-theoretic proof of a finite version of de Finetti's classical representation theorem for finite-valued random variables. We derive an upper bound on the relative entropy between the distribution of the first $k$ in a sequence of $n$ exchangeable random variables, and an appropriate mixture over product distributions. The mixing measure is characterised as the law of the empirical measure of the original sequence, and de Finetti's result is recovered as a corollary. The proof is nicely motivated by the Gibbs conditioning principle in connection with statistical mechanics, and it follows along an appealing sequence of steps. The technical estimates required for these steps are obtained via the use of a collection of combinatorial tools known within information theory as `the method of types.'

preprint2021arXiv

Fundamental Limits of Lossless Data Compression with Side Information

The problem of lossless data compression with side information available to both the encoder and the decoder is considered. The finite-blocklength fundamental limits of the best achievable performance are defined, in two different versions of the problem: Reference-based compression, when a single side information string is used repeatedly in compressing different source messages, and pair-based compression, where a different side information string is used for each source message. General achievability and converse theorems are established for arbitrary source-side information pairs. Nonasymptotic normal approximation expansions are proved for the optimal rate in both the reference-based and pair-based settings, for memoryless sources. These are stated in terms of explicit, finite-blocklength bounds, that are tight up to third-order terms. Extensions that go significantly beyond the class of memoryless sources are obtained. The relevant source dispersion is identified and its relationship with the conditional varentropy rate is established. Interestingly, the dispersion is different in reference-based and pair-based compression, and it is proved that the reference-based dispersion is in general smaller.

preprint2020arXiv

A Simple Network of Nodes Moving on the Circle

Two simple Markov processes are examined, one in discrete and one in continuous time, arising from idealized versions of a transmission protocol for mobile, delay-tolerant networks. We consider two independent walkers moving with constant speed on either the discrete or continuous circle, and changing directions at independent geometric (respectively, exponential) times. One of the walkers carries a message that wishes to travel as far and as fast as possible in the clockwise direction. The message stays with its current carrier unless the two walkers meet, the carrier is moving counter-clockwise, and the other walker is moving clockwise. In that case, the message jumps to the other walker. The long-term average clockwise speed of the message is computed. An explicit expression is derived via the solution of an associated boundary value problem in terms of the generator of the underlying Markov process. The average transmission cost is also similarly computed, measured as the long-term number of jumps the message makes per unit time. The tradeoff between speed and cost is examined, as a function of the underlying problem parameters.

preprint2020arXiv

Differential Temporal Difference Learning

Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques. Computation of the solution to the associated Bellman equations is challenging in most practical cases of interest. A popular class of approximation techniques, known as Temporal Difference (TD) learning algorithms, are an important sub-class of general reinforcement learning methods. The algorithms introduced in this paper are intended to resolve two well-known difficulties of TD-learning approaches: Their slow convergence due to very high variance, and the fact that, for the problem of computing the relative value function, consistent algorithms exist only in special cases. First we show that the gradients of these value functions admit a representation that lends itself to algorithm design. Based on this result, a new class of differential TD-learning algorithms is introduced. For Markovian models on Euclidean space with smooth dynamics, the algorithms are shown to be consistent under general conditions. Numerical results show dramatic variance reduction when compared to standard methods.

preprint2020arXiv

Nonasymptotic Gaussian Approximation for Inference with Stable Noise

The results of a series of theoretical studies are reported, examining the convergence rate for different approximate representations of $α$-stable distributions. Although they play a key role in modelling random processes with jumps and discontinuities, the use of $α$-stable distributions in inference often leads to analytically intractable problems. The LePage series, which is a probabilistic representation employed in this work, is used to transform an intractable, infinite-dimensional inference problem into a conditionally Gaussian parametric problem. A major component of our approach is the approximation of the tail of this series by a Gaussian random variable. Standard statistical techniques, such as Expectation-Maximization, Markov chain Monte Carlo, and Particle Filtering, can then be applied. In addition to the asymptotic normality of the tail of this series, we establish explicit, nonasymptotic bounds on the approximation error. Their proofs follow classical Fourier-analytic arguments, using Esséen's smoothing lemma. Specifically, we consider the distance between the distributions of: $(i)$~the tail of the series and an appropriate Gaussian; $(ii)$~the full series and the truncated series; and $(iii)$~the full series and the truncated series with an added Gaussian term. In all three cases, sharp bounds are established, and the theoretical results are compared with the actual distances (computed numerically) in specific examples of symmetric $α$-stable distributions. This analysis facilitates the selection of appropriate truncations in practice and offers theoretical guarantees for the accuracy of resulting estimates. One of the main conclusions obtained is that, for the purposes of inference, the use of a truncated series together with an approximately Gaussian error term has superior statistical properties and is likely a preferable choice in practice.

preprint2020arXiv

Sharp Second-Order Pointwise Asymptotics for Lossless Compression with Side Information

The problem of determining the best achievable performance of arbitrary lossless compression algorithms is examined, when correlated side information is available at both the encoder and decoder. For arbitrary source-side information pairs, the conditional information density is shown to provide a sharp asymptotic lower bound for the description lengths achieved by an arbitrary sequence of compressors. This implies that, for ergodic source-side information pairs, the conditional entropy rate is the best achievable asymptotic lower bound to the rate, not just in expectation but with probability one. Under appropriate mixing conditions, a central limit theorem and a law of the iterated logarithm are proved, describing the inevitable fluctuations of the second-order asymptotically best possible rate. An idealised version of Lempel-Ziv coding with side information is shown to be universally first- and second-order asymptotically optimal, under the same conditions. These results are in part based on a new almost-sure invariance principle for the conditional information density, which may be of independent interest.

preprint2020arXiv

The Lévy State Space Model

In this paper we introduce a new class of state space models based on shot-noise simulation representations of non-Gaussian Lévy-driven linear systems, represented as stochastic differential equations. In particular a conditionally Gaussian version of the models is proposed that is able to capture heavy-tailed non-Gaussianity while retaining tractability for inference procedures. We focus on a canonical class of such processes, the $α$-stable Lévy processes, which retain important properties such as self-similarity and heavy-tails, while emphasizing that broader classes of non-Gaussian Lévy processes may be handled by similar methodology. An important feature is that we are able to marginalise both the skewness and the scale parameters of these challenging models from posterior probability distributions. The models are posed in continuous time and so are able to deal with irregular data arrival times. Example modelling and inference procedures are provided using Rao-Blackwellised sequential Monte Carlo applied to a two-dimensional Langevin model, and this is tested on real exchange rate data.

preprint2016arXiv

Approximating a Diffusion by a Hidden Markov Model

For a wide class of continuous-time Markov processes, including all irreducible hypoelliptic diffusions evolving on an open, connected subset of $\RL^d$, the following are shown to be equivalent: (i) The process satisfies (a slightly weaker version of) the classical Donsker-Varadhan conditions; (ii) The transition semigroup of the process can be approximated by a finite-state hidden Markov model, in a strong sense in terms of an associated operator norm; (iii) The resolvent kernel of the process is `$v$-separable', that is, it can be approximated arbitrarily well in operator norm by finite-rank kernels. Under any (hence all) of the above conditions, the Markov process is shown to have a purely discrete spectrum on a naturally associated weighted $L_\infty$ space.

preprint2016arXiv

Estimating the Directed Information and Testing for Causality

The problem of estimating the directed information rate between two discrete processes $\{X_n\}$ and $\{Y_n\}$ via the plug-in (or maximum-likelihood) estimator is considered. When the joint process $\{(X_n,Y_n)\}$ is a Markov chain of a given memory length, the plug-in estimator is shown to be asymptotically Gaussian and to converge at the optimal rate $O(1/\sqrt{n})$ under appropriate conditions; this is the first estimator that has been shown to achieve this rate. An important connection is drawn between the problem of estimating the directed information rate and that of performing a hypothesis test for the presence of causal influence between the two processes. Under fairly general conditions, the null hypothesis, which corresponds to the absence of causal influence, is equivalent to the requirement that the directed information rate be equal to zero. In that case a finer result is established, showing that the plug-in converges at the faster rate $O(1/n)$ and that it is asymptotically $χ^2$-distributed. This is proved by showing that this estimator is equal to (a scalar multiple of) the classical likelihood ratio statistic for the above hypothesis test. Finally it is noted that these results facilitate the design of an actual likelihood ratio test for the presence or absence of causal influence.

preprint2016arXiv

Thinning and Information Projections

In this paper we establish lower bounds on information divergence of a distribution on the integers from a Poisson distribution. These lower bounds are tight and in the cases where a rate of convergence in the Law of Thin Numbers can be computed the rate is determined by the lower bounds proved in this paper. General techniques for getting lower bounds in terms of moments are developed. The results about lower bound in the Law of Thin Numbers are used to derive similar results for the Central Limit Theorem.

preprint2015arXiv

Entropy bounds on abelian groups and the Ruzsa divergence

Over the past few years, a family of interesting new inequalities for the entropies of sums and differences of random variables has been developed by Ruzsa, Tao and others, motivated by analogous results in additive combinatorics. The present work extends these earlier results to the case of random variables taking values in $\mathbb{R}^n$ or, more generally, in arbitrary locally compact and Polish abelian groups. We isolate and study a key quantity, the Ruzsa divergence between two probability distributions, and we show that its properties can be used to extend the earlier inequalities to the present general setting. The new results established include several variations on the theme that the entropies of the sum and the difference of two independent random variables severely constrain each other. Although the setting is quite general, the result are already of interest (and new) for random vectors in $\mathbb{R}^n$. In that special case, quantitative bounds are provided for the stability of the equality conditions in the entropy power inequality; a reverse entropy power inequality for log-concave random vectors is proved; an information-theoretic analog of the Rogers-Shephard inequality for convex bodies is established; and it is observed that some of these results lead to new inequalities for the determinants of positive-definite matrices. Moreover, by considering the multiplicative subgroups of the complex plane, one obtains new inequalities for the differential entropies of products and ratios of nonzero, complex-valued random variables.

preprint2012arXiv

Lossless Data Compression at Finite Blocklengths

This paper provides an extensive study of the behavior of the best achievable rate (and other related fundamental limits) in variable-length lossless compression. In the non-asymptotic regime, the fundamental limits of fixed-to-variable lossless compression with and without prefix constraints are shown to be tightly coupled. Several precise, quantitative bounds are derived, connecting the distribution of the optimal codelengths to the source information spectrum, and an exact analysis of the best achievable rate for arbitrary sources is given. Fine asymptotic results are proved for arbitrary (not necessarily prefix) compressors on general mixing sources. Non-asymptotic, explicit Gaussian approximation bounds are established for the best achievable rate on Markov sources. The source dispersion and the source varentropy rate are defined and characterized. Together with the entropy rate, the varentropy rate serves to tightly approximate the fundamental non-asymptotic limits of fixed-to-variable compression for all but very small blocklengths.

preprint2012arXiv

Sumset and Inverse Sumset Inequalities for Differential Entropy and Mutual Information

The sumset and inverse sumset theories of Freiman, Plünnecke and Ruzsa, give bounds connecting the cardinality of the sumset $A+B=\{a+b\;;\;a\in A,\,b\in B\}$ of two discrete sets $A,B$, to the cardinalities (or the finer structure) of the original sets $A,B$. For example, the sum-difference bound of Ruzsa states that, $|A+B|\,|A|\,|B|\leq|A-B|^3$, where the difference set $A-B= \{a-b\;;\;a\in A,\,b\in B\}$. Interpreting the differential entropy $h(X)$ of a continuous random variable $X$ as (the logarithm of) the size of the effective support of $X$, the main contribution of this paper is a series of natural information-theoretic analogs for these results. For example, the Ruzsa sum-difference bound becomes the new inequality, $h(X+Y)+h(X)+h(Y)\leq 3h(X-Y)$, for any pair of independent continuous random variables $X$ and $Y$. Our results include differential-entropy versions of Ruzsa's triangle inequality, the Plünnecke-Ruzsa inequality, and the Balog-Szemerédi-Gowers lemma. Also we give a differential entropy version of the Freiman-Green-Ruzsa inverse-sumset theorem, which can be seen as a quantitative converse to the entropy power inequality. Versions of most of these results for the discrete entropy $H(X)$ were recently proved by Tao, relying heavily on a strong, functional form of the submodularity property of $H(X)$. Since differential entropy is {\em not} functionally submodular, in the continuous case many of the corresponding discrete proofs fail, in many cases requiring substantially new proof strategies. We find that the basic property that naturally replaces the discrete functional submodularity, is the data processing property of mutual information.

preprint2011arXiv

Log-concavity, ultra-log-concavity, and a maximum entropy property of discrete compound Poisson measures

Sufficient conditions are developed, under which the compound Poisson distribution has maximal entropy within a natural class of probability measures on the nonnegative integers. Recently, one of the authors [O. Johnson, {\em Stoch. Proc. Appl.}, 2007] used a semigroup approach to show that the Poisson has maximal entropy among all ultra-log-concave distributions with fixed mean. We show via a non-trivial extension of this semigroup approach that the natural analog of the Poisson maximum entropy property remains valid if the compound Poisson distributions under consideration are log-concave, but that it fails in general. A parallel maximum entropy result is established for the family of compound binomial measures. Sufficient conditions for compound distributions to be log-concave are discussed and applications to combinatorics are examined; new bounds are derived on the entropy of the cardinality of a random independent set in a claw-free graph, and a connection is drawn to Mason's conjecture for matroids. The present results are primarily motivated by the desire to provide an information-theoretic foundation for compound Poisson approximation and associated limit theorems, analogous to the corresponding developments for the central limit theorem and for Poisson approximation. Our results also demonstrate new links between some probabilistic methods and the combinatorial notions of log-concavity and ultra-log-concavity, and they add to the growing body of work exploring the applications of maximum entropy characterizations to problems in discrete mathematics.

preprint2010arXiv

Control Variates for Reversible MCMC Samplers

A general methodology is introduced for the construction and effective application of control variates to estimation problems involving data from reversible MCMC samplers. We propose the use of a specific class of functions as control variates, and we introduce a new, consistent estimator for the values of the coefficients of the optimal linear combination of these functions. The form and proposed construction of the control variates is derived from our solution of the Poisson equation associated with a specific MCMC scenario. The new estimator, which can be applied to the same MCMC sample, is derived from a novel, finite-dimensional, explicit representation for the optimal coefficients. The resulting variance-reduction methodology is primarily applicable when the simulated data are generated by a conjugate random-scan Gibbs sampler. MCMC examples of Bayesian inference problems demonstrate that the corresponding reduction in the estimation variance is significant, and that in some cases it can be quite dramatic. Extensions of this methodology in several directions are given, including certain families of Metropolis-Hastings samplers and hybrid Metropolis-within-Gibbs algorithms. Corresponding simulation examples are presented illustrating the utility of the proposed methods. All methodological and asymptotic arguments are rigorously justified under easily verifiable and essentially minimal conditions.

preprint2010arXiv

Notes on Using Control Variates for Estimation with Reversible MCMC Samplers

A general methodology is presented for the construction and effective use of control variates for reversible MCMC samplers. The values of the coefficients of the optimal linear combination of the control variates are computed, and adaptive, consistent MCMC estimators are derived for these optimal coefficients. All methodological and asymptotic arguments are rigorously justified. Numerous MCMC simulation examples from Bayesian inference applications demonstrate that the resulting variance reduction can be quite dramatic.

preprint2009arXiv

Thinning, Entropy and the Law of Thin Numbers

Renyi's "thinning" operation on a discrete random variable is a natural discrete analog of the scaling operation for continuous random variables. The properties of thinning are investigated in an information-theoretic context, especially in connection with information-theoretic inequalities related to Poisson approximation results. The classical Binomial-to-Poisson convergence (sometimes referred to as the "law of small numbers" is seen to be a special case of a thinning limit theorem for convolutions of discrete distributions. A rate of convergence is provided for this limit, and nonasymptotic bounds are also established. This development parallels, in part, the development of Gaussian inequalities leading to the information-theoretic version of the central limit theorem. In particular, a "thinning Markov chain" is introduced, and it is shown to play a role analogous to that of the Ornstein-Uhlenbeck process in connection to the entropy power inequality.

preprint2008arXiv

On the entropy and log-concavity of compound Poisson measures

Motivated, in part, by the desire to develop an information-theoretic foundation for compound Poisson approximation limit theorems (analogous to the corresponding developments for the central limit theorem and for simple Poisson approximation), this work examines sufficient conditions under which the compound Poisson distribution has maximal entropy within a natural class of probability measures on the nonnegative integers. We show that the natural analog of the Poisson maximum entropy property remains valid if the measures under consideration are log-concave, but that it fails in general. A parallel maximum entropy result is established for the family of compound binomial measures. The proofs are largely based on ideas related to the semigroup approach introduced in recent work by Johnson for the Poisson family. Sufficient conditions are given for compound distributions to be log-concave, and specific examples are presented illustrating all the above results.

preprint2004arXiv

Entropy and the Law of Small Numbers

Two new information-theoretic methods are introduced for establishing Poisson approximation inequalities. First, using only elementary information-theoretic techniques it is shown that, when $S_n=\sum_{i=1}^nX_i$ is the sum of the (possibly dependent) binary random variables $X_1,X_2,...,X_n$, with $E(X_i)=p_i$ and $E(S_n)=\la$, then \ben D(P_{S_n}\|\Pol)\leq \sum_{i=1}^n p_i^2 + \Big[\sum_{i=1}^nH(X_i) - H(X_1,X_2,..., X_n)\Big], \een where $D(P_{S_n}\|{Po}(\la))$ is the relative entropy between the distribution of $S_n$ and the Poisson($\la$) distribution. The first term in this bound measures the individual smallness of the $X_i$ and the second term measures their dependence. A general method is outlined for obtaining corresponding bounds when approximating the distribution of a sum of general discrete random variables by an infinitely divisible distribution. Second, in the particular case when the $X_i$ are independent, the following sharper bound is established, \ben D(P_{S_n}\|\Pol)\leq \frac{1}λ \sum_{i=1}^n \frac{p_i^3}{1-p_i}, % \label{eq:abs2} \een and it is also generalized to the case when the $X_i$ are general integer-valued random variables. Its proof is based on the derivation of a subadditivity property for a new discrete version of the Fisher information, and uses a recent logarithmic Sobolev inequality for the Poisson distribution.