Source author record

Peter Harremoës

Peter Harremoës appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT math.ST Statistics Theory math.PR Machine Learning math.HO q-fin.PM quant-ph

Catalog footprint

What is connected

17works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Unnormalized Measures in Information Theory

Information theory is built on probability measures and by definition a probability measure has total mass 1. Probability measures are used to model uncertainty, and one may ask how important it is that the total mass is one. We claim that the main reason to normalize measures is that probability measures are related to codes via Kraft's inequality. Using a minimum description length approach to statistics we will demonstrate with that measures that are not normalized require a new interpretation that we will call the Poisson interpretation. With the Poisson interpretation many problems can be simplified. The focus will shift from from probabilities to mean values. We give examples of improvements of test procedures, improved inequalities, simplified algorithms, new projection results, and improvements in our description of quantum systems.

preprint2020arXiv

Bounds on the Information Divergence for Hypergeometric Distributions

The hypergeometric distributions have many important applications, but they have not had sufficient attention in information theory. Hypergeometric distributions can be approximated by binomial distributions or Poisson distributions. In this paper we present upper and lower bounds on information divergence. These bounds are important for statistical testing and a better understanding of the notion of exchange-ability.

preprint2020arXiv

From Thermodynamic Sufficiency to Information Causality

The principle called information causality has been used to deduce Tsirelson's bound. In this paper we derive information causality from monotonicity of divergence and relate it to more basic principles related to measurements on thermodynamic systems. This principle is more fundamental in the sense that it can be formulated for both unipartite systems and multipartite systems while information causality is only defined for multipartite systems. Thermodynamic sufficiency is a strong condition that put severe restrictions to shape of the state space to an extend that we conjecture that under very weak regularity conditions it can be used to deduce the complex Hilbert space formalism of quantum theory. Since the notion of sufficiency is relevant for all convex optimization problems there are many examples where it does not apply.

preprint2016arXiv

Sufficiency on the Stock Market

It is well-known that there are a number of relations between theoretical finance theory and information theory. Some of these relations are exact and some are approximate. In this paper we will explore some of these relations and determine under which conditions the relations are exact. It turns out that portfolio theory always leads to Bregman divergences. The Bregman divergence is only proportional to information divergence in situations that are essentially equal to the type of gambling studied by Kelly. This can be related an abstract sufficiency condition.

preprint2016arXiv

Thinning and Information Projections

In this paper we establish lower bounds on information divergence of a distribution on the integers from a Poisson distribution. These lower bounds are tight and in the cases where a rate of convergence in the Law of Thin Numbers can be computed the rate is determined by the lower bounds proved in this paper. General techniques for getting lower bounds in terms of moments are developed. The results about lower bound in the Law of Thin Numbers are used to derive similar results for the Central Limit Theorem.

preprint2015arXiv

Lattices with non-Shannon Inequalities

We study the existence or absence of non-Shannon inequalities for variables that are related by functional dependencies. Although the power-set on four variables is the smallest Boolean lattice with non-Shannon inequalities there exist lattices with many more variables without non-Shannon inequalities. We search for conditions that ensures that no non-Shannon inequalities exist. It is demonstrated that 3-dimensional distributive lattices cannot have non-Shannon inequalities and planar modular lattices cannot have non-Shannon inequalities. The existence of non-Shannon inequalities is related to the question of whether a lattice is isomorphic to a lattice of subgroups of a group.

preprint2015arXiv

Proper Scoring and Sufficiency

Logarithmic score and information divergence appear in both information theory, statistics, statistical mechanics, and portfolio theory. We demonstrate that all these topics involve some kind of optimization that leads directly to the use of Bregman divergences. If a sufficiency condition is also fulfilled the Bregman divergence must be proportional to information divergence. The sufficiency condition has quite different consequences in the different areas of application, and often it is not fulfilled. Therefore the sufficiency condition can be used to explain when results from one area can be transferred directly from one area to another and when one will experience differences.

preprint2014arXiv

Minimum KL-divergence on complements of $L_1$ balls

Pinsker's widely used inequality upper-bounds the total variation distance $||P-Q||_1$ in terms of the Kullback-Leibler divergence $D(P||Q)$. Although in general a bound in the reverse direction is impossible, in many applications the quantity of interest is actually $D^*(P,\eps)$ --- defined, for an arbitrary fixed $P$, as the infimum of $D(P||Q)$ over all distributions $Q$ that are $\eps$-far away from $P$ in total variation. We show that $D^*(P,\eps)\le C\eps^2 + O(\eps^3)$, where $C=C(P)=1/2$ for "balanced" distributions, thereby providing a kind of reverse Pinsker inequality. An application to large deviations is given, and some of the structural results may be of independent interest. Keywords: Pinsker inequality, Sanov's theorem, large deviations

preprint2014arXiv

Mutual information of Contingency Tables and Related Inequalities

For testing independence it is very popular to use either the $χ^{2}$-statistic or $G^{2}$-statistics (mutual information). Asymptotically both are $χ^{2}$-distributed so an obvious question is which of the two statistics that has a distribution that is closest to the $χ^{2}$-distribution. Surprisingly the distribution of mutual information is much better approximated by a $χ^{2}$-distribution than the $χ^{2}$-statistic. For technical reasons we shall focus on the simplest case with one degree of freedom. We introduce the signed log-likelihood and demonstrate that its distribution function can be related to the distribution function of a standard Gaussian by inequalities. For the hypergeometric distribution we formulate a general conjecture about how close the signed log-likelihood is to a standard Gaussian, and this conjecture gives much more accurate estimates of the tail probabilities of this type of distribution than previously published results. The conjecture has been proved numerically in all cases relevant for testing independence and further evidence of its validity is given.

preprint2014arXiv

Rényi Divergence and Kullback-Leibler Divergence

Rényi divergence is related to Rényi entropy much like Kullback-Leibler divergence is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Rényi divergence of order 1 equals the Kullback-Leibler divergence. We review and extend the most important properties of Rényi divergence and Kullback-Leibler divergence, including convexity, continuity, limits of $σ$-algebras and the relation of the special order 0 to the Gaussian dichotomy and contiguity. We also show how to generalize the Pythagorean inequality to orders different from 1, and we extend the known equivalence between channel capacity and minimax redundancy to continuous channel inputs (for all orders) and present several other minimax results.

preprint2013arXiv

Extendable MDL

In this paper we show that combination of the minimum description length principle and a exchange-ability condition leads directly to the use of Jeffreys prior. This approach works in most cases even when Jeffreys prior cannot be normalized. Kraft's inequality links codes and distributions but a closer look at this inequality demonstrates that this link only makes sense when sequences are considered as prefixes of potential longer sequences. For technical reasons only results for exponential families are stated. Results on when Jeffreys prior can be normalized after conditioning on a initializing string are given. An exotic case where no initial string allow Jeffreys prior to be normalized is given and some way of handling such exotic cases are discussed.

preprint2012arXiv

Information Divergence is more chi squared distributed than the chi squared statistics

For testing goodness of fit it is very popular to use either the chi square statistic or G statistics (information divergence). Asymptotically both are chi square distributed so an obvious question is which of the two statistics that has a distribution that is closest to the chi square distribution. Surprisingly, when there is only one degree of freedom it seems like the distribution of information divergence is much better approximated by a chi square distribution than the chi square statistic. For random variables we introduce a new transformation that transform several important distributions into new random variables that are almost Gaussian. For the binomial distributions and the Poisson distributions we formulate a general conjecture about how close their transform are to the Gaussian. The conjecture is proved for Poisson distributions.

preprint2011arXiv

Is Zero a Natural Number?

It is argued that zero should be considered as a cardinal number but not an ordinal number. One should make a clear distinction between order types that are labels for well-ordered sets and ordinal numbers that are labels for the elements in these sets.

preprint2010arXiv

Joint Range of f-divergences

We provide a general method for evaluation of the joint range of f-divergences for two different functions f. Via topological arguments we prove that the joint range for general distributions equals the convex hull of the joint range achieved by the distributions on a two-element set. The joint range technique provides important inequalities between different f-divergences with various applications in information theory and statistics.

preprint2010arXiv

On Bahadur Efficiency of Power Divergence Statistics

It is proved that the information divergence statistic is infinitely more Bahadur efficient than the power divergence statistics of the orders $α>1$ as long as the sequence of alternatives is contiguous with respect to the sequence of null-hypotheses and the the number of observations per bin increases to infinity is not very slow. This improves the former result in Harremoës and Vajda (2008) where the the sequence of null-hypotheses was assumed to be uniform and the restrictions on on the numbers of observations per bin were sharper. Moreover, this paper evaluates also the Bahadur efficiency of the power divergence statistics of the remaining positive orders $0< α\leq 1.$ The statistics of these orders are mutually Bahadur-comparable and all of them are more Bahadur efficient than the statistics of the orders $α> 1.$ A detailed discussion of the technical definitions and conditions is given, some unclear points are resolved, and the results are illustrated by examples.

preprint2010arXiv

On Pairs of $f$-divergences and their Joint Range

We compare two f-divergences and prove that their joint range is the convex hull of the joint range for distributions supported on only two points. Some applications of this result are given.

preprint2010arXiv

Rényi Divergence and Majorization

Rényi divergence is related to Rényi entropy much like information divergence (also called Kullback-Leibler divergence or relative entropy) is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as information divergence. We review the most important properties of Rényi divergence, including its relation to some other distances. We show how Rényi divergence appears when the theory of majorization is generalized from the finite to the continuous setting. Finally, Rényi divergence plays a role in analyzing the number of binary questions required to guess the values of a sequence of random variables.

Peter Harremoës

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Unnormalized Measures in Information Theory

Bounds on the Information Divergence for Hypergeometric Distributions

From Thermodynamic Sufficiency to Information Causality

Sufficiency on the Stock Market

Thinning and Information Projections

Lattices with non-Shannon Inequalities

Proper Scoring and Sufficiency

Minimum KL-divergence on complements of $L_1$ balls

Mutual information of Contingency Tables and Related Inequalities

Rényi Divergence and Kullback-Leibler Divergence

Extendable MDL

Information Divergence is more chi squared distributed than the chi squared statistics

Is Zero a Natural Number?

Joint Range of f-divergences

On Bahadur Efficiency of Power Divergence Statistics

On Pairs of $f$-divergences and their Joint Range

Rényi Divergence and Majorization