Researcher profile

Peter Harremoës

Peter Harremoës contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2022arXiv

Unnormalized Measures in Information Theory

Information theory is built on probability measures and by definition a probability measure has total mass 1. Probability measures are used to model uncertainty, and one may ask how important it is that the total mass is one. We claim that the main reason to normalize measures is that probability measures are related to codes via Kraft's inequality. Using a minimum description length approach to statistics we will demonstrate with that measures that are not normalized require a new interpretation that we will call the Poisson interpretation. With the Poisson interpretation many problems can be simplified. The focus will shift from from probabilities to mean values. We give examples of improvements of test procedures, improved inequalities, simplified algorithms, new projection results, and improvements in our description of quantum systems.

preprint2020arXiv

Bounds on the Information Divergence for Hypergeometric Distributions

The hypergeometric distributions have many important applications, but they have not had sufficient attention in information theory. Hypergeometric distributions can be approximated by binomial distributions or Poisson distributions. In this paper we present upper and lower bounds on information divergence. These bounds are important for statistical testing and a better understanding of the notion of exchange-ability.

preprint2020arXiv

From Thermodynamic Sufficiency to Information Causality

The principle called information causality has been used to deduce Tsirelson's bound. In this paper we derive information causality from monotonicity of divergence and relate it to more basic principles related to measurements on thermodynamic systems. This principle is more fundamental in the sense that it can be formulated for both unipartite systems and multipartite systems while information causality is only defined for multipartite systems. Thermodynamic sufficiency is a strong condition that put severe restrictions to shape of the state space to an extend that we conjecture that under very weak regularity conditions it can be used to deduce the complex Hilbert space formalism of quantum theory. Since the notion of sufficiency is relevant for all convex optimization problems there are many examples where it does not apply.

preprint2015arXiv

Lattices with non-Shannon Inequalities

We study the existence or absence of non-Shannon inequalities for variables that are related by functional dependencies. Although the power-set on four variables is the smallest Boolean lattice with non-Shannon inequalities there exist lattices with many more variables without non-Shannon inequalities. We search for conditions that ensures that no non-Shannon inequalities exist. It is demonstrated that 3-dimensional distributive lattices cannot have non-Shannon inequalities and planar modular lattices cannot have non-Shannon inequalities. The existence of non-Shannon inequalities is related to the question of whether a lattice is isomorphic to a lattice of subgroups of a group.

preprint2014arXiv

Minimum KL-divergence on complements of $L_1$ balls

Pinsker's widely used inequality upper-bounds the total variation distance $||P-Q||_1$ in terms of the Kullback-Leibler divergence $D(P||Q)$. Although in general a bound in the reverse direction is impossible, in many applications the quantity of interest is actually $D^*(P,\eps)$ --- defined, for an arbitrary fixed $P$, as the infimum of $D(P||Q)$ over all distributions $Q$ that are $\eps$-far away from $P$ in total variation. We show that $D^*(P,\eps)\le C\eps^2 + O(\eps^3)$, where $C=C(P)=1/2$ for "balanced" distributions, thereby providing a kind of reverse Pinsker inequality. An application to large deviations is given, and some of the structural results may be of independent interest. Keywords: Pinsker inequality, Sanov's theorem, large deviations

preprint2014arXiv

Mutual information of Contingency Tables and Related Inequalities

For testing independence it is very popular to use either the $χ^{2}$-statistic or $G^{2}$-statistics (mutual information). Asymptotically both are $χ^{2}$-distributed so an obvious question is which of the two statistics that has a distribution that is closest to the $χ^{2}$-distribution. Surprisingly the distribution of mutual information is much better approximated by a $χ^{2}$-distribution than the $χ^{2}$-statistic. For technical reasons we shall focus on the simplest case with one degree of freedom. We introduce the signed log-likelihood and demonstrate that its distribution function can be related to the distribution function of a standard Gaussian by inequalities. For the hypergeometric distribution we formulate a general conjecture about how close the signed log-likelihood is to a standard Gaussian, and this conjecture gives much more accurate estimates of the tail probabilities of this type of distribution than previously published results. The conjecture has been proved numerically in all cases relevant for testing independence and further evidence of its validity is given.

preprint2014arXiv

Rényi Divergence and Kullback-Leibler Divergence

Rényi divergence is related to Rényi entropy much like Kullback-Leibler divergence is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Rényi divergence of order 1 equals the Kullback-Leibler divergence. We review and extend the most important properties of Rényi divergence and Kullback-Leibler divergence, including convexity, continuity, limits of $σ$-algebras and the relation of the special order 0 to the Gaussian dichotomy and contiguity. We also show how to generalize the Pythagorean inequality to orders different from 1, and we extend the known equivalence between channel capacity and minimax redundancy to continuous channel inputs (for all orders) and present several other minimax results.

preprint2013arXiv

Extendable MDL

In this paper we show that combination of the minimum description length principle and a exchange-ability condition leads directly to the use of Jeffreys prior. This approach works in most cases even when Jeffreys prior cannot be normalized. Kraft's inequality links codes and distributions but a closer look at this inequality demonstrates that this link only makes sense when sequences are considered as prefixes of potential longer sequences. For technical reasons only results for exponential families are stated. Results on when Jeffreys prior can be normalized after conditioning on a initializing string are given. An exotic case where no initial string allow Jeffreys prior to be normalized is given and some way of handling such exotic cases are discussed.

preprint2012arXiv

Information Divergence is more chi squared distributed than the chi squared statistics

For testing goodness of fit it is very popular to use either the chi square statistic or G statistics (information divergence). Asymptotically both are chi square distributed so an obvious question is which of the two statistics that has a distribution that is closest to the chi square distribution. Surprisingly, when there is only one degree of freedom it seems like the distribution of information divergence is much better approximated by a chi square distribution than the chi square statistic. For random variables we introduce a new transformation that transform several important distributions into new random variables that are almost Gaussian. For the binomial distributions and the Poisson distributions we formulate a general conjecture about how close their transform are to the Gaussian. The conjecture is proved for Poisson distributions.

preprint2010arXiv

Joint Range of f-divergences

We provide a general method for evaluation of the joint range of f-divergences for two different functions f. Via topological arguments we prove that the joint range for general distributions equals the convex hull of the joint range achieved by the distributions on a two-element set. The joint range technique provides important inequalities between different f-divergences with various applications in information theory and statistics.

preprint2010arXiv

On Bahadur Efficiency of Power Divergence Statistics

It is proved that the information divergence statistic is infinitely more Bahadur efficient than the power divergence statistics of the orders $α>1$ as long as the sequence of alternatives is contiguous with respect to the sequence of null-hypotheses and the the number of observations per bin increases to infinity is not very slow. This improves the former result in Harremoës and Vajda (2008) where the the sequence of null-hypotheses was assumed to be uniform and the restrictions on on the numbers of observations per bin were sharper. Moreover, this paper evaluates also the Bahadur efficiency of the power divergence statistics of the remaining positive orders $0< α\leq 1.$ The statistics of these orders are mutually Bahadur-comparable and all of them are more Bahadur efficient than the statistics of the orders $α> 1.$ A detailed discussion of the technical definitions and conditions is given, some unclear points are resolved, and the results are illustrated by examples.

preprint2010arXiv

Rényi Divergence and Majorization

Rényi divergence is related to Rényi entropy much like information divergence (also called Kullback-Leibler divergence or relative entropy) is related to Shannon&#39;s entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as information divergence. We review the most important properties of Rényi divergence, including its relation to some other distances. We show how Rényi divergence appears when the theory of majorization is generalized from the finite to the continuous setting. Finally, Rényi divergence plays a role in analyzing the number of binary questions required to guess the values of a sequence of random variables.