Researcher profile

Frank Nielsen

Frank Nielsen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2022arXiv

A note on Onicescu's informational energy and correlation coefficient in exponential families

The informational energy of Onicescu is a positive quantity that measures the amount of uncertainty of a random variable. But contrary to Shannon's entropy, the informational energy increases when randomness decreases. We report closed-form formula for Onicescu's informational energy and its associated correlation coefficient when the probability distributions belong to an exponential family. We show how to instantiate the generic formula for several common exponential families.

preprint2022arXiv

A note on some information-theoretic divergences between Zeta distributions

We consider the zeta distributions which are discrete power law distributions that can be interpreted as the counterparts of the continuous Pareto distributions with unit scale. The family of zeta distributions forms a discrete exponential family with normalizing constants expressed using the Riemann zeta function. We report several information-theoretic measures between zeta distributions and study their underlying information geometry.

preprint2022arXiv

On the Influence of Enforcing Model Identifiability on Learning dynamics of Gaussian Mixture Models

A common way to learn and analyze statistical models is to consider operations in the model parameter space. But what happens if we optimize in the parameter space and there is no one-to-one mapping between the parameter space and the underlying statistical model space? Such cases frequently occur for hierarchical models which include statistical mixtures or stochastic neural networks, and these models are said to be singular. Singular models reveal several important and well-studied problems in machine learning like the decrease in convergence speed of learning trajectories due to attractor behaviors. In this work, we propose a relative reparameterization technique of the parameter space, which yields a general method for extracting regular submodels from singular models. Our method enforces model identifiability during training and we study the learning dynamics for gradient descent and expectation maximization for Gaussian Mixture Models (GMMs) under relative parameterization, showing faster experimental convergence and a improved manifold shape of the dynamics around the singularity. Extending the analysis beyond GMMs, we furthermore analyze the Fisher information matrix under relative reparameterization and its influence on the generalization error, and show how the method can be applied to more complex models like deep neural networks.

preprint2022arXiv

The analytic dually flat space of the mixture family of two prescribed distinct Cauchy distributions

A smooth and strictly convex function on an open convex domain induces both (1) a Hessian manifold with respect to the standard flat Euclidean connection, and (2) a dually flat space of information geometry. We first review these constructions and illustrate how to instantiate them for (a) full regular exponential families from their partition functions, (b) regular homogeneous cones from their characteristic functions, and (c) mixture families from their Shannon negentropy functions. Although these structures can be explicitly built for many common examples of the first two classes, the differential entropy of a continuous statistical mixture with distinct prescribed density components sharing the same support is hitherto not known in closed form, hence forcing implementations of mixture family manifolds in practice using Monte Carlo sampling. In this work, we report a notable exception: The family of mixtures defined as the convex combination of two prescribed and distinct Cauchy distributions. As a byproduct, we report closed-form formula for the Jensen-Shannon divergence between two mixtures of two prescribed Cauchy components.

preprint2022arXiv

Tractable structured natural gradient descent using local parameterizations

Natural-gradient descent (NGD) on structured parameter spaces (e.g., low-rank covariances) is computationally challenging due to difficult Fisher-matrix computations. We address this issue by using \emph{local-parameter coordinates} to obtain a flexible and efficient NGD method that works well for a wide-variety of structured parameterizations. We show four applications where our method (1) generalizes the exponential natural evolutionary strategy, (2) recovers existing Newton-like algorithms, (3) yields new structured second-order algorithms via matrix groups, and (4) gives new algorithms to learn covariances of Gaussian and Wishart-based distributions. We show results on a range of problems from deep learning, variational inference, and evolution strategies. Our work opens a new direction for scalable structured geometric methods.

preprint2021arXiv

Likelihood Ratio Exponential Families

The exponential family is well known in machine learning and statistical physics as the maximum entropy distribution subject to a set of observed constraints, while the geometric mixture path is common in MCMC methods such as annealed importance sampling. Linking these two ideas, recent work has interpreted the geometric mixture path as an exponential family of distributions to analyze the thermodynamic variational objective (TVO). We extend these likelihood ratio exponential families to include solutions to rate-distortion (RD) optimization, the information bottleneck (IB) method, and recent rate-distortion-classification approaches which combine RD and IB. This provides a common mathematical framework for understanding these methods via the conjugate duality of exponential families and hypothesis testing. Further, we collect existing results to provide a variational representation of intermediate RD or TVO distributions as a minimizing an expectation of KL divergences. This solution also corresponds to a size-power tradeoff using the likelihood ratio test and the Neyman Pearson lemma. In thermodynamic integration bounds such as the TVO, we identify the intermediate distribution whose expected sufficient statistics match the log partition function.

preprint2021arXiv

On information projections between multivariate elliptical and location-scale families

We study information projections with respect to statistical $f$-divergences between any two location-scale families. We consider a multivariate generalization of the location-scale families which includes the elliptical and the spherical subfamilies. By using the action of the multivariate location-scale group, we show how to reduce the calculation of $f$-divergences between any two location-scale densities to canonical settings involving standard densities, and derive thereof fast Monte Carlo estimators of $f$-divergences with good properties. Finally, we prove that the minimum $f$-divergence between a prescribed density of a location-scale family and another location-scale family is independent of the prescribed location-scale parameter. We interpret geometrically this property.

preprint2021arXiv

On the Kullback-Leibler divergence between discrete normal distributions

Discrete normal distributions are defined as the distributions with prescribed means and covariance matrices which maximize entropy on the integer lattice support. The set of discrete normal distributions form an exponential family with cumulant function related to the Riemann theta function. In this paper, we present several formula for common statistical divergences between discrete normal distributions including the Kullback-Leibler divergence. In particular, we describe an efficient approximation technique for calculating the Kullback-Leibler divergence between discrete normal distributions via the Rényi $α$-divergences or the projective $γ$-divergences.

preprint2021arXiv

On the Kullback-Leibler divergence between location-scale densities

We show that the $f$-divergence between any two densities of potentially different location-scale families can be reduced to the calculation of the $f$-divergence between one standard density with another location-scale density. It follows that the $f$-divergence between two scale densities depends only on the scale ratio. We then report conditions on the standard distribution to get symmetric $f$-divergences: First, we prove that all $f$-divergences between densities of a location family are symmetric whenever the standard density is even, and second, we illustrate a generic symmetric property with the calculation of the Kullback-Leibler divergence between scale Cauchy distributions. Finally, we show that the minimum $f$-divergence of any query density of a location-scale family to another location-scale family is independent of the query location-scale parameters.

preprint2020arXiv

Cumulant-free closed-form formulas for some common (dis)similarities between densities of an exponential family

It is well-known that the Bhattacharyya, Hellinger, Kullback-Leibler, $α$-divergences, and Jeffreys' divergences between densities belonging to a same exponential family have generic closed-form formulas relying on the strictly convex and real-analytic cumulant function characterizing the exponential family. In this work, we report (dis)similarity formulas which bypass the explicit use of the cumulant function and highlight the role of quasi-arithmetic means and their multivariate mean operator extensions. In practice, these cumulant-free formulas are handy when implementing these (dis)similarities using legacy Application Programming Interfaces (APIs) since our method requires only to partially factorize the densities canonically of the considered exponential family.

preprint2020arXiv

k-medoids and p-median clustering are solvable in polynomial time for a 2d Pareto front

This paper examines a common extension of k-medoids and k-median clustering in the case of a two-dimensional Pareto front, as generated by bi-objective optimization approaches. A characterization of optimal clusters is provided, which allows to solve the optimization problems to optimality in polynomial time using a common dynamic programming algorithm. More precisely, having $N$ points to cluster in $K$ subsets, the complexity of the algorithm is proven in $O(N^3)$ time and $O(K.N)$ memory space when $K\geqslant 3$, cases $K=2$ having a time complexity in $O(N^2)$. Furthermore, speeding-up the dynamic programming algorithm is possible avoiding useless computations, for a practical speed-up without improving the complexity. Parallelization issues are also discussed, to speed-up the algorithm in practice.

preprint2020arXiv

On Voronoi diagrams and dual Delaunay complexes on the information-geometric Cauchy manifolds

We study the Voronoi diagrams of a finite set of Cauchy distributions and their dual complexes from the viewpoint of information geometry by considering the Fisher-Rao distance, the Kullback-Leibler divergence, the chi square divergence, and a flat divergence derived from Tsallis' quadratic entropy related to the conformal flattening of the Fisher-Rao curved geometry. We prove that the Voronoi diagrams of the Fisher-Rao distance, the chi square divergence, and the Kullback-Leibler divergences all coincide with a hyperbolic Voronoi diagram on the corresponding Cauchy location-scale parameters, and that the dual Cauchy hyperbolic Delaunay complexes are Fisher orthogonal to the Cauchy hyperbolic Voronoi diagrams. The dual Voronoi diagrams with respect to the dual forward/reverse flat divergences amount to dual Bregman Voronoi diagrams, and their dual complexes are regular triangulations. The primal Bregman-Tsallis Voronoi diagram corresponds to the hyperbolic Voronoi diagram and the dual Bregman-Tsallis Voronoi diagram coincides with the ordinary Euclidean Voronoi diagram. Besides, we prove that the square root of the Kullback-Leibler divergence between Cauchy distributions yields a metric distance which is Hilbertian for the Cauchy scale families.

preprint2020arXiv

Schoenberg-Rao distances: Entropy-based and geometry-aware statistical Hilbert distances

Distances between probability distributions that take into account the geometry of their sample space,like the Wasserstein or the Maximum Mean Discrepancy (MMD) distances have received a lot of attention in machine learning as they can, for instance, be used to compare probability distributions with disjoint supports. In this paper, we study a class of statistical Hilbert distances that we term the Schoenberg-Rao distances, a generalization of the MMD that allows one to consider a broader class of kernels, namely the conditionally negative semi-definite kernels. In particular, we introduce a principled way to construct such kernels and derive novel closed-form distances between mixtures of Gaussian distributions. These distances, derived from the concave Rao's quadratic entropy, enjoy nice theoretical properties and possess interpretable hyperparameters which can be tuned for specific applications. Our method constitutes a practical alternative to Wasserstein distances and we illustrate its efficiency on a broad range of machine learning tasks such as density estimation, generative modeling and mixture simplification.

preprint2016arXiv

A series of maximum entropy upper bounds of the differential entropy

We present a series of closed-form maximum entropy upper bounds for the differential entropy of a continuous univariate random variable and study the properties of that series. We then show how to use those generic bounds for upper bounding the differential entropy of Gaussian mixture models. This requires to calculate the raw moments and raw absolute moments of Gaussian mixtures in closed-form that may also be handy in statistical machine learning and information theory. We report on our experiments and discuss on the tightness of those bounds.