Source author record

Jonathan H. Manton

Jonathan H. Manton appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT math.OC math.ST Statistics Theory Machine Learning math.HO eess.SP eess.SY Methodology physics.data-an

Catalog footprint

What is connected

18works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems

This paper presents a PAC-Bayes framework for learning controllers for unknown stochastic linear discrete-time systems, where the system parameters are drawn from a fixed but unknown distribution. We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller, and propose novel efficient learning algorithms with theoretical guarantees, which can be implemented for both finite and infinite controller spaces. Compared to prior work, our bound holds for unbounded quadratic cost. In the special case where LQG is optimal, our numerical results suggest that the learned controllers achieve comparable performance to LQG.

preprint2023arXiv

On Orthogonal Approximate Message Passing

Approximate Message Passing (AMP) is an efficient iterative parameter-estimation technique for certain high-dimensional linear systems with non-Gaussian distributions, such as sparse systems. In AMP, a so-called Onsager term is added to keep estimation errors approximately Gaussian. Orthogonal AMP (OAMP) does not require this Onsager term, relying instead on an orthogonalization procedure to keep the current errors uncorrelated with (i.e., orthogonal to) past errors. \LL{In this paper, we show the generality and significance of the orthogonality in ensuring that errors are "asymptotically independently and identically distributed Gaussian" (AIIDG).} This AIIDG property, which is essential for the attractive performance of OAMP, holds for separable functions. \LL{We present a simple and versatile procedure to establish the orthogonality through Gram-Schmidt (GS) orthogonalization, which is applicable to any prototype. We show that different AMP-type algorithms, such as expectation propagation (EP), turbo, AMP and OAMP, can be unified under the orthogonal principle.} The simplicity and generality of OAMP provide efficient solutions for estimation problems beyond the classical linear models. \LL{As an example, we study the optimization of OAMP via the GS model and GS orthogonalization.} More related applications will be discussed in a companion paper where new algorithms are developed for problems with multiple constraints and multiple measurement variables.

preprint2022arXiv

Fast Rate Generalization Error Bounds: Variations on a Theme

A recent line of works, initiated by Russo and Xu, has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of O(sqrt{lambda/n}) where lambda is some information-theoretic quantities such as the mutual information between the data sample and the learned hypothesis. However, such a learning rate is typically considered to be "slow", compared to a "fast rate" of O(1/n) in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate (O(1/n)) result can still be obtained using this bound under appropriate assumptions. Furthermore, we identify the key conditions needed for the fast rate generalization error, which we call the (eta,c)-central condition. Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a convergence rate of O(λ/{n}) for specific learning algorithms such as empirical risk minimization. Finally, analytical examples are given to show the effectiveness of the bounds.

preprint2022arXiv

Tracking and regret bounds for online zeroth-order Euclidean and Riemannian optimisation

We study numerical optimisation algorithms that use zeroth-order information to minimise time-varying geodesically-convex cost functions on Riemannian manifolds. In the Euclidean setting, zeroth-order algorithms have received a lot of attention in both the time-varying and time-invariant cases. However, the extension to Riemannian manifolds is much less developed. We focus on Hadamard manifolds, which are a special class of Riemannian manifolds with global nonpositive curvature that offer convenient grounds for the generalisation of convexity notions. Specifically, we derive bounds on the expected instantaneous tracking error, and we provide algorithm parameter values that minimise the algorithm's performance. Our results illustrate how the manifold geometry in terms of the sectional curvature affects these bounds. Additionally, we provide dynamic regret bounds for this online optimisation setting. To the best of our knowledge, these are the first regret bounds even for the Euclidean version of the problem. Lastly, via numerical simulations, we demonstrate the applicability of our algorithm on an online Karcher mean problem.

preprint2021arXiv

Hidden Markov chains and fields with observations in Riemannian manifolds

Hidden Markov chain, or Markov field, models, with observations in a Euclidean space, play a major role across signal and image processing. The present work provides a statistical framework which can be used to extend these models, along with related, popular algorithms (such as the Baum-Welch algorithm), to the case where the observations lie in a Riemannian manifold. It is motivated by the potential use of hidden Markov chains and fields, with observations in Riemannian manifolds, as models for complex signals and images.

preprint2020arXiv

Asymptotic regime for impropriety tests of complex random vectors

Impropriety testing for complex-valued vector has been considered lately due to potential applications ranging from digital communications to complex media imaging. This paper provides new results for such tests in the asymptotic regime, i.e. when the vector dimension and sample size grow commensurately to infinity. The studied tests are based on invariant statistics named impropriety coefficients. Limiting distributions for these statistics are derived, together with those of the Generalized Likelihood Ratio Test (GLRT) and Roy's test, in the Gaussian case. This characterization in the asymptotic regime allows also to identify a phase transition in Roy's test with potential application in detection of complex-valued low-rank subspace corrupted by proper noise in large datasets. Simulations illustrate the accuracy of the proposed asymptotic approximations.

preprint2020arXiv

Information-theoretic analysis for transfer learning

Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different distributions (denoted as $μ$ and $μ'$, respectively). In this work, we give an information-theoretic analysis on the generalization error and the excess risk of transfer learning algorithms, following a line of work initiated by Russo and Zhou. Our results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence $D(mu||mu')$ plays an important role in characterizing the generalization error in the settings of domain adaptation. Specifically, we provide generalization error upper bounds for general transfer learning algorithms and extend the results to a specific empirical risk minimization (ERM) algorithm where data from both distributions are available in the training phase. We further apply the method to iterative, noisy gradient descent algorithms, and obtain upper bounds which can be easily calculated, only using parameters from the learning algorithms. A few illustrative examples are provided to demonstrate the usefulness of the results. In particular, our bound is tighter in specific classification problems than the bound derived using Rademacher complexity.

preprint2020arXiv

VAR estimators using binary measurements

In this paper, two novel algorithms to estimate a Gaussian Vector Autoregressive (VAR) model from 1-bit measurements are introduced. They are based on the Yule-Walker scheme modified to account for quantisation. The scalar case has been studied before. The main difficulty when going from the scalar to the vector case is how to estimate the ratios of the variances of pairwise components of the VAR model. The first method overcomes this difficulty by requiring the quantisation to be non-symmetric: each component of the VAR model output is replaced by a binary "zero" or a binary "one" depending on whether its value is greater than a strictly positive threshold. Different components of the VAR model can have different thresholds. As the choice of these thresholds has a strong influence on the performance, this first method is best suited for applications where the variance of each time series is approximately known prior to choosing the corresponding threshold. The second method relies instead on symmetric quantisations of not only each component of the VAR model but also on the pairwise differences of the components. These additional measurements are equivalent to a ranking of the instantaneous VAR model output, from the smallest component to the largest component. This avoids the need for choosing thresholds but requires additional hardware for quantising the components in pairs. Numerical simulations show the efficiency of both schemes.

preprint2016arXiv

Expansion-maximization-compression algorithm with spherical harmonics for single particle imaging with X-ray lasers

In 3D single particle imaging with X-ray free-electron lasers, particle orientation is not recorded during measurement but is instead recovered as a necessary step in the reconstruction of a 3D image from the diffraction data. Here we use harmonic analysis on the sphere to cleanly separate the angu- lar and radial degrees of freedom of this problem, providing new opportunities to efficiently use data and computational resources. We develop the Expansion-Maximization-Compression algorithm into a shell-by-shell approach and implement an angular bandwidth limit that can be gradually raised during the reconstruction. We study the minimum number of patterns and minimum rotation sampling required for a desired angular and radial resolution. These extensions provide new av- enues to improve computational efficiency and speed of convergence, which are critically important considering the very large datasets expected from experiment.

preprint2015arXiv

A Primer on Reproducing Kernel Hilbert Spaces

Reproducing kernel Hilbert spaces are elucidated without assuming prior familiarity with Hilbert spaces. Compared with extant pedagogic material, greater care is placed on motivating the definition of reproducing kernel Hilbert spaces and explaining when and why these spaces are efficacious. The novel viewpoint is that reproducing kernel Hilbert space theory studies extrinsic geometry, associating with each geometric configuration a canonical overdetermined coordinate system. This coordinate system varies continuously with changing geometric configurations, making it well-suited for studying problems whose solutions also vary continuously with changing geometry. This primer can also serve as an introduction to infinite-dimensional linear algebra because reproducing kernel Hilbert spaces have more properties in common with Euclidean spaces than do more general Hilbert spaces.

preprint2015arXiv

Density estimation on the rotation group using diffusive wavelets

This paper considers the problem of estimating probability density functions on the rotation group $SO(3)$. Two distinct approaches are proposed, one based on characteristic functions and the other on wavelets using the heat kernel. Expressions are derived for their Mean Integrated Squared Errors. The performance of the estimators is studied numerically and compared with the performance of an existing technique using the De La Vallée Poussin kernel estimator. The heat-kernel wavelet approach appears to offer the best convergence, with faster convergence to the optimal bound and guaranteed positivity of the estimated probability density function.

preprint2015arXiv

Isotropic Multiple Scattering Processes on Hyperspheres

This paper presents several results about isotropic random walks and multiple scattering processes on hyperspheres ${\mathbb S}^{p-1}$. It allows one to derive the Fourier expansions on ${\mathbb S}^{p-1}$ of these processes. A result of unimodality for the multiconvolution of symmetrical probability density functions (pdf) on ${\mathbb S}^{p-1}$ is also introduced. Such processes are then studied in the case where the scattering distribution is von Mises Fisher (vMF). Asymptotic distributions for the multiconvolution of vMFs on ${\mathbb S}^{p-1}$ are obtained. Both Fourier expansion and asymptotic approximation allows us to compute estimation bounds for the parameters of Compound Cox Processes (CCP) on ${\mathbb S}^{p-1}$.

preprint2014arXiv

A Framework for Generalising the Newton Method and Other Iterative Methods from Euclidean Space to Manifolds

The Newton iteration is a popular method for minimising a cost function on Euclidean space. Various generalisations to cost functions defined on manifolds appear in the literature. In each case, the convergence rate of the generalised Newton iteration needed establishing from first principles. The present paper presents a framework for generalising iterative methods from Euclidean space to manifolds that ensures local convergence rates are preserved. It applies to any (memoryless) iterative method computing a coordinate independent property of a function (such as a zero or a local minimum). All possible Newton methods on manifolds are believed to come under this framework. Changes of coordinates, and not any Riemannian structure, are shown to play a natural role in lifting the Newton method to a manifold. The framework also gives new insight into the design of Newton methods in general.

preprint2013arXiv

A Primer on Stochastic Differential Geometry for Signal Processing

This primer explains how continuous-time stochastic processes (precisely, Brownian motion and other Ito diffusions) can be defined and studied on manifolds. No knowledge is assumed of either differential geometry or continuous-time processes. The arguably dry approach is avoided of first introducing differential geometry and only then introducing stochastic processes; both areas are motivated and developed jointly.

preprint2013arXiv

Differential Calculus, Tensor Products and the Importance of Notation

An efficient coordinate-free notation is elucidated for differentiating matrix expressions and other functions between higher-dimensional vector spaces. This method of differentiation is known, but not explained well, in the literature. Teaching it early in the curriculum would avoid the tedium of element-wise differentiation and provide a better footing for understanding more advanced applications of calculus. Additionally, it is shown to lead naturally to tensor products, a topic previously considered too difficult to motivate quickly in elementary ways.

preprint2012arXiv

Optimisation Geometry

This article demonstrates how an understanding of the geometry of a family of cost functions can be used to develop efficient numerical algorithms for real-time optimisation. Crucially, it is not the geometry of the individual functions which is studied, but the geometry of the family as a whole. In some respects, this challenges the conventional divide between convex and non-convex optimisation problems because none of the cost functions in a family need be convex in order for efficient numerical algorithms to exist for optimising in real-time any function belonging to the family. The title "Optimisation Geometry" comes by analogy from the study of the geometry of a family of probability distributions being called information geometry.

preprint2011arXiv

An Introductory Review of Information Theory in the Context of Computational Neuroscience

This paper introduces several fundamental concepts in information theory from the perspective of their origins in engineering. Understanding such concepts is important in neuroscience for two reasons. Simply applying formulae from information theory without understanding the assumptions behind their definitions can lead to erroneous results and conclusions. Furthermore, this century will see a convergence of information theory and neuroscience; information theory will expand its foundations to incorporate more comprehensively biological processes thereby helping reveal how neuronal networks achieve their remarkable information processing abilities.

preprint2009arXiv

Decompounding on compact Lie groups

Noncommutative harmonic analysis is used to solve a nonparametric estimation problem stated in terms of compound Poisson processes on compact Lie groups. This problem of decompounding is a generalization of a similar classical problem. The proposed solution is based on a char- acteristic function method. The treated problem is important to recent models of the physical inverse problem of multiple scattering.

Jonathan H. Manton

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems

On Orthogonal Approximate Message Passing

Fast Rate Generalization Error Bounds: Variations on a Theme

Tracking and regret bounds for online zeroth-order Euclidean and Riemannian optimisation

Hidden Markov chains and fields with observations in Riemannian manifolds

Asymptotic regime for impropriety tests of complex random vectors

Information-theoretic analysis for transfer learning

VAR estimators using binary measurements

Expansion-maximization-compression algorithm with spherical harmonics for single particle imaging with X-ray lasers

A Primer on Reproducing Kernel Hilbert Spaces

Density estimation on the rotation group using diffusive wavelets

Isotropic Multiple Scattering Processes on Hyperspheres

A Framework for Generalising the Newton Method and Other Iterative Methods from Euclidean Space to Manifolds

A Primer on Stochastic Differential Geometry for Signal Processing

Differential Calculus, Tensor Products and the Importance of Notation

Optimisation Geometry

An Introductory Review of Information Theory in the Context of Computational Neuroscience

Decompounding on compact Lie groups