Source author record

Nihat Ay

Nihat Ay appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

36works

25topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Invariance Properties of the Natural Gradient in Overparametrised Systems

The natural gradient field is a vector field that lives on a model equipped with a distinguished Riemannian metric, e.g. the Fisher-Rao metric, and represents the direction of steepest ascent of an objective function on the model with respect to this metric. In practice, one tries to obtain the corresponding direction on the parameter space by multiplying the ordinary gradient by the inverse of the Gram matrix associated with the metric. We refer to this vector on the parameter space as the natural parameter gradient. In this paper we study when the pushforward of the natural parameter gradient is equal to the natural gradient. Furthermore we investigate the invariance properties of the natural parameter gradient. Both questions are addressed in an overparametrised setting.

preprint2022arXiv

Natural Reweighted Wake-Sleep

Helmholtz Machines (HMs) are a class of generative models composed of two Sigmoid Belief Networks (SBNs), acting respectively as an encoder and a decoder. These models are commonly trained using a two-step optimization algorithm called Wake-Sleep (WS) and more recently by improved versions, such as Reweighted Wake-Sleep (RWS) and Bidirectional Helmholtz Machines (BiHM). The locality of the connections in an SBN induces sparsity in the Fisher Information Matrices associated to the probabilistic models, in the form of a finely-grained block-diagonal structure. In this paper we exploit this property to efficiently train SBNs and HMs using the natural gradient. We present a novel algorithm, called Natural Reweighted Wake-Sleep (NRWS), that corresponds to the geometric adaptation of its standard version. In a similar manner, we also introduce Natural Bidirectional Helmholtz Machine (NBiHM). Differently from previous work, we will show how for HMs the natural gradient can be efficiently computed without the need of introducing any approximation in the structure of the Fisher information matrix. The experiments performed on standard datasets from the literature show a consistent improvement of NRWS and NBiHM not only with respect to their non-geometric baselines but also with respect to state-of-the-art training algorithms for HMs. The improvement is quantified both in terms of speed of convergence as well as value of the log-likelihood reached after training.

preprint2021arXiv

Complexity as Causal Information Integration

Complexity measures in the context of the Integrated Information Theory of consciousness try to quantify the strength of the causal connections between different neurons. This is done by minimizing the KL-divergence between a full system and one without causal connections. Various measures have been proposed and compared in this setting. We will discuss a class of information geometric measures that aim at assessing the intrinsic causal influences in a system. One promising candidate of these measures, denoted by $Φ_{CIS}$, is based on conditional independence statements and does satisfy all of the properties that have been postulated as desirable. Unfortunately it does not have a graphical representation which makes it less intuitive and difficult to analyze. We propose an alternative approach using a latent variable which models a common exterior influence. This leads to a measure $Φ_{CII}$, Causal Information Integration, that satisfies all of the required conditions. Our measure can be calculated using an iterative information geometric algorithm, the em-algorithm. Therefore we are able to compare its behavior to existing integrated information measures.

preprint2020arXiv

Confounding Ghost Channels and Causality: A New Approach to Causal Information Flows

Information theory provides a fundamental framework for the quantification of information flows through channels, formally Markov kernels. However, quantities such as mutual information and conditional mutual information do not necessarily reflect the causal nature of such flows. We argue that this is often the result of conditioning based on sigma algebras that are not associated with the given channels. We propose a version of the (conditional) mutual information based on families of sigma algebras that are coupled with the underlying channel. This leads to filtrations which allow us to prove a corresponding causal chain rule as a basic requirement within the presented approach.

preprint2020arXiv

On the Locality of the Natural Gradient for Deep Learning

We study the natural gradient method for learning in deep Bayesian networks, including neural networks. There are two natural geometries associated with such learning systems consisting of visible and hidden units. One geometry is related to the full system, the other one to the visible sub-system. These two geometries imply different natural gradients. In a first step, we demonstrate a great simplification of the natural gradient with respect to the first geometry, due to locality properties of the Fisher information matrix. This simplification does not directly translate to a corresponding simplification with respect to the second geometry. We develop the theory for studying the relation between the two versions of the natural gradient and outline a method for the simplification of the natural gradient with respect to the second geometry based on the first one. This method suggests to incorporate a recognition model as an auxiliary model for the efficient application of the natural gradient method in deep networks.

preprint2016arXiv

Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes

It is well known that for any finite state Markov decision process (MDP) there is a memoryless deterministic policy that maximizes the expected reward. For partially observable Markov decision processes (POMDPs), optimal memoryless policies are generally stochastic. We study the expected reward optimization problem over the set of memoryless stochastic policies. We formulate this as a constrained linear optimization problem and develop a corresponding geometric framework. We show that any POMDP has an optimal memoryless policy of limited stochasticity, which allows us to reduce the dimensionality of the search space. Experiments demonstrate that this approach enables better and faster convergence of the policy gradient on the evaluated systems.

preprint2016arXiv

Information Theoretically Aided Reinforcement Learning for Embodied Agents

Reinforcement learning for embodied agents is a challenging problem. The accumulated reward to be optimized is often a very rugged function, and gradient methods are impaired by many local optimizers. We demonstrate, in an experimental setting, that incorporating an intrinsic reward can smoothen the optimization landscape while preserving the global optimizers of interest. We show that policy gradient optimization for locomotion in a complex morphology is significantly improved when supplementing the extrinsic reward by an intrinsic reward defined in terms of the mutual information of time consecutive sensor readings.

preprint2016arXiv

Iterative Scaling Algorithm for Channels

Here we define a procedure for evaluating KL-projections (I- and rI-projections) of channels. These can be useful in the decomposition of mutual information between input and outputs, e.g. to quantify synergies and interactions of different orders, as well as information integration and other related measures of complexity. The algorithm is a generalization of the standard iterative scaling algorithm, which we here extend from probability distributions to channels (also known as transition kernels).

preprint2016arXiv

The Umwelt of an Embodied Agent -- A Measure-Theoretic Definition

We consider a general model of the sensorimotor loop of an agent interacting with the world. This formalises Uexküll's notion of a \emph{function-circle}. Here, we assume a particular causal structure, mechanistically described in terms of Markov kernels. In this generality, we define two $σ$-algebras of events in the world that describe two respective perspectives: (1) the perspective of an external observer, (2) the intrinsic perspective of the agent. Not all aspects of the world, seen from the external perspective, are accessible to the agent. This is expressed by the fact that the second $σ$-algebra is a subalgebra of the first one. We propose the smaller one as formalisation of Uexküll's \emph{Umwelt} concept. We show that, under continuity and compactness assumptions, the global dynamics of the world can be simplified without changing the internal process. This simplification can serve as a minimal world model that the system must have in order to be consistent with the internal process.

preprint2015arXiv

Evaluating Morphological Computation in Muscle and DC-motor Driven Models of Human Hopping

In the context of embodied artificial intelligence, morphological computation refers to processes which are conducted by the body (and environment) that otherwise would have to be performed by the brain. Exploiting environmental and morphological properties is an important feature of embodied systems. The main reason is that it allows to significantly reduce the controller complexity. An important aspect of morphological computation is that it cannot be assigned to an embodied system per se, but that it is, as we show, behavior- and state-dependent. In this work, we evaluate two different measures of morphological computation that can be applied in robotic systems and in computer simulations of biological movement. As an example, these measures were evaluated on muscle and DC-motor driven hopping models. We show that a state-dependent analysis of the hopping behaviors provides additional insights that cannot be gained from the averaged measures alone. This work includes algorithms and computer code for the measures.

preprint2015arXiv

Geometry and Expressive Power of Conditional Restricted Boltzmann Machines

Conditional restricted Boltzmann machines are undirected stochastic neural networks with a layer of input and output units connected bipartitely to a layer of hidden units. These networks define models of conditional probability distributions on the states of the output units given the states of the input units, parametrized by interaction weights and biases. We address the representational power of these models, proving results their ability to represent conditional Markov random fields and conditional distributions with restricted supports, the minimal size of universal approximators, the maximal model approximation errors, and on the dimension of the set of representable conditional distributions. We contribute new tools for investigating conditional probability models, which allow us to improve the results that can be derived from existing work on restricted Boltzmann machine probability models.

preprint2015arXiv

Hierarchical Quantification of Synergy in Channels

The decomposition of channel information into synergies of different order is an open, active problem in the theory of complex systems. Most approaches to the problem are based on information theory, and propose decompositions of mutual information between inputs and outputs in se\-veral ways, none of which is generally accepted yet. We propose a new point of view on the topic. We model a multi-input channel as a Markov kernel. We can project the channel onto a series of exponential families which form a hierarchical structure. This is carried out with tools from information geometry, in a way analogous to the projections of probability distributions introduced by Amari. A Pythagorean relation leads naturally to a decomposition of the mutual information between inputs and outputs into terms which represent single node information, pairwise interactions, and in general n-node interactions. The synergy measures introduced in this paper can be easily evaluated by an iterative scaling algorithm, which is a standard procedure in information geometry.

preprint2015arXiv

Maximizing the divergence from a hierarchical model of quantum states

We study many-party correlations quantified in terms of the Umegaki relative entropy (divergence) from a Gibbs family known as a hierarchical model. We derive these quantities from the maximum-entropy principle which was used earlier to define the closely related irreducible correlation. We point out differences between quantum states and probability vectors which exist in hierarchical models, in the divergence from a hierarchical model and in local maximizers of this divergence. The differences are, respectively, missing factorization, discontinuity and reduction of uncertainty. We discuss global maximizers of the mutual information of separable qubit states.

preprint2014arXiv

A Theory of Cheap Control in Embodied Systems

We present a framework for designing cheap control architectures for embodied agents. Our derivation is guided by the classical problem of universal approximation, whereby we explore the possibility of exploiting the agent's embodiment for a new and more efficient universal approximation of behaviors generated by sensorimotor control. This embodied universal approximation is compared with the classical non-embodied universal approximation. To exemplify our approach, we present a detailed quantitative case study for policy models defined in terms of conditional restricted Boltzmann machines. In contrast to non-embodied universal approximation, which requires an exponential number of parameters, in the embodied setting we are able to generate all possible behaviors with a drastically smaller model, thus obtaining cheap universal approximation. We test and corroborate the theory experimentally with a six-legged walking machine. The experiments show that the sufficient controller complexity predicted by our theory is tight, which means that the theory has direct practical implications. Keywords: cheap design, embodiment, sensorimotor loop, universal approximation, conditional restricted Boltzmann machine

preprint2014arXiv

Expressive Power and Approximation Errors of Restricted Boltzmann Machines

We present explicit classes of probability distributions that can be learned by Restricted Boltzmann Machines (RBMs) depending on the number of units that they contain, and which are representative for the expressive power of the model. We use this to show that the maximal Kullback-Leibler divergence to the RBM model with $n$ visible and $m$ hidden units is bounded from above by $n - \left\lfloor \log(m+1) \right\rfloor - \frac{m+1}{2^{\left\lfloor\log(m+1)\right\rfloor}} \approx (n -1) - \log(m+1)$. In this way we can specify the number of hidden units that guarantees a sufficiently rich model containing different classes of distributions and respecting a given error tolerance.

preprint2014arXiv

On the Fisher Metric of Conditional Probability Polytopes

We consider three different approaches to define natural Riemannian metrics on polytopes of stochastic matrices. First, we define a natural class of stochastic maps between these polytopes and give a metric characterization of Chentsov type in terms of invariance with respect to these maps. Second, we consider the Fisher metric defined on arbitrary polytopes through their embeddings as exponential families in the probability simplex. We show that these metrics can also be characterized by an invariance principle with respect to morphisms of exponential families. Third, we consider the Fisher metric resulting from embedding the polytope of stochastic matrices in a simplex of joint distributions by specifying a marginal distribution. All three approaches result in slight variations of products of Fisher metrics. This is consistent with the nature of polytopes of stochastic matrices, which are Cartesian products of probability simplices. The first approach yields a scaled product of Fisher metrics; the second, a product of Fisher metrics; and the third, a product of Fisher metrics scaled by the marginal distribution.

preprint2014arXiv

Quantifying unique information

We propose new measures of shared information, unique information and synergistic information that can be used to decompose the multi-information of a pair of random variables $(Y,Z)$ with a third random variable $X$. Our measures are motivated by an operational idea of unique information which suggests that shared information and unique information should depend only on the pair marginal distributions of $(X,Y)$ and $(X,Z)$. Although this invariance property has not been studied before, it is satisfied by other proposed measures of shared information. The invariance property does not uniquely determine our new measures, but it implies that the functions that we define are bounds to any other measures satisfying the same invariance property. We study properties of our measures and compare them to other candidate measures.

preprint2014arXiv

The Information Theory of Individuality

We consider biological individuality in terms of information theoretic and graphical principles. Our purpose is to extract through an algorithmic decomposition system-environment boundaries supporting individuality. We infer or detect evolved individuals rather than assume that they exist. Given a set of consistent measurements over time, we discover a coarse-grained or quantized description on a system, inducing partitions (which can be nested). Legitimate individual partitions will propagate information from the past into the future, whereas spurious aggregations will not. Individuals are therefore defined in terms of ongoing, bounded information processing units rather than lists of static features or conventional replication-based definitions which tend to fail in the case of cultural change. One virtue of this approach is that it could expand the scope of what we consider adaptive or biological phenomena, particularly in the microscopic and macroscopic regimes of molecular and social phenomena.

preprint2013arXiv

Information driven self-organization of complex robotic behaviors

Information theory is a powerful tool to express principles to drive autonomous systems because it is domain invariant and allows for an intuitive interpretation. This paper studies the use of the predictive information (PI), also called excess entropy or effective measure complexity, of the sensorimotor process as a driving force to generate behavior. We study nonlinear and nonstationary systems and introduce the time-local predicting information (TiPI) which allows us to derive exact results together with explicit update rules for the parameters of the controller in the dynamical systems framework. In this way the information principle, formulated at the level of behavior, is translated to the dynamics of the synapses. We underpin our results with a number of case studies with high-dimensional robotic systems. We show the spontaneous cooperativity in a complex physical system with decentralized control. Moreover, a jointly controlled humanoid robot develops a high behavioral variety depending on its physics and the environment it is dynamically embedded into. The behavior can be decomposed into a succession of low-dimensional modes that increasingly explore the behavior space. This is a promising way to avoid the curse of dimensionality which hinders learning systems to scale well.

preprint2013arXiv

Information geometry and sufficient statistics

Information geometry provides a geometric approach to families of statistical models. The key geometric structures are the Fisher quadratic form and the Amari-Chentsov tensor. In statistics, the notion of sufficient statistic expresses the criterion for passing from one model to another without loss of information. This leads to the question how the geometric structures behave under such sufficient statistics. While this is well studied in the finite sample size case, in the infinite case, we encounter technical problems concerning the appropriate topologies. Here, we introduce notions of parametrized measure models and tensor fields on them that exhibit the right behavior under statistical transformations. Within this framework, we can then handle the topological issues and show that the Fisher metric and the Amari-Chentsov tensor on statistical models in the class of symmetric 2-tensor fields and 3-tensor fields can be uniquely (up to a constant) characterized by their invariance under sufficient statistics, thereby achieving a full generalization of the original result of Chentsov to infinite sample sizes. More generally, we decompose Markov morphisms between statistical models in terms of statistics. In particular, a monotonicity result for the Fisher information naturally follows.

preprint2013arXiv

Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis

One of the main challenges in the field of embodied artificial intelligence is the open-ended autonomous learning of complex behaviours. Our approach is to use task-independent, information-driven intrinsic motivation(s) to support task-dependent learning. The work presented here is a preliminary step in which we investigate the predictive information (the mutual information of the past and future of the sensor stream) as an intrinsic drive, ideally supporting any kind of task acquisition. Previous experiments have shown that the predictive information (PI) is a good candidate to support autonomous, open-ended learning of complex behaviours, because a maximisation of the PI corresponds to an exploration of morphology- and environment-dependent behavioural regularities. The idea is that these regularities can then be exploited in order to solve any given task. Three different experiments are presented and their results lead to the conclusion that the linear combination of the one-step PI with an external reward function is not generally recommended in an episodic policy gradient setting. Only for hard tasks a great speed-up can be achieved at the cost of an asymptotic performance lost.

preprint2013arXiv

Maximal Information Divergence from Statistical Models defined by Neural Networks

We review recent results about the maximal values of the Kullback-Leibler information divergence from statistical models defined by neural networks, including naive Bayes models, restricted Boltzmann machines, deep belief networks, and various classes of exponential families. We illustrate approaches to compute the maximal divergence from a given model starting from simple sub- or super-models. We give a new result for deep and narrow belief networks with finite-valued units.

preprint2013arXiv

Quantifying Morphological Computation

The field of embodied intelligence emphasises the importance of the morphology and environment with respect to the behaviour of a cognitive system. The contribution of the morphology to the behaviour, commonly known as morphological computation, is well-recognised in this community. We believe that the field would benefit from a formalisation of this concept as we would like to ask how much the morphology and the environment contribute to an embodied agent's behaviour, or how an embodied agent can maximise the exploitation of its morphology within its environment. In this work we derive two concepts of measuring morphological computation, and we discuss their relation to the Information Bottleneck Method. The first concepts asks how much the world contributes to the overall behaviour and the second concept asks how much the agent's action contributes to a behaviour. Various measures are derived from the concepts and validated in two experiments which highlight their strengths and weaknesses.

preprint2012arXiv

Robustness, Canalyzing Functions and Systems Design

We study a notion of robustness of a Markov kernel that describes a system of several input random variables and one output random variable. Robustness requires that the behaviour of the system does not change if one or several of the input variables are knocked out. If the system is required to be robust against too many knockouts, then the output variable cannot distinguish reliably between input states and must be independent of the input. We study how many input states the output variable can distinguish as a function of the required level of robustness. Gibbs potentials allow a mechanistic description of the behaviour of the system after knockouts. Robustness imposes structural constraints on these potentials. We show that interaction families of Gibbs potentials allow to describe robust systems. Given a distribution of the input random variables and the Markov kernel describing the system, we obtain a joint probability distribution. Robustness implies a number of conditional independence statements for this joint distribution. The set of all probability distributions corresponding to robust systems can be decomposed into a finite union of components, and we find parametrizations of the components. The decomposition corresponds to a primary decomposition of the conditional independence ideal and can be derived from more general results about generalized binomial edge ideals.

preprint2011arXiv

Effective complexity of stationary process realizations

The concept of effective complexity of an object as the minimal description length of its regularities has been initiated by Gell-Mann and Lloyd. The regularities are modeled by means of ensembles, that is probability distributions on finite binary strings. In our previous paper we propose a definition of effective complexity in precise terms of algorithmic information theory. Here we investigate the effective complexity of binary strings generated by stationary, in general not computable, processes. We show that under not too strong conditions long typical process realizations are effectively simple. Our results become most transparent in the context of coarse effective complexity which is a modification of the original notion of effective complexity that uses less parameters in its definition. A similar modification of the related concept of sophistication has been suggested by Antunes and Fortnow.

preprint2011arXiv

Process Dimension of Classical and Non-Commutative Processes

We treat observable operator models (OOM) and their non-commutative generalisation, which we call NC-OOMs. A natural characteristic of a stochastic process in the context of classical OOM theory is the process dimension. We investigate its properties within the more general formulation, which allows to consider process dimension as a measure of complexity of non-commutative processes: We prove lower semi-continuity, and derive an ergodic decomposition formula. Further, we obtain results on the close relationship between the canonical OOM and the concept of causal states which underlies the definition of statistical complexity. In particular, the topological statistical complexity, i.e. the logarithm of the number of causal states, turns out to be an upper bound to the logarithm of process dimension.

preprint2011arXiv

Robustness and Conditional Independence Ideals

We study notions of robustness of Markov kernels and probability distribution of a system that is described by $n$ input random variables and one output random variable. Markov kernels can be expanded in a series of potentials that allow to describe the system's behaviour after knockouts. Robustness imposes structural constraints on these potentials. Robustness of probability distributions is defined via conditional independence statements. These statements can be studied algebraically. The corresponding conditional independence ideals are related to binary edge ideals. The set of robust probability distributions lies on an algebraic variety. We compute a Gröbner basis of this ideal and study the irreducible decomposition of the variety. These algebraic results allow to parametrize the set of all robust probability distributions.

preprint2011arXiv

Support Sets in Exponential Families and Oriented Matroid Theory

The closure of a discrete exponential family is described by a finite set of equations corresponding to the circuits of an underlying oriented matroid. These equations are similar to the equations used in algebraic statistics, although they need not be polynomial in the general case. This description allows for a combinatorial study of the possible support sets in the closure of an exponential family. If two exponential families induce the same oriented matroid, then their closures have the same support sets. Furthermore, the positive cocircuits give a parameterization of the closure of the exponential family.

preprint2010arXiv

Higher coordination with less control - A result of information maximization in the sensorimotor loop

This work presents a novel learning method in the context of embodied artificial intelligence and self-organization, which has as few assumptions and restrictions as possible about the world and the underlying model. The learning rule is derived from the principle of maximizing the predictive information in the sensorimotor loop. It is evaluated on robot chains of varying length with individually controlled, non-communicating segments. The comparison of the results shows that maximizing the predictive information per wheel leads to a higher coordinated behavior of the physically connected robots compared to a maximization per robot. Another focus of this paper is the analysis of the effect of the robot chain length on the overall behavior of the robots. It will be shown that longer chains with less capable controllers outperform those of shorter length and more complex controllers. The reason is found and discussed in the information-geometric interpretation of the learning process.

preprint2010arXiv

Information-theoretic inference of common ancestors

A directed acyclic graph (DAG) partially represents the conditional independence structure among observations of a system if the local Markov condition holds, that is, if every variable is independent of its non-descendants given its parents. In general, there is a whole class of DAGs that represents a given set of conditional independence relations. We are interested in properties of this class that can be derived from observations of a subsystem only. To this end, we prove an information theoretic inequality that allows for the inference of common ancestors of observed parts in any DAG representing some unknown larger system. More explicitly, we show that a large amount of dependence in terms of mutual information among the observations implies the existence of a common ancestor that distributes this information. Within the causal interpretation of DAGs our result can be seen as a quantitative extension of Reichenbach's Principle of Common Cause to more than two variables. Our conclusions are valid also for non-probabilistic observations such as binary strings, since we state the proof for an axiomatized notion of mutual information that includes the stochastic as well as the algorithmic version.

preprint2010arXiv

Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines

We improve recently published results about resources of Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) required to make them Universal Approximators. We show that any distribution p on the set of binary vectors of length n can be arbitrarily well approximated by an RBM with k-1 hidden units, where k is the minimal number of pairs of binary vectors differing in only one entry such that their union contains the support set of p. In important cases this number is half of the cardinality of the support set of p. We construct a DBN with 2^n/2(n-b), b ~ log(n), hidden layers of width n that is capable of approximating any distribution on {0,1}^n arbitrarily well. This confirms a conjecture presented by Le Roux and Bengio 2010.

preprint2009arXiv

Quantifying structure in networks

We investigate exponential families of random graph distributions as a framework for systematic quantification of structure in networks. In this paper we restrict ourselves to undirected unlabeled graphs. For these graphs, the counts of subgraphs with no more than k links are a sufficient statistics for the exponential families of graphs with interactions between at most k links. In this framework we investigate the dependencies between several observables commonly used to quantify structure in networks, such as the degree distribution, cluster and assortativity coefficients.

preprint2008arXiv

Complexity Measures from Interaction Structures

We evaluate new complexity measures on the symbolic dynamics of coupled tent maps and cellular automata. These measures quantify complexity in terms of $k$-th order statistical dependencies that cannot be reduced to interactions between $k-1$ units. We demonstrate that these measures are able to identify complex dynamical regimes.

preprint2008arXiv

Effective Complexity and its Relation to Logical Depth

Effective complexity measures the information content of the regularities of an object. It has been introduced by M. Gell-Mann and S. Lloyd to avoid some of the disadvantages of Kolmogorov complexity, also known as algorithmic information content. In this paper, we give a precise formal definition of effective complexity and rigorous proofs of its basic properties. In particular, we show that incompressible binary strings are effectively simple, and we prove the existence of strings that have effective complexity close to their lengths. Furthermore, we show that effective complexity is related to Bennett's logical depth: If the effective complexity of a string $x$ exceeds a certain explicit threshold then that string must have astronomically large depth; otherwise, the depth can be arbitrarily small.

preprint2008arXiv

Hierarchical Models, Marginal Polytopes, and Linear Codes

In this paper, we explore a connection between binary hierarchical models, their marginal polytopes and codeword polytopes, the convex hulls of linear codes. The class of linear codes that are realizable by hierarchical models is determined. We classify all full dimensional polytopes with the property that their vertices form a linear code and give an algorithm that determines them.

preprint2007arXiv

Maximizing Multi-Information

Stochastic interdependence of a probablility distribution on a product space is measured by its Kullback-Leibler distance from the exponential family of product distributions (called multi-information). Here we investigate low-dimensional exponential families that contain the maximizers of stochastic interdependence in their closure. Based on a detailed description of the structure of probablility distributions with globally maximal multi-information we obtain our main result: The exponential family of pure pair-interactions contains all global maximizers of the multi-information in its closure.

Nihat Ay

What is connected

Connect this record

See the researcher in context

Building this map preview

36 published item(s)

Invariance Properties of the Natural Gradient in Overparametrised Systems

Natural Reweighted Wake-Sleep

Complexity as Causal Information Integration

Confounding Ghost Channels and Causality: A New Approach to Causal Information Flows

On the Locality of the Natural Gradient for Deep Learning

Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes

Information Theoretically Aided Reinforcement Learning for Embodied Agents

Iterative Scaling Algorithm for Channels

The Umwelt of an Embodied Agent -- A Measure-Theoretic Definition

Evaluating Morphological Computation in Muscle and DC-motor Driven Models of Human Hopping

Geometry and Expressive Power of Conditional Restricted Boltzmann Machines

Hierarchical Quantification of Synergy in Channels

Maximizing the divergence from a hierarchical model of quantum states

A Theory of Cheap Control in Embodied Systems

Expressive Power and Approximation Errors of Restricted Boltzmann Machines

On the Fisher Metric of Conditional Probability Polytopes

Quantifying unique information

The Information Theory of Individuality

Information driven self-organization of complex robotic behaviors

Information geometry and sufficient statistics

Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis

Maximal Information Divergence from Statistical Models defined by Neural Networks

Quantifying Morphological Computation

Robustness, Canalyzing Functions and Systems Design

Effective complexity of stationary process realizations

Process Dimension of Classical and Non-Commutative Processes

Robustness and Conditional Independence Ideals

Support Sets in Exponential Families and Oriented Matroid Theory

Higher coordination with less control - A result of information maximization in the sensorimotor loop

Information-theoretic inference of common ancestors

Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines

Quantifying structure in networks

Complexity Measures from Interaction Structures

Effective Complexity and its Relation to Logical Depth

Hierarchical Models, Marginal Polytopes, and Linear Codes

Maximizing Multi-Information