Source author record

Shun-ichi Amari

Shun-ichi Amari appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning Neurons and Cognition cond-mat.stat-mech math.DG math.ST Statistics Theory Computer Vision cond-mat.dis-nn math.OC nlin.AO

Catalog footprint

What is connected

15works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Deep Learning in Random Neural Fields: Numerical Experiments via Neural Tangent Kernel

A biological neural network in the cortex forms a neural field. Neurons in the field have their own receptive fields, and connection weights between two neurons are random but highly correlated when they are in close proximity in receptive fields. In this paper, we investigate such neural fields in a multilayer architecture to investigate the supervised learning of the fields. We empirically compare the performances of our field model with those of randomly connected deep networks. The behavior of a randomly connected network is investigated on the basis of the key idea of the neural tangent kernel regime, a recent development in the machine learning theory of over-parameterized networks; for most randomly connected neural networks, it is shown that global minima always exist in their small neighborhoods. We numerically show that this claim also holds for our neural fields. In more detail, our model has two structures: i) each neuron in a field has a continuously distributed receptive field, and ii) the initial connection weights are random but not independent, having correlations when the positions of neurons are close in each layer. We show that such a multilayer neural field is more robust than conventional models when input patterns are deformed by noise disturbances. Moreover, its generalization ability can be slightly superior to that of conventional models.

preprint2020arXiv

Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective

It is known that any target function is realized in a sufficiently small neighborhood of any randomly connected deep network, provided the width (the number of neurons in a layer) is sufficiently large. There are sophisticated theories and discussions concerning this striking fact, but rigorous theories are very complicated. We give an elementary geometrical proof by using a simple model for the purpose of elucidating its structure. We show that high-dimensional geometry plays a magical role: When we project a high-dimensional sphere of radius 1 to a low-dimensional subspace, the uniform distribution over the sphere reduces to a Gaussian distribution of negligibly small covariances.

preprint2020arXiv

Unified framework for the entropy production and the stochastic interaction based on information geometry

We show a relationship between the entropy production in stochastic thermodynamics and the stochastic interaction in the information integrated theory. To clarify this relationship, we newly introduce an information geometric interpretation of the entropy production for a total system and the partial entropy productions for subsystems. We show that the violation of the additivity of the entropy productions is related to the stochastic interaction. This framework is a thermodynamic foundation of the integrated information theory. We also show that our information geometric formalism leads to a novel expression of the entropy production related to an optimization problem minimizing the Kullback-Leibler divergence. We analytically illustrate this interpretation by using the spin model.

preprint2020arXiv

Wasserstein statistics in 1D location-scale model

Wasserstein geometry and information geometry are two important structures introduced in a manifold of probability distributions. The former is defined by using the transportation cost between two distributions, so it reflects the metric structure of the base manifold on which distributions are defined. Information geometry is constructed based on the invariance criterion that the geometry is invariant under reversible transformations of the base space. Both have their own merits for applications. Statistical inference is constructed on information geometry, where the Fisher metric plays a fundamental role, whereas Wasserstein geometry is useful for applications to computer vision and AI. We propose statistical inference based on the Wasserstein geometry in the case that the base space is 1-dimensional. By using the location-scale model, we derive the $W$-estimator explicitly and studies its asymptotic behaviors.

preprint2020arXiv

Wasserstein Statistics in One-dimensional Location-Scale Model

Wasserstein geometry and information geometry are two important structures to be introduced in a manifold of probability distributions. Wasserstein geometry is defined by using the transportation cost between two distributions, so it reflects the metric of the base manifold on which the distributions are defined. Information geometry is defined to be invariant under reversible transformations of the base space. Both have their own merits for applications. In particular, statistical inference is based upon information geometry, where the Fisher metric plays a fundamental role, whereas Wasserstein geometry is useful in computer vision and AI applications. In this study, we analyze statistical inference based on the Wasserstein geometry in the case that the base space is one-dimensional. By using the location-scale model, we further derive the W-estimator that explicitly minimizes the transportation cost from the empirical distribution to a statistical model and study its asymptotic behaviors. We show that the W-estimator is consistent and explicitly give its asymptotic distribution by using the functional delta method. The W-estimator is Fisher efficient in the Gaussian case.

preprint2015arXiv

Bayesian Robust Tensor Factorization for Incomplete Multiway Data

We propose a generative model for robust tensor factorization in the presence of both missing data and outliers. The objective is to explicitly infer the underlying low-CP-rank tensor capturing the global information and a sparse tensor capturing the local information (also considered as outliers), thus providing the robust predictive distribution over missing entries. The low-CP-rank tensor is modeled by multilinear interactions between multiple latent factors on which the column sparsity is enforced by a hierarchical prior, while the sparse tensor is modeled by a hierarchical view of Student-$t$ distribution that associates an individual hyperparameter with each element independently. For model learning, we develop an efficient closed-form variational inference under a fully Bayesian treatment, which can effectively prevent the overfitting problem and scales linearly with data size. In contrast to existing related works, our method can perform model selection automatically and implicitly without need of tuning parameters. More specifically, it can discover the groundtruth of CP rank and automatically adapt the sparsity inducing priors to various types of outliers. In addition, the tradeoff between the low-rank approximation and the sparse representation can be optimized in the sense of maximum model evidence. The extensive experiments and comparisons with many state-of-the-art algorithms on both synthetic and real-world datasets demonstrate the superiorities of our method from several perspectives.

preprint2015arXiv

Measuring integrated information from the decoding perspective

Accumulating evidence indicates that the capacity to integrate information in the brain is a prerequisite for consciousness. Integrated Information Theory (IIT) of consciousness provides a mathematical approach to quantifying the information integrated in a system, called integrated information, $Φ$. Integrated information is defined theoretically as the amount of information a system generates as a whole, above and beyond the sum of the amount of information its parts independently generate. IIT predicts that the amount of integrated information in the brain should reflect levels of consciousness. Empirical evaluation of this theory requires computing integrated information from neural data acquired from experiments, although difficulties with using the original measure $Φ$ precludes such computations. Although some practical measures have been previously proposed, we found that these measures fail to satisfy the theoretical requirements as a measure of integrated information. Measures of integrated information should satisfy the lower and upper bounds as follows: The lower bound of integrated information should be 0 when the system does not generate information (no information) or when the system comprises independent parts (no integration). The upper bound of integrated information is the amount of information generated by the whole system and is realized when the amount of information generated independently by its parts equals to 0. Here we derive the novel practical measure $Φ^*$ by introducing a concept of mismatched decoding developed from information theory. We show that $Φ^*$ is properly bounded from below and above, as required, as a measure of integrated information. We derive the analytical expression $Φ^*$ under the Gaussian assumption, which makes it readily applicable to experimental data.

preprint2015arXiv

Microscopic instability in recurrent neural networks

In a manner similar to the molecular chaos that underlies the stable thermodynamics of gases, neuronal system may exhibit microscopic instability in individual neuronal dynamics while a macroscopic order of the entire population possibly remains stable. In this study, we analyze the microscopic stability of a network of neurons whose macroscopic activity obeys stable dynamics, expressing either monostable, bistable, or periodic state. We reveal that the network exhibits a variety of dynamical states for microscopic instability residing in given stable macroscopic dynamics. The presence of a variety of dynamical states in such a simple random network implies more abundant microscopic fluctuations in real neural networks, which consist of more complex and hierarchically structured interactions.

preprint2015arXiv

On conformal divergences and their population minimizers

Total Bregman divergences are a recent tweak of ordinary Bregman divergences originally motivated by applications that required invariance by rotations. They have displayed superior results compared to ordinary Bregman divergences on several clustering, computer vision, medical imaging and machine learning tasks. These preliminary results raise two important problems : First, report a complete characterization of the left and right population minimizers for this class of total Bregman divergences. Second, characterize a principled superset of total and ordinary Bregman divergences with good clustering properties, from which one could tailor the choice of a divergence to a particular application. In this paper, we provide and study one such superset with interesting geometric features, that we call conformal divergences, and focus on their left and right population minimizers. Our results are obtained in a recently coined $(u, v)$-geometric structure that is a generalization of the dually flat affine connections in information geometry. We characterize both analytically and geometrically the population minimizers. We prove that conformal divergences (resp. total Bregman divergences) are essentially exhaustive for their left (resp. right) population minimizers. We further report new results and extend previous results on the robustness to outliers of the left and right population minimizers, and discuss the role of the $(u, v)$-geometric structure in clustering. Additional results are also given.

preprint2013arXiv

Achieving Precise Mechanical Control in Intrinsically Noisy Systems

How can precise control be realised in intrinsically noisy systems? Here, we develop a general theoretical framework that provides a way to achieve precise control in signal-dependent noisy environments. When the control signal has Poisson or supra-Poisson noise, precise control is not possible. If, however, the control signal has sub-Poisson noise, then precise control is possible. For this case, the precise control solution is not a function, but a rapidly varying random process that must be averaged with respect to a governing probability density functional. Our theoretical approach is applied to the control of straight-trajectory arm movement. Sub-Poisson noise in the control signal is shown to be capable of leading to precise control. Intriguingly, the control signal for this system has a natural counterpart, namely the bursting pulses of neurons --trains of Dirac-delta functions-- in biological systems to achieve precise control performance.

preprint2013arXiv

Curvature of Hessian Manfiolds

We prove that, in dimensions greater than 2, the generic metric is not a Hessian metric and find a curvature condition on Hessian metrics in dimensions greater than 3. In particular we prove that the forms used to define the Pontryagin classes in terms of the curvature vanish on a Hessian manifold. By contrast all analytic Riemannian 2-metrics are Hessian metrics.

preprint2013arXiv

Lp-Regularized Least Squares (0<p<1) and Critical Path

The least squares problem is formulated in terms of Lp quasi-norm regularization (0<p<1). Two formulations are considered: (i) an Lp-constrained optimization and (ii) an Lp-penalized (unconstrained) optimization. Due to the nonconvexity of the Lp quasi-norm, the solution paths of the regularized least squares problem are not ensured to be continuous. A critical path, which is a maximal continuous curve consisting of critical points, is therefore considered separately. The critical paths are piecewise smooth, as can be seen from the viewpoint of the variational method, and generally contain non-optimal points such as saddle points and local maxima as well as global/local minima. Along each critical path, the correspondence between the regularization parameters (which govern the 'strength' of regularization in the two formulations) is non-monotonic and, more specifically, it has multiplicity. Two paths of critical points connecting the origin and an ordinary least squares (OLS) solution are highlighted. One is a main path starting at an OLS solution, and the other is a greedy path starting at the origin. Part of the greedy path can be constructed with a generalized Minkowskian gradient. The breakpoints of the greedy path coincide with the step-by-step solutions generated by using orthogonal matching pursuit (OMP), thereby establishing a direct link between OMP and Lp-regularized least squares.

preprint2013arXiv

State Concentration Exponent as a Measure of Quickness in Kauffman-type Networks

We study the dynamics of randomly connected networks composed of binary Boolean elements and those composed of binary majority vote elements. We elucidate their differences in both sparsely and densely connected cases. The quickness of large network dynamics is usually quantified by the length of transient paths, an analytically intractable measure. For discrete-time dynamics of networks of binary elements, we address this dilemma with an alternative unified framework by using a concept termed state concentration, defined as the exponent of the average number of t-step ancestors in state transition graphs. The state transition graph is defined by nodes corresponding to network states and directed links corresponding to transitions. Using this exponent, we interrogate the dynamics of random Boolean and majority vote networks. We find that extremely sparse Boolean networks and majority vote networks with arbitrary density achieve quickness, owing in part to long-tailed in-degree distributions. As a corollary, only relatively dense majority vote networks can achieve both quickness and robustness.

preprint2010arXiv

Dually flat structure with escort probability and its application to alpha-Voronoi diagrams

This paper studies geometrical structure of the manifold of escort probability distributions and shows its new applicability to information science. In order to realize escort probabilities we use a conformal transformation that flattens so-called alpha-geometry of the space of discrete probability distributions, which well characterizes nonadditive statistics on the space. As a result escort probabilities are proved to be flat coordinates of the usual probabilities for the derived dually flat structure. Finally, we demonstrate that escort probabilities with the new structure admits a simple algorithm to compute Voronoi diagrams and centroids with respect to alpha-divergences.

preprint2010arXiv

Modeling Basal Ganglia for understanding Parkinsonian Reaching Movements

We present a computational model that highlights the role of basal ganglia (BG) in generating simple reaching movements. The model is cast within the reinforcement learning (RL) framework with the correspondence between RL components and neuroanatomy as follows: dopamine signal of substantia nigra pars compacta as the Temporal Difference error, striatum as the substrate for the Critic, and the motor cortex as the Actor. A key feature of this neurobiological interpretation is our hypothesis that the indirect pathway is the Explorer. Chaotic activity, originating from the indirect pathway part of the model, drives the wandering, exploratory movements of the arm. Thus the direct pathway subserves exploitation while the indirect pathway subserves exploration. The motor cortex becomes more and more independent of the corrective influence of BG, as training progresses. Reaching trajectories show diminishing variability with training. Reaching movements associated with Parkinson's disease (PD) are simulated by (a) reducing dopamine and (b) degrading the complexity of indirect pathway dynamics by switching it from chaotic to periodic behavior. Under the simulated PD conditions, the arm exhibits PD motor symptoms like tremor, bradykinesia and undershoot. The model echoes the notion that PD is a dynamical disease.

Shun-ichi Amari

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Deep Learning in Random Neural Fields: Numerical Experiments via Neural Tangent Kernel

Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective

Unified framework for the entropy production and the stochastic interaction based on information geometry

Wasserstein statistics in 1D location-scale model

Wasserstein Statistics in One-dimensional Location-Scale Model

Bayesian Robust Tensor Factorization for Incomplete Multiway Data

Measuring integrated information from the decoding perspective

Microscopic instability in recurrent neural networks

On conformal divergences and their population minimizers

Achieving Precise Mechanical Control in Intrinsically Noisy Systems

Curvature of Hessian Manfiolds

Lp-Regularized Least Squares (0<p<1) and Critical Path

State Concentration Exponent as a Measure of Quickness in Kauffman-type Networks

Dually flat structure with escort probability and its application to alpha-Voronoi diagrams

Modeling Basal Ganglia for understanding Parkinsonian Reaching Movements