Source author record

Georg Martius

Georg Martius appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

22works

20topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Scaling Laws and Tradeoffs in Recurrent Networks of Expressive Neurons

Cortical neurons are complex, multi-timescale processors wired into recurrent circuits, shaped by long evolutionary pressure under stringent biological constraints. Mainstream machine learning, by contrast, predominantly builds models from extremely simple units, a default inherited from early neural-network theory. We treat this as a normative architectural question. How should one split a fixed parameter budget $P$ between the number of units $N$, per-unit effective complexity $k_e$, and per-unit connectivity $k_c$? What controls the optimal allocation? This calls for a model in which per-unit complexity can be tuned independently of width and connectivity. Accordingly, we introduce the ELM Network, whose recurrent layer is built from Expressive Leaky Memory (ELM) neurons, chosen to mirror functional components of cortical neurons. The architecture allows for individually adjusting $N$, $k_e$, and $k_c$ and trains stably across orders of magnitude in scale. We evaluate the model on two qualitatively different sequence benchmarks: the neuromorphic SHD-Adding task and Enwik8 character-level language modeling. Performance improves monotonically along each of the three axes individually. Under a fixed budget, a clear non-trivial optimum emerges in their tradeoff, and larger budgets favor both more and more complex neurons. A closed-form information-theoretic model captures these tradeoffs and attributes the diminishing returns at two ends to: per-neuron signal-to-noise saturation and across-neuron redundancy. A hyperparameter sweep spanning three orders of magnitude in trainable parameters traces a near-Pareto-frontier scaling law consistent with the framework. This suggests that the simple-unit default in ML is not obviously optimal once this tradeoff surface is probed, and offers a normative lens on cortex's reliance on complex spatio-temporal integrators.

preprint2023arXiv

Learning with Muscles: Benefits for Data-Efficiency and Robustness in Anthropomorphic Tasks

Humans are able to outperform robots in terms of robustness, versatility, and learning of new tasks in a wide variety of movements. We hypothesize that highly nonlinear muscle dynamics play a large role in providing inherent stability, which is favorable to learning. While recent advances have been made in applying modern learning techniques to muscle-actuated systems both in simulation as well as in robotics, so far, no detailed analysis has been performed to show the benefits of muscles when learning from scratch. Our study closes this gap and showcases the potential of muscle actuators for core robotics challenges in terms of data-efficiency, hyperparameter sensitivity, and robustness.

preprint2022arXiv

Developing hierarchical anticipations via neural network-based event segmentation

Humans can make predictions on various time scales and hierarchical levels. Thereby, the learning of event encodings seems to play a crucial role. In this work we model the development of hierarchical predictions via autonomously learned latent event codes. We present a hierarchical recurrent neural network architecture, whose inductive learning biases foster the development of sparsely changing latent state that compress sensorimotor sequences. A higher level network learns to predict the situations in which the latent states tend to change. Using a simulated robotic manipulator, we demonstrate that the system (i) learns latent states that accurately reflect the event structure of the data, (ii) develops meaningful temporal abstract predictions on the higher level, and (iii) generates goal-anticipatory behavior similar to gaze behavior found in eye-tracking studies with infants. The architecture offers a step towards the autonomous learning of compressed hierarchical encodings of gathered experiences and the exploitation of these encodings to generate adaptive behavior.

preprint2022arXiv

Inferring Markovian quantum master equations of few-body observables in interacting spin chains

Full information about a many-body quantum system is usually out-of-reach due to the exponential growth -- with the size of the system -- of the number of parameters needed to encode its state. Nonetheless, in order to understand the complex phenomenology that can be observed in these systems, it is often sufficient to consider dynamical or stationary properties of local observables or, at most, of few-body correlation functions. These quantities are typically studied by singling out a specific subsystem of interest and regarding the remainder of the many-body system as an effective bath. In the simplest scenario, the subsystem dynamics, which is in fact an open quantum dynamics, can be approximated through Markovian quantum master equations. Here, we formulate the problem of finding the generator of the subsystem dynamics as a variational problem, which we solve using the standard toolbox of machine learning for optimization. This dynamical or ``Lindblad" generator provides the relevant dynamical parameters for the subsystem of interest. Importantly, the algorithm we develop is constructed such that the learned generator implements a physically consistent open quantum time-evolution. We exploit this to learn the generator of the dynamics of a subsystem of a many-body system subject to a unitary quantum dynamics. We explore the capability of our method to recover the time-evolution of a two-body subsystem and exploit the physical consistency of the generator to make predictions on the stationary state of the subsystem dynamics.

preprint2022arXiv

On the Pitfalls of Heteroscedastic Uncertainty Estimation with Probabilistic Neural Networks

Capturing aleatoric uncertainty is a critical part of many machine learning systems. In deep learning, a common approach to this end is to train a neural network to estimate the parameters of a heteroscedastic Gaussian distribution by maximizing the logarithm of the likelihood function under the observed data. In this work, we examine this approach and identify potential hazards associated with the use of log-likelihood in conjunction with gradient-based optimizers. First, we present a synthetic example illustrating how this approach can lead to very poor but stable parameter estimates. Second, we identify the culprit to be the log-likelihood loss, along with certain conditions that exacerbate the issue. Third, we present an alternative formulation, termed $β$-NLL, in which each data point's contribution to the loss is weighted by the $β$-exponentiated variance estimate. We show that using an appropriate $β$ largely mitigates the issue in our illustrative example. Fourth, we evaluate this approach on a range of domains and tasks and show that it achieves considerable improvements and performs more robustly concerning hyperparameters, both in predictive RMSE and log-likelihood criteria.

preprint2022arXiv

Planning from Pixels in Environments with Combinatorially Hard Search Spaces

The ability to form complex plans based on raw visual input is a litmus test for current capabilities of artificial intelligence, as it requires a seamless combination of visual processing and abstract algorithmic execution, two traditionally separate areas of computer science. A recent surge of interest in this field brought advances that yield good performance in tasks ranging from arcade games to continuous control; these methods however do not come without significant issues, such as limited generalization capabilities and difficulties when dealing with combinatorially hard planning instances. Our contribution is two-fold: (i) we present a method that learns to represent its environment as a latent graph and leverages state reidentification to reduce the complexity of finding a good policy from exponential to linear (ii) we introduce a set of lightweight environments with an underlying discrete combinatorial structure in which planning is challenging even for humans. Moreover, we show that our methods achieves strong empirical generalization to variations in the environment, even across highly disadvantaged regimes, such as "one-shot" planning, or in an offline RL paradigm which only provides low-quality trajectories.

preprint2022arXiv

Self-supervised Reinforcement Learning with Independently Controllable Subgoals

To successfully tackle challenging manipulation tasks, autonomous agents must learn a diverse set of skills and how to combine them. Recently, self-supervised agents that set their own abstract goals by exploiting the discovered structure in the environment were shown to perform well on many different tasks. In particular, some of them were applied to learn basic manipulation skills in compositional multi-object environments. However, these methods learn skills without taking the dependencies between objects into account. Thus, the learned skills are difficult to combine in realistic environments. We propose a novel self-supervised agent that estimates relations between environment components and uses them to independently control different parts of the environment state. In addition, the estimated relations between objects can be used to decompose a complex goal into a compatible sequence of subgoals. We show that, by using this framework, an agent can efficiently and automatically learn manipulation tasks in multi-object environments with different relations between objects.

preprint2022arXiv

Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains

A common approach to prediction and planning in partially observable domains is to use recurrent neural networks (RNNs), which ideally develop and maintain a latent memory about hidden, task-relevant factors. We hypothesize that many of these hidden factors in the physical world are constant over time, changing only sparsely. To study this hypothesis, we propose Gated $L_0$ Regularized Dynamics (GateL0RD), a novel recurrent architecture that incorporates the inductive bias to maintain stable, sparsely changing latent states. The bias is implemented by means of a novel internal gating function and a penalty on the $L_0$ norm of latent state changes. We demonstrate that GateL0RD can compete with or outperform state-of-the-art RNNs in a variety of partially observable prediction and control tasks. GateL0RD tends to encode the underlying generative factors of the environment, ignores spurious temporal dependencies, and generalizes better, improving sampling efficiency and overall performance in model-based planning and reinforcement learning tasks. Moreover, we show that the developing latent states can be easily interpreted, which is a step towards better explainability in RNNs.

preprint2021arXiv

Demystifying Inductive Biases for $β$-VAE Based Architectures

The performance of $β$-Variational-Autoencoders ($β$-VAEs) and their variants on learning semantically meaningful, disentangled representations is unparalleled. On the other hand, there are theoretical arguments suggesting the impossibility of unsupervised disentanglement. In this work, we shed light on the inductive bias responsible for the success of VAE-based architectures. We show that in classical datasets the structure of variance, induced by the generating factors, is conveniently aligned with the latent directions fostered by the VAE objective. This builds the pivotal bias on which the disentangling abilities of VAEs rely. By small, elaborate perturbations of existing datasets, we hide the convenient correlation structure that is easily exploited by a variety of architectures. To demonstrate this, we construct modified versions of standard datasets in which (i) the generative factors are perfectly preserved; (ii) each image undergoes a mild transformation causing a small change of variance; (iii) the leading \textbf{VAE-based disentanglement architectures fail to produce disentangled representations whilst the performance of a non-variational method remains unchanged}. The construction of our modifications is nontrivial and relies on recent progress on mechanistic understanding of $β$-VAEs and their connection to PCA. We strengthen that connection by providing additional insights that are of stand-alone interest.

preprint2021arXiv

Neuro-algorithmic Policies enable Fast Combinatorial Generalization

Although model-based and model-free approaches to learning the control of systems have achieved impressive results on standard benchmarks, generalization to task variations is still lacking. Recent results suggest that generalization for standard architectures improves only after obtaining exhaustive amounts of data. We give evidence that generalization capabilities are in many cases bottlenecked by the inability to generalize on the combinatorial aspects of the problem. Furthermore, we show that for a certain subclass of the MDP framework, this can be alleviated by neuro-algorithmic architectures. Many control problems require long-term planning that is hard to solve generically with neural networks alone. We introduce a neuro-algorithmic policy architecture consisting of a neural network and an embedded time-dependent shortest path solver. These policies can be trained end-to-end by blackbox differentiation. We show that this type of architecture generalizes well to unseen variations in the environment already after seeing a few examples.

preprint2021arXiv

The dynamical regime and its importance for evolvability, task performance and generalization

It has long been hypothesized that operating close to the critical state is beneficial for natural and artificial systems. We test this hypothesis by evolving foraging agents controlled by neural networks that can change the system's dynamical regime throughout evolution. Surprisingly, we find that all populations, regardless of their initial regime, evolve to be subcritical in simple tasks and even strongly subcritical populations can reach comparable performance. We hypothesize that the moderately subcritical regime combines the benefits of generalizability and adaptability brought by closeness to criticality with the stability of the dynamics characteristic for subcritical systems. By a resilience analysis, we find that initially critical agents maintain their fitness level even under environmental changes and degrade slowly with increasing perturbation strength. On the other hand, subcritical agents originally evolved to the same fitness, were often rendered utterly inadequate and degraded faster. We conclude that although the subcritical regime is preferable for a simple task, the optimal deviation from criticality depends on the task difficulty: for harder tasks, agents evolve closer to criticality. Furthermore, subcritical populations cannot find the path to decrease their distance to criticality. In summary, our study suggests that initializing models near criticality is important to find an optimal and flexible solution.

preprint2020arXiv

Control What You Can: Intrinsically Motivated Task-Planning Agent

We present a novel intrinsically motivated agent that learns how to control the environment in the fastest possible manner by optimizing learning progress. It learns what can be controlled, how to allocate time and attention, and the relations between objects using surprise based motivation. The effectiveness of our method is demonstrated in a synthetic as well as a robotic manipulation environment yielding considerably improved performance and smaller sample complexity. In a nutshell, our work combines several task-level planning agent structures (backtracking search on task graph, probabilistic road-maps, allocation of search efforts) with intrinsic motivation to achieve learning from scratch.

preprint2020arXiv

Fast Non-Parametric Learning to Accelerate Mixed-Integer Programming for Online Hybrid Model Predictive Control

Today's fast linear algebra and numerical optimization tools have pushed the frontier of model predictive control (MPC) forward, to the efficient control of highly nonlinear and hybrid systems. The field of hybrid MPC has demonstrated that exact optimal control law can be computed, e.g., by mixed-integer programming (MIP) under piecewise-affine (PWA) system models. Despite the elegant theory, online solving hybrid MPC is still out of reach for many applications. We aim to speed up MIP by combining geometric insights from hybrid MPC, a simple-yet-effective learning algorithm, and MIP warm start techniques. Following a line of work in approximate explicit MPC, the proposed learning-control algorithm, LNMS, gains computational advantage over MIP at little cost and is straightforward for practitioners to implement.

preprint2020arXiv

Sample-efficient Cross-Entropy Method for Real-time Planning

Trajectory optimizers for model-based reinforcement learning, such as the Cross-Entropy Method (CEM), can yield compelling results even in high-dimensional control tasks and sparse-reward environments. However, their sampling inefficiency prevents them from being used for real-time planning and control. We propose an improved version of the CEM algorithm for fast planning, with novel additions including temporally-correlated actions and memory, requiring 2.7-22x less samples and yielding a performance increase of 1.2-10x in high-dimensional control problems.

preprint2020arXiv

Self-supervised Visual Reinforcement Learning with Object-centric Representations

Autonomous agents need large repertoires of skills to act reasonably on new tasks that they have not seen before. However, acquiring these skills using only a stream of high-dimensional, unstructured, and unlabeled observations is a tricky challenge for any autonomous agent. Previous methods have used variational autoencoders to encode a scene into a low-dimensional vector that can be used as a goal for an agent to discover new skills. Nevertheless, in compositional/multi-object environments it is difficult to disentangle all the factors of variation into such a fixed-length representation of the whole scene. We propose to use object-centric representations as a modular and structured observation space, which is learned with a compositional generative world model. We show that the structure in the representations in combination with goal-conditioned attention policies helps the autonomous agent to discover and learn useful skills. These skills can be further combined to address compositional tasks like the manipulation of several different objects.

preprint2019arXiv

Analytical classical density functionals from an equation learning network

We explore the feasibility of using machine learning methods to obtain an analytic form of the classical free energy functional for two model fluids, hard rods and Lennard--Jones, in one dimension . The Equation Learning Network proposed in Ref. 1 is suitably modified to construct free energy densities which are functions of a set of weighted densities and which are built from a small number of basis functions with flexible combination rules. This setup considerably enlarges the functional space used in the machine learning optimization as compared to previous work 2 where the functional is limited to a simple polynomial form. As a result, we find a good approximation for the exact hard rod functional and its direct correlation function. For the Lennard--Jones fluid, we let the network learn (i) the full excess free energy functional and (ii) the excess free energy functional related to interparticle attractions. Both functionals show a good agreement with simulated density profiles for thermodynamic parameters inside and outside the training region.

preprint2016arXiv

Extrapolation and learning equations

In classical machine learning, regression is treated as a black box process of identifying a suitable function from a hypothesis set without attempting to gain insight into the mechanism connecting inputs and outputs. In the natural sciences, however, finding an interpretable function for a phenomenon is the prime goal as it allows to understand and generalize results. This paper proposes a novel type of function learning network, called equation learner (EQL), that can learn analytical expressions and is able to extrapolate to unseen domains. It is implemented as an end-to-end differentiable feed-forward network and allows for efficient gradient based training. Due to sparsity regularization concise interpretable expressions can be obtained. Often the true underlying source expression is identified.

preprint2016arXiv

Self-organized control for musculoskeletal robots

With the accelerated development of robot technologies, optimal control becomes one of the central themes of research. In traditional approaches, the controller, by its internal functionality, finds appropriate actions on the basis of the history of sensor values, guided by the goals, intentions, objectives, learning schemes, and so on planted into it. The idea is that the controller controls the world---the body plus its environment---as reliably as possible. However, in elastically actuated robots this approach faces severe difficulties. This paper advocates for a new paradigm of self-organized control. The paper presents a solution with a controller that is devoid of any functionalities of its own, given by a fixed, explicit and context-free function of the recent history of the sensor values. When applying this controller to a muscle-tendon driven arm-shoulder system from the Myorobotics toolkit, we observe a vast variety of self-organized behavior patterns: when left alone, the arm realizes pseudo-random sequences of different poses but one can also manipulate the system into definite motion patterns. But most interestingly, after attaching an object, the controller gets in a functional resonance with the object's internal dynamics: when given a half-filled bottle, the system spontaneously starts shaking the bottle so that maximum response from the dynamics of the water is being generated. After attaching a pendulum to the arm, the controller drives the pendulum into a circular mode. In this way, the robot discovers dynamical affordances of objects its body is interacting with. We also discuss perspectives for using this controller paradigm for intention driven behavior generation.

preprint2015arXiv

A novel plasticity rule can explain the development of sensorimotor intelligence

Grounding autonomous behavior in the nervous system is a fundamental challenge for neuroscience. In particular, the self-organized behavioral development provides more questions than answers. Are there special functional units for curiosity, motivation, and creativity? This paper argues that these features can be grounded in synaptic plasticity itself, without requiring any higher level constructs. We propose differential extrinsic plasticity (DEP) as a new synaptic rule for self-learning systems and apply it to a number of complex robotic systems as a test case. Without specifying any purpose or goal, seemingly purposeful and adaptive behavior is developed, displaying a certain level of sensorimotor intelligence. These surprising results require no system specific modifications of the DEP rule but arise rather from the underlying mechanism of spontaneous symmetry breaking due to the tight brain-body-environment coupling. The new synaptic rule is biologically plausible and it would be an interesting target for a neurobiolocal investigation. We also argue that this neuronal mechanism may have been a catalyst in natural evolution.

preprint2015arXiv

Quantifying Emergent Behavior of Autonomous Robots

Quantifying behaviors of robots which were generated autonomously from task-independent objective functions is an important prerequisite for objective comparisons of algorithms and movements of animals. The temporal sequence of such a behavior can be considered as a time series and hence complexity measures developed for time series are natural candidates for its quantification. The predictive information and the excess entropy are such complexity measures. They measure the amount of information the past contains about the future and thus quantify the nonrandom structure in the temporal sequence. However, when using these measures for systems with continuous states one has to deal with the fact that their values will depend on the resolution with which the systems states are observed. For deterministic systems both measures will diverge with increasing resolution. We therefore propose a new decomposition of the excess entropy in resolution dependent and resolution independent parts and discuss how they depend on the dimensionality of the dynamics, correlations and the noise level. For the practical estimation we propose to use estimates based on the correlation integral instead of the direct estimation of the mutual information using the algorithm by Kraskov et al. (2004) which is based on next neighbor statistics because the latter allows less control of the scale dependencies. Using our algorithm we are able to show how autonomous learning generates behavior of increasing complexity with increasing learning duration.

preprint2013arXiv

Information driven self-organization of complex robotic behaviors

Information theory is a powerful tool to express principles to drive autonomous systems because it is domain invariant and allows for an intuitive interpretation. This paper studies the use of the predictive information (PI), also called excess entropy or effective measure complexity, of the sensorimotor process as a driving force to generate behavior. We study nonlinear and nonstationary systems and introduce the time-local predicting information (TiPI) which allows us to derive exact results together with explicit update rules for the parameters of the controller in the dynamical systems framework. In this way the information principle, formulated at the level of behavior, is translated to the dynamics of the synapses. We underpin our results with a number of case studies with high-dimensional robotic systems. We show the spontaneous cooperativity in a complex physical system with decentralized control. Moreover, a jointly controlled humanoid robot develops a high behavioral variety depending on its physics and the environment it is dynamically embedded into. The behavior can be decomposed into a succession of low-dimensional modes that increasingly explore the behavior space. This is a promising way to avoid the curse of dimensionality which hinders learning systems to scale well.

preprint2013arXiv

Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis

One of the main challenges in the field of embodied artificial intelligence is the open-ended autonomous learning of complex behaviours. Our approach is to use task-independent, information-driven intrinsic motivation(s) to support task-dependent learning. The work presented here is a preliminary step in which we investigate the predictive information (the mutual information of the past and future of the sensor stream) as an intrinsic drive, ideally supporting any kind of task acquisition. Previous experiments have shown that the predictive information (PI) is a good candidate to support autonomous, open-ended learning of complex behaviours, because a maximisation of the PI corresponds to an exploration of morphology- and environment-dependent behavioural regularities. The idea is that these regularities can then be exploited in order to solve any given task. Three different experiments are presented and their results lead to the conclusion that the linear combination of the one-step PI with an external reward function is not generally recommended in an episodic policy gradient setting. Only for hard tasks a great speed-up can be achieved at the cost of an asymptotic performance lost.

Georg Martius

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

Scaling Laws and Tradeoffs in Recurrent Networks of Expressive Neurons

Learning with Muscles: Benefits for Data-Efficiency and Robustness in Anthropomorphic Tasks

Developing hierarchical anticipations via neural network-based event segmentation

Inferring Markovian quantum master equations of few-body observables in interacting spin chains

On the Pitfalls of Heteroscedastic Uncertainty Estimation with Probabilistic Neural Networks

Planning from Pixels in Environments with Combinatorially Hard Search Spaces

Self-supervised Reinforcement Learning with Independently Controllable Subgoals

Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains

Demystifying Inductive Biases for $β$-VAE Based Architectures

Neuro-algorithmic Policies enable Fast Combinatorial Generalization

The dynamical regime and its importance for evolvability, task performance and generalization

Control What You Can: Intrinsically Motivated Task-Planning Agent

Fast Non-Parametric Learning to Accelerate Mixed-Integer Programming for Online Hybrid Model Predictive Control

Sample-efficient Cross-Entropy Method for Real-time Planning

Self-supervised Visual Reinforcement Learning with Object-centric Representations

Analytical classical density functionals from an equation learning network

Extrapolation and learning equations

Self-organized control for musculoskeletal robots

A novel plasticity rule can explain the development of sensorimotor intelligence

Quantifying Emergent Behavior of Autonomous Robots

Information driven self-organization of complex robotic behaviors

Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis