Source author record

Naomi Ehrich Leonard

Naomi Ehrich Leonard appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Systems and Control Machine Learning Multiagent Systems math.DS eess.SY nlin.AO physics.soc-ph Populations and Evolution Quantitative Methods Social and Information Networks Biological Physics math.PR Neurons and Cognition q-fin.MF Robotics

Catalog footprint

What is connected

24works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Breaking indecision in multi-agent, multi-option dynamics

How does a group of agents break indecision when deciding about options with qualities that are hard to distinguish? Biological and artificial multi-agent systems, from honeybees and bird flocks to bacteria, robots, and humans, often need to overcome indecision when choosing among options in situations in which the performance or even the survival of the group are at stake. Breaking indecision is also important because in a fully indecisive state agents are not biased toward any specific option and therefore the agent group is maximally sensitive and prone to adapt to inputs and changes in its environment. Here, we develop a mathematical theory to study how decisions arise from the breaking of indecision. Our approach is grounded in both equivariant and network bifurcation theory. We model decision from indecision as synchrony-breaking in influence networks in which each node is the value assigned by an agent to an option. First, we show that three universal decision behaviors, namely, deadlock, consensus, and dissensus, are the generic outcomes of synchrony-breaking bifurcations from a fully synchronous state of indecision in influence networks. Second, we show that all deadlock and consensus value patterns and some dissensus value patterns are predicted by the symmetry of the influence networks. Third, we show that there are also many `exotic' dissensus value patterns. These patterns are predicted by network architecture, but not by network symmetries, through a new synchrony-breaking branching lemma. This is the first example of exotic solutions in an application. Numerical simulations of a novel influence network model illustrate our theoretical results.

preprint2022arXiv

Decentralized Learning With Limited Communications for Multi-robot Coverage of Unknown Spatial Fields

This paper presents an algorithm for a team of mobile robots to simultaneously learn a spatial field over a domain and spatially distribute themselves to optimally cover it. Drawing from previous approaches that estimate the spatial field through a centralized Gaussian process, this work leverages the spatial structure of the coverage problem and presents a decentralized strategy where samples are aggregated locally by establishing communications through the boundaries of a Voronoi partition. We present an algorithm whereby each robot runs a local Gaussian process calculated from its own measurements and those provided by its Voronoi neighbors, which are incorporated into the individual robot's Gaussian process only if they provide sufficiently novel information. The performance of the algorithm is evaluated in simulation and compared with centralized approaches.

preprint2022arXiv

On Using Hamiltonian Monte Carlo Sampling for Reinforcement Learning Problems in High-dimension

Value function based reinforcement learning (RL) algorithms, for example, $Q$-learning, learn optimal policies from datasets of actions, rewards, and state transitions. However, when the underlying state transition dynamics are stochastic and evolve on a high-dimensional space, generating independent and identically distributed (IID) data samples for creating these datasets poses a significant challenge due to the intractability of the associated normalizing integral. In these scenarios, Hamiltonian Monte Carlo (HMC) sampling offers a computationally tractable way to generate data for training RL algorithms. In this paper, we introduce a framework, called \textit{Hamiltonian $Q$-Learning}, that demonstrates, both theoretically and empirically, that $Q$ values can be learned from a dataset generated by HMC samples of actions, rewards, and state transitions. Furthermore, to exploit the underlying low-rank structure of the $Q$ function, Hamiltonian $Q$-Learning uses a matrix completion algorithm for reconstructing the updated $Q$ function from $Q$ value updates over a much smaller subset of state-action pairs. Thus, by providing an efficient way to apply $Q$-learning in stochastic, high-dimensional settings, the proposed approach broadens the scope of RL algorithms for real-world applications.

preprint2022arXiv

Provably Efficient Multi-Agent Reinforcement Learning with Fully Decentralized Communication

A challenge in reinforcement learning (RL) is minimizing the cost of sampling associated with exploration. Distributed exploration reduces sampling complexity in multi-agent RL (MARL). We investigate the benefits to performance in MARL when exploration is fully decentralized. Specifically, we consider a class of online, episodic, tabular $Q$-learning problems under time-varying reward and transition dynamics, in which agents can communicate in a decentralized manner.We show that group performance, as measured by the bound on regret, can be significantly improved through communication when each agent uses a decentralized message-passing protocol, even when limited to sending information up to its $γ$-hop neighbors. We prove regret and sample complexity bounds that depend on the number of agents, communication network structure and $γ.$ We show that incorporating more agents and more information sharing into the group learning scheme speeds up convergence to the optimal policy. Numerical simulations illustrate our results and validate our theoretical claims.

preprint2022arXiv

Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and Control

Recent approaches for modelling dynamics of physical systems with neural networks enforce Lagrangian or Hamiltonian structure to improve prediction and generalization. However, when coordinates are embedded in high-dimensional data such as images, these approaches either lose interpretability or can only be applied to one particular example. We introduce a new unsupervised neural network model that learns Lagrangian dynamics from images, with interpretability that benefits prediction and control. The model infers Lagrangian dynamics on generalized coordinates that are simultaneously learned with a coordinate-aware variational autoencoder (VAE). The VAE is designed to account for the geometry of physical systems composed of multiple rigid bodies in the plane. By inferring interpretable Lagrangian dynamics, the model learns physical system properties, such as kinetic and potential energy, which enables long-term prediction of dynamics in the image space and synthesis of energy-based controllers.

preprint2020arXiv

A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem

We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors under a linear observation cost. Neighbors are defined by a network graph that encodes the inherent observation constraints of the system. We define a cost associated with observations such that at every instance an agent makes an observation it receives a constant observation regret. We design a sampling algorithm and an observation protocol for each agent to maximize its own expected cumulative reward through minimizing expected cumulative sampling regret and expected cumulative observation regret. For our proposed protocol, we prove that total cumulative regret is logarithmically bounded. We verify the accuracy of analytical bounds using numerical simulations.

preprint2020arXiv

A model-independent theory of consensus and dissensus decision making

We develop a model-independent framework to study the dynamics of decision-making in opinion networks for an arbitrary number of agents and an arbitrary number of options. Model-independence means that the analysis is not performed on a specific set of equations, in contrast to classical approaches to decision making that fix a specific model and analyze it. Rather, the general features of decision making in dynamical opinion networks can be derived starting from empirically testable hypotheses about the deciding agents, the available options, and the interactions among them. After translating these empirical hypotheses into algebraic ones, we use the tools of equivariant bifurcation theory to uncover model-independent properties of dynamical opinion networks. The model-independent results are illustrated on a novel analytical model that is constructed by plugging a generic sigmoidal nonlinearity, modeling boundedness of opinions and opinion perception, into the model-independent equivariant structure. Our analysis reveals richer and more flexible opinion-formation behavior as compared to model-dependent approaches. For instance, analysis reveals the possibility of switching between consensus and various forms of dissensus by modulation of the level of agent cooperativity and without requiring any particular ad-hoc interaction topology (e.g., structural balance). From a theoretical viewpoint, we prove new results in equivariant bifurcation theory. We construct an exhaustive list of axial subgroups for the action of $\ES_n \times \ES_3$ on $\R^{n-1}\otimes\R^{2}$. We also generalize this list to the action of $\ES_n \times \ES_k$ on $\R^{n-1}\otimes \R^{k-1}$, i.e., for $n$ agents and $k$ options, although without proving that in this case the list is exhaustive.

preprint2020arXiv

Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits

We study a distributed decision-making problem in which multiple agents face the same multi-armed bandit (MAB), and each agent makes sequential choices among arms to maximize its own individual reward. The agents cooperate by sharing their estimates over a fixed communication graph. We consider an unconstrained reward model in which two or more agents can choose the same arm and collect independent rewards. And we consider a constrained reward model in which agents that choose the same arm at the same time receive no reward. We design a dynamic, consensus-based, distributed estimation algorithm for cooperative estimation of mean rewards at each arm. We leverage the estimates from this algorithm to develop two distributed algorithms: coop-UCB2 and coop-UCB2-selective-learning, for the unconstrained and constrained reward models, respectively. We show that both algorithms achieve group performance close to the performance of a centralized fusion center. Further, we investigate the influence of the communication graph structure on performance. We propose a novel graph explore-exploit index that predicts the relative performance of groups in terms of the communication graph, and we propose a novel nodal explore-exploit centrality index that predicts the relative performance of agents in terms of the agent locations in the communication graph.

preprint2020arXiv

Distributed Learning: Sequential Decision Making in Resource-Constrained Environments

We study cost-effective communication strategies that can be used to improve the performance of distributed learning systems in resource-constrained environments. For distributed learning in sequential decision making, we propose a new cost-effective partial communication protocol. We illustrate that with this protocol the group obtains the same order of performance that it obtains with full communication. Moreover, we prove that under the proposed partial communication protocol the communication cost is $O(\log T)$, where $T$ is the time horizon of the decision-making process. This improves significantly on protocols with full communication, which incur a communication cost that is $O(T)$. We validate our theoretical results using numerical simulations.

preprint2020arXiv

Influence Spread in the Heterogeneous Multiplex Linear Threshold Model

The linear threshold model (LTM) has been used to study spread on single-layer networks defined by one inter-agent sensing modality and agents homogeneous in protocol. We define and analyze the heterogeneous multiplex LTM to study spread on multi-layer networks with each layer representing a different sensing modality and agents heterogeneous in protocol. Protocols are designed to distinguish signals from different layers: an agent becomes active if a sufficient number of its neighbors in each of any $a$ of the $m$ layers is active. We focus on Protocol OR, when $a=1$, and Protocol AND, when $a=m$, which model agents that are most and least readily activated, respectively. We develop theory and algorithms to compute the size of the spread at steady state for any set of initially active agents and to analyze the role of distinguished sensing modalities, network structure, and heterogeneity. We show how heterogeneity manages the tension in spreading dynamics between sensitivity to inputs and robustness to disturbances.

preprint2019arXiv

A Continuous Threshold Model of Cascade Dynamics

We present a continuous threshold model (CTM) of cascade dynamics for a network of agents with real-valued activity levels that change continuously in time. The model generalizes the linear threshold model (LTM) from the literature, where an agent becomes active (adopts an innovation) if the fraction of its neighbors that are active is above a threshold. With the CTM we study the influence on cascades of heterogeneity in thresholds for a network comprised of a chain of three clusters of agents, each distinguished by a different threshold. The system is most sensitive to change as the dynamics pass through a bifurcation point: if the bifurcation is supercritical the response will be contained, while if the bifurcation is subcritical the response will be a cascade. We show that there is a subcritical bifurcation, thus a cascade, in response to an innovation if there is a large enough disparity between the thresholds of sufficiently large clusters on either end of the chain; otherwise the response will be contained.

preprint2016arXiv

A martingale analysis of first passage times of time-dependent Wiener diffusion models

Research in psychology and neuroscience has successfully modeled decision making as a process of noisy evidence accumulation to a decision bound. While there are several variants and implementations of this idea, the majority of these models make use of a noisy accumulation between two absorbing boundaries. A common assumption of these models is that decision parameters, e.g., the rate of accumulation (drift rate), remain fixed over the course of a decision, allowing the derivation of analytic formulas for the probabilities of hitting the upper or lower decision threshold, and the mean decision time. There is reason to believe, however, that many types of behavior would be better described by a model in which the parameters were allowed to vary over the course of the decision process. In this paper, we use martingale theory to derive formulas for the mean decision time, hitting probabilities, and first passage time (FPT) densities of a Wiener process with time-varying drift between two time-varying absorbing boundaries. This model was first studied by Ratcliff (1980) in the two-stage form, and here we consider the same model for an arbitrary number of stages (i.e. intervals of time during which parameters are constant). Our calculations enable direct computation of mean decision times and hitting probabilities for the associated multistage process. We also provide a review of how martingale theory may be used to analyze similar models employing Wiener processes by re-deriving some classical results. In concert with a variety of numerical tools already available, the current derivations should encourage mathematical analysis of more complex models of decision making with time-varying evidence.

preprint2016arXiv

Satisficing in multi-armed bandit problems

Satisficing is a relaxation of maximizing and allows for less risky decision making in the face of uncertainty. We propose two sets of satisficing objectives for the multi-armed bandit problem, where the objective is to achieve reward-based decision-making performance above a given threshold. We show that these new problems are equivalent to various standard multi-armed bandit problems with maximizing objectives and use the equivalence to find bounds on performance. The different objectives can result in qualitatively different behavior; for example, agents explore their options continually in one case and only a finite number of times in another. For the case of Gaussian rewards we show an additional equivalence between the two sets of satisficing objectives that allows algorithms developed for one set to be applied to the other. We then develop variants of the Upper Credible Limit (UCL) algorithm that solve the problems with satisficing objectives and show that these modified UCL algorithms achieve efficient satisficing performance.

preprint2015arXiv

Correlated Multiarmed Bandit Problem: Bayesian Algorithms and Regret Analysis

We consider the correlated multiarmed bandit (MAB) problem in which the rewards associated with each arm are modeled by a multivariate Gaussian random variable, and we investigate the influence of the assumptions in the Bayesian prior on the performance of the upper credible limit (UCL) algorithm and a new correlated UCL algorithm. We rigorously characterize the influence of accuracy, confidence, and correlation scale in the prior on the decision-making performance of the algorithms. Our results show how priors and correlation structure can be leveraged to improve performance.

preprint2015arXiv

Joint Centrality Distinguishes Optimal Leaders in Noisy Networks

We study the performance of a network of agents tasked with tracking an external unknown signal in the presence of stochastic disturbances and under the condition that only a limited subset of agents, known as leaders, can measure the signal directly. We investigate the optimal leader selection problem for a prescribed maximum number of leaders, where the optimal leader set minimizes total system error defined as steady-state variance about the external signal. In contrast to previously established greedy algorithms for optimal leader selection, our results rely on an expression of total system error in terms of properties of the underlying network graph. We demonstrate that the performance of any given set of leaders depends on their influence as determined by a new graph measure of centrality of a set. We define the $joint \; centrality$ of a set of nodes in a network graph such that a leader set with maximal joint centrality is an optimal leader set. In the case of a single leader, we prove that the optimal leader is the node with maximal information centrality. In the case of multiple leaders, we show that the nodes in the optimal leader set balance high information centrality with a coverage of the graph. For special cases of graphs, we solve explicitly for optimal leader sets. We illustrate with examples.

preprint2014arXiv

Collective Decision-Making in Ideal Networks: The Speed-Accuracy Tradeoff

We study collective decision-making in a model of human groups, with network interactions, performing two alternative choice tasks. We focus on the speed-accuracy tradeoff, i.e., the tradeoff between a quick decision and a reliable decision, for individuals in the network. We model the evidence aggregation process across the network using a coupled drift diffusion model (DDM) and consider the free response paradigm in which individuals take their time to make the decision. We develop reduced DDMs as decoupled approximations to the coupled DDM and characterize their efficiency. We determine high probability bounds on the error rate and the expected decision time for the reduced DDM. We show the effect of the decision-maker's location in the network on their decision-making performance under several threshold selection criteria. Finally, we extend the coupled DDM to the coupled Ornstein-Uhlenbeck model for decision-making in two alternative choice tasks with recency effects, and to the coupled race model for decision-making in multiple alternative choice tasks.

preprint2014arXiv

Cooperative learning in multi-agent systems from intermittent measurements

Motivated by the problem of tracking a direction in a decentralized way, we consider the general problem of cooperative learning in multi-agent systems with time-varying connectivity and intermittent measurements. We propose a distributed learning protocol capable of learning an unknown vector $μ$ from noisy measurements made independently by autonomous nodes. Our protocol is completely distributed and able to cope with the time-varying, unpredictable, and noisy nature of inter-agent communication, and intermittent noisy measurements of $μ$. Our main result bounds the learning speed of our protocol in terms of the size and combinatorial features of the (time-varying) networks connecting the nodes.

preprint2013arXiv

A New Notion of Effective Resistance for Directed Graphs-Part I: Definition and Properties

The graphical notion of effective resistance has found wide-ranging applications in many areas of pure mathematics, applied mathematics and control theory. By the nature of its construction, effective resistance can only be computed in undirected graphs and yet in several areas of its application, directed graphs arise as naturally (or more naturally) than undirected ones. In part I of this work, we propose a generalization of effective resistance to directed graphs that preserves its control-theoretic properties in relation to consensus-type dynamics. We proceed to analyze the dependence of our algebraic definition on the structural properties of the graph and the relationship between our construction and a graphical distance. The results make possible the calculation of effective resistance between any two nodes in any directed graph and provide a solid foundation for the application of effective resistance to problems involving directed graphs.

preprint2013arXiv

A New Notion of Effective Resistance for Directed Graphs-Part II: Computing Resistances

In Part I of this work we defined a generalization of the concept of effective resistance to directed graphs, and we explored some of the properties of this new definition. Here, we use the theory developed in Part I to compute effective resistances in some prototypical directed graphs. This exploration highlights cases where our notion of effective resistance for directed graphs behaves analogously to our experience from undirected graphs, as well as cases where it behaves in unexpected ways.

preprint2013arXiv

Adaptive Network Dynamics and Evolution of Leadership in Collective Migration

The evolution of leadership in migratory populations depends not only on costs and benefits of leadership investments but also on the opportunities for individuals to rely on cues from others through social interactions. We derive an analytically tractable adaptive dynamic network model of collective migration with fast timescale migration dynamics and slow timescale adaptive dynamics of individual leadership investment and social interaction. For large populations, our analysis of bifurcations with respect to investment cost explains the observed hysteretic effect associated with recovery of migration in fragmented environments. Further, we show a minimum connectivity threshold above which there is evolutionary branching into leader and follower populations. For small populations, we show how the topology of the underlying social interaction network influences the emergence and location of leaders in the adaptive system. Our model and analysis can describe other adaptive network dynamics involving collective tracking or collective learning of a noisy, unknown signal, and likewise can inform the design of robotic networks where agents use decentralized strategies that balance direct environmental measurements with agent interactions.

preprint2013arXiv

Starling flock networks manage uncertainty in consensus at low cost

Flocks of starlings exhibit a remarkable ability to maintain cohesion as a group in highly uncertain environments and with limited, noisy information. Recent work demonstrated that individual starlings within large flocks respond to a fixed number of nearest neighbors, but until now it was not understood why this number is seven. We analyze robustness to uncertainty of consensus in empirical data from multiple starling flocks and show that the flock interaction networks with six or seven neighbors optimize the trade-off between group cohesion and individual effort. We can distinguish these numbers of neighbors from fewer or greater numbers using our systems-theoretic approach to measuring robustness of interaction networks as a function of the network structure, i.e., who is sensing whom. The metric quantifies the disagreement within the network due to disturbances and noise during consensus behavior and can be evaluated over a parameterized family of hypothesized sensing strategies (here the parameter is number of neighbors). We use this approach to further show that for the range of flocks studied the optimal number of neighbors does not depend on the number of birds within a flock; rather, it depends on the shape, notably the thickness, of the flock. The results suggest that robustness to uncertainty may have been a factor in the evolution of flocking for starlings. More generally, our results elucidate the role of the interaction network on uncertainty management in collective behavior, and motivate the application of our approach to other biological networks.

preprint2012arXiv

Node Classification in Networks of Stochastic Evidence Accumulators

This paper considers a network of stochastic evidence accumulators, each represented by a drift-diffusion model accruing evidence towards a decision in continuous time by observing a noisy signal and by exchanging information with other units according to a fixed communication graph. We bring into focus the relationship between the location of each unit in the communication graph and its certainty as measured by the inverse of the variance of its state. We show that node classification according to degree distributions or geodesic distances cannot faithfully capture node ranking in terms of certainty. Instead, all possible paths connecting each unit with the rest in the network must be incorporated. We make this precise by proving that node classification according to information centrality provides a rank ordering with respect to node certainty, thereby affording a direct interpretation of the certainty level of each unit in terms of the structural properties of the underlying communication graph.

preprint2012arXiv

Nonuniform Coverage Control on the Line

This paper investigates control laws allowing mobile, autonomous agents to optimally position themselves on the line for distributed sensing in a nonuniform field. We show that a simple static control law, based only on local measurements of the field by each agent, drives the agents close to the optimal positions after the agents execute in parallel a number of sensing/movement/computation rounds that is essentially quadratic in the number of agents. Further, we exhibit a dynamic control law which, under slightly stronger assumptions on the capabilities and knowledge of each agent, drives the agents close to the optimal positions after the agents execute in parallel a number of sensing/communication/computation/movement rounds that is essentially linear in the number of agents. Crucially, both algorithms are fully distributed and robust to unpredictable loss and addition of agents.

preprint2011arXiv

Rearranging trees for robust consensus

In this paper, we use the H2 norm associated with a communication graph to characterize the robustness of consensus to noise. In particular, we restrict our attention to trees and by systematic attention to the effect of local changes in topology, we derive a partial ordering for undirected trees according to the H2 norm. Our approach for undirected trees provides a constructive method for deriving an ordering for directed trees. Further, our approach suggests a decentralized manner in which trees can be rearranged in order to improve their robustness.

Naomi Ehrich Leonard

What is connected

Connect this record

See the researcher in context

Building this map preview

24 published item(s)

Breaking indecision in multi-agent, multi-option dynamics

Decentralized Learning With Limited Communications for Multi-robot Coverage of Unknown Spatial Fields

On Using Hamiltonian Monte Carlo Sampling for Reinforcement Learning Problems in High-dimension

Provably Efficient Multi-Agent Reinforcement Learning with Fully Decentralized Communication

Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and Control

A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem

A model-independent theory of consensus and dissensus decision making

Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits

Distributed Learning: Sequential Decision Making in Resource-Constrained Environments

Influence Spread in the Heterogeneous Multiplex Linear Threshold Model

A Continuous Threshold Model of Cascade Dynamics

A martingale analysis of first passage times of time-dependent Wiener diffusion models

Satisficing in multi-armed bandit problems

Correlated Multiarmed Bandit Problem: Bayesian Algorithms and Regret Analysis

Joint Centrality Distinguishes Optimal Leaders in Noisy Networks

Collective Decision-Making in Ideal Networks: The Speed-Accuracy Tradeoff

Cooperative learning in multi-agent systems from intermittent measurements

A New Notion of Effective Resistance for Directed Graphs-Part I: Definition and Properties

A New Notion of Effective Resistance for Directed Graphs-Part II: Computing Resistances

Adaptive Network Dynamics and Evolution of Leadership in Collective Migration

Starling flock networks manage uncertainty in consensus at low cost

Node Classification in Networks of Stochastic Evidence Accumulators

Nonuniform Coverage Control on the Line

Rearranging trees for robust consensus