Topic overview

Multiagent Systems

1840 works5855 researchers0 institutions

Topic snapshot

What this area looks like now

1840works
5855authors
0experts visible
0communities

Next steps

Move from topic reading into action

The graph preview below keeps the nearby papers, people and communities visible in the same reading flow.

Topic graph

See the topic as a live network

Open full explorer

Inspect nearby papers, researchers, institutions and communities without opening a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Papers in this area

24 featured work(s)

preprint2020arXiv

Underwater Caging and Capture for Autonomous Underwater Vehicles

In this paper, we consider the problem of caging and eventual capture of an underwater entity using multiple Autonomous Underwater Vehicles (AUVs) in a 3D water volume We solve this problem both with and without taking bathymetry into account. Our proposed algorithm for range-limited sensing in 3D environments captures a finite-speed entity based on sparse and irregular observations. After an isolated initial sighting of the entity, the uncertainty of its whereabouts grows while deployment of the AUV system is underway. To contain the entity, an initial cage, or barrier of sensing footprints, is created around the initial sighting, using islands and other terrain as part of the cage if available. After the initial cage is established, the system waits for a second sighting, and the possible opportunity to create a smaller, shrinkable cage. This process continues until at some point it is possible to create this smaller cage, resulting in capture, meaning the entity is sensed directly and continuously. We present a set of algorithms for addressing the scenario above, and illustrate their performance on a set of examples. The proposed algorithm is a combination of solutions to the mi

preprint2020arXiv

Evaluation of the cumulated impacts on the marine resource of a socio-ecological coral system: approach by agent-based modeling

In the context of climate change and significant changes in human activities around the world, coral reefs are subject to many disruptions. We develop here a tool to help decision-making in Moorea (French Polynesia), based on multi-agent modeling. We model the trophic interactions with a Lotka-Volterra model, and also the interactions between fishermen, trophic groups and tourist operators. The results are generated through global, temporal (time series), and spatial (GIS maps) outputs. The model produced here can be transposed to other ecological and economic situations, and other geographical areas, by modifying the parameters and changing the input map data.

preprint2020arXiv

My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits

Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, representable as an NxM matrix. These utilities are unknown to the players. In each turn players select an arm and receive a noisy observation of their utility for it. However, if any other players selected the same arm that turn, all colliding players will all receive zero utility due to the conflict. No other communication or coordination between the players is possible. Our goal is to design a distributed algorithm that learns the matching between players and arms that achieves max-min fairness while minimizing the regret. We present an algorithm and prove that it is regret optimal up to a $\log\log T$ factor. This is the first max-min fairness multi-player bandit algorithm with (near) order optimal regret.

preprint2020arXiv

FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis

Balancing performance and safety is crucial to deploying autonomous vehicles in multi-agent environments. In particular, autonomous racing is a domain that penalizes safe but conservative policies, highlighting the need for robust, adaptive strategies. Current approaches either make simplifying assumptions about other agents or lack robust mechanisms for online adaptation. This work makes algorithmic contributions to both challenges. First, to generate a realistic, diverse set of opponents, we develop a novel method for self-play based on replica-exchange Markov chain Monte Carlo. Second, we propose a distributionally robust bandit optimization procedure that adaptively adjusts risk aversion relative to uncertainty in beliefs about opponents' behaviors. We rigorously quantify the tradeoffs in performance and robustness when approximating these computations in real-time motion-planning, and we demonstrate our methods experimentally on autonomous vehicles that achieve scaled speeds comparable to Formula One racecars.

preprint2020arXiv

Graph Neural Networks for 3D Multi-Object Tracking

3D Multi-object tracking (MOT) is crucial to autonomous systems. Recent work often uses a tracking-by-detection pipeline, where the feature of each object is extracted independently to compute an affinity matrix. Then, the affinity matrix is passed to the Hungarian algorithm for data association. A key process of this pipeline is to learn discriminative features for different objects in order to reduce confusion during data association. To that end, we propose two innovative techniques: (1) instead of obtaining the features for each object independently, we propose a novel feature interaction mechanism by introducing Graph Neural Networks; (2) instead of obtaining the features from either 2D or 3D space as in prior work, we propose a novel joint feature extractor to learn appearance and motion features from 2D and 3D space. Through experiments on the KITTI dataset, our proposed method achieves state-of-the-art 3D MOT performance. Our project website is at http://www.xinshuoweng.com/projects/GNN3DMOT.

preprint2020arXiv

A Robust Gradient Tracking Method for Distributed Optimization over Directed Networks

In this paper, we consider the problem of distributed consensus optimization over multi-agent networks with directed network topology. Assuming each agent has a local cost function that is smooth and strongly convex, the global objective is to minimize the average of all the local cost functions. To solve the problem, we introduce a robust gradient tracking method (R-Push-Pull) adapted from the recently proposed Push-Pull/AB algorithm. R-Push-Pull inherits the advantages of Push-Pull and enjoys linear convergence to the optimal solution with exact communication. Under noisy information exchange, R-Push-Pull is more robust than the existing gradient tracking based algorithms; the solutions obtained by each agent reach a neighborhood of the optimum in expectation exponentially fast under a constant stepsize policy. We provide a numerical example that demonstrate the effectiveness of R-Push-Pull.

preprint2020arXiv

Integrated Self-Organized Traffic Light Controllers for Signalized Intersections

Detecting emergency vehicles arrival on roads has been the focus for many researchers. It is quite important to detect the emergency vehicles (e.g; ambulance) arrival to traffic light to give the green light for it to pass through. Many researchers have suggested and patented emergency vehicles detection systems however, according to our knowledge, none of them considered solving the effect of giving extra green time to a road while the queues are being built on others. This paper considers the problem of finding a better traffic light phase plan to stabilize/recover the situation at an effected intersection after solving an emergency vehicle existence. A hardware setup and a novel messaging protocol have been suggested to be set on roads and vehicles to collect roads real time data. In addition, a novel decision making protocol has been created to make the use of the collected data for making a better traffic light phase plan for an intersection. The phase plan has two main decisions to be made; which light has a higher priority to be green in the next phase, and how long the green phase should be. After simulating the proposed system using our customized simulator written in Matl

preprint2020arXiv

Reward Design for Driver Repositioning Using Multi-Agent Reinforcement Learning

A large portion of passenger requests is reportedly unserviced, partially due to vacant for-hire drivers' cruising behavior during the passenger seeking process. This paper aims to model the multi-driver repositioning task through a mean field multi-agent reinforcement learning (MARL) approach that captures competition among multiple agents. Because the direct application of MARL to the multi-driver system under a given reward mechanism will likely yield a suboptimal equilibrium due to the selfishness of drivers, this study proposes a reward design scheme with which a more desired equilibrium can be reached. To effectively solve the bilevel optimization problem with upper level as the reward design and the lower level as a multi-agent system, a Bayesian optimization (BO) algorithm is adopted to speed up the learning process. We then apply the bilevel optimization model to two case studies, namely, e-hailing driver repositioning under service charge and multiclass taxi driver repositioning under NYC congestion pricing. In the first case study, the model is validated by the agreement between the derived optimal control from BO and that from an analytical solution. With a simple p

preprint2020arXiv

SmartSON:A Smart contract driven incentive management framework for Self-Organizing Networks

This article proposes a self-organizing collaborative computing network with an approach to enhance the expectation of a collaborating node for joining the self-organizing network. The proposed approach relies on Ethereum cryptocurrency and Smart Contract to enhance the expectation of collaborating nodes by monetizing the services provided to the self-organizing network. Furthermore, an escrow based smart contract is formalized in the proposed framework to sustains the monetary trust issue between collaborating nodes. The proposed scheme can enforce an autonomic incentive management mechanism to any type of self-organizing networks such as self-organizing clouds, ad-hoc networks, self-organizing federated cloud networks, self-organizing federated learning networks, and self-organizing D2D networks to name a few. Considering the distributed nature of these self-organizing networks and the Ethereum blockchain network, a distributed agent-based methodology is materialized in the proposed framework. Following this, a proof of concept implementation for the general case of a self-organizing cloud is presented. Lastly, the article provides some insights into possible future directions us

preprint2020arXiv

Enhanced or distorted wisdom of crowds? An agent-based model of opinion formation under social influence

We propose an agent-based model of collective opinion formation to study the wisdom of crowds under social influence. The opinion of an agent is a continuous positive value, denoting its subjective answer to a factual question. The wisdom of crowds states that the average of all opinions is close to the truth, i.e. the correct answer. But if agents have the chance to adjust their opinion in response to the opinions of others, this effect can be destroyed. Our model investigates this scenario by evaluating two competing effects: (i) agents tend to keep their own opinion (individual conviction $β$), (ii) they tend to adjust their opinion if they have information about the opinions of others (social influence $α$). For the latter, two different regimes (full information vs. aggregated information) are compared. Our simulations show that social influence only in rare cases enhances the wisdom of crowds. Most often, we find that agents converge to a collective opinion that is even farther away from the true answer. So, under social influence the wisdom of crowds can be systematically wrong.

preprint2020arXiv

Automated Trajectory Synthesis for UAV Swarms Based on Resilient Data Collection Objectives

The use of Unmanned Aerial Vehicles (UAVs) for collecting data from remotely located sensor systems is emerging. The data can be time-sensitive and require to be transmitted to a data processing center. However, planning the trajectory of a collaborative UAV swarm depends on multi-fold constraints, such as data collection requirements, UAV maneuvering capacity, and budget limitation. Since a UAV may fail or be compromised, it is important to provide necessary resilience to such contingencies, thus ensuring data security. It is important to provide the UAVs with efficient spatio-temporal trajectories so that they can efficiently cover necessary data sources. In this work, we present Synth4UAV, a formal approach for automated synthesis of efficient trajectories for a UAV swarm by logically modeling the aerial space and data point topology, UAV moves, and associated constraints in terms of the turning and climbing angle, fuel usage, data collection point coverage, data freshness, and resiliency properties. We use efficient, logical formulas to encode and solve the complex model. The solution to the model provides the routing and maneuvering plan for each UAV, including the time to vis

preprint2020arXiv

End-to-End 3D Multi-Object Tracking and Trajectory Forecasting

3D multi-object tracking (MOT) and trajectory forecasting are two critical components in modern 3D perception systems. We hypothesize that it is beneficial to unify both tasks under one framework to learn a shared feature representation of agent interaction. To evaluate this hypothesis, we propose a unified solution for 3D MOT and trajectory forecasting which also incorporates two additional novel computational units. First, we employ a feature interaction technique by introducing Graph Neural Networks (GNNs) to capture the way in which multiple agents interact with one another. The GNN is able to model complex hierarchical interactions, improve the discriminative feature learning for MOT association, and provide socially-aware context for trajectory forecasting. Second, we use a diversity sampling function to improve the quality and diversity of our forecasted trajectories. The learned sampling function is trained to efficiently extract a variety of outcomes from a generative trajectory distribution and helps avoid the problem of generating many duplicate trajectory samples. We show that our method achieves state-of-the-art performance on the KITTI dataset. Our project website is

preprint2020arXiv

Decentralized Stochastic Gradient Tracking for Non-convex Empirical Risk Minimization

This paper studies a decentralized stochastic gradient tracking (DSGT) algorithm for non-convex empirical risk minimization problems over a peer-to-peer network of nodes, which is in sharp contrast to the existing DSGT only for convex problems. To ensure exact convergence and handle the variance among decentralized datasets, each node performs a stochastic gradient (SG) tracking step by using a mini-batch of samples, where the batch size is designed to be proportional to the size of the local dataset. We explicitly evaluate the convergence rate of DSGT with respect to the number of iterations in terms of algebraic connectivity of the network, mini-batch size, gradient variance, etc. Under certain conditions, we further show that DSGT has a network independence property in the sense that the network topology only affects the convergence rate up to a constant factor. Hence, the convergence rate of DSGT can be comparable to the centralized SGD method. Moreover, a linear speedup of DSGT with respect to the number of nodes is achievable for some scenarios. Numerical experiments for neural networks and logistic regression problems on CIFAR-10 finally illustrate the advantages of DSGT.

preprint2020arXiv

Collaborative Multi-Robot Systems for Search and Rescue: Coordination and Perception

Autonomous or teleoperated robots have been playing increasingly important roles in civil applications in recent years. Across the different civil domains where robots can support human operators, one of the areas where they can have more impact is in search and rescue (SAR) operations. In particular, multi-robot systems have the potential to significantly improve the efficiency of SAR personnel with faster search of victims, initial assessment and mapping of the environment, real-time monitoring and surveillance of SAR operations, or establishing emergency communication networks, among other possibilities. SAR operations encompass a wide variety of environments and situations, and therefore heterogeneous and collaborative multi-robot systems can provide the most advantages. In this paper, we review and analyze the existing approaches to multi-robot SAR support, from an algorithmic perspective and putting an emphasis on the methods enabling collaboration among the robots as well as advanced perception through machine vision and multi-agent active perception. Furthermore, we put these algorithms in the context of the different challenges and constraints that various types of robots

preprint2020arXiv

A Deep Multi-Agent Reinforcement Learning Approach to Autonomous Separation Assurance

A novel deep multi-agent reinforcement learning framework is proposed to identify and resolve conflicts among a variable number of aircraft in a high-density, stochastic, and dynamic sector. Currently the sector capacity is constrained by human air traffic controller's cognitive limitation. We investigate the feasibility of a new concept (autonomous separation assurance) and a new approach to push the sector capacity above human cognitive limitation. We propose the concept of using distributed vehicle autonomy to ensure separation, instead of a centralized sector air traffic controller. Our proposed framework utilizes Proximal Policy Optimization (PPO) that we modify to incorporate an attention network. This allows the agents to have access to variable aircraft information in the sector in a scalable, efficient approach to achieve high traffic throughput under uncertainty. Agents are trained using a centralized learning, decentralized execution scheme where one neural network is learned and shared by all agents. The proposed framework is validated on three challenging case studies in the BlueSky air traffic control environment. Numerical results show the proposed framework sign

preprint2020arXiv

AllenAct: A Framework for Embodied AI Research

The domain of Embodied AI, in which agents learn to complete tasks through interaction with their environment from egocentric observations, has experienced substantial growth with the advent of deep reinforcement learning and increased interest from the computer vision, NLP, and robotics communities. This growth has been facilitated by the creation of a large number of simulated environments (such as AI2-THOR, Habitat and CARLA), tasks (like point navigation, instruction following, and embodied question answering), and associated leaderboards. While this diversity has been beneficial and organic, it has also fragmented the community: a huge amount of effort is required to do something as simple as taking a model trained in one environment and testing it in another. This discourages good science. We introduce AllenAct, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research. AllenAct provides first-class support for a growing collection of embodied environments, tasks and algorithms, provides reproductions of state-of-the-art models and includes extensive documentation, tutorials, start-up code, and pre-trained models. We ho

preprint2020arXiv

Algorithmic Approaches to Reconfigurable Assembly Systems

Assembly of large scale structural systems in space is understood as critical to serving applications that cannot be deployed from a single launch. Recent literature proposes the use of discrete modular structures for in-space assembly and relatively small scale robotics that are able to modify and traverse the structure. This paper addresses the algorithmic problems in scaling reconfigurable space structures built through robotic construction, where reconfiguration is defined as the problem of transforming an initial structure into a different goal configuration. We analyze different algorithmic paradigms and present corresponding abstractions and graph formulations, examining specialized algorithms that consider discretized space and time steps. We then discuss fundamental design trades for different computational architectures, such as centralized versus distributed, and present two representative algorithms as concrete examples for comparison. We analyze how those algorithms achieve different objective functions and goals, such as minimization of total distance traveled, maximization of fault-tolerance, or minimization of total time spent in assembly. This is meant to offer an

preprint2020arXiv

Continuous Deep Hierarchical Reinforcement Learning for Ground-Air Swarm Shepherding

The control and guidance of multi-robots (swarm) is a non-trivial problem due to the complexity inherent in the coupled interaction among the group. Whether the swarm is cooperative or non-cooperative, lessons can be learnt from sheepdogs herding sheep. Biomimicry of shepherding offers computational methods for swarm control with the potential to generalize and scale in different environments. However, learning to shepherd is complex due to the large search space that a machine learner is faced with. We present a deep hierarchical reinforcement learning approach for shepherding, whereby an unmanned aerial vehicle (UAV) learns to act as an aerial sheepdog to control and guide a swarm of unmanned ground vehicles (UGVs). The approach extends our previous work on machine education to decompose the search space into a hierarchically organized curriculum. Each lesson in the curriculum is learnt by a deep reinforcement learning model. The hierarchy is formed by fusing the outputs of the model. The approach is demonstrated first in a high-fidelity robotic-operating-system (ROS)-based simulation environment, then with physical UGVs and a UAV in an in-door testing facility. We investigate th

preprint2020arXiv

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a mixing network that estimates joint action-values as a monotonic combination of per-agent values. We structurally enforce that the joint-action value is monotonic in the per-agent values, through the use of non-negative weights in the mixing network, which guarantees consistency between the centralised and decentralised policies. To evaluate the performance of QMIX, we propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning. We evaluate QMIX on a challenging set of SMAC scenarios and show that it significantly o

preprint2020arXiv

DLMP-based Coordination Procedure for Decentralized Demand Response under Distribution Network Constraints

Load aggregators are independent private entities whose goal is to optimize energy consumption flexibilities offered by multiple residential consumers. Although aggregators optimize their decisions in a decentralized way, they are indirectly linked together if their respective consumers belong to the same distribution grid. This is an important issue for a distribution system operator (DSO), in charge of the reliability of the distribution network, it has to ensure that decentralized decisions taken do not violate the grid constraints and do not increase the global system costs. From the information point of view,the network state and characteristics are confidential to the DSO, which makes a decentralized solution even more relevant. To address this issue, we propose a decentralized coordination mechanism between the DSO and multiple aggregators that computes the optimal demand response profiles while solving the optimal power flow problem. The procedure, based on distribution locational marginal prices (DLMP), preserves the decentralized structure of information and decisions, and lead to a feasible and optimal solution for both the aggregators and the DSO. The procedure is analy

preprint2020arXiv

Finding Core Members of Cooperative Games using Agent-Based Modeling

Agent-based modeling (ABM) is a powerful paradigm to gain insight into social phenomena. One area that ABM has rarely been applied is coalition formation. Traditionally, coalition formation is modeled using cooperative game theory. In this paper, a heuristic algorithm is developed that can be embedded into an ABM to allow the agents to find coalition. The resultant coalition structures are comparable to those found by cooperative game theory solution approaches, specifically, the core. A heuristic approach is required due to the computational complexity of finding a cooperative game theory solution which limits its application to about only a score of agents. The ABM paradigm provides a platform in which simple rules and interactions between agents can produce a macro-level effect without the large computational requirements. As such, it can be an effective means for approximating cooperative game solutions for large numbers of agents. Our heuristic algorithm combines agent-based modeling and cooperative game theory to help find agent partitions that are members of a games' core solution. The accuracy of our heuristic algorithm can be determined by comparing its outcomes to the

preprint2020arXiv

Learning to Collaborate in Multi-Module Recommendation via Multi-Agent Reinforcement Learning without Communication

With the rise of online e-commerce platforms, more and more customers prefer to shop online. To sell more products, online platforms introduce various modules to recommend items with different properties such as huge discounts. A web page often consists of different independent modules. The ranking policies of these modules are decided by different teams and optimized individually without cooperation, which might result in competition between modules. Thus, the global policy of the whole page could be sub-optimal. In this paper, we propose a novel multi-agent cooperative reinforcement learning approach with the restriction that different modules cannot communicate. Our contributions are three-fold. Firstly, inspired by a solution concept in game theory named correlated equilibrium, we design a signal network to promote cooperation of all modules by generating signals (vectors) for different modules. Secondly, an entropy-regularized version of the signal network is proposed to coordinate agents' exploration of the optimal global policy. Furthermore, experiments based on real-world e-commerce data demonstrate that our algorithm obtains superior performance over baselines.

preprint2020arXiv

Utilitarian Welfare and Representation Guarantees of Approval-Based Multiwinner Rules

To choose a suitable multiwinner voting rule is a hard and ambiguous task. Depending on the context, it varies widely what constitutes the choice of an ``optimal'' subset of alternatives. In this paper, we provide a quantitative analysis of multiwinner voting rules using methods from the theory of approximation algorithms---we estimate how well multiwinner rules approximate two extreme objectives: a representation criterion defined via the Approval Chamberlin--Courant rule and a utilitarian criterion defined via Multiwinner Approval Voting. With both theoretical and experimental methods, we classify multiwinner rules in terms of their quantitative alignment with these two opposing objectives. Our results provide fundamental information about the nature of multiwinner rules and, in particular, about the necessary tradeoffs when choosing such a rule.

preprint2020arXiv

High Accuracy Traffic Light Controller for Increasing the Given Green Time Utilization

Traffic congestion has become one of the major problems in the urban cities according to the increasing number of vehicles in those cities, obsolete technologies used on the roads of those cities, inappropriate road design, and many other reasons. So, that has urged the need for a more accurate traffic light controlling system; one that will help in maintaining high stability at all levels of demand. This paper introduces a dynamic traffic light phase plan protocol for Single-Isolated Intersections. The developed controlling method was compared with four other methods and showed a good performance in terms of reducing the average and maximum queue lengths, optimizing the given green time amount as needed, and increased the intersections throughput (increased the given green time utilization). In addition, it maintained a good traffic light stability at all levels of demand.

People in this topic

12 visible researcher(s)