Source author record

Eugene Vinitsky

Eugene Vinitsky appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Multiagent Systems Robotics Computational Complexity Data Structures and Algorithms Machine Learning Systems and Control

Catalog footprint

What is connected

4works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Unified Automatic Control of Vehicular Systems with Reinforcement Learning

Emerging vehicular systems with increasing proportions of automated components present opportunities for optimal control to mitigate congestion and increase efficiency. There has been a recent interest in applying deep reinforcement learning (DRL) to these nonlinear dynamical systems for the automatic design of effective control strategies. Despite conceptual advantages of DRL being model-free, studies typically nonetheless rely on training setups that are painstakingly specialized to specific vehicular systems. This is a key challenge to efficient analysis of diverse vehicular and mobility systems. To this end, this article contributes a streamlined methodology for vehicular microsimulation and discovers high performance control strategies with minimal manual design. A variable-agent, multi-task approach is presented for optimization of vehicular Partially Observed Markov Decision Processes. The methodology is experimentally validated on mixed autonomy traffic systems, where fractions of vehicles are automated; empirical improvement, typically 15-60% over a human driving baseline, is observed in all configurations of six diverse open or closed traffic systems. The study reveals numerous emergent behaviors resembling wave mitigation, traffic signaling, and ramp metering. Finally, the emergent behaviors are analyzed to produce interpretable control strategies, which are validated against the learned control strategies.

preprint2021arXiv

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

A wide range of reinforcement learning (RL) problems - including robustness, transfer learning, unsupervised RL, and emergent complexity - require specifying a distribution of tasks or environments in which a policy will be trained. However, creating a useful distribution of environments is error prone, and takes a significant amount of developer time and effort. We propose Unsupervised Environment Design (UED) as an alternative paradigm, where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments. Existing approaches to automatically generating environments suffer from common failure modes: domain randomization cannot generate structure or adapt the difficulty of the environment to the agent's learning progress, and minimax adversarial training leads to worst-case environments that are often unsolvable. To generate structured, solvable environments for our protagonist agent, we introduce a second, antagonist agent that is allied with the environment-generating adversary. The adversary is motivated to generate environments which maximize regret, defined as the difference between the protagonist and antagonist agent's return. We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED). Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.

preprint2021arXiv

Flow: A Modular Learning Framework for Mixed Autonomy Traffic

The rapid development of autonomous vehicles (AVs) holds vast potential for transportation systems through improved safety, efficiency, and access to mobility. However, the progression of these impacts, as AVs are adopted, is not well understood. Numerous technical challenges arise from the goal of analyzing the partial adoption of autonomy: partial control and observation, multi-vehicle interactions, and the sheer variety of scenarios represented by real-world networks. To shed light into near-term AV impacts, this article studies the suitability of deep reinforcement learning (RL) for overcoming these challenges in a low AV-adoption regime. A modular learning framework is presented, which leverages deep RL to address complex traffic dynamics. Modules are composed to capture common traffic phenomena (stop-and-go traffic jams, lane changing, intersections). Learned control laws are found to improve upon human driving performance, in terms of system-level velocity, by up to 57% with only 4-7% adoption of AVs. Furthermore, in single-lane traffic, a small neural network control law with only local observation is found to eliminate stop-and-go traffic - surpassing all known model-based controllers to achieve near-optimal performance - and generalize to out-of-distribution traffic densities.

preprint2020arXiv

On the Approximability of Time Disjoint Walks

We introduce the combinatorial optimization problem Time Disjoint Walks (TDW), which has applications in collision-free routing of discrete objects (e.g., autonomous vehicles) over a network. This problem takes as input a digraph $G$ with positive integer arc lengths, and $k$ pairs of vertices that each represent a trip demand from a source to a destination. The goal is to find a walk and delay for each demand so that no two trips occupy the same vertex at the same time, and so that a min-max or min-sum objective over the trip durations is realized. We focus here on the min-sum variant of Time Disjoint Walks, although most of our results carry over to the min-max case. We restrict our study to various subclasses of DAGs, and observe that there is a sharp complexity boundary between Time Disjoint Walks on oriented stars and on oriented stars with the central vertex replaced by a path. In particular, we present a poly-time algorithm for min-sum and min-max TDW on the former, but show that min-sum TDW on the latter is NP-hard. Our main hardness result is that for DAGs with max degree $Δ\leq3$, min-sum Time Disjoint Walks is APX-hard. We present a natural approximation algorithm for the same class, and provide a tight analysis. In particular, we prove that it achieves an approximation ratio of $Θ(k/\log k)$ on bounded-degree DAGs, and $Θ(k)$ on DAGs and bounded-degree digraphs.