Researcher profile

Zhenyu Shou

Zhenyu Shou contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Multi-Agent Reinforcement Learning for Markov Routing Games: A New Modeling Paradigm For Dynamic Traffic Assignment

This paper aims to develop a paradigm that models the learning behavior of intelligent agents (including but not limited to autonomous vehicles, connected and automated vehicles, or human-driven vehicles with intelligent navigation systems where human drivers follow the navigation instructions completely) with a utility-optimizing goal and the system's equilibrating processes in a routing game among atomic selfish agents. Such a paradigm can assist policymakers in devising optimal operational and planning countermeasures under both normal and abnormal circumstances. To this end, we develop a Markov routing game (MRG) in which each agent learns and updates her own en-route path choice policy while interacting with others in transportation networks. To efficiently solve MRG, we formulate it as multi-agent reinforcement learning (MARL) and devise a mean field multi-agent deep Q learning (MF-MA-DQL) approach that captures the competition among agents. The linkage between the classical DUE paradigm and our proposed Markov routing game (MRG) is discussed. We show that the routing behavior of intelligent agents is shown to converge to the classical notion of predictive dynamic user equilibrium (DUE) when traffic environments are simulated using dynamic loading models (DNL). In other words, the MRG depicts DUEs assuming perfect information and deterministic environments propagated by DNL models. Four examples are solved to illustrate the algorithm efficiency and consistency between DUE and the MRG equilibrium, on a simple network without and with spillback, the Ortuzar Willumsen (OW) Network, and a real-world network near Columbia University's campus in Manhattan of New York City.

preprint2020arXiv

Long-Term Prediction of Lane Change Maneuver Through a Multilayer Perceptron

Behavior prediction plays an essential role in both autonomous driving systems and Advanced Driver Assistance Systems (ADAS), since it enhances vehicle&#39;s awareness of the imminent hazards in the surrounding environment. Many existing lane change prediction models take as input lateral or angle information and make short-term (< 5 seconds) maneuver predictions. In this study, we propose a longer-term (5~10 seconds) prediction model without any lateral or angle information. Three prediction models are introduced, including a logistic regression model, a multilayer perceptron (MLP) model, and a recurrent neural network (RNN) model, and their performances are compared by using the real-world NGSIM dataset. To properly label the trajectory data, this study proposes a new time-window labeling scheme by adding a time gap between positive and negative samples. Two approaches are also proposed to address the unstable prediction issue, where the aggressive approach propagates each positive prediction for certain seconds, while the conservative approach adopts a roll-window average to smooth the prediction. Evaluation results show that the developed prediction model is able to capture 75% of real lane change maneuvers with an average advanced prediction time of 8.05 seconds.

preprint2020arXiv

Optimal Passenger-Seeking Policies on E-hailing Platforms Using Markov Decision Process and Imitation Learning

Vacant taxi drivers&#39; passenger seeking process in a road network generates additional vehicle miles traveled, adding congestion and pollution into the road network and the environment. This paper aims to employ a Markov Decision Process (MDP) to model idle e-hailing drivers&#39; optimal sequential decisions in passenger-seeking. Transportation network companies (TNC) or e-hailing (e.g., Didi, Uber) drivers exhibit different behaviors from traditional taxi drivers because e-hailing drivers do not need to actually search for passengers. Instead, they reposition themselves so that the matching platform can match a passenger. Accordingly, we incorporate e-hailing drivers&#39; new features into our MDP model. The reward function used in the MDP model is uncovered by leveraging an inverse reinforcement learning technique. We then use 44,160 Didi drivers&#39; 3-day trajectories to train the model. To validate the effectiveness of the model, a Monte Carlo simulation is conducted to simulate the performance of drivers under the guidance of the optimal policy, which is then compared with the performance of drivers following one baseline heuristic, namely, the local hotspot strategy. The results show that our model is able to achieve a 17.5% improvement over the local hotspot strategy in terms of the rate of return. The proposed MDP model captures the supply-demand ratio considering the fact that the number of drivers in this study is sufficiently large and thus the number of unmatched orders is assumed to be negligible. To better incorporate the competition among multiple drivers into the model, we have also devised and calibrated a dynamic adjustment strategy of the order matching probability.

preprint2020arXiv

Reward Design for Driver Repositioning Using Multi-Agent Reinforcement Learning

A large portion of passenger requests is reportedly unserviced, partially due to vacant for-hire drivers&#39; cruising behavior during the passenger seeking process. This paper aims to model the multi-driver repositioning task through a mean field multi-agent reinforcement learning (MARL) approach that captures competition among multiple agents. Because the direct application of MARL to the multi-driver system under a given reward mechanism will likely yield a suboptimal equilibrium due to the selfishness of drivers, this study proposes a reward design scheme with which a more desired equilibrium can be reached. To effectively solve the bilevel optimization problem with upper level as the reward design and the lower level as a multi-agent system, a Bayesian optimization (BO) algorithm is adopted to speed up the learning process. We then apply the bilevel optimization model to two case studies, namely, e-hailing driver repositioning under service charge and multiclass taxi driver repositioning under NYC congestion pricing. In the first case study, the model is validated by the agreement between the derived optimal control from BO and that from an analytical solution. With a simple piecewise linear service charge, the objective of the e-hailing platform can be increased by 8.4%. In the second case study, an optimal toll charge of $5.1 is solved using BO, which improves the objective of city planners by 7.9%, compared to that without any toll charge. Under this optimal toll charge, the number of taxis in the NYC central business district is decreased, indicating a better traffic condition, without substantially increasing the crowdedness of the subway system.

preprint2020arXiv

Sensor Fusion of Camera and Cloud Digital Twin Information for Intelligent Vehicles

With the rapid development of intelligent vehicles and Advanced Driving Assistance Systems (ADAS), a mixed level of human driver engagements is involved in the transportation system. Visual guidance for drivers is essential under this situation to prevent potential risks. To advance the development of visual guidance systems, we introduce a novel sensor fusion methodology, integrating camera image and Digital Twin knowledge from the cloud. Target vehicle bounding box is drawn and matched by combining results of object detector running on ego vehicle and position information from the cloud. The best matching result, with a 79.2% accuracy under 0.7 Intersection over Union (IoU) threshold, is obtained with depth image served as an additional feature source. Game engine-based simulation results also reveal that the visual guidance system could improve driving safety significantly cooperate with the cloud Digital Twin system.