Researcher profile

Wei Zhan

Wei Zhan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
24works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

24 published item(s)

preprint2022arXiv

Analyzing and Enhancing Closed-loop Stability in Reactive Simulation

Simulation has played an important role in efficiently evaluating self-driving vehicles in terms of scalability. Existing methods mostly rely on heuristic-based simulation, where traffic participants follow certain human-encoded rules that fail to generate complex human behaviors. Therefore, the reactive simulation concept is proposed to bridge the human behavior gap between simulation and real-world traffic scenarios by leveraging real-world data. However, these reactive models can easily generate unreasonable behaviors after a few steps of simulation, where we regard the model as losing its stability. To the best of our knowledge, no work has explicitly discussed and analyzed the stability of the reactive simulation framework. In this paper, we aim to provide a thorough stability analysis of the reactive simulation and propose a solution to enhance the stability. Specifically, we first propose a new reactive simulation framework, where we discover that the smoothness and consistency of the simulated state sequences are crucial factors to stability. We then incorporate the kinematic vehicle model into the framework to improve the closed-loop stability of the reactive simulation. Furthermore, along with commonly-used metrics, several novel metrics are proposed in this paper to better analyze the simulation performance.

preprint2022arXiv

DetMatch: Two Teachers are Better Than One for Joint 2D and 3D Semi-Supervised Object Detection

While numerous 3D detection works leverage the complementary relationship between RGB images and point clouds, developments in the broader framework of semi-supervised object recognition remain uninfluenced by multi-modal fusion. Current methods develop independent pipelines for 2D and 3D semi-supervised learning despite the availability of paired image and point cloud frames. Observing that the distinct characteristics of each sensor cause them to be biased towards detecting different objects, we propose DetMatch, a flexible framework for joint semi-supervised learning on 2D and 3D modalities. By identifying objects detected in both sensors, our pipeline generates a cleaner, more robust set of pseudo-labels that both demonstrates stronger performance and stymies single-modality error propagation. Further, we leverage the richer semantics of RGB images to rectify incorrect 3D class predictions and improve localization of 3D boxes. Evaluating on the challenging KITTI and Waymo datasets, we improve upon strong semi-supervised learning methods and observe higher quality pseudo-labels. Code will be released at https://github.com/Divadi/DetMatch

preprint2022arXiv

Domain Knowledge Driven Pseudo Labels for Interpretable Goal-Conditioned Interactive Trajectory Prediction

Motion forecasting in highly interactive scenarios is a challenging problem in autonomous driving. In such scenarios, we need to accurately predict the joint behavior of interacting agents to ensure the safe and efficient navigation of autonomous vehicles. Recently, goal-conditioned methods have gained increasing attention due to their advantage in performance and their ability to capture the multimodality in trajectory distribution. In this work, we study the joint trajectory prediction problem with the goal-conditioned framework. In particular, we introduce a conditional-variational-autoencoder-based (CVAE) model to explicitly encode different interaction modes into the latent space. However, we discover that the vanilla model suffers from posterior collapse and cannot induce an informative latent space as desired. To address these issues, we propose a novel approach to avoid KL vanishing and induce an interpretable interactive latent space with pseudo labels. The proposed pseudo labels allow us to incorporate domain knowledge on interaction in a flexible manner. We motivate the proposed method using an illustrative toy example. In addition, we validate our framework on the Waymo Open Motion Dataset with both quantitative and qualitative evaluations.

preprint2022arXiv

Efficient Game-Theoretic Planning with Prediction Heuristic for Socially-Compliant Autonomous Driving

Planning under social interactions with other agents is an essential problem for autonomous driving. As the actions of the autonomous vehicle in the interactions affect and are also affected by other agents, autonomous vehicles need to efficiently infer the reaction of the other agents. Most existing approaches formulate the problem as a generalized Nash equilibrium problem solved by optimization-based methods. However, they demand too much computational resource and easily fall into the local minimum due to the non-convexity. Monte Carlo Tree Search (MCTS) successfully tackles such issues in game-theoretic problems. However, as the interaction game tree grows exponentially, the general MCTS still requires a huge amount of iterations to reach the optima. In this paper, we introduce an efficient game-theoretic trajectory planning algorithm based on general MCTS by incorporating a prediction algorithm as a heuristic. On top of it, a social-compliant reward and a Bayesian inference algorithm are designed to generate diverse driving behaviors and identify the other driver's driving preference. Results demonstrate the effectiveness of the proposed framework with datasets containing naturalistic driving behavior in highly interactive scenarios.

preprint2022arXiv

Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning

Offline Reinforcement learning (RL) has shown potent in many safe-critical tasks in robotics where exploration is risky and expensive. However, it still struggles to acquire skills in temporally extended tasks. In this paper, we study the problem of offline RL for temporally extended tasks. We propose a hierarchical planning framework, consisting of a low-level goal-conditioned RL policy and a high-level goal planner. The low-level policy is trained via offline RL. We improve the offline training to deal with out-of-distribution goals by a perturbed goal sampling process. The high-level planner selects intermediate sub-goals by taking advantages of model-based planning methods. It plans over future sub-goal sequences based on the learned value function of the low-level policy. We adopt a Conditional Variational Autoencoder to sample meaningful high-dimensional sub-goal candidates and to solve the high-level long-term strategy optimization problem. We evaluate our proposed method in long-horizon driving and robot navigation tasks. Experiments show that our method outperforms baselines with different hierarchical designs and other regular planners without hierarchy in these complex tasks.

preprint2022arXiv

Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models

3D point-clouds and 2D images are different visual representations of the physical world. While human vision can understand both representations, computer vision models designed for 2D image and 3D point-cloud understanding are quite different. Our paper explores the potential of transferring 2D model architectures and weights to understand 3D point-clouds, by empirically investigating the feasibility of the transfer, the benefits of the transfer, and shedding light on why the transfer works. We discover that we can indeed use the same architecture and pretrained weights of a neural net model to understand both images and point-clouds. Specifically, we transfer the image-pretrained model to a point-cloud model by copying or inflating the weights. We find that finetuning the transformed image-pretrained models (FIP) with minimal efforts -- only on input, output, and normalization layers -- can achieve competitive performance on 3D point-cloud classification, beating a wide range of point-cloud models that adopt task-specific architectures and use a variety of tricks. When finetuning the whole model, the performance improves even further. Meanwhile, FIP improves data efficiency, reaching up to 10.0 top-1 accuracy percent on few-shot classification. It also speeds up the training of point-cloud models by up to 11.1x for a target accuracy (e.g., 90 % accuracy). Lastly, we provide an explanation of the image to point-cloud transfer from the aspect of neural collapse. The code is available at: \url{https://github.com/chenfengxu714/image2point}.

preprint2022arXiv

Interventional Behavior Prediction: Avoiding Overly Confident Anticipation in Interactive Prediction

Conditional behavior prediction (CBP) builds up the foundation for a coherent interactive prediction and planning framework that can enable more efficient and less conservative maneuvers in interactive scenarios. In CBP task, we train a prediction model approximating the posterior distribution of target agents' future trajectories conditioned on the future trajectory of an assigned ego agent. However, we argue that CBP may provide overly confident anticipation on how the autonomous agent may influence the target agents' behavior. Consequently, it is risky for the planner to query a CBP model. Instead, we should treat the planned trajectory as an intervention and let the model learn the trajectory distribution under intervention. We refer to it as the interventional behavior prediction (IBP) task. Moreover, to properly evaluate an IBP model with offline datasets, we propose a Shapley-value-based metric to verify if the prediction model satisfies the inherent temporal independence of an interventional distribution. We show that the proposed metric can effectively identify a CBP model violating the temporal independence, which plays an important role when establishing IBP benchmarks.

preprint2022arXiv

Parallel Repetition For All 3-Player Games Over Binary Alphabet

We prove that for every 3-player game with binary questions and answers and value $<1$, the value of the $n$-fold parallel repetition of the game decays polynomially fast to 0. That is, for every such game, there exists a constant $c>0$, such that the value of the $n$-fold parallel repetition of the game is at most $n^{-c}$. Along the way to proving this theorem, we prove two additional parallel repetition theorems for multiplayer games, that may be of independent interest: Playerwise Connected Games (with any number of players and any Alphabet size): We identify a large class of multiplayer games and prove that for every game with value $<1$ in that class, the value of the $n$-fold parallel repetition of the game decays polynomially fast to 0. More precisely, our result applies for playerwise connected games, with any number of players and any alphabet size. The class of playerwise connected games is strictly larger than the class of connected games that was defined in [DHVY17] and for which exponentially fast decay bounds are known [DHVY17]. For playerwise connected games that are not connected, only inverse Ackermann decay bounds were previously known [Ver96]. Exponential Bounds for the Anti-Correlation Game: In the 3-player anti-correlation game, two out of three players are given $1$ as input, and the remaining player is given $0$. The two players who were given $1$ must produce different outputs in $\{0,1\}$. We prove that the value of the $n$-fold parallel repetition of that game decays exponentially fast to 0. Only inverse Ackermann decay bounds were previously known [Ver96]. This game was studied and motivated in several previous works. In particular, Holmgren and Yang gave it as an example for a 3-player game whose non-signaling value (is smaller than 1 and yet) does not decrease at all under parallel repetition [HY19].

preprint2022arXiv

Polynomial Bounds On Parallel Repetition For All 3-Player Games With Binary Inputs

We prove that for every 3-player (3-prover) game $\mathcal G$ with value less than one, whose query distribution has the support $\mathcal S = \{(1,0,0), (0,1,0), (0,0,1)\}$ of hamming weight one vectors, the value of the $n$-fold parallel repetition $\mathcal G^{\otimes n}$ decays polynomially fast to zero; that is, there is a constant $c = c(\mathcal G)>0$ such that the value of the game $\mathcal G^{\otimes n}$ is at most $n^{-c}$. Following the recent work of Girish, Holmgren, Mittal, Raz and Zhan (STOC 2022), our result is the missing piece that implies a similar bound for a much more general class of multiplayer games: For $\textbf{every}$ 3-player game $\mathcal G$ over $\textit{binary questions}$ and $\textit{arbitrary answer lengths}$, with value less than 1, there is a constant $c = c(\mathcal G)>0$ such that the value of the game $\mathcal G^{\otimes n}$ is at most $n^{-c}$. Our proof technique is new and requires many new ideas. For example, we make use of the Level-$k$ inequalities from Boolean Fourier Analysis, which, to the best of our knowledge, have not been explored in this context prior to our work.

preprint2022arXiv

PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map

Deep learning has recently achieved significant progress in trajectory forecasting. However, the scarcity of trajectory data inhibits the data-hungry deep-learning models from learning good representations. While mature representation learning methods exist in computer vision and natural language processing, these pre-training methods require large-scale data. It is hard to replicate these approaches in trajectory forecasting due to the lack of adequate trajectory data (e.g., 34K samples in the nuScenes dataset). To work around the scarcity of trajectory data, we resort to another data modality closely related to trajectories-HD-maps, which is abundantly provided in existing datasets. In this paper, we propose PreTraM, a self-supervised pre-training scheme via connecting trajectories and maps for trajectory forecasting. Specifically, PreTraM consists of two parts: 1) Trajectory-Map Contrastive Learning, where we project trajectories and maps to a shared embedding space with cross-modal contrastive learning, and 2) Map Contrastive Learning, where we enhance map representation with contrastive learning on large quantities of HD-maps. On top of popular baselines such as AgentFormer and Trajectron++, PreTraM boosts their performance by 5.5% and 6.9% relatively in FDE-10 on the challenging nuScenes dataset. We show that PreTraM improves data efficiency and scales well with model size.

preprint2022arXiv

Socially-Compatible Behavior Design of Autonomous Vehicles with Verification on Real Human Data

As more and more autonomous vehicles (AVs) are being deployed on public roads, designing socially compatible behaviors for them is becoming increasingly important. In order to generate safe and efficient actions, AVs need to not only predict the future behaviors of other traffic participants, but also be aware of the uncertainties associated with such behavior prediction. In this paper, we propose an uncertain-aware integrated prediction and planning (UAPP) framework. It allows the AVs to infer the characteristics of other road users online and generate behaviors optimizing not only their own rewards, but also their courtesy to others, and their confidence regarding the prediction uncertainties. We first propose the definitions for courtesy and confidence. Based on that, their influences on the behaviors of AVs in interactive driving scenarios are explored. Moreover, we evaluate the proposed algorithm on naturalistic human driving data by comparing the generated behavior against ground truth. Results show that the online inference can significantly improve the human-likeness of the generated behaviors. Furthermore, we find that human drivers show great courtesy to others, even for those without right-of-way. We also find that such driving preferences vary significantly in different cultures.

preprint2022arXiv

SST-Calib: Simultaneous Spatial-Temporal Parameter Calibration between LIDAR and Camera

With information from multiple input modalities, sensor fusion-based algorithms usually out-perform their single-modality counterparts in robotics. Camera and LIDAR, with complementary semantic and depth information, are the typical choices for detection tasks in complicated driving environments. For most camera-LIDAR fusion algorithms, however, the calibration of the sensor suite will greatly impact the performance. More specifically, the detection algorithm usually requires an accurate geometric relationship among multiple sensors as the input, and it is often assumed that the contents from these sensors are captured at the same time. Preparing such sensor suites involves carefully designed calibration rigs and accurate synchronization mechanisms, and the preparation process is usually done offline. In this work, a segmentation-based framework is proposed to jointly estimate the geometrical and temporal parameters in the calibration of a camera-LIDAR suite. A semantic segmentation mask is first applied to both sensor modalities, and the calibration parameters are optimized through pixel-wise bidirectional loss. We specifically incorporated the velocity information from optical flow for temporal parameters. Since supervision is only performed at the segmentation level, no calibration label is needed within the framework. The proposed algorithm is tested on the KITTI dataset, and the result shows an accurate real-time calibration of both geometric and temporal parameters.

preprint2022arXiv

Towards General and Efficient Active Learning

Active learning selects the most informative samples to exploit limited annotation budgets. Existing work follows a cumbersome pipeline that repeats the time-consuming model training and batch data selection multiple times. In this paper, we challenge this status quo by proposing a novel general and efficient active learning (GEAL) method following our designed new pipeline. Utilizing a publicly available pretrained model, our method selects data from different datasets with a single-pass inference of the same model without extra training or supervision. To capture subtle local information, we propose knowledge clusters extracted from intermediate features. Free from the troublesome batch selection strategy, all data samples are selected in one-shot through a distance-based sampling in the fine-grained knowledge cluster level. This whole process is faster than prior arts by hundreds of times. Extensive experiments verify the effectiveness of our method on object detection, image classification, and semantic segmentation. Our code is publicly available in https://github.com/yichen928/GEAL_active_learning.

preprint2022arXiv

Transferable and Adaptable Driving Behavior Prediction

While autonomous vehicles still struggle to solve challenging situations during on-road driving, humans have long mastered the essence of driving with efficient, transferable, and adaptable driving capability. By mimicking humans&#39; cognition model and semantic understanding during driving, we propose HATN, a hierarchical framework to generate high-quality, transferable, and adaptable predictions for driving behaviors in multi-agent dense-traffic environments. Our hierarchical method consists of a high-level intention identification policy and a low-level trajectory generation policy. We introduce a novel semantic sub-task definition and generic state representation for each sub-task. With these techniques, the hierarchical framework is transferable across different driving scenarios. Besides, our model is able to capture variations of driving behaviors among individuals and scenarios by an online adaptation module. We demonstrate our algorithms in the task of trajectory prediction for real traffic data at intersections and roundabouts from the INTERACTION dataset. Through extensive numerical studies, it is evident that our method significantly outperformed other methods in terms of prediction accuracy, transferability, and adaptability. Pushing the state-of-the-art performance by a considerable margin, we also provide a cognitive view of understanding the driving behavior behind such improvement. We highlight that in the future, more research attention and effort are deserved for transferability and adaptability. It is not only due to the promising performance elevation of prediction and planning algorithms, but more fundamentally, they are crucial for the scalable and general deployment of autonomous vehicles.

preprint2022arXiv

What Matters for 3D Scene Flow Network

3D scene flow estimation from point clouds is a low-level 3D motion perception task in computer vision. Flow embedding is a commonly used technique in scene flow estimation, and it encodes the point motion between two consecutive frames. Thus, it is critical for the flow embeddings to capture the correct overall direction of the motion. However, previous works only search locally to determine a soft correspondence, ignoring the distant points that turn out to be the actual matching ones. In addition, the estimated correspondence is usually from the forward direction of the adjacent point clouds, and may not be consistent with the estimated correspondence acquired from the backward direction. To tackle these problems, we propose a novel all-to-all flow embedding layer with backward reliability validation during the initial scene flow estimation. Besides, we investigate and compare several design choices in key components of the 3D scene flow network, including the point similarity calculation, input elements of predictor, and predictor & refinement level design. After carefully choosing the most effective designs, we are able to present a model that achieves the state-of-the-art performance on FlyingThings3D and KITTI Scene Flow datasets. Our proposed model surpasses all existing methods by at least 38.2% on FlyingThings3D dataset and 24.7% on KITTI Scene Flow dataset for EPE3D metric. We release our codes at https://github.com/IRMVLab/3DFlow.

preprint2021arXiv

A Simple and Efficient Multi-task Network for 3D Object Detection and Road Understanding

Detecting dynamic objects and predicting static road information such as drivable areas and ground heights are crucial for safe autonomous driving. Previous works studied each perception task separately, and lacked a collective quantitative analysis. In this work, we show that it is possible to perform all perception tasks via a simple and efficient multi-task network. Our proposed network, LidarMTL, takes raw LiDAR point cloud as inputs, and predicts six perception outputs for 3D object detection and road understanding. The network is based on an encoder-decoder architecture with 3D sparse convolution and deconvolution operations. Extensive experiments verify the proposed method with competitive accuracies compared to state-of-the-art object detectors and other task-specific networks. LidarMTL is also leveraged for online localization. Code and pre-trained model have been made available at https://github.com/frankfengdi/LidarMTL.

preprint2021arXiv

Diverse Critical Interaction Generation for Planning and Planner Evaluation

Generating diverse and comprehensive interacting agents to evaluate the decision-making modules is essential for the safe and robust planning of autonomous vehicles~(AV). Due to efficiency and safety concerns, most researchers choose to train interactive adversary~(competitive or weakly competitive) agents in simulators and generate test cases to interact with evaluated AVs. However, most existing methods fail to provide both natural and critical interaction behaviors in various traffic scenarios. To tackle this problem, we propose a styled generative model RouteGAN that generates diverse interactions by controlling the vehicles separately with desired styles. By altering its style coefficients, the model can generate trajectories with different safety levels serve as an online planner. Experiments show that our model can generate diverse interactions in various scenarios. We evaluate different planners with our model by testing their collision rate in interaction with RouteGAN planners of multiple critical levels.

preprint2021arXiv

Interaction-Aware Behavior Planning for Autonomous Vehicles Validated with Real Traffic Data

Autonomous vehicles (AVs) need to interact with other traffic participants who can be either cooperative or aggressive, attentive or inattentive. Such different characteristics can lead to quite different interactive behaviors. Hence, to achieve safe and efficient autonomous driving, AVs need to be aware of such uncertainties when they plan their own behaviors. In this paper, we formulate such a behavior planning problem as a partially observable Markov Decision Process (POMDP) where the cooperativeness of other traffic participants is treated as an unobservable state. Under different cooperativeness levels, we learn the human behavior models from real traffic data via the principle of maximum likelihood. Based on that, the POMDP problem is solved by Monte-Carlo Tree Search. We verify the proposed algorithm in both simulations and real traffic data on a lane change scenario, and the results show that the proposed algorithm can successfully finish the lane changes without collisions.

preprint2020arXiv

Efficient Sampling-Based Maximum Entropy Inverse Reinforcement Learning with Application to Autonomous Driving

In the past decades, we have witnessed significant progress in the domain of autonomous driving. Advanced techniques based on optimization and reinforcement learning (RL) become increasingly powerful at solving the forward problem: given designed reward/cost functions, how should we optimize them and obtain driving policies that interact with the environment safely and efficiently. Such progress has raised another equally important question: \emph{what should we optimize}? Instead of manually specifying the reward functions, it is desired that we can extract what human drivers try to optimize from real traffic data and assign that to autonomous vehicles to enable more naturalistic and transparent interaction between humans and intelligent agents. To address this issue, we present an efficient sampling-based maximum-entropy inverse reinforcement learning (IRL) algorithm in this paper. Different from existing IRL algorithms, by introducing an efficient continuous-domain trajectory sampler, the proposed algorithm can directly learn the reward functions in the continuous domain while considering the uncertainties in demonstrated trajectories from human drivers. We evaluate the proposed algorithm on real driving data, including both non-interactive and interactive scenarios. The experimental results show that the proposed algorithm achieves more accurate prediction performance with faster convergence speed and better generalization compared to other baseline IRL algorithms.

preprint2020arXiv

Inferring Spatial Uncertainty in Object Detection

The availability of real-world datasets is the prerequisite for developing object detection methods for autonomous driving. While ambiguity exists in object labels due to error-prone annotation process or sensor observation noises, current object detection datasets only provide deterministic annotations without considering their uncertainty. This precludes an in-depth evaluation among different object detection methods, especially for those that explicitly model predictive probability. In this work, we propose a generative model to estimate bounding box label uncertainties from LiDAR point clouds, and define a new representation of the probabilistic bounding box through spatial distribution. Comprehensive experiments show that the proposed model represents uncertainties commonly seen in driving scenarios. Based on the spatial distribution, we further propose an extension of IoU, called the Jaccard IoU (JIoU), as a new evaluation metric that incorporates label uncertainty. Experiments on the KITTI and the Waymo Open Datasets show that JIoU is superior to IoU when evaluating probabilistic object detectors.

preprint2020arXiv

Lower Bounds for XOR of Forrelations

The Forrelation problem, introduced by Aaronson [A10] and Aaronson and Ambainis [AA15], is a well studied problem in the context of separating quantum and classical models. Variants of this problem were used to give exponential separations between quantum and classical query complexity [A10, AA15]; quantum query complexity and bounded-depth circuits [RT19]; and quantum and classical communication complexity [GRT19]. In all these separations, the lower bound for the classical model only holds when the advantage of the protocol (over a random guess) is more than $\approx 1/\sqrt{N}$, that is, the success probability is larger than $\approx 1/2 + 1/\sqrt{N}$. To achieve separations when the classical protocol has smaller advantage, we study in this work the XOR of $k$ independent copies of the Forrelation function (where $k\ll N$). We prove a very general result that shows that any family of Boolean functions that is closed under restrictions, whose Fourier mass at level $2k$ is bounded by $α^k$, cannot compute the XOR of $k$ independent copies of the Forrelation function with advantage better than $O\left(\frac{α^k}{N^{k/2}}\right)$. This is a strengthening of a result of [CHLT19], that gave a similar result for $k=1$, using the technique of [RT19]. As an application of our result, we give the first example of a partial Boolean function that can be computed by a simultaneous-message quantum protocol of cost $\mbox{polylog}(N)$ (when players share $\mbox{polylog}(N)$ EPR pairs), however, any classical interactive randomized protocol of cost at most $\tilde{o}(N^{1/4})$, has quasipolynomially small advantage over a random guess. We also give the first example of a partial Boolean function that has a quantum query algorithm of cost $\mbox{polylog}(N)$, and such that, any constant-depth circuit of quasipolynomial size has quasipolynomially small advantage over a random guess.

preprint2020arXiv

Towards Better Performance and More Explainable Uncertainty for 3D Object Detection of Autonomous Vehicles

In this paper, we propose a novel form of the loss function to increase the performance of LiDAR-based 3d object detection and obtain more explainable and convincing uncertainty for the prediction. The loss function was designed using corner transformation and uncertainty modeling. With the new loss function, the performance of our method on the val split of KITTI dataset shows up to a 15% increase in terms of Average Precision (AP) comparing with the baseline using simple L1 Loss. In the study of the characteristics of predicted uncertainties, we find that generally more accurate prediction of the bounding box is usually accompanied by lower uncertainty. The distribution of corner uncertainties agrees on the distribution of the point cloud in the bounding box, which means the corner with denser observed points has lower uncertainty. Moreover, our method also learns the constraint from the cuboid geometry of the bounding box in uncertainty prediction. Finally, we propose an efficient Bayesian updating method to recover the uncertainty for the original parameters of the bounding boxes which can help to provide probabilistic results for the planning module.

preprint2020arXiv

UrbanLoco: A Full Sensor Suite Dataset for Mapping and Localization in Urban Scenes

Mapping and localization is a critical module of autonomous driving, and significant achievements have been reached in this field. Beyond Global Navigation Satellite System (GNSS), research in point cloud registration, visual feature matching, and inertia navigation has greatly enhanced the accuracy and robustness of mapping and localization in different scenarios. However, highly urbanized scenes are still challenging: LIDAR- and camera-based methods perform poorly with numerous dynamic objects; the GNSS-based solutions experience signal loss and multipath problems; the inertia measurement units (IMU) suffer from drifting. Unfortunately, current public datasets either do not adequately address this urban challenge or do not provide enough sensor information related to mapping and localization. Here we present UrbanLoco: a mapping/localization dataset collected in highly-urbanized environments with a full sensor-suite. The dataset includes 13 trajectories collected in San Francisco and Hong Kong, covering a total length of over 40 kilometers. Our dataset includes a wide variety of urban terrains: urban canyons, bridges, tunnels, sharp turns, etc. More importantly, our dataset includes information from LIDAR, cameras, IMU, and GNSS receivers. Now the dataset is publicly available through the link in the footnote. Dataset Link: https://advdataset2019.wixsite.com/urbanloco.

preprint2019arXiv

Generic Tracking and Probabilistic Prediction Framework and Its Application in Autonomous Driving

Accurately tracking and predicting behaviors of surrounding objects are key prerequisites for intelligent systems such as autonomous vehicles to achieve safe and high-quality decision making and motion planning. However, there still remain challenges for multi-target tracking due to object number fluctuation and occlusion. To overcome these challenges, we propose a constrained mixture sequential Monte Carlo (CMSMC) method in which a mixture representation is incorporated in the estimated posterior distribution to maintain multi-modality. Multiple targets can be tracked simultaneously within a unified framework without explicit data association between observations and tracking targets. The framework can incorporate an arbitrary prediction model as the implicit proposal distribution of the CMSMC method. An example in this paper is a learning-based model for hierarchical time-series prediction, which consists of a behavior recognition module and a state evolution module. Both modules in the proposed model are generic and flexible so as to be applied to a class of time-series prediction problems where behaviors can be separated into different levels. Finally, the proposed framework is applied to a numerical case study as well as a task of on-road vehicle tracking, behavior recognition, and prediction in highway scenarios. Instead of only focusing on forecasting trajectory of a single entity, we jointly predict continuous motions for interactive entities simultaneously. The proposed approaches are evaluated from multiple aspects, which demonstrate great potential for intelligent vehicular systems and traffic surveillance systems.