Researcher profile

Klaus Dietmayer

Klaus Dietmayer contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
28works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

28 published item(s)

preprint2023arXiv

SCENE: Reasoning about Traffic Scenes using Heterogeneous Graph Neural Networks

Understanding traffic scenes requires considering heterogeneous information about dynamic agents and the static infrastructure. In this work we propose SCENE, a methodology to encode diverse traffic scenes in heterogeneous graphs and to reason about these graphs using a heterogeneous Graph Neural Network encoder and task-specific decoders. The heterogeneous graphs, whose structures are defined by an ontology, consist of different nodes with type-specific node features and different relations with type-specific edge features. In order to exploit all the information given by these graphs, we propose to use cascaded layers of graph convolution. The result is an encoding of the scene. Task-specific decoders can be applied to predict desired attributes of the scene. Extensive evaluation on two diverse binary node classification tasks show the main strength of this methodology: despite being generic, it even manages to outperform task-specific baselines. The further application of our methodology to the task of node classification in various knowledge graphs shows its transferability to other domains.

preprint2022arXiv

A Multi-Task Recurrent Neural Network for End-to-End Dynamic Occupancy Grid Mapping

A common approach for modeling the environment of an autonomous vehicle are dynamic occupancy grid maps, in which the surrounding is divided into cells, each containing the occupancy and velocity state of its location. Despite the advantage of modeling arbitrary shaped objects, the used algorithms rely on hand-designed inverse sensor models and semantic information is missing. Therefore, we introduce a multi-task recurrent neural network to predict grid maps providing occupancies, velocity estimates, semantic information and the driveable area. During training, our network architecture, which is a combination of convolutional and recurrent layers, processes sequences of raw lidar data, that is represented as bird's eye view images with several height channels. The multi-task network is trained in an end-to-end fashion to predict occupancy grid maps without the usual preprocessing steps consisting of removing ground points and applying an inverse sensor model. In our evaluations, we show that our learned inverse sensor model is able to overcome some limitations of a geometric inverse sensor model in terms of representing object shapes and modeling freespace. Moreover, we report a better runtime performance and more accurate semantic predictions for our end-to-end approach, compared to our network relying on measurement grid maps as input data.

preprint2022arXiv

A Spatio-Temporal Multilayer Perceptron for Gesture Recognition

Gesture recognition is essential for the interaction of autonomous vehicles with humans. While the current approaches focus on combining several modalities like image features, keypoints and bone vectors, we present neural network architecture that delivers state-of-the-art results only with body skeleton input data. We propose the spatio-temporal multilayer perceptron for gesture recognition in the context of autonomous vehicles. Given 3D body poses over time, we define temporal and spatial mixing operations to extract features in both domains. Additionally, the importance of each time step is re-weighted with Squeeze-and-Excitation layers. An extensive evaluation of the TCG and Drive&Act datasets is provided to showcase the promising performance of our approach. Furthermore, we deploy our model to our autonomous vehicle to show its real-time capability and stable execution.

preprint2022arXiv

CRAT-Pred: Vehicle Trajectory Prediction with Crystal Graph Convolutional Neural Networks and Multi-Head Self-Attention

Predicting the motion of surrounding vehicles is essential for autonomous vehicles, as it governs their own motion plan. Current state-of-the-art vehicle prediction models heavily rely on map information. In reality, however, this information is not always available. We therefore propose CRAT-Pred, a multi-modal and non-rasterization-based trajectory prediction model, specifically designed to effectively model social interactions between vehicles, without relying on map information. CRAT-Pred applies a graph convolution method originating from the field of material science to vehicle prediction, allowing to efficiently leverage edge features, and combines it with multi-head self-attention. Compared to other map-free approaches, the model achieves state-of-the-art performance with a significantly lower number of model parameters. In addition to that, we quantitatively show that the self-attention mechanism is able to learn social interactions between vehicles, with the weights representing a measurable interaction score. The source code is publicly available.

preprint2022arXiv

Dynamic Occupancy Grid Mapping with Recurrent Neural Networks

Modeling and understanding the environment is an essential task for autonomous driving. In addition to the detection of objects, in complex traffic scenarios the motion of other road participants is of special interest. Therefore, we propose to use a recurrent neural network to predict a dynamic occupancy grid map, which divides the vehicle surrounding in cells, each containing the occupancy probability and a velocity estimate. During training, our network is fed with sequences of measurement grid maps, which encode the lidar measurements of a single time step. Due to the combination of convolutional and recurrent layers, our approach is capable to use spatial and temporal information for the robust detection of static and dynamic environment. In order to apply our approach with measurements from a moving ego-vehicle, we propose a method for ego-motion compensation that is applicable in neural network architectures with recurrent layers working on different resolutions. In our evaluations, we compare our approach with a state-of-the-art particle-based algorithm on a large publicly available dataset to demonstrate the improved accuracy of velocity estimates and the more robust separation of the environment in static and dynamic area. Additionally, we show that our proposed method for ego-motion compensation leads to comparable results in scenarios with stationary and with moving ego-vehicle.

preprint2022arXiv

Globally Optimal Multi-Scale Monocular Hand-Eye Calibration Using Dual Quaternions

In this work, we present an approach for monocular hand-eye calibration from per-sensor ego-motion based on dual quaternions. Due to non-metrically scaled translations of monocular odometry, a scaling factor has to be estimated in addition to the rotation and translation calibration. For this, we derive a quadratically constrained quadratic program that allows a combined estimation of all extrinsic calibration parameters. Using dual quaternions leads to low run-times due to their compact representation. Our problem formulation further allows to estimate multiple scalings simultaneously for different sequences of the same sensor setup. Based on our problem formulation, we derive both, a fast local and a globally optimal solving approach. Finally, our algorithms are evaluated and compared to state-of-the-art approaches on simulated and real-world data, e.g., the EuRoC MAV dataset.

preprint2022arXiv

MEAT: Maneuver Extraction from Agent Trajectories

Advances in learning-based trajectory prediction are enabled by large-scale datasets. However, in-depth analysis of such datasets is limited. Moreover, the evaluation of prediction models is limited to metrics averaged over all samples in the dataset. We propose an automated methodology that allows to extract maneuvers (e.g., left turn, lane change) from agent trajectories in such datasets. The methodology considers information about the agent dynamics and information about the lane segments the agent traveled along. Although it is possible to use the resulting maneuvers for training classification networks, we exemplary use them for extensive trajectory dataset analysis and maneuver-specific evaluation of multiple state-of-the-art trajectory prediction models. Additionally, an analysis of the datasets and an evaluation of the prediction models based on the agent dynamics is provided.

preprint2022arXiv

Motion Estimation in Occupancy Grid Maps in Stationary Settings Using Recurrent Neural Networks

In this work, we tackle the problem of modeling the vehicle environment as dynamic occupancy grid map in complex urban scenarios using recurrent neural networks. Dynamic occupancy grid maps represent the scene in a bird's eye view, where each grid cell contains the occupancy probability and the two dimensional velocity. As input data, our approach relies on measurement grid maps, which contain occupancy probabilities, generated with lidar measurements. Given this configuration, we propose a recurrent neural network architecture to predict a dynamic occupancy grid map, i.e. filtered occupancy and velocity of each cell, by using a sequence of measurement grid maps. Our network architecture contains convolutional long-short term memories in order to sequentially process the input, makes use of spatial context, and captures motion. In the evaluation, we quantify improvements in estimating the velocity of braking and turning vehicles compared to the state-of-the-art. Additionally, we demonstrate that our approach provides more consistent velocity estimates for dynamic objects, as well as, less erroneous velocity estimates in static area.

preprint2022arXiv

MotionMixer: MLP-based 3D Human Body Pose Forecasting

In this work, we present MotionMixer, an efficient 3D human body pose forecasting model based solely on multi-layer perceptrons (MLPs). MotionMixer learns the spatial-temporal 3D body pose dependencies by sequentially mixing both modalities. Given a stacked sequence of 3D body poses, a spatial-MLP extracts fine grained spatial dependencies of the body joints. The interaction of the body joints over time is then modelled by a temporal MLP. The spatial-temporal mixed features are finally aggregated and decoded to obtain the future motion. To calibrate the influence of each time step in the pose sequence, we make use of squeeze-and-excitation (SE) blocks. We evaluate our approach on Human3.6M, AMASS, and 3DPW datasets using the standard evaluation protocols. For all evaluations, we demonstrate state-of-the-art performance, while having a model with a smaller number of parameters. Our code is available at: https://github.com/MotionMLP/MotionMixer

preprint2022arXiv

Robust 3D Object Detection in Cold Weather Conditions

Adverse weather conditions can negatively affect LiDAR-based object detectors. In this work, we focus on the phenomenon of vehicle gas exhaust condensation in cold weather conditions. This everyday effect can influence the estimation of object sizes, orientations and introduce ghost object detections, compromising the reliability of the state of the art object detectors. We propose to solve this problem by using data augmentation and a novel training loss term. To effectively train deep neural networks, a large set of labeled data is needed. In case of adverse weather conditions, this process can be extremely laborious and expensive. We address this issue in two steps: First, we present a gas exhaust data generation method based on 3D surface reconstruction and sampling which allows us to generate large sets of gas exhaust clouds from a small pool of labeled data. Second, we introduce a point cloud augmentation process that can be used to add gas exhaust to datasets recorded in good weather conditions. Finally, we formulate a new training loss term that leverages the augmented point cloud to increase object detection robustness by penalizing predictions that include noise. In contrast to other works, our method can be used with both grid-based and point-based detectors. Moreover, since our approach does not require any network architecture changes, inference times remain unchanged. Experimental results on real data show that our proposed method greatly increases robustness to gas exhaust and noisy data.

preprint2022arXiv

Self-Assessment for Single-Object Tracking in Clutter Using Subjective Logic

Reliable tracking algorithms are essential for automated driving. However, the existing consistency measures are not sufficient to meet the increasing safety demands in the automotive sector. Therefore, this work presents a novel method for self-assessment of single-object tracking in clutter based on Kalman filtering and subjective logic. A key feature of the approach is that it additionally provides a measure of the collected statistical evidence in its online reliability scores. In this way, various aspects of reliability, such as the correctness of the assumed measurement noise, detection probability, and clutter rate, can be monitored in addition to the overall assessment based on the available evidence. Here, we present a mathematical derivation of the reference distribution used in our self-assessment module for our studied problem. Moreover, we introduce a formula that describes how a threshold should be chosen for the degree of conflict, the subjective logic comparison measure used for the reliability decision making. Our approach is evaluated in a challenging simulation scenario designed to model adverse weather conditions. The simulations show that our method can significantly improve the reliability checking of single-object tracking in clutter in several aspects.

preprint2022arXiv

Situation-Aware Environment Perception for Decentralized Automation Architectures

Advances in the field of environment perception for automated agents have resulted in an ongoing increase in generated sensor data. The available computational resources to process these data are bound to become insufficient for real-time applications. Reducing the amount of data to be processed by identifying the most relevant data based on the agents' situation, often referred to as situation-awareness, has gained increasing research interest, and the importance of complementary approaches is expected to increase further in the near future. In this work, we extend the applicability range of our recently introduced concept for situation-aware environment perception to the decentralized automation architecture of the UNICARagil project. Considering the specific driving capabilities of the vehicle and using real-world data on target hardware in a post-processing manner, we provide an estimate for the daily reduction in power consumption that accumulates to 36.2%. While achieving these promising results, we additionally show the need to consider scalability in data processing in the design of software modules as well as in the design of functional systems if the benefits of situation-awareness shall be leveraged optimally.

preprint2021arXiv

DeepCLR: Correspondence-Less Architecture for Deep End-to-End Point Cloud Registration

This work addresses the problem of point cloud registration using deep neural networks. We propose an approach to predict the alignment between two point clouds with overlapping data content, but displaced origins. Such point clouds originate, for example, from consecutive measurements of a LiDAR mounted on a moving platform. The main difficulty in deep registration of raw point clouds is the fusion of template and source point cloud. Our proposed architecture applies flow embedding to tackle this problem, which generates features that describe the motion of each template point. These features are then used to predict the alignment in an end-to-end fashion without extracting explicit point correspondences between both input clouds. We rely on the KITTI odometry and ModelNet40 datasets for evaluating our method on various point distributions. Our approach achieves state-of-the-art accuracy and the lowest run-time of the compared methods.

preprint2021arXiv

Graph-based Motion Planning for Automated Vehicles using Multi-model Branching and Admissible Heuristics

Automated driving in urban scenarios requires efficient planning algorithms able to handle complex situations in real-time. A popular approach is to use graph-based planning methods in order to obtain a rough trajectory which is subsequently optimized. A key aspect is the generation of trajectories implementing comfortable and safe behavior already during graph-search while keeping computation times low. To capture this aspect, on the one hand, a branching strategy is presented in this work that leads to better performance in terms of quality of resulting trajectories and runtime. On the other hand, admissible heuristics are shown which guide the graph-search efficiently, where the solution remains optimal.

preprint2021arXiv

Motion Classification and Height Estimation of Pedestrians Using Sparse Radar Data

A complete overview of the surrounding vehicle environment is important for driver assistance systems and highly autonomous driving. Fusing results of multiple sensor types like camera, radar and lidar is crucial for increasing the robustness. The detection and classification of objects like cars, bicycles or pedestrians has been analyzed in the past for many sensor types. Beyond that, it is also helpful to refine these classes and distinguish for example between different pedestrian types or activities. This task is usually performed on camera data, though recent developments are based on radar spectrograms. However, for most automotive radar systems, it is only possible to obtain radar targets instead of the original spectrograms. This work demonstrates that it is possible to estimate the body height of walking pedestrians using 2D radar targets. Furthermore, different pedestrian motion types are classified.

preprint2021arXiv

Online Extrinsic Calibration based on Per-Sensor Ego-Motion Using Dual Quaternions

In this work, we propose an approach for extrinsic sensor calibration from per-sensor ego-motion estimates. Our problem formulation is based on dual quaternions, enabling two different online capable solving approaches. We provide a certifiable globally optimal and a fast local approach along with a method to verify the globality of the local approach. Additionally, means for integrating previous knowledge, for example, a common ground plane for planar sensor motion, are described. Our algorithms are evaluated on simulated data and on a publicly available dataset containing RGB-D camera images. Further, our online calibration approach is tested on the KITTI odometry dataset, which provides data of a lidar and two stereo camera systems mounted on a vehicle. Our evaluation confirms the short run time, state-of-the-art accuracy, as well as online capability of our approach while retaining the global optimality of the solution at any time.

preprint2020arXiv

Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

Recent advancements in perception for autonomous driving are driven by deep learning. In order to achieve robust and accurate scene understanding, autonomous vehicles are usually equipped with different sensors (e.g. cameras, LiDARs, Radars), and multiple sensing modalities can be fused to exploit their complementary properties. In this context, many methods have been proposed for deep multi-modal perception problems. However, there is no general guideline for network architecture design, and questions of "what to fuse", "when to fuse", and "how to fuse" remain open. This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving. To this end, we first provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection and semantic segmentation in autonomous driving research. We then summarize the fusion methodologies and discuss challenges and open questions. In the appendix, we provide tables that summarize topics and methods. We also provide an interactive online platform to navigate each reference: https://boschresearch.github.io/multimodalperception/.

preprint2020arXiv

Extended Existence Probability Using Digital Maps for Object Verification

A main task for automated vehicles is an accurate and robust environment perception. Especially, an error-free detection and modeling of other traffic participants is of great importance to drive safely in any situation. For this purpose, multi-object tracking algorithms, based on object detections from raw sensor measurements, are commonly used. However, false object hypotheses can occur due to a high density of different traffic participants in complex, arbitrary scenarios. For this reason, the presented approach introduces a probabilistic model to verify the existence of a tracked object. Therefore, an object verification module is introduced, where the influences of multiple digital map elements on a track's existence are evaluated. Finally, a probabilistic model fuses the various influences and estimates an extended existence probability for every track. In addition, a Bayes Net is implemented as directed graphical model to highlight this work's expandability. The presented approach, reduces the number of false positives, while retaining true positives. Real world data is used to evaluate and to highlight the benefits of the presented approach, especially in urban scenarios.

preprint2020arXiv

Inferring Spatial Uncertainty in Object Detection

The availability of real-world datasets is the prerequisite for developing object detection methods for autonomous driving. While ambiguity exists in object labels due to error-prone annotation process or sensor observation noises, current object detection datasets only provide deterministic annotations without considering their uncertainty. This precludes an in-depth evaluation among different object detection methods, especially for those that explicitly model predictive probability. In this work, we propose a generative model to estimate bounding box label uncertainties from LiDAR point clouds, and define a new representation of the probabilistic bounding box through spatial distribution. Comprehensive experiments show that the proposed model represents uncertainties commonly seen in driving scenarios. Based on the spatial distribution, we further propose an extension of IoU, called the Jaccard IoU (JIoU), as a new evaluation metric that incorporates label uncertainty. Experiments on the KITTI and the Waymo Open Datasets show that JIoU is superior to IoU when evaluating probabilistic object detectors.

preprint2020arXiv

Kalman Filter Meets Subjective Logic: A Self-Assessing Kalman Filter Using Subjective Logic

Self-assessment is a key to safety and robustness in automated driving. In order to design safer and more robust automated driving functions, the goal is to self-assess the performance of each module in a whole automated driving system. One crucial component in automated driving systems is the tracking of surrounding objects, where the Kalman filter is the most fundamental tracking algorithm. For Kalman filters, some classical online consistency measures exist for self-assessment, which are based on classical probability theory. However, these classical approaches lack the ability to measure the explicit statistical uncertainty within the self-assessment, which is an important quality measure, particularly, if only a small number of samples is available for the self-assessment. In this work, we propose a novel online self-assessment method using subjective logic, which is a modern extension of probabilistic logic that explicitly models the statistical uncertainty. Thus, by embedding classical Kalman filtering into subjective logic, our method additionally features an explicit measure for statistical uncertainty in the self-assessment.

preprint2020arXiv

Labels Are Not Perfect: Improving Probabilistic Object Detection via Label Uncertainty

Reliable uncertainty estimation is crucial for robust object detection in autonomous driving. However, previous works on probabilistic object detection either learn predictive probability for bounding box regression in an un-supervised manner, or use simple heuristics to do uncertainty regularization. This leads to unstable training or suboptimal detection performance. In this work, we leverage our previously proposed method for estimating uncertainty inherent in ground truth bounding box parameters (which we call label uncertainty) to improve the detection accuracy of a probabilistic LiDAR-based object detector. Experimental results on the KITTI dataset show that our method surpasses both the baseline model and the models based on simple heuristics by up to 3.6% in terms of Average Precision.

preprint2020arXiv

Leveraging Uncertainties for Deep Multi-modal Object Detection in Autonomous Driving

This work presents a probabilistic deep neural network that combines LiDAR point clouds and RGB camera images for robust, accurate 3D object detection. We explicitly model uncertainties in the classification and regression tasks, and leverage uncertainties to train the fusion network via a sampling mechanism. We validate our method on three datasets with challenging real-world driving scenarios. Experimental results show that the predicted uncertainties reflect complex environmental uncertainty like difficulties of a human expert to label objects. The results also show that our method consistently improves the Average Precision by up to 7% compared to the baseline method. When sensors are temporally misaligned, the sampling method improves the Average Precision by up to 20%, showing its high robustness against noisy sensor inputs.

preprint2020arXiv

Robust Semantic Segmentation in Adverse Weather Conditions by means of Fast Video-Sequence Segmentation

Computer vision tasks such as semantic segmentation perform very well in good weather conditions, but if the weather turns bad, they have problems to achieve this performance in these conditions. One possibility to obtain more robust and reliable results in adverse weather conditions is to use video-segmentation approaches instead of commonly used single-image segmentation methods. Video-segmentation approaches capture temporal information of the previous video-frames in addition to current image information, and hence, they are more robust against disturbances, especially if they occur in only a few frames of the video-sequence. However, video-segmentation approaches, which are often based on recurrent neural networks, cannot be applied in real-time applications anymore, since their recurrent structures in the network are computational expensive. For instance, the inference time of the LSTM-ICNet, in which recurrent units are placed at proper positions in the single-segmentation approach ICNet, increases up to 61 percent compared to the basic ICNet. Hence, in this work, the LSTM-ICNet is sped up by modifying the recurrent units of the network so that it becomes real-time capable again. Experiments on different datasets and various weather conditions show that the inference time can be decreased by about 23 percent by these modifications, while they achieve similar performance than the LSTM-ICNet and outperform the single-segmentation approach enormously in adverse weather conditions.

preprint2020arXiv

Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather

The fusion of multimodal sensor streams, such as camera, lidar, and radar measurements, plays a critical role in object detection for autonomous vehicles, which base their decision making on these inputs. While existing methods exploit redundant information in good environmental conditions, they fail in adverse weather where the sensory streams can be asymmetrically distorted. These rare "edge-case" scenarios are not represented in available datasets, and existing fusion architectures are not designed to handle them. To address this challenge we present a novel multimodal dataset acquired in over 10,000km of driving in northern Europe. Although this dataset is the first large multimodal dataset in adverse weather, with 100k labels for lidar, camera, radar, and gated NIR sensors, it does not facilitate training as extreme weather is rare. To this end, we present a deep fusion network for robust fusion without a large corpus of labeled training data covering all asymmetric distortions. Departing from proposal-level fusion, we propose a single-shot model that adaptively fuses features, driven by measurement entropy. We validate the proposed method, trained on clean data, on our extensive validation dataset. Code and data are available here https://github.com/princeton-computational-imaging/SeeingThroughFog.

preprint2020arXiv

Uncertainty depth estimation with gated images for 3D reconstruction

Gated imaging is an emerging sensor technology for self-driving cars that provides high-contrast images even under adverse weather influence. It has been shown that this technology can even generate high-fidelity dense depth maps with accuracy comparable to scanning LiDAR systems. In this work, we extend the recent Gated2Depth framework with aleatoric uncertainty providing an additional confidence measure for the depth estimates. This confidence can help to filter out uncertain estimations in regions without any illumination. Moreover, we show that training on dense depth maps generated by LiDAR depth completion algorithms can further improve the performance.

preprint2020arXiv

Uncertainty Estimation in One-Stage Object Detection

Environment perception is the task for intelligent vehicles on which all subsequent steps rely. A key part of perception is to safely detect other road users such as vehicles, pedestrians, and cyclists. With modern deep learning techniques huge progress was made over the last years in this field. However such deep learning based object detection models cannot predict how certain they are in their predictions, potentially hampering the performance of later steps such as tracking or sensor fusion. We present a viable approaches to estimate uncertainty in an one-stage object detector, while improving the detection performance of the baseline approach. The proposed model is evaluated on a large scale automotive pedestrian dataset. Experimental results show that the uncertainty outputted by our system is coupled with detection accuracy and the occlusion level of pedestrians.

preprint2020arXiv

Using Machine Learning to Detect Ghost Images in Automotive Radar

Radar sensors are an important part of driver assistance systems and intelligent vehicles due to their robustness against all kinds of adverse conditions, e.g., fog, snow, rain, or even direct sunlight. This robustness is achieved by a substantially larger wavelength compared to light-based sensors such as cameras or lidars. As a side effect, many surfaces act like mirrors at this wavelength, resulting in unwanted ghost detections. In this article, we present a novel approach to detect these ghost objects by applying data-driven machine learning algorithms. For this purpose, we use a large-scale automotive data set with annotated ghost objects. We show that we can use a state-of-the-art automotive radar classifier in order to detect ghost objects alongside real objects. Furthermore, we are able to reduce the amount of false positive detections caused by ghost images in some settings.

preprint2018arXiv

Environment Perception Framework Fusing Multi-Object Tracking, Dynamic Occupancy Grid Maps and Digital Maps

Autonomously driving vehicles require a complete and robust perception of the local environment. A main challenge is to perceive any other road users, where multi-object tracking or occupancy grid maps are commonly used. The presented approach combines both methods to compensate false positives and receive a complementary environment perception. Therefore, an environment perception framework is introduced that defines a common representation, extracts objects from a dynamic occupancy grid map and fuses them with tracks of a Labeled Multi-Bernoulli filter. Finally, a confidence value is developed, that validates object estimates using different constraints regarding physical possibilities, method specific characteristics and contextual information from a digital map. Experimental results with real world data highlight the robustness and significance of the presented fusing approach, utilizing the confidence value in rural and urban scenarios.