Source author record

Fabian Herzog

Fabian Herzog appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV

Catalog footprint

What is connected

5works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Face Morphing: Fooling a Face Recognition System Is Simple!

State-of-the-art face recognition (FR) approaches have shown remarkable results in predicting whether two faces belong to the same identity, yielding accuracies between 92% and 100% depending on the difficulty of the protocol. However, the accuracy drops substantially when exposed to morphed faces, specifically generated to look similar to two identities. To generate morphed faces, we integrate a simple pretrained FR model into a generative adversarial network (GAN) and modify several loss functions for face morphing. In contrast to previous works, our approach and analyses are not limited to pairs of frontal faces with the same ethnicity and gender. Our qualitative and quantitative results affirm that our approach achieves a seamless change between two faces even in unconstrained scenarios. Despite using features from a simpler FR model for face morphing, we demonstrate that even recent FR systems struggle to distinguish the morphed face from both identities obtaining an accuracy of only 55-70%. Besides, we provide further insights into how knowing the FR system makes it particularly vulnerable to face morphing attacks.

preprint2022arXiv

Synthehicle: Multi-Vehicle Multi-Camera Tracking in Virtual Cities

Smart City applications such as intelligent traffic routing or accident prevention rely on computer vision methods for exact vehicle localization and tracking. Due to the scarcity of accurately labeled data, detecting and tracking vehicles in 3D from multiple cameras proves challenging to explore. We present a massive synthetic dataset for multiple vehicle tracking and segmentation in multiple overlapping and non-overlapping camera views. Unlike existing datasets, which only provide tracking ground truth for 2D bounding boxes, our dataset additionally contains perfect labels for 3D bounding boxes in camera- and world coordinates, depth estimation, and instance, semantic and panoptic segmentation. The dataset consists of 17 hours of labeled video material, recorded from 340 cameras in 64 diverse day, rain, dawn, and night scenes, making it the most extensive dataset for multi-target multi-camera tracking so far. We provide baselines for detection, vehicle re-identification, and single- and multi-camera tracking. Code and data are publicly available.

preprint2022arXiv

Towards a Deeper Understanding of Skeleton-based Gait Recognition

Gait recognition is a promising biometric with unique properties for identifying individuals from a long distance by their walking patterns. In recent years, most gait recognition methods used the person's silhouette to extract the gait features. However, silhouette images can lose fine-grained spatial information, suffer from (self) occlusion, and be challenging to obtain in real-world scenarios. Furthermore, these silhouettes also contain other visual clues that are not actual gait features and can be used for identification, but also to fool the system. Model-based methods do not suffer from these problems and are able to represent the temporal motion of body joints, which are actual gait features. The advances in human pose estimation started a new era for model-based gait recognition with skeleton-based gait recognition. In this work, we propose an approach based on Graph Convolutional Networks (GCNs) that combines higher-order inputs, and residual networks to an efficient architecture for gait recognition. Extensive experiments on the two popular gait datasets, CASIA-B and OUMVLP-Pose, show a massive improvement (3x) of the state-of-the-art (SotA) on the largest gait dataset OUMVLP-Pose and strong temporal modeling capabilities. Finally, we visualize our method to understand skeleton-based gait recognition better and to show that we model real gait features.

preprint2021arXiv

Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos

Understanding actions and gestures in video streams requires temporal reasoning of the spatial content from different time instants, i.e., spatiotemporal (ST) modeling. In this survey paper, we have made a comparative analysis of different ST modeling techniques for action and gecture recognition tasks. Since Convolutional Neural Networks (CNNs) are proved to be an effective tool as a feature extractor for static images, we apply ST modeling techniques on the features of static images from different time instants extracted by CNNs. All techniques are trained end-to-end together with a CNN feature extraction part and evaluated on two publicly available benchmarks: The Jester and the Something-Something datasets. The Jester dataset contains various dynamic and static hand gestures, whereas the Something-Something dataset contains actions of human-object interactions. The common characteristic of these two benchmarks is that the designed architectures need to capture the full temporal content of videos in order to correctly classify actions/gestures. Contrary to expectations, experimental results show that Recurrent Neural Network (RNN) based ST modeling techniques yield inferior results compared to other techniques such as fully convolutional architectures. Codes and pretrained models of this work are publicly available.

preprint2021arXiv

Lightweight Multi-Branch Network for Person Re-Identification

Person Re-Identification aims to retrieve person identities from images captured by multiple cameras or the same cameras in different time instances and locations. Because of its importance in many vision applications from surveillance to human-machine interaction, person re-identification methods need to be reliable and fast. While more and more deep architectures are proposed for increasing performance, those methods also increase overall model complexity. This paper proposes a lightweight network that combines global, part-based, and channel features in a unified multi-branch architecture that builds on the resource-efficient OSNet backbone. Using a well-founded combination of training techniques and design choices, our final model achieves state-of-the-art results on CUHK03 labeled, CUHK03 detected, and Market-1501 with 85.1% mAP / 87.2% rank1, 82.4% mAP / 84.9% rank1, and 91.5% mAP / 96.3% rank1, respectively.

Fabian Herzog

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Face Morphing: Fooling a Face Recognition System Is Simple!

Synthehicle: Multi-Vehicle Multi-Camera Tracking in Virtual Cities

Towards a Deeper Understanding of Skeleton-based Gait Recognition

Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos

Lightweight Multi-Branch Network for Person Re-Identification