Researcher profile

Christian Herglotz

Christian Herglotz contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2023arXiv

Learning Frequency-Specific Quantization Scaling in VVC for Standard-Compliant Task-driven Image Coding

Today, visual data is often analyzed by a neural network without any human being involved, which demands for specialized codecs. For standard-compliant codec adaptations towards certain information sinks, HEVC or VVC provide the possibility of frequency-specific quantization with scaling lists. This is a well-known method for the human visual system, where scaling lists are derived from psycho-visual models. In this work, we employ scaling lists when performing VVC intra coding for neural networks as information sink. To this end, we propose a novel data-driven method to obtain optimal scaling lists for arbitrary neural networks. Experiments with Mask R-CNN as information sink reveal that coding the Cityscapes dataset with the proposed scaling lists result in peak bitrate savings of 8.9 % over VVC with constant quantization. By that, our approach also outperforms scaling lists optimized for the human visual system. The generated scaling lists can be found under https://github.com/FAU-LMS/VCM_scaling_lists.

preprint2022arXiv

A Bitstream Feature Based Model for Video Decoding Energy Estimation

In this paper we show that a small amount of bit stream features can be used to accurately estimate the energy consumption of state-of-the-art software and hardware accelerated decoder implementations for four different video codecs. By testing the estimation performance on HEVC, H.264, H.263, and VP9 we show that the proposed model can be used for any hybrid video codec. We test our approach on a high amount of different test sequences to prove the general validity. We show that less than 20 features are sufficient to obtain mean estimation errors that are smaller than 8%. Finally, an example will show the performance trade-offs in terms of rate, distortion, and decoding energy for all tested codecs.

preprint2022arXiv

A Low-Parametric Model for Bit-Rate Estimation of VVC Residual Coding

There are many tasks within video compression which require fast bit rate estimation. As an example, rate-control algorithms are only feasible because it is possible to estimate the required bit rate without needing to encode the entire block. With residual coding technology becoming more and more sophisticated, the corresponding bit rate models require more advanced features. In this work, we propose a set of four features together with a linear model, which is able to estimate the rate of arbitrary residual blocks which were compressed using the VVC standard. Our method outperforms other methods which were used for the same task both in terms of mean absolute error and mean relative error. Our model deviates by less than 4 bit on average over a large dataset of natural images.

preprint2022arXiv

A Novel Viewport-Adaptive Motion Compensation Technique for Fisheye Video

Although fisheye cameras are in high demand in many application areas due to their large field of view, many image and video signal processing tasks such as motion compensation suffer from the introduced strong radial distortions. A recently proposed projection-based approach takes the fisheye projection into account to improve fisheye motion compensation. However, the approach does not consider the large field of view of fisheye lenses that requires the consideration of different motion planes in 3D space. We propose a novel viewport-adaptive motion compensation technique that applies the motion vectors in different perspective viewports in order to realize these motion planes. Thereby, some pixels are mapped to so-called virtual image planes and require special treatment to obtain reliable mappings between the perspective viewports and the original fisheye image. While the state-of-the-art ultra wide-angle compensation is sufficiently accurate, we propose a virtual image plane compensation that leads to perfect mappings. All in all, we achieve average gains of +2.40 dB in terms of PSNR compared to the state of the art in fisheye motion compensation.

preprint2022arXiv

Analysis of Neural Image Compression Networks for Machine-to-Machine Communication

Video and image coding for machines (VCM) is an emerging field that aims to develop compression methods resulting in optimal bitstreams when the decoded frames are analyzed by a neural network. Several approaches already exist improving classic hybrid codecs for this task. However, neural compression networks (NCNs) have made an enormous progress in coding images over the last years. Thus, it is reasonable to consider such NCNs, when the information sink at the decoder side is a neural network as well. Therefore, we build-up an evaluation framework analyzing the performance of four state-of-the-art NCNs, when a Mask R-CNN is segmenting objects from the decoded image. The compression performance is measured by the weighted average precision for the Cityscapes dataset. Based on that analysis, we find that networks with leaky ReLU as non-linearity and training with SSIM as distortion criteria results in the highest coding gains for the VCM task. Furthermore, it is shown that the GAN-based NCN architecture achieves the best coding performance and even out-performs the recently standardized Versatile Video Coding (VVC) for the given scenario.

preprint2022arXiv

Decoding-Energy-Rate-Distortion Optimization for Video Coding

This paper presents a method for generating coded video bit streams requiring less decoding energy than conventionally coded bit streams. To this end, we propose extending the standard rate-distortion optimization approach to also consider the decoding energy. In the encoder, the decoding energy is estimated during runtime using a feature-based energy model. These energy estimates are then used to calculate decoding-energy-rate-distortion costs that are minimized by the encoder. This ultimately leads to optimal trade-offs between these three parameters. Therefore, we introduce the mathematical theory for describing decoding-energy-rate-distortion optimization and the proposed encoder algorithm is explained in detail. For rate-energy control, a new encoder parameter is introduced. Finally, measurements of the software decoding process for HEVC-coded bit streams are performed. Results show that this approach can lead to up to 30% of decoding energy reduction at a constant visual objective quality when accepting a bitrate increase at the same order of magnitude.

preprint2022arXiv

Estimating the HEVC Decoding Energy Using the Decoder Processing Time

This paper presents a method to accurately estimate the required decoding energy for a given HEVC software decoding solution. We show that the decoder's processing time as returned by common C++ and UNIX functions is a highly suitable parameter to obtain valid estimations for the actual decoding energy. We verify this hypothesis by performing an exhaustive measurement series using different decoder setups and video bit streams. Our findings can be used by developers and researchers in the search for new energy saving video compression algorithms.

preprint2022arXiv

Estimation of Non-Functional Properties for Embedded Hardware with Application to Image Processing

In recent years, due to a higher demand for portable devices, which provide restricted amounts of processing capacity and battery power, the need for energy and time efficient hard- and software solutions has increased. Preliminary estimations of time and energy consumption can thus be valuable to improve implementations and design decisions. To this end, this paper presents a method to estimate the time and energy consumption of a given software solution, without having to rely on the use of a traditional Cycle Accurate Simulator (CAS). Instead, we propose to utilize a combination of high-level functional simulation with a mechanistic extension to include non-functional properties: Instruction counts from virtual execution are multiplied with corresponding specific energies and times. By evaluating two common image processing algorithms on an FPGA-based CPU, where a mean relative estimation error of 3% is achieved for cacheless systems, we show that this estimation tool can be a valuable aid in the development of embedded processor architectures. The tool allows the developer to reach well-suited design decisions regarding the optimal processor hardware configuration for a given algorithm at an early stage in the design process.

preprint2022arXiv

Joint Optimization of Rate, Distortion, and Decoding Energy for HEVC Intraframe Coding

This paper presents a novel algorithm that aims at minimizing the required decoding energy by exploiting a general energy model for HEVC-decoder solutions. We incorporate the energy model into the HEVC encoder such that it is capable of constructing a bit stream whose decoding process consumes less energy than the decoding process of a conventional bit stream. To achieve this, we propose to extend the traditional Rate-Distortion-Optimization scheme to a Decoding-Energy-Rate-Distortion approach. To obtain fast encoding decisions in the optimization process, we derive a fixed relation between the quantization parameter and the Lagrange multiplier for energy optimization. Our experiments show that this concept is applicable for intraframe-coded videos and that for local playback as well as online streaming scenarios, up to 15% of the decoding energy can be saved at the expense of a bitrate increase of approximately the same magnitude.

preprint2022arXiv

Modeling the Energy Consumption of HEVC Intra Decoding

Battery life is one of the major limitations to mobile device use, which makes research on energy efficient soft- and hardware an important task. This paper investigates the energy required by a CPU when decoding compressed bitstream videos on mobile platforms. A model is derived that describes the energy consumption of the new HEVC decoder for intra coded videos. We show that the relative estimation error of the model is smaller than 3.2% and that the model can be used to build encoders aiming at minimizing decoding energy.

preprint2022arXiv

Modeling the Energy Consumption of the HEVC Decoding Process

In this paper, we present a bit stream feature based energy model that accurately estimates the energy required to decode a given HEVC-coded bit stream. Therefore, we take a model from literature and extend it by explicitly modeling the inloop filters, which was not done before. Furthermore, to prove its superior estimation performance, it is compared to seven different energy models from literature. By using a unified evaluation framework we show how accurately the required decoding energy for different decoding systems can be approximated. We give thorough explanations on the model parameters and explain how the model variables are derived. To show the modeling capabilities in general, we test the estimation performance for different decoding software and hardware solutions, where we find that the proposed model outperforms the models from literature by reaching frame-wise mean estimation errors of less than 7% for software and less than 15% for hardware based systems.

preprint2022arXiv

Multi-Objective Design Space Exploration for the Optimization of the HEVC Mode Decision Process

Finding the best possible encoding decisions for compressing a video sequence is a highly complex problem. In this work, we propose a multi-objective Design Space Exploration (DSE) method to automatically find HEVC encoder implementations that are optimized for several different criteria. The DSE shall optimize the coding mode evaluation order of the mode decision process and jointly explore early skip conditions to minimize the four objectives a) bitrate, b) distortion, c) encoding time, and d) decoding energy. In this context, we use a SystemC-based actor model of the HM test model encoder for the evaluation of each explored solution. The evaluation that is based on real measurements shows that our framework can automatically generate encoder solutions that save more than 60% of encoding time or 3% of decoding energy when accepting bitrate increases of around 3%.

preprint2022arXiv

On Intra Video Coding and In-loop Filtering for Neural Object Detection Networks

Classical video coding for satisfying humans as the final user is a widely investigated field of studies for visual content, and common video codecs are all optimized for the human visual system (HVS). But are the assumptions and optimizations also valid when the compressed video stream is analyzed by a machine? To answer this question, we compared the performance of two state-of-the-art neural detection networks when being fed with deteriorated input images coded with HEVC and VVC in an autonomous driving scenario using intra coding. Additionally, the impact of the three VVC in-loop filters when coding images for a neural network is examined. The results are compared using the mean average precision metric to evaluate the object detection performance for the compressed inputs. Throughout these tests, we found that the Bjøntegaard Delta Rate savings with respect to PSNR of 22.2 % using VVC instead of HEVC cannot be reached when coding for object detection networks with only 13.6% in the best case. Besides, it is shown that disabling the VVC in-loop filters SAO and ALF results in bitrate savings of 6.4 % compared to the standard VTM at the same mean average precision.

preprint2022arXiv

Rate-Distortion Optimal Transform Coefficient Selection for Unoccupied Regions in Video-Based Point Cloud Compression

This paper presents a novel method to determine rate-distortion optimized transform coefficients for efficient compression of videos generated from point clouds. The method exploits a generalized frequency selective extrapolation approach that iteratively determines rate-distortion-optimized coefficients for all basis functions of two-dimensional discrete cosine and sine transforms. The method is applied to blocks containing both occupied and unoccupied pixels in video based point cloud compression for HEVC encoding. In the proposed algorithm, only the values of the transform coefficients are changed such that resulting bit streams are compliant to the V-PCC standard. For all-intra coded point clouds, bitrate savings of more than 4% for geometry and more than 6% for texture error metrics with respect to standard encoding can be observed. These savings are more than twice as high as savings obtained using competing methods from literature. In the randomaccess case, our proposed method outperforms competing V-PCC methods by more than 0.5%.

preprint2022arXiv

Robust Deep Neural Object Detection and Segmentation for Automotive Driving Scenario with Compressed Image Data

Deep neural object detection or segmentation networks are commonly trained with pristine, uncompressed data. However, in practical applications the input images are usually deteriorated by compression that is applied to efficiently transmit the data. Thus, we propose to add deteriorated images to the training process in order to increase the robustness of the two state-of-the-art networks Faster and Mask R-CNN. Throughout our paper, we investigate an autonomous driving scenario by evaluating the newly trained models on the Cityscapes dataset that has been compressed with the upcoming video coding standard Versatile Video Coding (VVC). When employing the models that have been trained with the proposed method, the weighted average precision of the R-CNNs can be increased by up to 3.68 percentage points for compressed input images, which corresponds to bitrate savings of nearly 48 %.

preprint2022arXiv

Saliency-Driven Versatile Video Coding for Neural Object Detection

Saliency-driven image and video coding for humans has gained importance in the recent past. In this paper, we propose such a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile Video Coding (VVC). To determine the salient regions before encoding, we employ the real-time-capable object detection network You Only Look Once~(YOLO) in combination with a novel decision criterion. To measure the coding quality for a machine, the state-of-the-art object segmentation network Mask R-CNN was applied to the decoded frame. From extensive simulations we find that, compared to the reference VVC with a constant quality, up to 29 % of bitrate can be saved with the same detection accuracy at the decoder side by applying the proposed saliency-driven framework. Besides, we compare YOLO against other, more traditional saliency detection methods.

preprint2022arXiv

Video Coding for Machines with Feature-Based Rate-Distortion Optimization

Common state-of-the-art video codecs are optimized to deliver a low bitrate by providing a certain quality for the final human observer, which is achieved by rate-distortion optimization (RDO). But, with the steady improvement of neural networks solving computer vision tasks, more and more multimedia data is not observed by humans anymore, but directly analyzed by neural networks. In this paper, we propose a standard-compliant feature-based RDO (FRDO) that is designed to increase the coding performance, when the decoded frame is analyzed by a neural network in a video coding for machine scenario. To that extent, we replace the pixel-based distortion metrics in conventional RDO of VTM-8.0 with distortion metrics calculated in the feature space created by the first layers of a neural network. Throughout several tests with the segmentation network Mask R-CNN and single images from the Cityscapes dataset, we compare the proposed FRDO and its hybrid version HFRDO with different distortion measures in the feature space against the conventional RDO. With HFRDO, up to 5.49 % bitrate can be saved compared to the VTM-8.0 implementation in terms of Bjøntegaard Delta Rate and using the weighted average precision as quality metric. Additionally, allowing the encoder to vary the quantization parameter results in coding gains for the proposed HFRDO of up 9.95 % compared to conventional VTM.