Researcher profile

André Kaup

André Kaup contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
70works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

70 published item(s)

preprint2026arXiv

TAFA-GSGC: Group-wise Scalable Point Cloud Geometry Compression with Progressive Residual Refinement

Scalable compression is essential for bandwidth-adaptive transmission, yet most learned codecs are optimized for a fixed rate-distortion point, making rate adaptation costly due to re-encoding or maintaining multiple bitstreams. In this work, we propose TAFA-GSGC, a scalable learned point cloud geometry codec that enables multi-quality decoding from a single bitstream and a single trained model. TAFA-GSGC combines layered residual refinement with channel-group entropy coding, and introduces a Target-Aligned Feature Aggregation module to reduce cross-layer redundancy in enhancement residuals. Our framework supports up to 9 decodable quality levels with monotonic quality improvement as more subbitstreams are received, while maintaining strong compression efficiency. Compared with the PCGCv2 baseline, TAFA-GSGC demonstrates improved RD performance, achieving average BD-rate reductions of 4.99% and 5.92% in terms of D1-PSNR and D2-PSNR, respectively.

preprint2023arXiv

Analysis of displacement compensation methods for wavelet lifting of medical 3-D thorax CT volume data

A huge advantage of the wavelet transform in image and video compression is its scalability. Wavelet-based coding of medical computed tomography (CT) data becomes more and more popular. While much effort has been spent on encoding of the wavelet coefficients, the extension of the transform by a compensation method as in video coding has not gained much attention so far. We will analyze two compensation methods for medical CT data and compare the characteristics of the displacement compensated wavelet transform with video data. We will show that for thorax CT data the transform coding gain can be improved by a factor of 2 and the quality of the lowpass band can be improved by 8 dB in terms of PSNR compared to the original transform without compensation.

preprint2023arXiv

Centroid adapted frequency selective extrapolation for reconstruction of lost image areas

Lost image areas with different size and arbitrary shape can occur in many scenarios such as error-prone communication, depth-based image rendering or motion compensated wavelet lifting. The goal of image reconstruction is to restore these lost image areas as close to the original as possible. Frequency selective extrapolation is a block-based method for efficiently reconstructing lost areas in images. So far, the actual shape of the lost area is not considered directly. We propose a centroid adaption to enhance the existing frequency selective extrapolation algorithm that takes the shape of lost areas into account. To enlarge the test set for evaluation we further propose a method to generate arbitrarily shaped lost areas. On our large test set, we obtain an average reconstruction gain of 1.29 dB.

preprint2023arXiv

Efficient lossless coding of highpass bands from block-based motion compensated wavelet lifting using JPEG 2000

Lossless image coding is a crucial task especially in the medical area, e.g., for volumes from Computed Tomography or Magnetic Resonance Tomography. Besides lossless coding, compensated wavelet lifting offers a scalable representation of such huge volumes. While compensation methods increase the details in the lowpass band, they also vary the characteristics of the wavelet coefficients, so an adaption of the coefficient coder should be considered. We propose a simple invertible extension for JPEG 2000 that can reduce the filesize for lossless coding of the highpass band by 0.8% on average with peak rate saving of 1.1%.

preprint2023arXiv

Graph-based compensated wavelet lifting for 3-D+t medical CT data

An efficient scalable data representation is an important task especially in the medical area, e.g. for volumes from Computed Tomography (CT) or Magnetic Resonance Tomography (MRT), when a downscaled version of the original signal is needed. Image and video coders based on wavelet transforms provide an adequate way to naturally achieve scalability. This paper presents a new approach for improving the visual quality of the lowpass band by using a novel graph-based method for motion compensation, which is an important step considering data compression. We compare different kinds of neighborhoods for graph construction and demonstrate that a higher amount of referenced nodes increases the quality of the lowpass band while the mean energy of the highpass band decreases. We show that for cardiac CT data the proposed method outperforms a traditional mesh-based approach of motion compensation by approximately 11 dB in terms of PSNR of the lowpass band. Also the mean energy of the highpass band decreases by around 30%.

preprint2023arXiv

Improving mesh-based motion compensation by using edge adaptive graph-based compensated wavelet lifting for medical data sets

Medical applications like Computed Tomography (CT) or Magnetic Resonance Tomography (MRT) often require an efficient scalable representation of their huge output volumes in the further processing chain of medical routine. A downscaled version of such a signal can be obtained by using image and video coders based on wavelet transforms. The visual quality of the resulting lowpass band, which shall be used as a representative, can be improved by applying motion compensation methods during the transform. This paper presents a new approach of using the distorted edge lengths of a mesh-based compensated grid instead of the approximated intensity values of the underlying frame to perform a motion compensation. We will show that an edge adaptive graph-based compensation and its usage for compensated wavelet lifting improves the visual quality of the lowpass band by approximately 2.5 dB compared to the traditional mesh-based compensation, while the additional filesize required for coding the motion information doesn't change.

preprint2023arXiv

Learning Frequency-Specific Quantization Scaling in VVC for Standard-Compliant Task-driven Image Coding

Today, visual data is often analyzed by a neural network without any human being involved, which demands for specialized codecs. For standard-compliant codec adaptations towards certain information sinks, HEVC or VVC provide the possibility of frequency-specific quantization with scaling lists. This is a well-known method for the human visual system, where scaling lists are derived from psycho-visual models. In this work, we employ scaling lists when performing VVC intra coding for neural networks as information sink. To this end, we propose a novel data-driven method to obtain optimal scaling lists for arbitrary neural networks. Experiments with Mask R-CNN as information sink reveal that coding the Cityscapes dataset with the proposed scaling lists result in peak bitrate savings of 8.9 % over VVC with constant quantization. By that, our approach also outperforms scaling lists optimized for the human visual system. The generated scaling lists can be found under https://github.com/FAU-LMS/VCM_scaling_lists.

preprint2023arXiv

On the influence of clipping in lossless predictive and wavelet coding of noisy images

Especially in lossless image coding the obtainable compression ratio strongly depends on the amount of noise included in the data as all noise has to be coded, too. Different approaches exist for lossless image coding. We analyze the compression performance of three kinds of approaches, namely direct entropy, predictive and wavelet-based coding. The results from our theoretical model are compared to simulated results from standard algorithms that base on the three approaches. As long as no clipping occurs with increasing noise more bits are needed for lossless compression. We will show that for very noisy signals it is more advantageous to directly use an entropy coder without advanced preprocessing steps.

preprint2023arXiv

Optimal Filter Selection for Multispectral Object Classification Using Fast Binary Search

When designing multispectral imaging systems for classifying different spectra it is necessary to choose a small number of filters from a set with several hundred different ones. Tackling this problem by full search leads to a tremendous number of possibilities to check and is NP-hard. In this paper we introduce a novel fast binary search for optimal filter selection that guarantees a minimum distance metric between the different spectra to classify. In our experiments, this procedure reaches the same optimal solution as with full search at much lower complexity. The desired number of filters influences the full search in factorial order while the fast binary search stays constant. Thus, fast binary search allows to find the optimal solution of all combinations in an adequate amount of time and avoids prevailing heuristics. Moreover, our fast binary search algorithm outperforms other filter selection techniques in terms of misclassified spectra in a real-world classification problem.

preprint2022arXiv

A Bitstream Feature Based Model for Video Decoding Energy Estimation

In this paper we show that a small amount of bit stream features can be used to accurately estimate the energy consumption of state-of-the-art software and hardware accelerated decoder implementations for four different video codecs. By testing the estimation performance on HEVC, H.264, H.263, and VP9 we show that the proposed model can be used for any hybrid video codec. We test our approach on a high amount of different test sequences to prove the general validity. We show that less than 20 features are sufficient to obtain mean estimation errors that are smaller than 8%. Finally, an example will show the performance trade-offs in terms of rate, distortion, and decoding energy for all tested codecs.

preprint2022arXiv

A Fast Algorithm for Selective Signal Extrapolation with Arbitrary Basis Functions

Signal extrapolation is an important task in digital signal processing for extending known signals into unknown areas. The Selective Extrapolation is a very effective algorithm to achieve this. Thereby, the extrapolation is obtained by generating a model of the signal to be extrapolated as weighted superposition of basis functions. Unfortunately, this algorithm is computationally very expensive and, up to now, efficient implementations exist only for basis function sets that emanate from discrete transforms. Within the scope of this contribution, a novel efficient solution for Selective Extrapolation is presented for utilization with arbitrary basis functions. The proposed algorithm mathematically behaves identically to the original Selective Extrapolation, but is several decades faster. Furthermore, it is able to outperform existent fast transform domain algorithms which are limited to basis function sets that belong to the corresponding transform. With that, the novel algorithm allows for an efficient use of arbitrary basis functions, even if they are only numerically defined.

preprint2022arXiv

A Low-Parametric Model for Bit-Rate Estimation of VVC Residual Coding

There are many tasks within video compression which require fast bit rate estimation. As an example, rate-control algorithms are only feasible because it is possible to estimate the required bit rate without needing to encode the entire block. With residual coding technology becoming more and more sophisticated, the corresponding bit rate models require more advanced features. In this work, we propose a set of four features together with a linear model, which is able to estimate the rate of arbitrary residual blocks which were compressed using the VVC standard. Our method outperforms other methods which were used for the same task both in terms of mean absolute error and mean relative error. Our model deviates by less than 4 bit on average over a large dataset of natural images.

preprint2022arXiv

A Novel End-To-End Network for Reconstruction of Non-Regularly Sampled Image Data Using Locally Fully Connected Layers

Quarter sampling and three-quarter sampling are novel sensor concepts that enable the acquisition of higher resolution images without increasing the number of pixels. This is achieved by non-regularly covering parts of each pixel of a low-resolution sensor such that only one quadrant or three quadrants of the sensor area of each pixel is sensitive to light. Combining a properly designed mask and a high-quality reconstruction algorithm, a higher image quality can be achieved than using a low-resolution sensor and subsequent upsampling. For the latter case, the image quality can be further enhanced using super resolution algorithms such as the very deep super resolution network (VDSR). In this paper, we propose a novel end-to-end neural network to reconstruct high resolution images from non-regularly sampled sensor data. The network is a concatenation of a locally fully connected reconstruction network (LFCR) and a standard VDSR network. Altogether, using a three-quarter sampling sensor with our novel neural network layout, the image quality in terms of PSNR for the Urban100 dataset can be increased by 2.96 dB compared to the state-of-the-art approach. Compared to a low-resolution sensor with VDSR, a gain of 1.11 dB is achieved.

preprint2022arXiv

A Novel Viewport-Adaptive Motion Compensation Technique for Fisheye Video

Although fisheye cameras are in high demand in many application areas due to their large field of view, many image and video signal processing tasks such as motion compensation suffer from the introduced strong radial distortions. A recently proposed projection-based approach takes the fisheye projection into account to improve fisheye motion compensation. However, the approach does not consider the large field of view of fisheye lenses that requires the consideration of different motion planes in 3D space. We propose a novel viewport-adaptive motion compensation technique that applies the motion vectors in different perspective viewports in order to realize these motion planes. Thereby, some pixels are mapped to so-called virtual image planes and require special treatment to obtain reliable mappings between the perspective viewports and the original fisheye image. While the state-of-the-art ultra wide-angle compensation is sufficiently accurate, we propose a virtual image plane compensation that leads to perfect mappings. All in all, we achieve average gains of +2.40 dB in terms of PSNR compared to the state of the art in fisheye motion compensation.

preprint2022arXiv

Adaptive frequency prior for frequency selective reconstruction of images from non-regular subsampling

Image signals typically are defined on a rectangular two-dimensional grid. However, there exist scenarios where this is not fulfilled and where the image information only is available for a non-regular subset of pixel position. For processing, transmitting or displaying such an image signal, a re-sampling to a regular grid is required. Recently, Frequency Selective Reconstruction (FSR) has been proposed as a very effective sparsity-based algorithm for solving this under-determined problem. For this, FSR iteratively generates a model of the signal in the Fourier-domain. In this context, a fixed frequency prior inspired by the optical transfer function is used for favoring low-frequency content. However, this fixed prior is often too strict and may lead to a reduced reconstruction quality. To resolve this weakness, this paper proposes an adaptive frequency prior which takes the local density of the available samples into account. The proposed adaptive prior allows for a very high reconstruction quality, yielding gains of up to 0.6 dB PSNR over the fixed prior, independently of the density of the available samples. Compared to other state-of-the-art algorithms, visually noticeable gains of several dB are possible.

preprint2022arXiv

Adaptive joint spatio-temporal error concealment for video communication

In the past years, video communication has found its application in an increasing number of environments. Unfortunately, some of them are error-prone and the risk of block losses caused by transmission errors is ubiquitous. To reduce the effects of these block losses, a new spatio-temporal error concealment algorithm is presented. The algorithm uses spatial as well as temporal information for extrapolating the signal into the lost areas. The extrapolation is carried out in two steps, first a preliminary temporal extrapolation is performed which then is used to generate a model of the original signal, using the spatial neighborhood of the lost block. By applying the spatial refinement a significantly higher concealment quality can be achieved resulting in a gain of up to 5.2 dB in PSNR compared to the unrefined underlying pure temporal extrapolation.

preprint2022arXiv

Analysis of Neural Image Compression Networks for Machine-to-Machine Communication

Video and image coding for machines (VCM) is an emerging field that aims to develop compression methods resulting in optimal bitstreams when the decoded frames are analyzed by a neural network. Several approaches already exist improving classic hybrid codecs for this task. However, neural compression networks (NCNs) have made an enormous progress in coding images over the last years. Thus, it is reasonable to consider such NCNs, when the information sink at the decoder side is a neural network as well. Therefore, we build-up an evaluation framework analyzing the performance of four state-of-the-art NCNs, when a Mask R-CNN is segmenting objects from the decoded image. The compression performance is measured by the weighted average precision for the Cityscapes dataset. Based on that analysis, we find that networks with leaky ReLU as non-linearity and training with SSIM as distortion criteria results in the highest coding gains for the VCM task. Furthermore, it is shown that the GAN-based NCN architecture achieves the best coding performance and even out-performs the recently standardized Versatile Video Coding (VVC) for the given scenario.

preprint2022arXiv

Automatic Registration of Images with Inconsistent Content Through Line-Support Region Segmentation and Geometrical Outlier Removal

The implementation of automatic image registration is still difficult in various applications. In this paper, an automatic image registration approach through line-support region segmentation and geometrical outlier removal (ALRS-GOR) is proposed. This new approach is designed to address the problems associated with the registration of images with affine deformations and inconsistent content, such as remote sensing images with different spectral content or noise interference, or map images with inconsistent annotations. To begin with, line-support regions, namely a straight region whose points share roughly the same image gradient angle, are extracted to address the issues of inconsistent content existing in images. To alleviate the incompleteness of line segments, an iterative strategy with multi-resolution is employed to preserve global structures that are masked at full resolution by image details or noise. Then, Geometrical Outlier Removal (GOR) is developed to provide reliable feature point matching, which is based on affineinvariant geometrical classifications for corresponding matches initialized by SIFT. The candidate outliers are selected by comparing the disparity of accumulated classifications among all matches, instead of conventional methods which only rely on local geometrical relations. Various image sets have been considered in this paper for the evaluation of the proposed approach, including aerial images with simulated affine deformations, remote sensing optical and synthetic aperture radar images taken at different situations (multispectral, multisensor, and multitemporal), and map images with inconsistent annotations. Experimental results demonstrate the superior performance of the proposed method over the existing approaches for the whole data set.

preprint2022arXiv

Complex-Valued Frequency Selective Extrapolation for Fast Image and Video Signal Extrapolation

Signal extrapolation tasks arise in miscellaneous manners in the field of image and video signal processing. But, due to the widespread use of low-power and mobile devices, the computational complexity of an algorithm plays a crucial role in selecting an algorithm for a given problem. Within the scope of this contribution, we introduce the complex-valued Frequency Selective Extrapolation for fast image and video signal extrapolation. This algorithm iteratively generates a generic complex-valued model of the signal to be extrapolated as weighted superposition of Fourier basis functions. We further show that this algorithm is up to 10 times faster than the existent real-valued Frequency Selective Extrapolation that takes the real-valued nature of the input signals into account during the model generation. At the same time, the quality which is achievable by the complex-valued model generation is similar to the quality of the real-valued model generation.

preprint2022arXiv

Content-Adaptive Motion Compensated Frequency Selective Extrapolation for Error Concealment in Video Communication

If digital video data is transmitted over unreliable channels such as the internet or wireless terminals, the risk of severe image distortion due to transmission errors is ubiquitous. To cope with this, error concealment can be applied on the distorted data at the receiver. In this contribution we propose a novel spatio-temporal error concealment algorithm, the Content-Adaptive Motion Compensated Frequency Selective Extrapolation. The algorithm operates in two stages, whereas at first the motion in a distorted sequence is estimated. After that, a model of the signal is generated for concealing the distortion. The novel algorithm is based on an already existent error concealment algorithm. But by adapting the model generation to the content of a sequence, the novel algorithm is able to exploit the remaining information, which is still available in the distorted sequence, more effectively compared to the original algorithm. In doing so, a visually noticeable gain of up to 0.51 dB PSNR compared to the underlying algorithm and more than 3 dB compared to other error concealment algorithms can be achieved.

preprint2022arXiv

Declipping of Speech Signals Using Frequency Selective Extrapolation

The reconstruction of clipped speech signals is an important task in audio signal processing to achieve an enhanced audio quality for further processing. In this paper, Frequency Selective Extrapolation (FSE), which is commonly used for error concealment or the reconstruction of incomplete image data, is adapted to be able to restore audio signals which are distorted from clipping. For this, FSE generates a model of the signal as an iterative superposition of Fourier basis functions. Clipped samples can then be replaced by estimated samples from the model. The performance of the proposed algorithm is evaluated by using different speech test data sets. Compared to other state-of-the-art declipping algorithms, this leads to a maximum gain in SNR of up to 3:5 dB and an average gain of 1 dB.

preprint2022arXiv

Decoding-Energy-Rate-Distortion Optimization for Video Coding

This paper presents a method for generating coded video bit streams requiring less decoding energy than conventionally coded bit streams. To this end, we propose extending the standard rate-distortion optimization approach to also consider the decoding energy. In the encoder, the decoding energy is estimated during runtime using a feature-based energy model. These energy estimates are then used to calculate decoding-energy-rate-distortion costs that are minimized by the encoder. This ultimately leads to optimal trade-offs between these three parameters. Therefore, we introduce the mathematical theory for describing decoding-energy-rate-distortion optimization and the proposed encoder algorithm is explained in detail. For rate-energy control, a new encoder parameter is introduced. Finally, measurements of the software decoding process for HEVC-coded bit streams are performed. Results show that this approach can lead to up to 30% of decoding energy reduction at a constant visual objective quality when accepting a bitrate increase at the same order of magnitude.

preprint2022arXiv

Denoising-based image reconstruction from pixels located at non-integer positions

Digital images are commonly represented as regular 2D arrays, so pixels are organized in form of a matrix addressed by integers. However, there are many image processing operations, such as rotation or motion compensation, that produce pixels at non-integer positions. Typically, image reconstruction techniques cannot handle samples at non-integer positions. In this paper, we propose to use triangulation-based reconstruction as initial estimate that is later refined by a novel adaptive denoising framework. Simulations reveal that improvements of up to more than 1.8 dB (in terms of PSNR) are achieved with respect to the initial estimate.

preprint2022arXiv

Design Techniques for Incremental Non-Regular Image Sampling Patterns

Even though image signals are typically acquired on a regular two dimensional grid, there exist many scenarios where non-regular sampling is possible. Non-regular sampling can remove aliasing. In terms of the non-regular sampling patterns, there is a high degree of freedom in how to actually arrange the sampling positions. In literature, random patterns show higher reconstruction quality compared to regular patterns due to reduced aliasing effects. On the downside, random patterns feature large void areas which is also disadvantageous. In the scope of this work, we present two techniques to design optimized non-regular image sampling patterns for arbitrary sampling densities. Both techniques create incremental sampling patterns, i.e., one pixel position is added in each step until the desired sampling density is reached. Our proposed patterns increase the reconstruction quality by more than +0.5 dB in PSNR for a broad density range. Visual comparisons are provided.

preprint2022arXiv

Distributed Parallel Image Signal Extrapolation Framework using Message Passing Interface

This paper introduces a framework for distributed parallel image signal extrapolation. Since high-quality image signal processing often comes along with a high computational complexity, a parallel execution is desirable. The proposed framework allows for the application of existing image signal extrapolation algorithms without the need to modify them for a parallel processing. The unaltered application of existing algorithms is achieved by dividing input images into overlapping tiles which are distributed to compute nodes via Message Passing Interface. In order to keep the computational overhead low, a novel image tiling algorithm is proposed. Using this algorithm, a nearly optimum tiling is possible at a very small processing time. For showing the efficacy of the framework, it is used for parallelizing a high-complexity extrapolation algorithm. Simulation results show that the proposed framework has no negative impact on extrapolation quality while at the same time offering good scaling behavior on compute clusters.

preprint2022arXiv

Dynamic Non-Regular Sampling Sensor Using Frequency Selective Reconstruction

Both a high spatial and a high temporal resolution of images and videos are desirable in many applications such as entertainment systems, monitoring manufacturing processes, or video surveillance. Due to the limited throughput of pixels per second, however, there is always a trade-off between acquiring sequences with a high spatial resolution at a low temporal resolution or vice versa. In this paper, a modified sensor concept is proposed which is able to acquire both a high spatial and a high temporal resolution. This is achieved by dynamically reading out only a subset of pixels in a non-regular order to obtain a high temporal resolution. A full high spatial resolution is then obtained by performing a subsequent three-dimensional reconstruction of the partially acquired frames. The main benefit of the proposed dynamic readout is that for each frame, different sampling points are available which is advantageous since this information can significantly enhance the reconstruction quality of the proposed reconstruction algorithm. Using the proposed dynamic readout strategy, gains in PSNR of up to 8.55 dB are achieved compared to a static readout strategy. Compared to other state-of-the-art techniques like frame rate up-conversion or super-resolution which are also able to reconstruct sequences with both a high spatial and a high temporal resolution, average gains in PSNR of up to 6.58 dB are possible.

preprint2022arXiv

Enhanced Image Reconstruction From Quarter Sampling Measurements Using An Adapted Very Deep Super Resolution Network

Quarter sampling is a novel sensor concept that enables the acquisition of higher resolution images without increasing the number of pixels. This is achieved by covering three quarters of each pixel of a low-resolution sensor such that only one quadrant of the sensor area of each pixel is sensitive to light. By randomly masking different parts, effectively a non-regular sampling of a higher resolution image is performed. Combining a properly designed mask and a high-quality reconstruction algorithm, a higher image quality can be achieved than using a low-resolution sensor and subsequent upsampling. For the latter case, the image quality can be enhanced using super resolution algorithms. Recently, algorithms based on machine learning such as the Very Deep Super Resolution network (VDSR) proofed to be successful for this task. In this work, we transfer the concepts of VDSR to the special case of quarter sampling. Besides adapting the network layout to take advantage of the case of quarter sampling, we introduce a novel data augmentation technique enabled by quarter sampling. Altogether, using the quarter sampling sensor, the image quality in terms of PSNR can be increased by +0.67 dB for the Urban 100 dataset compared to using a low-resolution sensor with VDSR.

preprint2022arXiv

Estimating the HEVC Decoding Energy Using the Decoder Processing Time

This paper presents a method to accurately estimate the required decoding energy for a given HEVC software decoding solution. We show that the decoder's processing time as returned by common C++ and UNIX functions is a highly suitable parameter to obtain valid estimations for the actual decoding energy. We verify this hypothesis by performing an exhaustive measurement series using different decoder setups and video bit streams. Our findings can be used by developers and researchers in the search for new energy saving video compression algorithms.

preprint2022arXiv

Estimation of Non-Functional Properties for Embedded Hardware with Application to Image Processing

In recent years, due to a higher demand for portable devices, which provide restricted amounts of processing capacity and battery power, the need for energy and time efficient hard- and software solutions has increased. Preliminary estimations of time and energy consumption can thus be valuable to improve implementations and design decisions. To this end, this paper presents a method to estimate the time and energy consumption of a given software solution, without having to rely on the use of a traditional Cycle Accurate Simulator (CAS). Instead, we propose to utilize a combination of high-level functional simulation with a mechanistic extension to include non-functional properties: Instruction counts from virtual execution are multiplied with corresponding specific energies and times. By evaluating two common image processing algorithms on an FPGA-based CPU, where a mean relative estimation error of 3% is achieved for cacheless systems, we show that this estimation tool can be a valuable aid in the development of embedded processor architectures. The tool allows the developer to reach well-suited design decisions regarding the optimal processor hardware configuration for a given algorithm at an early stage in the design process.

preprint2022arXiv

Evaluation of Video Coding for Machines without Ground Truth

In the emerging field of video coding for machines, video datasets with pristine video quality and high-quality annotations are required for a comprehensive evaluation. However, existing video datasets with detailed annotations are severely limited in size and video quality. Thus, current methods have to either evaluate their codecs on still images or on already compressed data. To mitigate this problem, we propose an evaluation method based on pseudo ground-truth data from the field of semantic segmentation to the evaluation of video coding for machines. Through extensive evaluation, this paper shows that the proposed ground-truth-agnostic evaluation method results in an acceptable absolute measurement error below 0.7 percentage points on the Bjontegaard Delta Rate compared to using the true ground truth for mid-range bitrates. We evaluate on the three tasks of semantic segmentation, instance segmentation, and object detection. Lastly, we utilize the ground-truth-agnostic method to measure the coding performances of the VVC compared against HEVC on the Cityscapes sequences. This reveals that the coding position has a significant influence on the task performance.

preprint2022arXiv

Fast orthogonality deficiency compensation for improved frequency selective image extrapolation

The purpose of this paper is to introduce a very efficient algorithm for signal extrapolation. It can widely be used in many applications in image and video communication, e. g. for concealment of block errors caused by transmission errors or for prediction in video coding. The signal extrapolation is performed by extending a signal from a limited number of known samples into areas beyond these samples. Therefore a finite set of orthogonal basis functions is used and the known part of the signal is projected onto them. Since the basis functions are not orthogonal regarding the area of the known samples, the projection does not lead to the real portion a basis function has of the signal. The proposed algorithm efficiently copes with this non-orthogonality resulting in very good objective and visual extrapolation results for edges, smooth areas, as well as structured areas. Compared to an existent implementation, this algorithm has a significantly lower computational complexity without any degradation in quality. The processing time can be reduced by a factor larger than 100.

preprint2022arXiv

Fast Reconstruction of Three-Quarter Sampling Measurements Using Recurrent Local Joint Sparse Deconvolution and Extrapolation

Recently, non-regular three-quarter sampling has shown to deliver an increased image quality of image sensors by using differently oriented L-shaped pixels compared to the same number of square pixels. A three-quarter sampling sensor can be understood as a conventional low-resolution sensor where one quadrant of each square pixel is opaque. Subsequent to the measurement, the data can be reconstructed on a regular grid with twice the resolution in both spatial dimensions using an appropriate reconstruction algorithm. For this reconstruction, local joint sparse deconvolution and extrapolation (L-JSDE) has shown to perform very well. As a disadvantage, L-JSDE requires long computation times of several dozen minutes per megapixel. In this paper, we propose a faster version of L-JSDE called recurrent L-JSDE (RL-JSDE) which is a reformulation of L-JSDE. For reasonable recurrent measurement patterns, RL-JSDE provides significant speedups on both CPU and GPU without sacrificing image quality. Compared to L-JSDE, 20-fold and 733-fold speedups are achieved on CPU and GPU, respectively.

preprint2022arXiv

Frequency selective extrapolation with residual filtering for image error concealment

The purpose of signal extrapolation is to estimate unknown signal parts from known samples. This task is especially important for error concealment in image and video communication. For obtaining a high quality reconstruction, assumptions have to be made about the underlying signal in order to solve this underdetermined problem. Among existent reconstruction algorithms, frequency selective extrapolation (FSE) achieves high performance by assuming that image signals can be sparsely represented in the frequency domain. However, FSE does not take into account the low-pass behaviour of natural images. In this paper, we propose a modified FSE that takes this prior knowledge into account for the modelling, yielding significant PSNR gains.

preprint2022arXiv

Frequency-Selective Mesh-to-Mesh Resampling for Color Upsampling of Point Clouds

With the increased use of virtual and augmented reality applications, the importance of point cloud data rises. High-quality capturing of point clouds is still expensive and thus, the need for point cloud super-resolution or point cloud upsampling techniques emerges. In this paper, we propose an interpolation scheme for color upsampling of three-dimensional color point clouds. As a point cloud represents an object's surface in three-dimensional space, we first conduct a local transform of the surface into a two-dimensional plane. Secondly, we propose to apply a novel Frequency-Selective Mesh-to-Mesh Resampling (FSMMR) technique for the interpolation of the points in 2D. FSMMR generates a model of weighted superpositions of basis functions on scattered points. This model is then evaluated for the final points in order to increase the resolution of the original point cloud. Evaluation shows that our approach outperforms common interpolation schemes. Visual comparisons of the jaguar point cloud underlines the quality of our upsampling results. The high performance of FSMMR holds for various sampling densities of the input point cloud.

preprint2022arXiv

Increasing Imaging Resolution by Non-Regular Sampling and Joint Sparse Deconvolution and Extrapolation

Increasing the resolution of image sensors has been a never ending struggle since many years. In this paper, we propose a novel image sensor layout which allows for the acquisition of images at a higher resolution and improved quality. For this, the image sensor makes use of non-regular sampling which reduces the impact of aliasing. Therewith, it allows for capturing details which would not be possible with state-of-the-art sensors of the same number of pixels. The non-regular sampling is achieved by rotating prototype pixel cells in a non-regular fashion. As not the whole area of the pixel cell is sensitive to light, a non-regular spatial integration of the incident light is obtained. Based on the sensor output data, a high-resolution image can be reconstructed by performing a deconvolution with respect to the integration area and an extrapolation of the information to the insensitive regions of the pixels. To solve this challenging task, we introduce a novel joint sparse deconvolution and extrapolation algorithm. The union of non-regular sampling and the proposed reconstruction allows for achieving a higher resolution and therewith an improved imaging quality.

preprint2022arXiv

Iterative Optimization of Quarter Sampling Masks for Non-Regular Sampling Sensors

Non-regular sampling can reduce aliasing at the expense of noise. Recently, it has been shown that non-regular sampling can be carried out using a conventional regular imaging sensor when the surface of its individual pixels is partially covered. This technique is called quarter sampling (also 1/4 sampling), since only one quarter of each pixel is sensitive to light. For this purpose, the choice of a proper sampling mask is crucial to achieve a high reconstruction quality. In the scope of this work, we present an iterative algorithm to improve an arbitrary quarter sampling mask which results in a continuous increase of the reconstruction quality. In terms of the reconstruction algorithms, we test two simple algorithms, namely, linear interpolation and nearest neighbor interpolation, as well as two more sophisticated algorithms, namely, steering kernel regression and frequency selective extrapolation. Besides PSNR gains of +0.31 dB to +0.68 dB relative to a random quarter sampling mask resulting from our optimized mask, visually noticeable enhancements are perceptible.

preprint2022arXiv

Joint Optimization of Rate, Distortion, and Decoding Energy for HEVC Intraframe Coding

This paper presents a novel algorithm that aims at minimizing the required decoding energy by exploiting a general energy model for HEVC-decoder solutions. We incorporate the energy model into the HEVC encoder such that it is capable of constructing a bit stream whose decoding process consumes less energy than the decoding process of a conventional bit stream. To achieve this, we propose to extend the traditional Rate-Distortion-Optimization scheme to a Decoding-Energy-Rate-Distortion approach. To obtain fast encoding decisions in the optimization process, we derive a fixed relation between the quantization parameter and the Lagrange multiplier for energy optimization. Our experiments show that this concept is applicable for intraframe-coded videos and that for local playback as well as online streaming scenarios, up to 15% of the decoding energy can be saved at the expense of a bitrate increase of approximately the same magnitude.

preprint2022arXiv

Key Point Agnostic Frequency-Selective Mesh-to-Grid Image Resampling using Spectral Weighting

Many applications in image processing require resampling of arbitrarily located samples onto regular grid positions. This is important in frame-rate up-conversion, super-resolution, and image warping among others. A state-of-the-art high quality model-based resampling technique is frequency-selective mesh-to-grid resampling which requires pre-estimation of key points. In this paper, we propose a new key point agnostic frequency-selective mesh-to-grid resampling that does not depend on pre-estimated key points. Hence, the number of data points that are included is reduced drastically and the run time decreases significantly. To compensate for the key points, a spectral weighting function is introduced that models the optical transfer function in order to favor low frequencies more than high ones. Thereby, resampling artefacts like ringing are supressed reliably and the resampling quality increases. On average, the new AFSMR is conceptually simpler and gains up to 1.2 dB in terms of PSNR compared to the original mesh-to-grid resampling while being approximately 14.5 times faster.

preprint2022arXiv

Learning True Rate-Distortion-Optimization for End-To-End Image Compression

Even though rate-distortion optimization is a crucial part of traditional image and video compression, not many approaches exist which transfer this concept to end-to-end-trained image compression. Most frameworks contain static compression and decompression models which are fixed after training, so efficient rate-distortion optimization is not possible. In a previous work, we proposed RDONet, which enables an RDO approach comparable to adaptive block partitioning in HEVC. In this paper, we enhance the training by introducing low-complexity estimations of the RDO result into the training. Additionally, we propose fast and very fast RDO inference modes. With our novel training method, we achieve average rate savings of 19.6% in MS-SSIM over the previous RDONet model, which equals rate savings of 27.3% over a comparable conventional deep image coder.

preprint2022arXiv

Modeling the Energy Consumption of HEVC Intra Decoding

Battery life is one of the major limitations to mobile device use, which makes research on energy efficient soft- and hardware an important task. This paper investigates the energy required by a CPU when decoding compressed bitstream videos on mobile platforms. A model is derived that describes the energy consumption of the new HEVC decoder for intra coded videos. We show that the relative estimation error of the model is smaller than 3.2% and that the model can be used to build encoders aiming at minimizing decoding energy.

preprint2022arXiv

Modeling the Energy Consumption of the HEVC Decoding Process

In this paper, we present a bit stream feature based energy model that accurately estimates the energy required to decode a given HEVC-coded bit stream. Therefore, we take a model from literature and extend it by explicitly modeling the inloop filters, which was not done before. Furthermore, to prove its superior estimation performance, it is compared to seven different energy models from literature. By using a unified evaluation framework we show how accurately the required decoding energy for different decoding systems can be approximated. We give thorough explanations on the model parameters and explain how the model variables are derived. To show the modeling capabilities in general, we test the estimation performance for different decoding software and hardware solutions, where we find that the proposed model outperforms the models from literature by reaching frame-wise mean estimation errors of less than 7% for software and less than 15% for hardware based systems.

preprint2022arXiv

Motion Compensated Frequency Selective Extrapolation for Error Concealment in Video Coding

Although wireless and IP-based access to video content gives a new degree of freedom to the viewers, the risk of severe block losses caused by transmission errors is always present. The purpose of this paper is to present a new method for concealing block losses in erroneously received video sequences. For this, a motion compensated data set is generated around the lost block. Based on this aligned data set, a model of the signal is created that continues the signal into the lost areas. Since spatial as well as temporal informations are used for the model generation, the proposed method is superior to methods that use either spatial or temporal information for concealment. Furthermore it outperforms current state of the art spatio-temporal concealment algorithms by up to 1.4 dB in PSNR.

preprint2022arXiv

Motion Compensated Three-Dimensional Frequency Selective Extrapolation for Improved Error Concealment in Video Communication

During transmission of video data over error-prone channels the risk of getting severe image distortions due to transmission errors is ubiquitous. To deal with image distortions at decoder side, error concealment is applied. This article presents Motion Compensated Three-Dimensional Frequency Selective Extrapolation, a novel spatio-temporal error concealment algorithm. The algorithm uses fractional-pel motion estimation and compensation as initial step, being followed by the generation of a model of the distorted signal. The model generation is conducted by an enhanced version of Three-Dimensional Frequency Selective Extrapolation, an existing error concealment algorithm. Compared to this existent algorithm, the proposed one yields an improvement in concealment quality of up to 1.64 dB PSNR. Altogether, the incorporation of motion compensation and the improved model generation extends the already high extrapolation quality of the underlying Frequency Selective Extrapolation, resulting in a gain of more than 3 dB compared to other well-known error concealment algorithms.

preprint2022arXiv

Multi-Objective Design Space Exploration for the Optimization of the HEVC Mode Decision Process

Finding the best possible encoding decisions for compressing a video sequence is a highly complex problem. In this work, we propose a multi-objective Design Space Exploration (DSE) method to automatically find HEVC encoder implementations that are optimized for several different criteria. The DSE shall optimize the coding mode evaluation order of the mode decision process and jointly explore early skip conditions to minimize the four objectives a) bitrate, b) distortion, c) encoding time, and d) decoding energy. In this context, we use a SystemC-based actor model of the HM test model encoder for the evaluation of each explored solution. The evaluation that is based on real measurements shows that our framework can automatically generate encoder solutions that save more than 60% of encoding time or 3% of decoding energy when accepting bitrate increases of around 3%.

preprint2022arXiv

Multiple Selection Approximation for Improved Spatio-Temporal Prediction in Video Coding

In this contribution, a novel spatio-temporal prediction algorithm for video coding is introduced. This algorithm exploits temporal as well as spatial redundancies for effectively predicting the signal to be encoded. To achieve this, the algorithm operates in two stages. Initially, motion compensated prediction is applied on the block being encoded. Afterwards this preliminary temporal prediction is refined by forming a joint model of the initial predictor and the spatially adjacent already transmitted blocks. The novel algorithm is able to outperform earlier refinement algorithms in speed and prediction quality. Compared to pure motion compensated prediction, the mean data rate can be reduced by up to 15% and up to 1.16 dB gain in PSNR can be achieved for the considered sequences.

preprint2022arXiv

Multiple Selection Extrapolation for Improved Spatial Error Concealment

This contribution introduces a novel signal extrapolation algorithm and its application to image error concealment. The signal extrapolation is carried out by iteratively generating a model of the signal suffering from distortion. Thereby, the model results from a weighted superposition of two-dimensional basis functions whereas in every iteration step a set of these is selected and the approximation residual is projected onto the subspace they span. The algorithm is an improvement to the Frequency Selective Extrapolation that has proven to be an effective method for concealing lost or distorted image regions. Compared to this algorithm, the novel algorithm is able to reduce the processing time by a factor larger than three, by still preserving the very high extrapolation quality.

preprint2022arXiv

Novel Consistency Check For Fast Recursive Reconstruction Of Non-Regularly Sampled Video Data

Quarter sampling is a novel sensor design that allows for an acquisition of higher resolution images without increasing the number of pixels. When being used for video data, one out of four pixels is measured in each frame. Effectively, this leads to a non-regular spatio-temporal sub-sampling. Compared to purely spatial or temporal sub-sampling, this allows for an increased reconstruction quality, as aliasing artifacts can be reduced. For the fast reconstruction of such sensor data with a fixed mask, recursive variant of frequency selective reconstruction (FSR) was proposed. Here, pixels measured in previous frames are projected into the current frame to support its reconstruction. In doing so, the motion between the frames is computed using template matching. Since some of the motion vectors may be erroneous, it is important to perform a proper consistency checking. In this paper, we propose faster consistency checking methods as well as a novel recursive FSR that uses the projected pixels different than in literature and can handle dynamic masks. Altogether, we are able to significantly increase the reconstruction quality by + 1.01 dB compared to the state-of-the-art recursive reconstruction method using a fixed mask. Compared to a single frame reconstruction, an average gain of about + 1.52 dB is achieved for dynamic masks. At the same time, the computational complexity of the consistency checks is reduced by a factor of 13 compared to the literature algorithm.

preprint2022arXiv

On Intra Video Coding and In-loop Filtering for Neural Object Detection Networks

Classical video coding for satisfying humans as the final user is a widely investigated field of studies for visual content, and common video codecs are all optimized for the human visual system (HVS). But are the assumptions and optimizations also valid when the compressed video stream is analyzed by a machine? To answer this question, we compared the performance of two state-of-the-art neural detection networks when being fed with deteriorated input images coded with HEVC and VVC in an autonomous driving scenario using intra coding. Additionally, the impact of the three VVC in-loop filters when coding images for a neural network is examined. The results are compared using the mean average precision metric to evaluate the object detection performance for the compressed inputs. Throughout these tests, we found that the Bjøntegaard Delta Rate savings with respect to PSNR of 22.2 % using VVC instead of HEVC cannot be reached when coding for object detection networks with only 13.6% in the best case. Besides, it is shown that disabling the VVC in-loop filters SAO and ALF results in bitrate savings of 6.4 % compared to the standard VTM at the same mean average precision.

preprint2022arXiv

Optimized and Parallelized Processing Order for Improved Frequency Selective Signal Extrapolation

In the recent years, multi-core processor designs have found their way into many computing devices. To exploit the capabilities of such devices in the best possible way, signal processing algorithms have to be adapted to an operation in parallel tasks. In this contribution an optimized processing order is proposed for Frequency Selective Extrapolation, a powerful signal extrapolation algorithm. Using this optimized order, the extrapolation can be carried out in parallel. The algorithm scales very good, resulting in an acceleration of a factor of up to 7.7 for an eight core computer. Additionally, the optimized processing order aims at reducing the propagation of extrapolation errors over consecutive losses. Thus, in addition to the acceleration, a visually noticeable improvement in quality of up to 0.5 dB PSNR can be achieved.

preprint2022arXiv

Optimized processing order for 3D hole filling in video sequences using frequency selective extrapolation

A problem often arising in video communication is the reconstruction of missing or distorted areas in a video sequence. Such holes of unavailable pixels may be caused for example by transmission errors of coded video data or undesired objects like logos. In order to close the holes given neighboring available content, a signal extrapolation has to be performed. The best quality can be achieved, if spatial as well as temporal information is used for the reconstruction. However, the question always is in which order to process the extrapolation to obtain the best result. In this paper, an optimized processing order is introduced for improving the extrapolation quality of Three-dimensional Frequency Selective Extrapolation. Using the proposed optimized order, holes in video sequences can be closed from the outer margin to the center, leading to a higher reconstruction quality, and visually noticeable gains of more than 0.5 dB PSNR are possible.

preprint2022arXiv

Orthogonality Deficiency Compensation for Improved Frequency Selective Image Extrapolation

This paper describes a very efficient algorithm for image signal extrapolation. It can be used for various applications in image and video communication, e.g. the concealment of data corrupted by transmission errors or prediction in video coding. The extrapolation is performed on a limited number of known samples and extends the signal beyond these samples. Therefore the signal from the known samples is iteratively projected onto different basis functions in order to generate a model of the signal. As the basis functions are not orthogonal with respect to the area of the known samples we propose a new extension, the orthogonality deficiency compensation, to cope with the non-orthogonality. Using this extension, very good extrapolation results for structured as well as for smooth areas are achievable. This algorithm improves PSNR up to 2 dB and gives a better visual quality for concealment of block losses compared to extrapolation algorithms existent so far.

preprint2022arXiv

Rate-Distortion Optimal Transform Coefficient Selection for Unoccupied Regions in Video-Based Point Cloud Compression

This paper presents a novel method to determine rate-distortion optimized transform coefficients for efficient compression of videos generated from point clouds. The method exploits a generalized frequency selective extrapolation approach that iteratively determines rate-distortion-optimized coefficients for all basis functions of two-dimensional discrete cosine and sine transforms. The method is applied to blocks containing both occupied and unoccupied pixels in video based point cloud compression for HEVC encoding. In the proposed algorithm, only the values of the transform coefficients are changed such that resulting bit streams are compliant to the V-PCC standard. For all-intra coded point clouds, bitrate savings of more than 4% for geometry and more than 6% for texture error metrics with respect to standard encoding can be observed. These savings are more than twice as high as savings obtained using competing methods from literature. In the randomaccess case, our proposed method outperforms competing V-PCC methods by more than 0.5%.

preprint2022arXiv

Real-Time Frequency Selective Reconstruction through Register-Based Argmax Calculation

Frequency Selective Reconstruction (FSR) is a state-of-the-art algorithm for solving diverse image reconstruction tasks, where a subset of pixel values in the image is missing. However, it entails a high computational complexity due to its iterative, blockwise procedure to reconstruct the missing pixel values. Although the complexity of FSR can be considerably decreased by performing its computations in the frequency domain, the reconstruction procedure still takes multiple seconds up to multiple minutes depending on the parameterization. However, FSR has the potential for a massive parallelization greatly improving its reconstruction time. In this paper, we introduce a novel highly parallelized formulation of FSR adapted to the capabilities of modern GPUs and propose a considerably accelerated calculation of the inherent argmax calculation. Altogether, we achieve a 100-fold speed-up, which enables the usage of FSR for real-time applications.

preprint2022arXiv

Reconstruction of images taken by a pair of non-regular sampling sensors using correlation based matching

Multi-view image acquisition systems with two or more cameras can be rather costly due to the number of high resolution image sensors that are required. Recently, it has been shown that by covering a low resolution sensor with a non-regular sampling mask and by using an efficient algorithm for image reconstruction, a high resolution image can be obtained. In this paper, a stereo image reconstruction setup for multi-view scenarios is proposed. A scene is captured by a pair of non-regular sampling sensors and by incorporating information from the adjacent view, the reconstruction quality can be increased. Compared to a state-of-the-art single-view reconstruction algorithm, this leads to a visually noticeable average gain in PSNR of 0.74 dB.

preprint2022arXiv

Reconstruction of Videos Taken by a Non-Regular Sampling Sensor

Recently, it has been shown that a high resolution image can be obtained without the usage of a high resolution sensor. The main idea has been that a low resolution sensor is covered with a non-regular sampling mask followed by a reconstruction of the incomplete high resolution image captured this way. In this paper, a multi-frame reconstruction approach is proposed where a video is taken by a non-regular sampling sensor and fully reconstructed afterwards. By utilizing the temporal correlation between neighboring frames, the reconstruction quality can be further enhanced. Compared to a state-of-the-art single-frame reconstruction approach, this leads to a visually noticeable gain in PSNR of up to 1.19 dB on average.

preprint2022arXiv

Recursive Frequency Selective Reconstruction of Non-Regularly Sampled Video Data

High resolution images can be acquired using a non-regular sampling sensor which consists of an underlying low resolution sensor that is covered with a non-regular sampling mask. The reconstructed high resolution image is then obtained during post-processing. Recently, it has been shown that the temporal correlation between neighboring frames can be exploited in order to enhance the reconstruction quality of non-regularly sampled video data. In this paper, a new recursive multi-frame reconstruction approach is proposed in order to further increase the reconstruction quality. By using a new reference order, previously reconstructed frames can be used for the subsequent motion estimation and a new weighting function allows for the incorporation of multiple pixels projected onto the same position. With the new recursive multi-frame approach, a visually noticeable average gain in PSNR of up to 1.13 dB with respect to a state-of-the-art single-frame reconstruction approach can be achieved. Compared to the existing multi-frame approach, a gain of 0.31 dB is possible. SSIM results show the same behavior as PSNR results. Additionally, the pre-reconstruction step of the existing multi-frame approach can be avoided and the new algorithm is, in general, capable of real-time processing.

preprint2022arXiv

Reducing Randomness of Non-Regular Sampling Masks for Image Reconstruction

Increasing spatial image resolution is an often required, yet challenging task in image acquisition. Recently, it has been shown that it is possible to obtain a high resolution image by covering a low resolution sensor with a non-regular sampling mask. Due to the masking, however, some pixel information in the resulting high resolution image is not available and has to be reconstructed by an efficient image reconstruction algorithm in order to get a fully reconstructed high resolution image. In this paper, the influence of different sampling masks with a reduced randomness of the non-regularity on the image reconstruction process is evaluated. Simulation results show that it is sufficient to use sampling masks that are non-regular only on a smaller scale. These sampling masks lead to a visually noticeable gain in PSNR compared to arbitrary chosen sampling masks which are non-regular over the whole image sensor size. At the same time, they simplify the manufacturing process and allow for efficient storage.

preprint2022arXiv

Reliability-based Mesh-to-Grid Image Reconstruction

This paper presents a novel method for the reconstruction of images from samples located at non-integer positions, called mesh. This is a common scenario for many image processing applications, such as super-resolution, warping or virtual view generation in multi-camera systems. The proposed method relies on a set of initial estimates that are later refined by a new reliability-based content-adaptive framework that employs denoising in order to reduce the reconstruction error. The reliability of the initial estimate is computed so stronger denoising is applied to less reliable estimates. The proposed technique can improve the reconstruction quality by more than 2 dB (in terms of PSNR) with respect to the initial estimate and it outperforms the state-of-the-art denoising-based refinement by up to 0.7 dB.

preprint2022arXiv

Resampling Images to a Regular Grid from a Non-Regular Subset of Pixel Positions Using Frequency Selective Reconstruction

Even though image signals are typically defined on a regular two-dimensional grid, there also exist many scenarios where this is not the case and the amplitude of the image signal only is available for a non-regular subset of pixel positions. In such a case, a resampling of the image to a regular grid has to be carried out. This is necessary since almost all algorithms and technologies for processing, transmitting or displaying image signals rely on the samples being available on a regular grid. Thus, it is of great importance to reconstruct the image on this regular grid so that the reconstruction comes closest to the case that the signal has been originally acquired on the regular grid. In this paper, Frequency Selective Reconstruction is introduced for solving this challenging task. This algorithm reconstructs image signals by exploiting the property that small areas of images can be represented sparsely in the Fourier domain. By further taking into account the basic properties of the Optical Transfer Function of imaging systems, a sparse model of the signal is iteratively generated. In doing so, the proposed algorithm is able to achieve a very high reconstruction quality, in terms of PSNR and SSIM as well as in terms of visual quality. Simulation results show that the proposed algorithm is able to outperform state-of-the-art reconstruction algorithms and gains of more than 1 dB PSNR are possible.

preprint2022arXiv

Reusing the H.264/AVC deblocking filter for efficient spatio-temporal prediction in video coding

The prediction step is a very important part of hybrid video codecs for effectively compressing video sequences. While existing video codecs predict either in temporal or in spatial direction only, the compression efficiency can be increased by a combined spatio-temporal prediction. In this paper we propose an algorithm for reusing the H.264/AVC deblocking filter for spatio-temporal prediction. Reusing this highly op timized filter allows for a very low computational complexity of this prediction mode and an average rate reduction of up to 7.2% can be achieved.

preprint2022arXiv

RFVTM: A Recovery and Filtering Vertex Trichotomy Matching for Remote Sensing Image Registration

Reliable feature point matching is a vital yet challenging process in feature-based image registration. In this paper,a robust feature point matching algorithm called Recovery and Filtering Vertex Trichotomy Matching (RFVTM) is proposed to remove outliers and retain sufficient inliers for remote sensing images. A novel affine invariant descriptor called vertex trichotomy descriptor is proposed on the basis of that geometrical relations between any of vertices and lines are preserved after affine transformations, which is constructed by mapping each vertex into trichotomy sets. The outlier removals in Vertex Trichotomy Matching (VTM) are implemented by iteratively comparing the disparity of corresponding vertex trichotomy descriptors. Some inliers mistakenly validated by a large amount of outliers are removed in VTM iterations, and several residual outliers close to correct locations cannot be excluded with the same graph structures. Therefore, a recovery and filtering strategy is designed to recover some inliers based on identical vertex trichotomy descriptors and restricted transformation errors. Assisted with the additional recovered inliers, residual outliers can also be filtered out during the process of reaching identical graph for the expanded vertex sets. Experimental results demonstrate the superior performance on precision and stability of this algorithm under various conditions, such as remote sensing images with large transformations, duplicated patterns, or inconsistent spectral content.

preprint2022arXiv

Robust Deep Neural Object Detection and Segmentation for Automotive Driving Scenario with Compressed Image Data

Deep neural object detection or segmentation networks are commonly trained with pristine, uncompressed data. However, in practical applications the input images are usually deteriorated by compression that is applied to efficiently transmit the data. Thus, we propose to add deteriorated images to the training process in order to increase the robustness of the two state-of-the-art networks Faster and Mask R-CNN. Throughout our paper, we investigate an autonomous driving scenario by evaluating the newly trained models on the Cityscapes dataset that has been compressed with the upcoming video coding standard Versatile Video Coding (VVC). When employing the models that have been trained with the proposed method, the weighted average precision of the R-CNNs can be increased by up to 3.68 percentage points for compressed input images, which corresponds to bitrate savings of nearly 48 %.

preprint2022arXiv

Saliency-Driven Versatile Video Coding for Neural Object Detection

Saliency-driven image and video coding for humans has gained importance in the recent past. In this paper, we propose such a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile Video Coding (VVC). To determine the salient regions before encoding, we employ the real-time-capable object detection network You Only Look Once~(YOLO) in combination with a novel decision criterion. To measure the coding quality for a machine, the state-of-the-art object segmentation network Mask R-CNN was applied to the decoded frame. From extensive simulations we find that, compared to the reference VVC with a constant quality, up to 29 % of bitrate can be saved with the same detection accuracy at the decoder side by applying the proposed saliency-driven framework. Besides, we compare YOLO against other, more traditional saliency detection methods.

preprint2022arXiv

Scalable Kernel-Based Minimum Mean Square Error Estimator for Accelerated Image Error Concealment

Error concealment is of great importance for block-based video systems, such as DVB or video streaming services. In this paper, we propose a novel scalable spatial error concealment algorithm that aims at obtaining high quality reconstructions with reduced computational burden. The proposed technique exploits the excellent reconstructing abilities of the kernel-based minimum mean square error K-MMSE estimator. We propose to decompose this approach into a set of hierarchically stacked layers. The first layer performs the basic reconstruction that the subsequent layers can eventually refine. In addition, we design a layer management mechanism, based on profiles, that dynamically adapts the use of higher layers to the visual complexity of the area being reconstructed. The proposed technique outperforms other state-of-the-art algorithms and produces high quality reconstructions, equivalent to K-MMSE, while requiring around one tenth of its computational time.

preprint2022arXiv

Spatio-temporal error concealment in video by denoised temporal extrapolation refinement

In video communication, the concealment of distortions caused by transmission errors is important for allowing for a pleasant visual quality and for reducing error propagation. In this article, Denoised Temporal Extrapolation Refinement is introduced as a novel spatiotemporal error concealment algorithm. The algorithm operates in two steps. First, temporal error concealment is used for obtaining an initial estimate. Afterwards, a spatial denoising algorithm is used for reducing the imperfectness of the temporal extrapolation. For this, Non-Local Means denoising is used which is extended by a spiral scan processing order and is improved by an adaptation step for taking the preliminary temporal extrapolation into account. In doing so, a spatio-temporal error concealment results. By making use of the refinement, a visually noticeable average gain of 1 dB over pure temporal error concealment is possible. With this, the algorithm also is able to clearly outperform other spatio-temporal error concealment algorithms.

preprint2022arXiv

Spatio-temporal prediction in video coding by best approximation

Within the scope of this contribution we propose a novel efficient spatio-temporal prediction algorithm for video coding. The algorithm operates in two stages. First, motion compensation is performed on the block to be predicted in order to exploit temporal correlations. Afterwards, in order to exploit spatial correlations, this preliminary estimate is spatially refined by forming a joint model of the motion compensated block and spatially adjacent already decoded blocks. Compared to an earlier refinement algorithm, the novel one only needs very little iteration, leading to a speedup of factor 17. The implementation of this new algorithm into the H.264/AVC leads to a maximum reduction in data rate of up to nearly 13% for the considered sequences.

preprint2022arXiv

Spatio-temporal prediction in video coding by non-local means refined motion compensation

The prediction step is a very important part of hybrid video codecs. In this contribution, a novel spatio-temporal prediction algorithm is introduced. For this, the prediction is carried out in two steps. Firstly, a preliminary temporal prediction is conducted by motion compensation. Afterwards, spatial refinement is carried out for incorporating spatial redundancies from already decoded neighboring blocks. Thereby, the spatial refinement is achieved by applying Non-Local Means de-noising to the union of the motion compensated block and the already decoded blocks. Including the spatial refinement into H.264/AVC, a rate reduction of up to 14 % or respectively a gain of up to 0.7 dB PSNR compared to unrefined motion compensated prediction can be achieved.

preprint2022arXiv

Spatio-temporal prediction in video coding by spatially refined motion compensation

The purpose of this contribution is to introduce a new method of signal prediction in video coding. Unlike most existent prediction methods that either use temporal or use spatial correlations to generate the prediction signal, the proposed method uses spatial and temporal correlations at the same time. The spatio-temporal prediction is obtained by first performing motion compensation for a macroblock, followed by a refinement step that pays attention to the correlations between the macroblock and its surroundings. At the decoder, the refinement step can be performed in the same manner, thus no additional side information has to be transmitted. Implementation of the spatial refinement step into the H.264/AVC video codec leads to reduction in data rate of up to nearly 15% and increase in PSNR of up to 0.75 dB, compared to pure motion compensated prediction.

preprint2022arXiv

Texture-Dependent Frequency Selective Reconstruction of Non-Regularly Sampled Images

There exist many scenarios where pixel information is available only on a non-regular subset of pixel positions. For further processing, however, it is required to reconstruct such images on a regular grid. Besides many other algorithms, frequency selective reconstruction can be applied for this task. It performs a block-wise generation of a sparse signal model as an iterative superposition of Fourier basis functions and uses this model to replace missing or corrupted pixels in an image. In this paper, it is shown that it is not required to spend the same amount of iterations on both homogeneous and heterogeneous regions. Hence, a new texture-dependent approach for frequency selective reconstruction is introduced that distributes the number of iterations depending on the texture of the regions to be reconstructed. Compared to the original frequency selective reconstruction and depending on the number of iterations, visually noticeable gains in PSNR of up to 1.47 dB can be achieved.

preprint2022arXiv

Video Coding for Machines with Feature-Based Rate-Distortion Optimization

Common state-of-the-art video codecs are optimized to deliver a low bitrate by providing a certain quality for the final human observer, which is achieved by rate-distortion optimization (RDO). But, with the steady improvement of neural networks solving computer vision tasks, more and more multimedia data is not observed by humans anymore, but directly analyzed by neural networks. In this paper, we propose a standard-compliant feature-based RDO (FRDO) that is designed to increase the coding performance, when the decoded frame is analyzed by a neural network in a video coding for machine scenario. To that extent, we replace the pixel-based distortion metrics in conventional RDO of VTM-8.0 with distortion metrics calculated in the feature space created by the first layers of a neural network. Throughout several tests with the segmentation network Mask R-CNN and single images from the Cityscapes dataset, we compare the proposed FRDO and its hybrid version HFRDO with different distortion measures in the feature space against the conventional RDO. With HFRDO, up to 5.49 % bitrate can be saved compared to the VTM-8.0 implementation in terms of Bjøntegaard Delta Rate and using the weighted average precision as quality metric. Additionally, allowing the encoder to vary the quantization parameter results in coding gains for the proposed HFRDO of up 9.95 % compared to conventional VTM.