Source author record

Nicki Holighaus

Nicki Holighaus appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Sound eess.AS math.FA math.NA Mathematical Software eess.SP Information Theory math.IT Numerical Analysis Machine Learning math.CO math.GR math.NT Social and Information Networks

Catalog footprint

What is connected

12works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Grid-Based Decimation for Wavelet Transforms with Stably Invertible Implementation

The constant center frequency to bandwidth ratio (Q-factor) of wavelet transforms provides a very natural representation for audio data. However, invertible wavelet transforms have either required non-uniform decimation -- leading to irregular data structures that are cumbersome to work with -- or require excessively high oversampling with unacceptable computational overhead. Here, we present a novel decimation strategy for wavelet transforms that leads to stable representations with oversampling rates close to one and uniform decimation. Specifically, we show that finite implementations of the resulting representation are energy-preserving in the sense of frame theory. The obtained wavelet coefficients can be stored in a timefrequency matrix with a natural interpretation of columns as time frames and rows as frequency channels. This matrix structure immediately grants access to a large number of algorithms that are successfully used in time-frequency audio processing, but could not previously be used jointly with wavelet transforms. We demonstrate the application of our method in processing based on nonnegative matrix factorization, in onset detection, and in phaseless reconstruction.

preprint2022arXiv

Audio inpainting of music by means of neural networks

We studied the ability of deep neural networks (DNNs) to restore missing audio content based on its context, a process usually referred to as audio inpainting. We focused on gaps in the range of tens of milliseconds. The proposed DNN structure was trained on audio signals containing music and musical instruments, separately, with 64-ms long gaps. The input to the DNN was the context, i.e., the signal surrounding the gap, transformed into time-frequency (TF) coefficients. Our results were compared to those obtained from a reference method based on linear predictive coding (LPC). For music, our DNN significantly outperformed the reference method, demonstrating a generally good usability of the proposed DNN structure for inpainting complex audio signals like music.

preprint2022arXiv

Audio Inpainting via $\ell_1$-Minimization and Dictionary Learning

Audio inpainting refers to signal processing techniques that aim at restoring missing or corrupted consecutive samples in audio signals. Prior works have shown that $\ell_1$- minimization with appropriate weighting is capable of solving audio inpainting problems, both for the analysis and the synthesis models. These models assume that audio signals are sparse with respect to some redundant dictionary and exploit that sparsity for inpainting purposes. Remaining within the sparsity framework, we utilize dictionary learning to further increase the sparsity and combine it with weighted $\ell_1$-minimization adapted for audio inpainting to compensate for the loss of energy within the gap after restoration. Our experiments demonstrate that our approach is superior in terms of signal-to-distortion ratio (SDR) and objective difference grade (ODG) compared with its original counterpart.

preprint2022arXiv

Fast Matching Pursuit with Multi-Gabor Dictionaries

Finding the best K-sparse approximation of a signal in a redundant dictionary is an NP-hard problem. Suboptimal greedy matching pursuit (MP) algorithms are generally used for this task. In this work, we present an acceleration technique and an implementation of the matching pursuit algorithm acting on a multi-Gabor dictionary, i.e., a concatenation of several Gabor-type time-frequency dictionaries, each of which consisting of translations and modulations of a possibly different window and time and frequency shift parameters. The technique is based on pre-computing and thresholding inner products between atoms and on updating the residual directly in the coefficient domain, i.e., without the round-trip to the signal domain. Since the proposed acceleration technique involves an approximate update step, we provide theoretical and experimental results illustrating the convergence of the resulting algorithm. The implementation is written in C (compatible with C99 and C++11) and we also provide Matlab and GNU Octave interfaces. For some settings, the implementation is up to 70 times faster than the standard Matching Pursuit Toolkit (MPTK).

preprint2022arXiv

Non-iterative Filter Bank Phase (Re)Construction

Signal reconstruction from magnitude-only measurements presents a long-standing problem in signal processing. In this contribution, we propose a phase (re)construction method for filter banks with uniform decimation and controlled frequency variation. The suggested procedure extends the recently introduced phase-gradient heap integration and relies on a phase-magnitude relationship for filter bank coefficients obtained from Gaussian filters. Admissible filter banks are modeled as the discretization of certain generalized translation-invariant systems, for which we derive the phase-magnitude relationship explicitly. The implementation for discrete signals is described and the performance of the algorithm is evaluated on a range of real and synthetic signals.

preprint2022arXiv

Phase Vocoder Done Right

The phase vocoder (PV) is a widely spread technique for processing audio signals. It employs a short-time Fourier transform (STFT) analysis-modify-synthesis loop and is typically used for time-scaling of signals by means of using different time steps for STFT analysis and synthesis. The main challenge of PV used for that purpose is the correction of the STFT phase. In this paper, we introduce a novel method for phase correction based on phase gradient estimation and its integration. The method does not require explicit peak picking and tracking nor does it require detection of transients and their separate treatment. Yet, the method does not suffer from the typical phase vocoder artifacts even for extreme time stretching factors.

preprint2016arXiv

A Perceptually Motivated Filter Bank with Perfect Reconstruction for Audio Signal Processing

Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. To approximate the auditory frequency resolution in the signal chain, some applications rely on perceptually motivated FBs, the gammatone FB being a popular example. However, most perceptually motivated FBs only allow partial signal reconstruction at high redundancies and/or do not have good resistance to sub-channel processing. This paper introduces an oversampled perceptually motivated FB enabling perfect reconstruction, efficient FB design, and adaptable redundancy. The filters are directly constructed in the frequency domain and linearly distributed on a perceptual frequency scale (e.g. ERB, Bark, or Mel scale). The proposed design allows for various filter shapes, uniform or non-uniform FB setting, and large down-sampling factors. For redundancies $\geq$ 3 perfect reconstruction is achieved by computing the canonical dual FB analytically. For lower redundancies perfect reconstruction is achieved using an iterative method. Experiments show performance improvements of the proposed approach when compared to the gammatone FB in terms of reconstruction error and resistance to sub-channel processing, especially at low redundancies.

preprint2016arXiv

Frame Theory for Signal Processing in Psychoacoustics

This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view-point for audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field.

preprint2014arXiv

Representing and counting the subgroups of the group Z_m x Z_n

We deduce a simple representation and the invariant factor decompositions of the subgroups of the group $\Bbb{Z}_m \times \Bbb{Z}_n$, where $m$ and $n$ are arbitrary positive integers. We obtain formulas for the total number of subgroups and the number of subgroups of a given order.

preprint2013arXiv

Efficient algorithms for discrete Gabor transforms on a nonseparable lattice

The Discrete Gabor Transform (DGT) is the most commonly used transform for signal analysis and synthesis using a linear frequency scale. It turns out that the involved operators are rich in structure if one samples the discrete phase space on a subgroup. Most of the literature focuses on separable subgroups, in this paper we will survey existing methods for a generalization to arbitrary groups, as well as present an improvement on existing methods. Comparisons are made with respect to the computational complexity, and the running time of optimized implementations in the C programming language. The new algorithms have the lowest known computational complexity for nonseparable lattices and the implementations are freely available for download. By summarizing general background information on the state of the art, this article can also be seen as a research survey, sharing with the readers experience in the numerical work in Gabor analysis.

preprint2013arXiv

Spectrum-Adapted Tight Graph Wavelet and Vertex-Frequency Frames

We consider the problem of designing spectral graph filters for the construction of dictionaries of atoms that can be used to efficiently represent signals residing on weighted graphs. While the filters used in previous spectral graph wavelet constructions are only adapted to the length of the spectrum, the filters proposed in this paper are adapted to the distribution of graph Laplacian eigenvalues, and therefore lead to atoms with better discriminatory power. Our approach is to first characterize a family of systems of uniformly translated kernels in the graph spectral domain that give rise to tight frames of atoms generated via generalized translation on the graph. We then warp the uniform translates with a function that approximates the cumulative spectral density function of the graph Laplacian eigenvalues. We use this approach to construct computationally efficient, spectrum-adapted, tight vertex-frequency and graph wavelet frames. We give numerous examples of the resulting spectrum-adapted graph filters, and also present an illustrative example of vertex-frequency analysis using the proposed construction.

preprint2012arXiv

A framework for invertible, real-time constant-Q transforms

Audio signal processing frequently requires time-frequency representations and in many applications, a non-linear spacing of frequency-bands is preferable. This paper introduces a framework for efficient implementation of invertible signal transforms allowing for non-uniform and in particular non-linear frequency resolution. Non-uniformity in frequency is realized by applying nonstationary Gabor frames with adaptivity in the frequency domain. The realization of a perfectly invertible constant-Q transform is described in detail. To achieve real-time processing, independent of signal length, slice-wise processing of the full input signal is proposed and referred to as sliCQ transform. By applying frame theory and FFT-based processing, the presented approach overcomes computational inefficiency and lack of invertibility of classical constant-Q transform implementations. Numerical simulations evaluate the efficiency of the proposed algorithm and the method's applicability is illustrated by experiments on real-life audio signals.

Nicki Holighaus

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Grid-Based Decimation for Wavelet Transforms with Stably Invertible Implementation

Audio inpainting of music by means of neural networks

Audio Inpainting via $\ell_1$-Minimization and Dictionary Learning

Fast Matching Pursuit with Multi-Gabor Dictionaries

Non-iterative Filter Bank Phase (Re)Construction

Phase Vocoder Done Right

A Perceptually Motivated Filter Bank with Perfect Reconstruction for Audio Signal Processing

Frame Theory for Signal Processing in Psychoacoustics

Representing and counting the subgroups of the group Z_m x Z_n

Efficient algorithms for discrete Gabor transforms on a nonseparable lattice

Spectrum-Adapted Tight Graph Wavelet and Vertex-Frequency Frames

A framework for invertible, real-time constant-Q transforms