Source author record

Srđan Kitić

Srđan Kitić appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Sound eess.AS eess.SP Machine Learning

Catalog footprint

What is connected

8works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Survey of Sound Source Localization with Deep Learning Methods

This article is a survey on deep learning methods for single and multiple sound source localization. We are particularly interested in sound source localization in indoor/domestic environment, where reverberation and diffuse noise are present. We provide an exhaustive topography of the neural-based localization literature in this context, organized according to several aspects: the neural network architecture, the type of input features, the output strategy (classification or regression), the types of data used for model training and evaluation, and the model training strategy. This way, an interested reader can easily comprehend the vast panorama of the deep learning-based sound source localization methods. Tables summarizing the literature survey are provided at the end of the paper for a quick search of methods with a given set of target characteristics.

preprint2022arXiv

Echo-enabled Direction-of-Arrival and range estimation of a mobile source in Ambisonic domain

Range estimation of a far field sound source in a reverberant environment is known to be a notoriously difficult problem, hence most localization methods are only capable of estimating the source's Direction-of-Arrival (DoA). In an earlier work, we have demonstrated that, under certain restrictive acoustic conditions and given the orientation of a reflecting surface, one can exploit the dominant acoustic reflection to evaluate the DoA \emph{and} the distance to a static sound source in Ambisonic domain. In this article, we leverage the recently presented Generalized Time-domain Velocity Vector (GTVV) representation to estimate these quantities for a moving sound source without an a priori knowledge of reflectors' orientations. We show that the trajectories of a moving source and the corresponding reflections are spatially and temporally related, which can be used to infer the absolute delay of the propagating source signal and, therefore, approximate the microphone-to-source distance. Experiments on real sound data confirm the validity of the proposed approach.

preprint2022arXiv

Generalized Time Domain Velocity Vector

We introduce and analyze Generalized Time Domain Velocity Vector (GTVV), an extension of the previously presented acoustic multipath footprint extracted from the Ambisonic recordings. GTVV is better adapted to adverse acoustic conditions, and enables efficient parameter estimation of multiple plane wave components in the recorded multichannel mixture. Experiments on simulated data confirm the predicted theoretical advantages of these new spatio-temporal features.

preprint2020arXiv

A Comparative Study of Multilateration Methods for Single-Source Localization in Distributed Audio

In this article we analyze the state-of-the-art in multilateration - the family of localization methods enabled by the range difference observations. These methods are computationally efficient, signal-independent, and flexible with regards to the number of sensing nodes and their spatial arrangement. However, the multilateration problem does not admit a closed-form solution in the general case, and the localization performance is conditioned on the accuracy of range difference estimates. For that reason, we consider a simplified use case where multiple distributed microphones capture the signal coming from a near field sound source, and discuss their robustness to the estimation errors. In addition to surveying the relevant bibliography, we present the results of a small-scale benchmark of few "mainstream" multilateration algorithms, based on an in-house Room Impulse Response dataset.

preprint2020arXiv

Dilated U-net based approach for multichannel speech enhancement from First-Order Ambisonics recordings

We present a CNN architecture for speech enhancement from multichannel first-order Ambisonics mixtures. The data-dependent spatial filters, deduced from a mask-based approach, are used to help an automatic speech recognition engine to face adverse conditions of reverberation and competitive speakers. The mask predictions are provided by a neural network, fed with rough estimations of speech and noise amplitude spectra, under the assumption of known directions of arrival. This study evaluates the replacing of the recurrent LSTM network previously investigated by a convolutive U-net under more stressing conditions with an additional second competitive speaker. We show that, due to more accurate short-term masks prediction, the U-net architecture brings some improvements in terms of word error rate. Moreover, results indicate that the use of dilated convolutive layers is beneficial in difficult situations with two interfering speakers, and/or where the target and interferences are close to each other in terms of the angular distance. Moreover, these results come with a two-fold reduction in the number of parameters.

preprint2020arXiv

Scattering Features for Multimodal Gait Recognition

We consider the problem of identifying people on the basis of their walk (gait) pattern. Classical approaches to tackle this problem are based on, e.g., video recordings or piezoelectric sensors embedded in the floor. In this work, we rely on acoustic and vibration measurements, obtained from a microphone and a geophone sensor, respectively. The contribution of this work is twofold. First, we propose a feature extraction method based on an (untrained) shallow scattering network, specially tailored for the gait signals. Second, we demonstrate that fusing the two modalities improves identification in the practically relevant open set scenario.

preprint2020arXiv

Time Domain Velocity Vector for Retracing the Multipath Propagation

We propose a conceptually and computationally simple form of sound velocity that offers a readable view of the interference between direct and indirect sound waves. Unlike most approaches in the literature, it jointly exploits both active and reactive sound intensity measurements, as typically derived from a first order ambisonics recording. This representation has a potential both as a valuable tool for directly analyzing sound multipath propagation, as well as being a new spatial feature format for machine learning algorithms in audio and acoustics. As a showcase, we demonstrate that the Direction-Of-Arrival and the range of a sound source can be estimated as a development of this approach. To the best knowledge of the authors, this is the first time that range is estimated from an ambisonics recording.

preprint2015arXiv

Sparsity and cosparsity for audio declipping: a flexible non-convex approach

This work investigates the empirical performance of the sparse synthesis versus sparse analysis regularization for the ill-posed inverse problem of audio declipping. We develop a versatile non-convex heuristics which can be readily used with both data models. Based on this algorithm, we report that, in most cases, the two models perform almost similarly in terms of signal enhancement. However, the analysis version is shown to be amenable for real time audio processing, when certain analysis operators are considered. Both versions outperform state-of-the-art methods in the field, especially for the severely saturated signals.

Srđan Kitić

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

A Survey of Sound Source Localization with Deep Learning Methods

Echo-enabled Direction-of-Arrival and range estimation of a mobile source in Ambisonic domain

Generalized Time Domain Velocity Vector

A Comparative Study of Multilateration Methods for Single-Source Localization in Distributed Audio

Dilated U-net based approach for multichannel speech enhancement from First-Order Ambisonics recordings

Scattering Features for Multimodal Gait Recognition

Time Domain Velocity Vector for Retracing the Multipath Propagation

Sparsity and cosparsity for audio declipping: a flexible non-convex approach