Researcher profile

Srđan Kitić

Srđan Kitić contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

A Survey of Sound Source Localization with Deep Learning Methods

This article is a survey on deep learning methods for single and multiple sound source localization. We are particularly interested in sound source localization in indoor/domestic environment, where reverberation and diffuse noise are present. We provide an exhaustive topography of the neural-based localization literature in this context, organized according to several aspects: the neural network architecture, the type of input features, the output strategy (classification or regression), the types of data used for model training and evaluation, and the model training strategy. This way, an interested reader can easily comprehend the vast panorama of the deep learning-based sound source localization methods. Tables summarizing the literature survey are provided at the end of the paper for a quick search of methods with a given set of target characteristics.

preprint2022arXiv

Echo-enabled Direction-of-Arrival and range estimation of a mobile source in Ambisonic domain

Range estimation of a far field sound source in a reverberant environment is known to be a notoriously difficult problem, hence most localization methods are only capable of estimating the source's Direction-of-Arrival (DoA). In an earlier work, we have demonstrated that, under certain restrictive acoustic conditions and given the orientation of a reflecting surface, one can exploit the dominant acoustic reflection to evaluate the DoA \emph{and} the distance to a static sound source in Ambisonic domain. In this article, we leverage the recently presented Generalized Time-domain Velocity Vector (GTVV) representation to estimate these quantities for a moving sound source without an a priori knowledge of reflectors' orientations. We show that the trajectories of a moving source and the corresponding reflections are spatially and temporally related, which can be used to infer the absolute delay of the propagating source signal and, therefore, approximate the microphone-to-source distance. Experiments on real sound data confirm the validity of the proposed approach.

preprint2022arXiv

Generalized Time Domain Velocity Vector

We introduce and analyze Generalized Time Domain Velocity Vector (GTVV), an extension of the previously presented acoustic multipath footprint extracted from the Ambisonic recordings. GTVV is better adapted to adverse acoustic conditions, and enables efficient parameter estimation of multiple plane wave components in the recorded multichannel mixture. Experiments on simulated data confirm the predicted theoretical advantages of these new spatio-temporal features.

preprint2020arXiv

A Comparative Study of Multilateration Methods for Single-Source Localization in Distributed Audio

In this article we analyze the state-of-the-art in multilateration - the family of localization methods enabled by the range difference observations. These methods are computationally efficient, signal-independent, and flexible with regards to the number of sensing nodes and their spatial arrangement. However, the multilateration problem does not admit a closed-form solution in the general case, and the localization performance is conditioned on the accuracy of range difference estimates. For that reason, we consider a simplified use case where multiple distributed microphones capture the signal coming from a near field sound source, and discuss their robustness to the estimation errors. In addition to surveying the relevant bibliography, we present the results of a small-scale benchmark of few "mainstream" multilateration algorithms, based on an in-house Room Impulse Response dataset.

preprint2020arXiv

Dilated U-net based approach for multichannel speech enhancement from First-Order Ambisonics recordings

We present a CNN architecture for speech enhancement from multichannel first-order Ambisonics mixtures. The data-dependent spatial filters, deduced from a mask-based approach, are used to help an automatic speech recognition engine to face adverse conditions of reverberation and competitive speakers. The mask predictions are provided by a neural network, fed with rough estimations of speech and noise amplitude spectra, under the assumption of known directions of arrival. This study evaluates the replacing of the recurrent LSTM network previously investigated by a convolutive U-net under more stressing conditions with an additional second competitive speaker. We show that, due to more accurate short-term masks prediction, the U-net architecture brings some improvements in terms of word error rate. Moreover, results indicate that the use of dilated convolutive layers is beneficial in difficult situations with two interfering speakers, and/or where the target and interferences are close to each other in terms of the angular distance. Moreover, these results come with a two-fold reduction in the number of parameters.

preprint2020arXiv

Scattering Features for Multimodal Gait Recognition

We consider the problem of identifying people on the basis of their walk (gait) pattern. Classical approaches to tackle this problem are based on, e.g., video recordings or piezoelectric sensors embedded in the floor. In this work, we rely on acoustic and vibration measurements, obtained from a microphone and a geophone sensor, respectively. The contribution of this work is twofold. First, we propose a feature extraction method based on an (untrained) shallow scattering network, specially tailored for the gait signals. Second, we demonstrate that fusing the two modalities improves identification in the practically relevant open set scenario.

preprint2020arXiv

Time Domain Velocity Vector for Retracing the Multipath Propagation

We propose a conceptually and computationally simple form of sound velocity that offers a readable view of the interference between direct and indirect sound waves. Unlike most approaches in the literature, it jointly exploits both active and reactive sound intensity measurements, as typically derived from a first order ambisonics recording. This representation has a potential both as a valuable tool for directly analyzing sound multipath propagation, as well as being a new spatial feature format for machine learning algorithms in audio and acoustics. As a showcase, we demonstrate that the Direction-Of-Arrival and the range of a sound source can be estimated as a development of this approach. To the best knowledge of the authors, this is the first time that range is estimated from an ambisonics recording.