Source author record

Birger Kollmeier

Birger Kollmeier appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Sound Artificial Intelligence Computation and Language Machine Learning physics.med-ph

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Objective comparison of auditory profiles using manifold learning and intrinsic measures

Assigning individuals with hearing impairment to auditory profiles can support a better understanding of the causes and consequences of hearing loss and facilitate profile-based hearing-aid fitting. However, the factors influencing auditory profile generation remain insufficiently understood, and existing profiling frameworks have rarely been compared systematically. This study therefore investigated the impact of two key factors - the clustering method and the number of profiles - on auditory profile generation. In addition, eight established auditory profiling frameworks were systematically reviewed and compared using intrinsic statistical measures and manifold learning techniques. Frameworks were evaluated with respect to internal consistency (i.e., grouping similar individuals) and cluster separation (i.e., clear differentiation between groups). To ensure comparability, all analyses were conducted on a common open-access dataset, the extended Oldenburg Hearing Health Record (OHHR), comprising 1,127 participants (mean age = 67.2 years, SD = 12.0). Results showed that both the clustering method and the chosen number of profiles substantially influenced the resulting auditory profiles. Among purely audiogram-based approaches, the Bisgaard auditory profiles demonstrated the strongest clustering performance, whereas audiometric phenotypes performed worst. Among frameworks incorporating supra-threshold information in addition to the audiogram, the Hearing4All auditory profiles were advantageous, combining a near-optimal number of profile classes (N = 13) with high clustering quality, as indicated by a low Davies-Bouldin index. In conclusion, manifold learning and intrinsic measures enable systematic comparison of auditory profiling frameworks and identify the Hearing4All auditory profile as a promising approach for future research.

preprint2022arXiv

Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features

For a better understanding of the mechanisms underlying speech perception and the contribution of different signal features, computational models of speech recognition have a long tradition in hearing research. Due to the diverse range of situations in which speech needs to be recognized, these models need to be generalizable across many acoustic conditions, speakers, and languages. This contribution examines the importance of different features for speech recognition predictions of plain and Lombard speech for English in comparison to Cantonese in stationary and modulated noise. While Cantonese is a tonal language that encodes information in spectro-temporal features, the Lombard effect is known to be associated with spectral changes in the speech signal. These contrasting properties of tonal languages and the Lombard effect form an interesting basis for the assessment of speech recognition models. Here, an automatic speech recognition-based ASR model using spectral or spectro-temporal features is evaluated with empirical data. The results indicate that spectro-temporal features are crucial in order to predict the speaker-specific speech recognition threshold SRT$_{50}$ in both Cantonese and English as well as to account for the improvement of speech recognition in modulated noise, while effects due to Lombard speech can already be predicted by spectral features.

preprint2022arXiv

Prediction of speech intelligibility with DNN-based performance measures

This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities. This model does not require the clean speech reference nor the word labels during testing as the ASR decoding step, which finds the most likely sequence of words given phoneme posterior probabilities, is omitted. The model is evaluated via the root-mean-squared error between the predicted and observed speech reception thresholds from eight normal-hearing listeners. The recognition task consists of identifying noisy words from a German matrix sentence test. The speech material was mixed with eight noise maskers covering different modulation types, from speech-shaped stationary noise to a single-talker masker. The prediction performance is compared to five established models and an ASR-model using word labels. Two combinations of features and networks were tested. Both include temporal information either at the feature level (amplitude modulation filterbanks and a feed-forward network) or captured by the architecture (mel-spectrograms and a time-delay deep neural network, TDNN). The TDNN model is on par with the DNN while reducing the number of parameters by a factor of 37; this optimization allows parallel streams on dedicated hearing aid hardware as a forward-pass can be computed within the 10ms of each frame. The proposed model performs almost as well as the label-based model and produces more accurate predictions than the baseline models.

preprint2021arXiv

DARF: A data-reduced FADE version for simulations of speech recognition thresholds with real hearing aids

Developing and selecting hearing aids is a time consuming process which is simplified by using objective models. Previously, the framework for auditory discrimination experiments (FADE) accurately simulated benefits of hearing aid algorithms with root mean squared prediction errors below 3 dB. One FADE simulation requires several hours of (un)processed signals, which is obstructive when the signals have to be recorded. We propose and evaluate a data-reduced FADE version (DARF) which facilitates simulations with signals that cannot be processed digitally, but that can only be recorded in real-time. DARF simulates one speech recognition threshold (SRT) with about 30 minutes of recorded and processed signals of the (German) matrix sentence test. Benchmark experiments were carried out to compare DARF and standard FADE exhibiting small differences for stationary maskers (1 dB), but larger differences with strongly fluctuating maskers (5 dB). Hearing impairment and hearing aid algorithms seemed to reduce the differences. Hearing aid benefits were simulated in terms of speech recognition with three pairs of real hearing aids in silence ($\geq$8 dB), in stationary and fluctuating maskers in co-located (stat. 2 dB; fluct. 6 dB), and spatially separated speech and noise signals (stat. $\geq$8 dB; fluct. 8 dB). The simulations were plausible in comparison to data from literature, but a comparison with empirical data is still open. DARF facilitates objective SRT simulations with real devices with unknown signal processing in real environments. Yet, a validation of DARF for devices with unknown signal processing is still pending since it was only tested with three similar devices. Nonetheless, DARF could be used for improving as well as for developing or model-based fitting of hearing aids.

preprint2020arXiv

The Hearpiece database of individual transfer functions of an openly available in-the-ear earpiece for hearing device research

We present a database of acoustic transfer functions of the Hearpiece, an openly available multi-microphone multi-driver in-the-ear earpiece for hearing device research. The database includes HRTFs for 87 incidence directions as well as responses of the drivers, all measured at the four microphones of the Hearpiece as well as the eardrum in the occluded and open ear. The transfer functions were measured in both ears of 25 human subjects and a KEMAR with anthropometric pinnae for five reinsertions of the device. We describe the measurements of the database and analyse derived acoustic parameters of the device. All regarded transfer functions are subject to differences between subjects as well as variations due to reinsertion into the same ear. Also, the results show that KEMAR measurements represent a median human ear well for all assessed transfer functions. The database is a rich basis for development, evaluation and robustness analysis of multiple hearing device algorithms and applications. The database is openly available at https://doi.org/10.5281/zenodo.3733191.