Researcher profile

Patrick A. Naylor

Patrick A. Naylor contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2023arXiv

Uncertainty Quantification in Machine Learning for Joint Speaker Diarization and Identification

This paper studies modulation spectrum features ($Φ$) and mel-frequency cepstral coefficients ($Ψ$) in joint speaker diarization and identification (JSID). JSID is important as speaker diarization on its own to distinguish speakers is insufficient for many applications, it is often necessary to identify speakers as well. Machine learning models are set up using convolutional neural networks (CNNs) on $Φ$ and recurrent neural networks $\unicode{x2013}$ long short-term memory (LSTMs) on $Ψ$, then concatenating into fully connected layers. Experiment 1 shows models on both $Φ$ and $Ψ$ have better diarization error rates (DERs) than models on either alone; a CNN on $Φ$ has DER 29.09\%, compared to 27.78\% for a LSTM on $Ψ$ and 19.44\% for a model on both. Experiment 1 also investigates aleatoric uncertainties and shows the model on both $Φ$ and $Ψ$ has mean entropy 0.927~bits (out of 4~bits) for correct predictions compared to 1.896~bits for incorrect predictions which, along with entropy histogram shapes, shows the model helpfully indicates where it is uncertain. Experiment 2 investigates epistemic uncertainties as well as aleatoric using Monte Carlo dropout (MCD). It compares models on both $Φ$ and $Ψ$ with models trained on x-vectors ($X$), before applying Kalman filter smoothing on epistemic uncertainties for resegmentation and model ensembles. While the two models on $X$ (DERs 10.23\% and 9.74\%) outperform those on $Φ$ and $Ψ$ (DER 17.85\%) after their individual Kalman filter smoothing, combining them using a Kalman filter smoothing method improves the DER to 9.29\%. Aleatoric uncertainties are higher for incorrect predictions. Both Experiments show models on $Φ$ do not distinguish overlapping speakers as well as anticipated. However, Experiment 2 shows model ensembles do better with overlapping speakers than individual models do.

preprint2022arXiv

Spatial Processing Front-End For Distant ASR Exploiting Self-Attention Channel Combinator

We present a novel multi-channel front-end based on channel shortening with theWeighted Prediction Error (WPE) method followed by a fixed MVDR beamformer used in combination with a recently proposed self-attention-based channel combination (SACC) scheme, for tackling the distant ASR problem. We show that the proposed system used as part of a ContextNet based end-to-end (E2E) ASR system outperforms leading ASR systems as demonstrated by a 21.6% reduction in relative WER on a multi-channel LibriSpeech playback dataset. We also show how dereverberation prior to beamforming is beneficial and compare the WPE method with a modified neural channel shortening approach. An analysis of the non-intrusive estimate of the signal C50 confirms that the 8 channel WPE method provides significant dereverberation of the signals (13.6 dB improvement). We also show how the weights of the SACC system allow the extraction of accurate spatial information which can be beneficial for other speech processing applications like diarization.

preprint2020arXiv

Time-Frequency Analysis and Parameterisation of Knee Sounds for Non-invasive Detection of Osteoarthritis

Objective: In this work the potential of non-invasive detection of knee osteoarthritis is investigated using the sounds generated by the knee joint during walking. Methods: The information contained in the time-frequency domain of these signals and its compressed representations is exploited and their discriminant properties are studied. Their efficacy for the task of normal vs abnormal signal classification is evaluated using a comprehensive experimental framework. Based on this, the impact of the feature extraction parameters on the classification performance is investigated using Classification and Regression Trees (CART), Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) classifiers. Results: It is shown that classification is successful with an area under the Receiver Operating Characteristic (ROC) curve of 0.92. Conclusion: The analysis indicates improvements in classification performance when using non-uniform frequency scaling and identifies specific frequency bands that contain discriminative features. Significance: Contrary to other studies that focus on sit-to-stand movements and knee flexion/extension, this study used knee sounds obtained during walking. The analysis of such signals leads to non-invasive detection of knee osteoarthritis with high accuracy and could potentially extend the range of available tools for the assessment of the disease as a more practical and cost effective method without requiring clinical setups.