Source author record

Jon Barker

Jon Barker appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Sound Artificial Intelligence astro-ph.GA astro-ph.HE astro-ph.SR Computation and Language eess.SP Quantitative Methods

Catalog footprint

What is connected

6works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Quantifying the dust in SN 2012aw and iPTF14hls with ORBYTS

Core-collapse supernovae (CCSNe) are potentially capable of producing large quantities of dust, with strong evidence that ejecta dust masses can grow significantly over extended periods of time. Red-blue asymmetries in the broad emission lines of CCSNe can be modelled using the Monte Carlo radiative transfer code DAMOCLES, to determine ejecta dust masses. To facilitate easier use of DAMOCLES, we present a Tkinter graphical user interface (GUI) running DAMOCLES. The GUI was tested by high school students as part of the Original Research By Young Twinkle Students (ORBYTS) programme, who used it to measure the dust masses formed at two epochs in two Type IIP CCSNe: SN 2012aw and iPTF14hls, demonstrating that a wide range of people can contribute significantly to scientific advancement. Bayesian methods were used to quantify uncertainties on our model parameters. From the presence of a red scattering wing in the day 1863 H$α$ profile of SN 2012aw, we were able to constrain the dust composition to large (radius $>0.1 μ$m) silicate grains, with a dust mass of $6.0^{+21.9}_{-3.6}\times10^{-4} M_\odot$. From the day 1158 H$α$ profile of SN 2012aw, we found a dust mass of $3.0^{+14}_{-2.5}\times10^{-4}$ M$_\odot$. For iPTF14hls, we found a day 1170 dust mass of 8.1 $^{+81}_{-7.6}\times10^{-5}$ M$_{\odot}$ for a dust composition consisting of 50% amorphous carbon and 50% astronomical silicate. At 1000 days post explosion, SN 2012aw and iPTF14hls have formed less dust than SN 1987A, suggesting that SN 1987A could form larger dust masses than other Type IIP's.

preprint2022arXiv

Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition

End-to-end models have achieved significant improvement on automatic speech recognition. One common method to improve performance of these models is expanding the data-space through data augmentation. Meanwhile, human auditory inspired front-ends have also demonstrated improvement for automatic speech recognisers. In this work, a well-verified auditory-based model, which can simulate various hearing abilities, is investigated for the purpose of data augmentation for end-to-end speech recognition. By introducing the auditory model into the data augmentation process, end-to-end systems are encouraged to ignore variation from the signal that cannot be heard and thereby focus on robust features for speech recognition. Two mechanisms in the auditory model, spectral smearing and loudness recruitment, are studied on the LibriSpeech dataset with a transformer-based end-to-end model. The results show that the proposed augmentation methods can bring statistically significant improvement on the performance of the state-of-the-art SpecAugment.

preprint2022arXiv

Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners

An accurate objective speech intelligibility prediction algorithms is of great interest for many applications such as speech enhancement for hearing aids. Most algorithms measures the signal-to-noise ratios or correlations between the acoustic features of clean reference signals and degraded signals. However, these hand-picked acoustic features are usually not explicitly correlated with recognition. Meanwhile, deep neural network (DNN) based automatic speech recogniser (ASR) is approaching human performance in some speech recognition tasks. This work leverages the hidden representations from DNN-based ASR as features for speech intelligibility prediction in hearing-impaired listeners. The experiments based on a hearing aid intelligibility database show that the proposed method could make better prediction than a widely used short-time objective intelligibility (STOI) based binaural measure.

preprint2022arXiv

Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction

Non-intrusive intelligibility prediction is important for its application in realistic scenarios, where a clean reference signal is difficult to access. The construction of many non-intrusive predictors require either ground truth intelligibility labels or clean reference signals for supervised learning. In this work, we leverage an unsupervised uncertainty estimation method for predicting speech intelligibility, which does not require intelligibility labels or reference signals to train the predictor. Our experiments demonstrate that the uncertainty from state-of-the-art end-to-end automatic speech recognition (ASR) models is highly correlated with speech intelligibility. The proposed method is evaluated on two databases and the results show that the unsupervised uncertainty measures of ASR models are more correlated with speech intelligibility from listening results than the predictions made by widely used intrusive methods.

preprint2021arXiv

The Use of Voice Source Features for Sung Speech Recognition

In this paper, we ask whether vocal source features (pitch, shimmer, jitter, etc) can improve the performance of automatic sung speech recognition, arguing that conclusions previously drawn from spoken speech studies may not be valid in the sung speech domain. We first use a parallel singing/speaking corpus (NUS-48E) to illustrate differences in sung vs spoken voicing characteristics including pitch range, syllables duration, vibrato, jitter and shimmer. We then use this analysis to inform speech recognition experiments on the sung speech DSing corpus, using a state of the art acoustic model and augmenting conventional features with various voice source parameters. Experiments are run with three standard (increasingly large) training sets, DSing1 (15.1 hours), DSing3 (44.7 hours) and DSing30 (149.1 hours). Pitch combined with degree of voicing produces a significant decrease in WER from 38.1% to 36.7% when training with DSing1 however smaller decreases in WER observed when training with the larger more varied DSing3 and DSing30 sets were not seen to be statistically significant. Voicing quality characteristics did not improve recognition performance although analysis suggests that they do contribute to an improved discrimination between voiced/unvoiced phoneme pairs.

preprint2020arXiv

CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge revisits the previous CHiME-5 challenge and further considers the problem of distant multi-microphone conversational speech diarization and recognition in everyday home environments. Speech material is the same as the previous CHiME-5 recordings except for accurate array synchronization. The material was elicited using a dinner party scenario with efforts taken to capture data that is representative of natural conversational speech. This paper provides a baseline description of the CHiME-6 challenge for both segmented multispeaker speech recognition (Track 1) and unsegmented multispeaker speech recognition (Track 2). Of note, Track 2 is the first challenge activity in the community to tackle an unsegmented multispeaker speech recognition scenario with a complete set of reproducible open source baselines providing speech enhancement, speaker diarization, and speech recognition modules.

Jon Barker

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Quantifying the dust in SN 2012aw and iPTF14hls with ORBYTS

Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition

Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners

Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction

The Use of Voice Source Features for Sung Speech Recognition

CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings