Source author record

Nathan Howard

Nathan Howard appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Sound eess.SP physics.plasm-ph

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy

Acoustic Echo Cancellation (AEC) is essential for accurate recognition of queries spoken to a smart speaker that is playing out audio. Previous work has shown that a neural AEC model operating on log-mel spectral features (denoted "logmel" hereafter) can greatly improve Automatic Speech Recognition (ASR) accuracy when optimized with an auxiliary loss utilizing a pre-trained ASR model encoder. In this paper, we develop a conformer-based waveform-domain neural AEC model inspired by the "TasNet" architecture. The model is trained by jointly optimizing Negative Scale-Invariant SNR (SISNR) and ASR losses on a large speech dataset. On a realistic rerecorded test set, we find that cascading a linear adaptive AEC and a waveform-domain neural AEC is very effective, giving 56-59% word error rate (WER) reduction over the linear AEC alone. On this test set, the 1.6M parameter waveform-domain neural AEC also improves over a larger 6.5M parameter logmel-domain neural AEC model by 20-29% in easy to moderate conditions. By operating on smaller frames, the waveform neural model is able to perform better at smaller sizes and is better suited for applications where memory is limited.

preprint2022arXiv

Mask scalar prediction for improving robust automatic speech recognition

Using neural network based acoustic frontends for improving robustness of streaming automatic speech recognition (ASR) systems is challenging because of the causality constraints and the resulting distortion that the frontend processing introduces in speech. Time-frequency masking based approaches have been shown to work well, but they need additional hyper-parameters to scale the mask to limit speech distortion. Such mask scalars are typically hand-tuned and chosen conservatively. In this work, we present a technique to predict mask scalars using an ASR-based loss in an end-to-end fashion, with minimal increase in the overall model size and complexity. We evaluate the approach on two robust ASR tasks: multichannel enhancement in the presence of speech and non-speech noise, and acoustic echo cancellation (AEC). Results show that the presented algorithm consistently improves word error rate (WER) without the need for any additional tuning over strong baselines that use hand-tuned hyper-parameters: up to 16% for multichannel enhancement in noisy conditions, and up to 7% for AEC.

preprint2020arXiv

The Dependence of the Impurity Transport on the Dominant Turbulent Regime in ELM-y H-mode Discharges

Laser blow-off injections of aluminum and tungsten have been performed on the DIII-D tokamak to investigate the variation of impurity transport in a set of dedicated ion and electron heating scans with a fixed value of the external torque. The particle transport is quantified via the Bayesian inference method, which, constrained by a combination of a charge exchange recombination spectroscopy, soft X-ray measurements, and VUV spectroscopy provides a detailed uncertainty quantification of the transport coefficients. Contrasting discharge phases with a dominant electron and ion heating reveal a factor of 30 increase in midradius impurity diffusion and a 3-fold drop in the impurity confinement time when additional electron heating is applied. Further, the calculated stationary aluminum density profiles reverse from peaked in electron heated to hollow in the ion heated case, following a similar trend as electron and carbon density profiles. Comparable values of a core diffusion have been observed for W and Al ions, while differences in the propagation dynamics of these impurities are attributed to pedestal and edge transport. Modeling of the core transport with non-linear gyrokinetics code CGYRO [J. Candy and E. Belly J. Comput. Phys. 324,73 (2016)], significantly underpredicts the magnitude of the variation in Al transport. The experiment demonstrates a 3-times steeper increase of impurity diffusion with additional electron heat flux and 10-times lower diffusion in ion heated case than predicted by the modeling. However, the CGYRO model correctly predicts that the Al diffusion dramatically increases below the linear threshold for the transition from the ion temperature gradient (ITG) to trapped electron mode (TEM).