Source author record

Brian King

Brian King appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language eess.AS Sound physics.app-ph physics.optics

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Compute Cost Amortized Transformer for Streaming ASR

We present a streaming, Transformer-based end-to-end automatic speech recognition (ASR) architecture which achieves efficient neural inference through compute cost amortization. Our architecture creates sparse computation pathways dynamically at inference time, resulting in selective use of compute resources throughout decoding, enabling significant reductions in compute with minimal impact on accuracy. The fully differentiable architecture is trained end-to-end with an accompanying lightweight arbitrator mechanism operating at the frame-level to make dynamic decisions on each input while a tunable loss function is used to regularize the overall level of compute against predictive performance. We report empirical results from experiments using the compute amortized Transformer-Transducer (T-T) model conducted on LibriSpeech data. Our best model can achieve a 60% compute cost reduction with only a 3% relative word error rate (WER) increase.

preprint2021arXiv

End-to-End Multi-Channel Transformer for Speech Recognition

Transformers are powerful neural architectures that allow integrating different modalities using attention mechanisms. In this paper, we leverage the neural transformer architectures for multi-channel speech recognition systems, where the spectral and spatial information collected from different microphones are integrated using attention layers. Our multi-channel transformer network mainly consists of three parts: channel-wise self attention layers (CSA), cross-channel attention layers (CCA), and multi-channel encoder-decoder attention layers (EDA). The CSA and CCA layers encode the contextual relationship within and between channels and across time, respectively. The channel-attended outputs from CSA and CCA are then fed into the EDA layers to help decode the next token given the preceding ones. The experiments show that in a far-field in-house dataset, our method outperforms the baseline single-channel transformer, as well as the super-directive and neural beamformers cascaded with the transformers.

preprint2020arXiv

Elucidating the Behavior of Nanophotonic Structures Through Explainable Machine Learning Algorithms

A central challenge in the development of nanophotonic structures is identifying the optimal design for a target functionality, and understanding the physical mechanisms that enable the optimized device's capabilities. Previously investigated design methods for nanophotonic structures, including both conventional optimization approaches as well as nascent machine learning (ML) strategies, have made progress, yet they remain 'black boxes' that lack explanations for their predictions. Here we demonstrate that convolutional neural networks (CNN) trained to predict the electromagnetic response of classes of metal-dielectric-metal metamaterials, including complex freeform designs, can be explained to reveal deeper insights into the underlying physics of nanophotonic structures. Using an explainable AI (XAI) approach, we show that we can identify the importance of specific spatial regions of a nanophotonic structure for the presence or lack of an absorption peak. Our results highlight that ML strategies can be used for physics discovery, as well as design optimization, in optics and photonics.

preprint2020arXiv

Multi-view Frequency LSTM: An Efficient Frontend for Automatic Speech Recognition

Acoustic models in real-time speech recognition systems typically stack multiple unidirectional LSTM layers to process the acoustic frames over time. Performance improvements over vanilla LSTM architectures have been reported by prepending a stack of frequency-LSTM (FLSTM) layers to the time LSTM. These FLSTM layers can learn a more robust input feature to the time LSTM layers by modeling time-frequency correlations in the acoustic input signals. A drawback of FLSTM based architectures however is that they operate at a predefined, and tuned, window size and stride, referred to as 'view' in this paper. We present a simple and efficient modification by combining the outputs of multiple FLSTM stacks with different views, into a dimensionality reduced feature representation. The proposed multi-view FLSTM architecture allows to model a wider range of time-frequency correlations compared to an FLSTM model with single view. When trained on 50K hours of English far-field speech data with CTC loss followed by sMBR sequence training, we show that the multi-view FLSTM acoustic model provides relative Word Error Rate (WER) improvements of 3-7% for different speaker and acoustic environment scenarios over an optimized single FLSTM model, while retaining a similar computational footprint.

preprint2020arXiv

Multiplexed Supercell Metasurface Design and Optimization with Tandem Residual Networks

Complex nanophotonic structures hold the potential to deliver exquisitely tailored optical responses for a range of applications. Metal-insulator-metal (MIM) metasurfaces arranged in supercells, for instance, can be tailored by geometry and material choice to exhibit a variety of absorption properties and resonant wavelengths. With this flexibility, however, comes a vast space of design possibilities that classical design paradigms struggle to effectively navigate. To overcome this challenge, here we demonstrate a tandem residual network approach to efficiently generate multiplexed supercells through inverse design. By using a training dataset with several thousand full-wave electromagnetic simulations in a design space of over three trillion possible designs, the deep learning model can accurately generate a wide range of complex supercell designs given a spectral target. Beyond inverse design, the presented approach can also be used to explore the structure-property relationships of broadband absorption and emission in such supercell configurations. Thus, this study demonstrates the feasibility of high-dimensional supercell inverse design with deep neural networks that is applicable to complex nanophotonic structures composed of multiple subunit elements that may exhibit coupling.

Brian King

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Compute Cost Amortized Transformer for Streaming ASR

End-to-End Multi-Channel Transformer for Speech Recognition

Elucidating the Behavior of Nanophotonic Structures Through Explainable Machine Learning Algorithms

Multi-view Frequency LSTM: An Efficient Frontend for Automatic Speech Recognition

Multiplexed Supercell Metasurface Design and Optimization with Tandem Residual Networks