Source author record

Chander Chandak

Chander Chandak appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Sound Computation and Language Emerging Technologies Hardware Architecture

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Streaming End-to-End Bilingual ASR Systems with Joint Language Identification

Multilingual ASR technology simplifies model training and deployment, but its accuracy is known to depend on the availability of language information at runtime. Since language identity is seldom known beforehand in real-world scenarios, it must be inferred on-the-fly with minimum latency. Furthermore, in voice-activated smart assistant systems, language identity is also required for downstream processing of ASR output. In this paper, we introduce streaming, end-to-end, bilingual systems that perform both ASR and language identification (LID) using the recurrent neural network transducer (RNN-T) architecture. On the input side, embeddings from pretrained acoustic-only LID classifiers are used to guide RNN-T training and inference, while on the output side, language targets are jointly modeled with ASR targets. The proposed method is applied to two language pairs: English-Spanish as spoken in the United States, and English-Hindi as spoken in India. Experiments show that for English-Spanish, the bilingual joint ASR-LID architecture matches monolingual ASR and acoustic-only LID accuracies. For the more challenging (owing to within-utterance code switching) case of English-Hindi, English ASR and LID metrics show degradation. Overall, in scenarios where users switch dynamically between languages, the proposed architecture offers a promising simplification over running multiple monolingual ASR models and an LID classifier in parallel.

preprint2020arXiv

Streaming Language Identification using Combination of Acoustic Representations and ASR Hypotheses

This paper presents our modeling and architecture approaches for building a highly accurate low-latency language identification system to support multilingual spoken queries for voice assistants. A common approach to solve multilingual speech recognition is to run multiple monolingual ASR systems in parallel and rely on a language identification (LID) component that detects the input language. Conventionally, LID relies on acoustic only information to detect input language. We propose an approach that learns and combines acoustic level representations with embeddings estimated on ASR hypotheses resulting in up to 50% relative reduction of identification error rate, compared to a model that uses acoustic only features. Furthermore, to reduce the processing cost and latency, we exploit a streaming architecture to identify the spoken language early when the system reaches a predetermined confidence level, alleviating the need to run multiple ASR systems until the end of input query. The combined acoustic and text LID, coupled with our proposed streaming runtime architecture, results in an average of 1500ms early identification for more than 50% of utterances, with almost no degradation in accuracy. We also show improved results by adopting a semi-supervised learning (SSL) technique using the newly proposed model architecture as a teacher model.

preprint2020arXiv

Streaming ResLSTM with Causal Mean Aggregation for Device-Directed Utterance Detection

In this paper, we propose a streaming model to distinguish voice queries intended for a smart-home device from background speech. The proposed model consists of multiple CNN layers with residual connections, followed by a stacked LSTM architecture. The streaming capability is achieved by using unidirectional LSTM layers and a causal mean aggregation layer to form the final utterance-level prediction up to the current frame. In order to avoid redundant computation during online streaming inference, we use a caching mechanism for every convolution operation. Experimental results on a device-directed vs. non device-directed task show that the proposed model yields an equal error rate reduction of 41% compared to our previous best model on this task. Furthermore, we show that the proposed model is able to accurately predict earlier in time compared to the attention-based models.

preprint2014arXiv

Complexity Analysis of Reversible Logic Synthesis

Reversible logic circuit is a necessary construction for achieving ultra low power dissipation as well as for prominent post-CMOS computing technologies such as Quantum computing. Consequently automatic synthesis of a Boolean function using elementary reversible logic gates has received significant research attention in recent times, creating the domain of reversible logic synthesis. In this paper, we study the complexity of reversible logic synthesis. The problem is separately studied for bounded-ancilla and ancilla-free optimal synthesis approaches. The computational complexity for both cases are linked to known/presumed hard problems. Finally, experiments are performed with a shortest-path based reversible logic synthesis approach and a (0-1) ILP-based formulation.

preprint2013arXiv

Designing Parity Preserving Reversible Circuits

Making a reversible circuit fault-tolerant is much more difficult than classical circuit and there have been only a few works in the area of parity-preserving reversible logic design. Moreover, all of these designs are ad hoc, based on some pre-defined parity preserving reversible gates as building blocks. In this paper, we for the first time propose a novel and systematic approach towards parity preserving reversible circuits design. We provide some related theoretical results and give two algorithms, one from reversible specification to parity preserving reversible specification and another from irreversible specification to parity preserving reversible specification. We also evaluate the effectiveness of our approach by extensive experimental results.

Chander Chandak

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Streaming End-to-End Bilingual ASR Systems with Joint Language Identification

Streaming Language Identification using Combination of Acoustic Representations and ASR Hypotheses

Streaming ResLSTM with Causal Mean Aggregation for Device-Directed Utterance Detection

Complexity Analysis of Reversible Logic Synthesis

Designing Parity Preserving Reversible Circuits