Source author record

Desh Raj

Desh Raj appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Sound Computation and Language hep-ph

Catalog footprint

What is connected

9works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Continuous Streaming Multi-Talker ASR with Dual-path Transducers

Streaming recognition of multi-talker conversations has so far been evaluated only for 2-speaker single-turn sessions. In this paper, we investigate it for multi-turn meetings containing multiple speakers using the Streaming Unmixing and Recognition Transducer (SURT) model, and show that naively extending the single-turn model to this harder setting incurs a performance penalty. As a solution, we propose the dual-path (DP) modeling strategy first used for time-domain speech separation. We experiment with LSTM and Transformer based DP models, and show that they improve word error rate (WER) performance while yielding faster convergence. We also explore training strategies such as chunk width randomization and curriculum learning for these models, and demonstrate their importance through ablation studies. Finally, we evaluate our models on the LibriCSS meeting data, where they perform competitively with offline separation-based methods.

preprint2021arXiv

Frustratingly Easy Noise-aware Training of Acoustic Models

Environmental noises and reverberation have a detrimental effect on the performance of automatic speech recognition (ASR) systems. Multi-condition training of neural network-based acoustic models is used to deal with this problem, but it requires many-folds data augmentation, resulting in increased training time. In this paper, we propose utterance-level noise vectors for noise-aware training of acoustic models in hybrid ASR. Our noise vectors are obtained by combining the means of speech frames and silence frames in the utterance, where the speech/silence labels may be obtained from a GMM-HMM model trained for ASR alignments, such that no extra computation is required beyond averaging of feature vectors. We show through experiments on AMI and Aurora-4 that this simple adaptation technique can result in 6-7% relative WER improvement. We implement several embedding-based adaptation baselines proposed in literature, and show that our method outperforms them on both the datasets. Finally, we extend our method to the online ASR setting by using frame-level maximum likelihood for the mean estimation.

preprint2021arXiv

The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

This paper provides a detailed description of the Hitachi-JHU system that was submitted to the Third DIHARD Speech Diarization Challenge. The system outputs the ensemble results of the five subsystems: two x-vector-based subsystems, two end-to-end neural diarization-based subsystems, and one hybrid subsystem. We refine each system and all five subsystems become competitive and complementary. After the DOVER-Lap based system combination, it achieved diarization error rates of 11.58 % and 14.09 % in Track 1 full and core, and 16.94 % and 20.01 % in Track 2 full and core, respectively. With their results, we won second place in all the tasks of the challenge.

preprint2020arXiv

CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge revisits the previous CHiME-5 challenge and further considers the problem of distant multi-microphone conversational speech diarization and recognition in everyday home environments. Speech material is the same as the previous CHiME-5 recordings except for accurate array synchronization. The material was elicited using a dinner party scenario with efforts taken to capture data that is representative of natural conversational speech. This paper provides a baseline description of the CHiME-6 challenge for both segmented multispeaker speech recognition (Track 1) and unsegmented multispeaker speech recognition (Track 2). Of note, Track 2 is the first challenge activity in the community to tackle an unsegmented multispeaker speech recognition scenario with a complete set of reproducible open source baselines providing speech enhancement, speaker diarization, and speech recognition modules.

preprint2020arXiv

New Hybrid Textures for Neutrino Mass Matrices

We perform a systematic investigation of the texture structures of Majorana neutrino mass matrix $M_ν$ having two texture zeros and an equality between two nonzero matrix elements, in the light of recent neutrino oscillation data. Among forty-two possible textures, it is found that only eight textures are compatible with the current experimental data at 3$σ$ C.L. Out of these phenomenologically viable textures, six follow normal mass ordering while remaining two satisfy the inverted mass ordering of neutrino mass spectrum. In the numerical analysis, we carry out a scan over the possible space of all viable patterns. We present the implications of each allowed patterns for three mixing angles (solar, reactor and atmospheric), leptonic CP-violation, neutrino mass scale and the neutrinoless double beta decay indicating strong correlations between oscillation parameters. The symmetry realization of one of the viable textures is also presented.

preprint2020arXiv

The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge

This paper summarizes the JHU team's efforts in tracks 1 and 2 of the CHiME-6 challenge for distant multi-microphone conversational speech diarization and recognition in everyday home environments. We explore multi-array processing techniques at each stage of the pipeline, such as multi-array guided source separation (GSS) for enhancement and acoustic model training data, posterior fusion for speech activity detection, PLDA score fusion for diarization, and lattice combination for automatic speech recognition (ASR). We also report results with different acoustic model architectures, and integrate other techniques such as online multi-channel weighted prediction error (WPE) dereverberation and variational Bayes-hidden Markov model (VB-HMM) based overlap assignment to deal with reverberation and overlapping speakers, respectively. As a result of these efforts, our ASR systems achieve a word error rate of 40.5% and 67.5% on tracks 1 and 2, respectively, on the evaluation set. This is an improvement of 10.8% and 10.4% absolute, over the challenge baselines for the respective tracks.

preprint2019arXiv

Probing the Information Encoded in X-vectors

Deep neural network based speaker embeddings, such as x-vectors, have been shown to perform well in text-independent speaker recognition/verification tasks. In this paper, we use simple classifiers to investigate the contents encoded by x-vector embeddings. We probe these embeddings for information related to the speaker, channel, transcription (sentence, words, phones), and meta information about the utterance (duration and augmentation type), and compare these with the information encoded by i-vectors across a varying number of dimensions. We also study the effect of data augmentation during extractor training on the information captured by x-vectors. Experiments on the RedDots data set show that x-vectors capture spoken content and channel-related information, while performing well on speaker verification tasks.

preprint2016arXiv

Deviations in Tribimaximal Mixing From Sterile Neutrino Sector

We explore the possibility of generating a non-zero $U_{e3}$ element of the neutrino mixing matrix from tribimaximal neutrino mixing by adding a light sterile neutrino to the active neutrinos. Small active-sterile mixing can provide the necessary deviation from tribimaximal mixing to generate a non-zero $θ_{13}$ and atmospheric mixing $θ_{23}$ different from maximal. Assuming no CP-violation, we study the phenomenological impact of sterile neutrinos in the context of current neutrino oscillation data. The tribimaximal pattern is broken in such a manner that the second column of tribimaximal mixing remains intact in the neutrino mixing matrix.

preprint2015arXiv

Neutrino Mass Matrices with Two Vanishing Elements/Cofactors

We study the phenomenological implications of the recent neutrino data for class B of two texture zeros and two vanishing cofactors for Majorana neutrinos in the flavor basis. We find that classes $B_{1}$($B_2$) of two texture zeros and classes $B_5$($B_6$) of two vanishing cofactors have similar predictions for neutrino oscillation parameters for the same mass hierarchy. Similar predictions for classes $B_3$($B_4$) of two texture zeros and classes $B_3$($B_4$) of two vanishing cofactors are expected. However, a preference for a shift in the quadrant of the Dirac-type CP violating phase($δ$) in contrast to the earlier analysis has been predicted for a relatively large value of the reactor neutrino mixing angle($θ_{13}$) for class B of two texture zeros and two vanishing cofactors for an inverted mass spectrum. No such shift in the quadrant of $δ$ has been found for the normal mass spectrum.

Desh Raj

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Continuous Streaming Multi-Talker ASR with Dual-path Transducers

Frustratingly Easy Noise-aware Training of Acoustic Models

The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

New Hybrid Textures for Neutrino Mass Matrices

The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge

Probing the Information Encoded in X-vectors

Deviations in Tribimaximal Mixing From Sterile Neutrino Sector

Neutrino Mass Matrices with Two Vanishing Elements/Cofactors