Source author record

Muhammad Umar Farooq

Muhammad Umar Farooq appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language eess.AS Computer Vision Information Theory math.IT

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CTC-DID: CTC-Based Arabic dialect identification for streaming applications

This paper proposes a Dialect Identification (DID) approach inspired by the Connectionist Temporal Classification (CTC) loss function as used in Automatic Speech Recognition (ASR). CTC-DID frames the dialect identification task as a limited-vocabulary ASR system, where dialect tags are treated as a sequence of labels for a given utterance. For training, the repetition of dialect tags in transcriptions is estimated either using a proposed Language-Agnostic Heuristic (LAH) approach or a pre-trained ASR model. The method is evaluated on the low-resource Arabic Dialect Identification (ADI) task, with experimental results demonstrating that an SSL-based CTC-DID model, trained on a limited dataset, outperforms both fine-tuned Whisper and ECAPA-TDNN models. Notably, CTC-DID also surpasses these models in zero-shot evaluation on the Casablanca dataset. The proposed approach is found to be more robust to shorter utterances and is shown to be easily adaptable for streaming, real-time applications, with minimal performance degradation.

preprint2022arXiv

Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition

Multilingual automatic speech recognition (ASR) systems mostly benefit low resource languages but suffer degradation in performance across several languages relative to their monolingual counterparts. Limited studies have focused on understanding the languages behaviour in the multilingual speech recognition setups. In this paper, a novel data-driven approach is proposed to investigate the cross-lingual acoustic-phonetic similarities. This technique measures the similarities between posterior distributions from various monolingual acoustic models against a target speech signal. Deep neural networks are trained as mapping networks to transform the distributions from different acoustic models into a directly comparable form. The analysis observes that the languages closeness can not be truly estimated by the volume of overlapping phonemes set. Entropy analysis of the proposed mapping networks exhibits that a language with lesser overlap can be more amenable to cross-lingual transfer, and hence more beneficial in the multilingual setup. Finally, the proposed posterior transformation approach is leveraged to fuse monolingual models for a target language. A relative improvement of ~8% over monolingual counterpart is achieved.

preprint2022arXiv

Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

Multilingual speech recognition has drawn significant attention as an effective way to compensate data scarcity for low-resource languages. End-to-end (e2e) modelling is preferred over conventional hybrid systems, mainly because of no lexicon requirement. However, hybrid DNN-HMMs still outperform e2e models in limited data scenarios. Furthermore, the problem of manual lexicon creation has been alleviated by publicly available trained models of grapheme-to-phoneme (G2P) and text to IPA transliteration for a lot of languages. In this paper, a novel approach of hybrid DNN-HMM acoustic models fusion is proposed in a multilingual setup for the low-resource languages. Posterior distributions from different monolingual acoustic models, against a target language speech signal, are fused together. A separate regression neural network is trained for each source-target language pair to transform posteriors from source acoustic model to the target language. These networks require very limited data as compared to the ASR training. Posterior fusion yields a relative gain of 14.65% and 6.5% when compared with multilingual and monolingual baselines respectively. Cross-lingual model fusion shows that the comparable results can be achieved without using posteriors from the language dependent ASR.

preprint2020arXiv

Generalized LDPC Codes with Convolutional Code Constraints

Braided convolutional codes (BCCs) are a class of spatially coupled turbo-like codes that can be described by a $(2,3)$-regular compact graph. In this paper, we introduce a family of $(d_v,d_c)$-regular GLDPC codes with convolutional code constraints (CC-GLDPC codes), which form an extension of classical BCCs to arbitrary regular graphs. In order to characterize the performance in the waterfall and error floor regions, we perform an analysis of the density evolution thresholds as well as the finite-length ensemble weight enumerators and minimum distances of the ensembles. In particular, we consider various ensembles of overall rate $R=1/3$ and $R=1/2$ and study the trade-off between variable node degree and strength of the component codes. We also compare the results to corresponding classical LDPC codes with equal degrees and rates. It is observed that for the considered LDPC codes with variable node degree $d_v>2$, we can find a CC-GLDPC code with smaller $d_v$ that offers similar or better performance in terms of BP and MAP thresholds at the expense of a negligible loss in the minimum distance.

preprint2020arXiv

Motion Corrected Multishot MRI Reconstruction Using Generative Networks with Sensitivity Encoding

Multishot Magnetic Resonance Imaging (MRI) is a promising imaging modality that can produce a high-resolution image with relatively less data acquisition time. The downside of multishot MRI is that it is very sensitive to subject motion and even small amounts of motion during the scan can produce artifacts in the final MR image that may cause misdiagnosis. Numerous efforts have been made to address this issue; however, all of these proposals are limited in terms of how much motion they can correct and the required computational time. In this paper, we propose a novel generative networks based conjugate gradient SENSE (CG-SENSE) reconstruction framework for motion correction in multishot MRI. The proposed framework first employs CG-SENSE reconstruction to produce the motion-corrupted image and then a generative adversarial network (GAN) is used to correct the motion artifacts. The proposed method has been rigorously evaluated on synthetically corrupted data on varying degrees of motion, numbers of shots, and encoding trajectories. Our analyses (both quantitative as well as qualitative/visual analysis) establishes that the proposed method significantly robust and outperforms state-of-the-art motion correction techniques and also reduces severalfold of computational times.

Muhammad Umar Farooq

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

CTC-DID: CTC-Based Arabic dialect identification for streaming applications

Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition

Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

Generalized LDPC Codes with Convolutional Code Constraints

Motion Corrected Multishot MRI Reconstruction Using Generative Networks with Sensitivity Encoding