Researcher profile

Ahmed Ali

Ahmed Ali contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2023arXiv

Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition

The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language. Designing a CS-ASR system has many challenges, mainly due to data scarcity, grammatical structure complexity, and domain mismatch. The most common method for addressing CS is to train an ASR system with the available transcribed CS speech, along with monolingual data. In this work, we propose a zero-shot learning methodology for CS-ASR by augmenting the monolingual data with artificially generating CS text. We based our approach on random lexical replacements and Equivalence Constraint (EC) while exploiting aligned translation pairs to generate random and grammatically valid CS content. Our empirical results show a 65.5% relative reduction in language model perplexity, and 7.7% in ASR WER on two ecologically valid CS test sets. The human evaluation of the generated text using EC suggests that more than 80% is of adequate quality.

preprint2022arXiv

Balanced End-to-End Monolingual pre-training for Low-Resourced Indic Languages Code-Switching Speech Recognition

The success in designing Code-Switching (CS) ASR often depends on the availability of the transcribed CS resources. Such dependency harms the development of ASR in low-resourced languages such as Bengali and Hindi. In this paper, we exploit the transfer learning approach to design End-to-End (E2E) CS ASR systems for the two low-resourced language pairs using different monolingual speech data and a small set of noisy CS data. We trained the CS-ASR, following two steps: (i) building a robust bilingual ASR system using a convolution-augmented transformer (Conformer) based acoustic model and n-gram language model, and (ii) fine-tuned the entire E2E ASR with limited noisy CS data. We tested our method on MUCS 2021 challenge and achieved 3rd place in the CS track. We then tested the proposed method using noisy CS data released for Hindi-English and Bengali-English pairs in Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages (MUCS 2021) and achieved 3rd place in the CS track. Unlike, the leading two systems that benefited from crawling YouTube and learning transliteration pairs, our proposed transfer learning approach focused on using only the limited CS data with no data-cleaning or data re-segmentation. Our approach achieved 14.1% relative gain in word error rate (WER) in Hindi-English and 27.1% in Bengali-English. We provide detailed guidelines on the steps to finetune the self-attention based model for limited data for ASR. Moreover, we release the code and recipe used in this paper.

preprint2022arXiv

ClassSPLOM -- A Scatterplot Matrix to Visualize Separation of Multiclass Multidimensional Data

In multiclass classification of multidimensional data, the user wants to build a model of the classes to predict the label of unseen data. The model is trained on the data and tested on unseen data with known labels to evaluate its quality. The results are visualized as a confusion matrix which shows how many data labels have been predicted correctly or confused with other classes. The multidimensional nature of the data prevents the direct visualization of the classes so we design ClassSPLOM to give more perceptual insights about the classification results. It uses the Scatterplot Matrix (SPLOM) metaphor to visualize a Linear Discriminant Analysis projection of the data for each pair of classes and a set of Receiving Operating Curves to evaluate their trustworthiness. We illustrate ClassSPLOM on a use case in Arabic dialects identification.

preprint2022arXiv

Creating Speech-to-Speech Corpus from Dubbed Series

Dubbed series are gaining a lot of popularity in recent years with strong support from major media service providers. Such popularity is fueled by studies that showed that dubbed versions of TV shows are more popular than their subtitled equivalents. We propose an unsupervised approach to construct speech-to-speech corpus, aligned on short segment levels, to produce a parallel speech corpus in the source- and target- languages. Our methodology exploits video frames, speech recognition, machine translation, and noisy frames removal algorithms to match segments in both languages. To verify the performance of the proposed method, we apply it on long and short dubbed clips. Out of 36 hours TR-AR dubbed series, our pipeline was able to generate 17 hours of paired segments, which is about 47% of the corpus. We applied our method on another language pair, EN-AR, to ensure it is robust enough and not tuned for a specific language or a specific corpus. Regardless of the language pairs, the accuracy of the paired segments was around 70% when evaluated using human subjective evaluation. The corpus will be freely available for the research community.

preprint2020arXiv

Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition

End-to-end neural network systems for automatic speech recognition (ASR) are trained from acoustic features to text transcriptions. In contrast to modular ASR systems, which contain separately-trained components for acoustic modeling, pronunciation lexicon, and language modeling, the end-to-end paradigm is both conceptually simpler and has the potential benefit of training the entire system on the end task. However, such neural network models are more opaque: it is not clear how to interpret the role of different parts of the network and what information it learns during training. In this paper, we analyze the learned internal representations in an end-to-end ASR model. We evaluate the representation quality in terms of several classification tasks, comparing phonemes and graphemes, as well as different articulatory features. We study two languages (English and Arabic) and three datasets, finding remarkable consistency in how different properties are represented in different layers of the deep neural network.

preprint2020arXiv

Interpretation of $Y_b (10750)$ as a tetraquark and its production mechanism

Recently, the Belle Collaboration has updated the analysis of the cross sections for the processes $e^+ e^- \to Υ(nS)\, π^+ π^-$ ($n = 1,\, 2,\, 3$) in the $e^+ e^-$ center-of-mass energy range from 10.52 to 11.02 GeV. A new structure, called here $Y_b (10750)$, with the mass $M (Y_b) = (10752.7 \pm 5.9^{+0.7}_{-1.1})$ MeV and the Breit-Wigner width $Γ(Y_b) = (35.5^{+17.6 +3.9}_{-11.3 -3.3})$ MeV was observed \cite{Abdesselam:2019gth}. We interpret $Y_b (10750)$ as a compact $J^{PC} = 1^{--}$ state with a dominant tetraquark component. The mass eigenstate $Y_b (10750)$ is treated as a linear combination of the diquark-antidiquark and $b \bar b$ components due to the mixing via gluonic exchanges shown recently to arise in the limit of large number of quark colors. The mixing angle between $Y_b$ and $Υ(5S)$ can be estimated from the electronic width, recently determined to be $Γ_{ee} (Y_b) = (13.7 \pm 1.8)$ eV. The mixing provides a plausible mechanism for $Y_b (10750)$ production in high energy collisions from its $b \bar b$ component and we work out the Drell-Yan and prompt production cross sections for $p p \to Y_b (10750) \to Υ(nS)\, π^+ π^-$ at the LHC. The resonant part of the dipion invariant mass spectrum in $Y_b (10750) \to Υ(1S)\, π^+ π^-$ and the corresponding angular distribution of $π^+$-meson in the dipion rest frame are presented as an example.

preprint2020arXiv

What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context

Predicting the political bias and the factuality of reporting of entire news outlets are critical elements of media profiling, which is an understudied but an increasingly important research direction. The present level of proliferation of fake, biased, and propagandistic content online, has made it impossible to fact-check every single suspicious claim, either manually or automatically. Alternatively, we can profile entire news outlets and look for those that are likely to publish fake or biased content. This approach makes it possible to detect likely "fake news" the moment they are published, by simply checking the reliability of their source. From a practical perspective, political bias and factuality of reporting have a linguistic aspect but also a social context. Here, we study the impact of both, namely (i) what was written (i.e., what was published by the target medium, and how it describes itself on Twitter) vs. (ii) who read it (i.e., analyzing the readers of the target medium on Facebook, Twitter, and YouTube). We further study (iii) what was written about the target medium on Wikipedia. The evaluation results show that what was written matters most, and that putting all information sources together yields huge improvements over the current state-of-the-art.

preprint2020arXiv

Word Error Rate Estimation Without ASR Output: e-WER2

Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive. In this paper, we continue our effort in estimating WER using acoustic, lexical and phonotactic features. Our novel approach to estimate the WER uses a multistream end-to-end architecture. We report results for systems using internal speech decoder features (glass-box), systems without speech decoder features (black-box), and for systems without having access to the ASR system (no-box). The no-box system learns joint acoustic-lexical representation from phoneme recognition results along with MFCC acoustic features to estimate WER. Considering WER per sentence, our no-box system achieves 0.56 Pearson correlation with the reference evaluation and 0.24 root mean square error (RMSE) across 1,400 sentences. The estimated overall WER by e-WER2 is 30.9% for a three hours test set, while the WER computed using the reference transcriptions was 28.5%.

preprint2019arXiv

Mass spectrum of the hidden-charm pentaquarks in the compact diquark model

The LHCb collaboration have recently updated their analysis of the resonant $J/ψp$ mass spectrum in the decay $Λ_b^0 \to J/ψp K^-$, making use of their combined Run 1 and Run 2 data. In the updated analysis, three narrow states, $P_c (4312)^+$, $P_c (4440)^+$,and $P_c (4457)^+$, are observed. The spin-parity assignments of these states are not yet known. We interpret these narrow resonances as compact hidden-charm diquark-diquark-antiquark pentaquarks. Using an effective Hamiltonian, based on constituent quarks and diquarks, we calculate the pentaquark mass spectrum for the complete $SU (3)_F$ lowest $S$- and $P$-wave multiplets, taking into account dominant spin-spin, spin-orbit, orbital and tensor interactions. The resulting spectrum is very rich and we work out the quark flavor compositions, masses, and $J^P$ quantum numbers of the pentaquarks. However, heavy quark symmetry restricts the observable states in $Λ_b$-baryon, as well as in the decays of the other weakly-decaying $b$-baryons, $Ξ_b$ and $Ω_b$. In addition, some of the pentaquark states are estimated to lie below the $J/ψp$ threshold in $Λ_b$-decays (and corresponding thresholds in $Ξ_b$- and $Ω_b$-decays). They decay via $c \bar c$ annihilation into light hadrons or a dilepton pair, and are expected to be narrower than the $P_c$-states observed. We anticipate their discovery, as well as of the other pentaquark states present in the spectrum at the LHC, and in the long-term future at a Tera-$Z$ factory.