Source author record

Ali Abavisani

Ali Abavisani appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Sound Computation and Language eess.SP Machine Learning Quantitative Methods

Catalog footprint

What is connected

2works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain unknown. Past works explored multilingual training, transfer learning, as well as zero-shot learning in order to build ASR systems for these low-resource languages. While it has been shown that the pooling of resources from multiple languages is helpful, we have not yet seen a successful application of an ASR model to a language unseen during training. A crucial step in the adaptation of ASR from seen to unseen languages is the creation of the phone inventory of the unseen language. The ultimate goal of our work is to build the phone inventory of a language unseen during training in an unsupervised way without any knowledge about the language. In this paper, we 1) investigate the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; 2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and 3) present different methods to build a phone inventory of an unseen language in an unsupervised way. To that end, we conducted mono-, multi-, and crosslingual experiments on a set of 13 phonetically diverse languages and several in-depth analyses. We found a number of universal phone tokens (IPA symbols) that are well-recognized cross-linguistically. Through a detailed analysis of results, we conclude that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.

preprint2019arXiv

The role of cue enhancement and frequency fine-tuning in hearing impaired phone recognition

A speech-based hearing test is designed to identify the susceptible error-prone phones for individual hearing impaired (HI) ear. Only robust tokens in the experiment noise levels had been chosen for the test. The noise-robustness of tokens is measured as SNR90 of the token, which is the signal to the speech-weighted noise ratio where a normal hearing (NH) listener would recognize the token with an accuracy of 90% on average. Two sets of tokens T1 and T2 having the same consonant-vowels but different talkers with distinct SNR90 had been presented with flat gain at listeners' most comfortable level. We studied the effects of frequency fine-tuning of the primary cue by presenting tokens of the same consonant but different vowels with similar SNR90. Additionally, we investigated the role of changing the intensity of primary cue in HI phone recognition, by presenting tokens from both sets T1 and T2. On average, 92% of tokens are improved when we replaced the CV with the same CV but with a more robust talker. Additionally, using CVs with similar SNR90, on average, tokens are improved by 75%, 71%, 63%, and 72%, when we replaced vowels /A, ae, I, E/, respectively. The confusion pattern in each case provides insight into how these changes affect the phone recognition in each HI ear. We propose to prescribe hearing aid amplification tailored to individual HI ears, based on the confusion pattern, the response from cue enhancement, and the response from frequency fine-tuning of the cue.