Source author record

Jan "Honza'' Černocký

Jan "Honza'' Černocký appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Machine Learning Sound Artificial Intelligence

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Speaker adaptation for Wav2vec2 based dysarthric ASR

Dysarthric speech recognition has posed major challenges due to lack of training data and heavy mismatch in speaker characteristics. Recent ASR systems have benefited from readily available pretrained models such as wav2vec2 to improve the recognition performance. Speaker adaptation using fMLLR and xvectors have provided major gains for dysarthric speech with very little adaptation data. However, integration of wav2vec2 with fMLLR features or xvectors during wav2vec2 finetuning is yet to be explored. In this work, we propose a simple adaptation network for fine-tuning wav2vec2 using fMLLR features. The adaptation network is also flexible to handle other speaker adaptive features such as xvectors. Experimental analysis show steady improvements using our proposed approach across all impairment severity levels and attains 57.72\% WER for high severity in UASpeech dataset. We also performed experiments on German dataset to substantiate the consistency of our proposed approach across diverse domains.

preprint2022arXiv

Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries

In this paper, we demonstrate a method for training speaker embedding extractors using weak annotation. More specifically, we are using the full VoxCeleb recordings and the name of the celebrities appearing on each video without knowledge of the time intervals the celebrities appear in the video. We show that by combining a baseline speaker diarization algorithm that requires no training or parameter tuning, a modified loss with aggregation over segments, and a two-stage training approach, we are able to train a competitive ResNet-based embedding extractor. Finally, we experiment with two different aggregation functions and analyze their behaviour in terms of their gradients.

preprint2020arXiv

BUT Opensat 2019 Speech Recognition System

The paper describes the BUT Automatic Speech Recognition (ASR) systems submitted for OpenSAT evaluations under two domain categories such as low resourced languages and public safety communications. The first was challenging due to lack of training data, therefore various architectures and multilingual approaches were employed. The combination led to superior performance. The second domain was challenging due to recording in extreme conditions such as specific channel, speaker under stress and high levels of noise. Data augmentation process was inevitable to get reasonably good performance.

Jan "Honza'' Černocký

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Speaker adaptation for Wav2vec2 based dysarthric ASR

Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries

BUT Opensat 2019 Speech Recognition System

Jan &#34;Honza&#39;&#39; Černocký

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Speaker adaptation for Wav2vec2 based dysarthric ASR

Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries

BUT Opensat 2019 Speech Recognition System

Jan "Honza'' Černocký