Source author record

Kilol Gupta

Kilol Gupta appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language eess.AS Machine Learning Sound

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

UserLibri: A Dataset for ASR Personalization Using Only Text

Personalization of speech models on mobile devices (on-device personalization) is an active area of research, but more often than not, mobile devices have more text-only data than paired audio-text data. We explore training a personalized language model on text-only data, used during inference to improve speech recognition performance for that user. We experiment on a user-clustered LibriSpeech corpus, supplemented with personalized text-only data for each user from Project Gutenberg. We release this User-Specific LibriSpeech (UserLibri) dataset to aid future personalization research. LibriSpeech audio-transcript pairs are grouped into 55 users from the test-clean dataset and 52 users from test-other. We are able to lower the average word error rate per user across both sets in streaming and nonstreaming models, including an improvement of 2.5 for the harder set of test-other users when streaming.

preprint2020arXiv

Context-Aware Attention for Understanding Twitter Abuse

The original goal of any social media platform is to facilitate users to indulge in healthy and meaningful conversations. But more often than not, it has been found that it becomes an avenue for wanton attacks. We want to alleviate this issue and hence we try to provide a detailed analysis of how abusive behavior can be monitored in Twitter. The complexity of the natural language constructs makes this task challenging. We show how applying contextual attention to Long Short Term Memory networks help us give near state of art results on multiple benchmarks abuse detection data sets from Twitter.

preprint2020arXiv

Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features

This paper presents a simple yet effective method to achieve prosody transfer from a reference speech signal to synthesized speech. The main idea is to incorporate well-known acoustic correlates of prosody such as pitch and loudness contours of the reference speech into a modern neural text-to-speech (TTS) synthesizer such as Tacotron2 (TC2). More specifically, a small set of acoustic features are extracted from reference audio and then used to condition a TC2 synthesizer. The trained model is evaluated using subjective listening tests and a novel objective evaluation of prosody transfer is proposed. Listening tests show that the synthesized speech is rated as highly natural and that prosody is successfully transferred from the reference speech signal to the synthesized signal.

Kilol Gupta

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

UserLibri: A Dataset for ASR Personalization Using Only Text

Context-Aware Attention for Understanding Twitter Abuse

Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features