Researcher profile

Yichen Han

Yichen Han contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis

In recent years, neural network based methods for multi-speaker text-to-speech synthesis (TTS) have made significant progress. However, the current speaker encoder models used in these methods still cannot capture enough speaker information. In this paper, we focus on accurate speaker encoder modeling and propose an end-to-end method that can generate high-quality speech and better similarity for both seen and unseen speakers. The proposed architecture consists of three separately trained components: a speaker encoder based on the state-of-the-art ECAPA-TDNN model which is derived from speaker verification task, a FastSpeech2 based synthesizer, and a HiFi-GAN vocoder. The comparison among different speaker encoder models shows our proposed method can achieve better naturalness and similarity. To efficiently evaluate our synthesized speech, we are the first to adopt deep learning based automatic MOS evaluation methods to assess our results, and these methods show great potential in automatic speech quality assessment.

preprint2022arXiv

Towards Visualization of Time-Series Ecological Momentary Assessment (EMA) Data on Standalone Voice-First Virtual Assistants

Population aging is an increasingly important consideration for health care in the 21th century, and continuing to have access and interact with digital health information is a key challenge for aging populations. Voice-based Intelligent Virtual Assistants (IVAs) are promising to improve the Quality of Life (QoL) of older adults, and coupled with Ecological Momentary Assessments (EMA) they can be effective to collect important health information from older adults, especially when it comes to repeated time-based events. However, this same EMA data is hard to access for the older adult: although the newest IVAs are equipped with a display, the effectiveness of visualizing time-series based EMA data on standalone IVAs has not been explored. To investigate the potential opportunities for visualizing time-series based EMA data on standalone IVAs, we designed a prototype system, where older adults are able to query and examine the time-series EMA data on Amazon Echo Show - a widely used commercially available standalone screen-based IVA. We conducted a preliminary semi-structured interview with a geriatrician and an older adult, and identified three findings that should be carefully considered when designing such visualizations.