Researcher profile

Kai-Chun Liu

Kai-Chun Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application

This study presents a deep learning-based speech signal-processing mobile application known as CITISEN. The CITISEN provides three functions: speech enhancement (SE), model adaptation (MA), and background noise conversion (BNC), allowing CITISEN to be used as a platform for utilizing and evaluating SE models and flexibly extend the models to address various noise environments and users. For SE, a pretrained SE model downloaded from the cloud server is used to effectively reduce noise components from instant or saved recordings provided by users. For encountering unseen noise or speaker environments, the MA function is applied to promote CITISEN. A few audio samples recording on a noisy environment are uploaded and used to adapt the pretrained SE model on the server. Finally, for BNC, CITISEN first removes the background noises through an SE model and then mixes the processed speech with new background noise. The novel BNC function can evaluate SE performance under specific conditions, cover people's tracks, and provide entertainment. The experimental results confirmed the effectiveness of SE, MA, and BNC functions. Compared with the noisy speech signals, the enhanced speech signals achieved about 6\% and 33\% of improvements, respectively, in terms of short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ). With MA, the STOI and PESQ could be further improved by approximately 6\% and 11\%, respectively. Finally, the BNC experiment results indicated that the speech signals converted from noisy and silent backgrounds have a close scene identification accuracy and similar embeddings in an acoustic scene classification model. Therefore, the proposed BNC can effectively convert the background noise of a speech signal and be a data augmentation method when clean speech signals are unavailable.

preprint2022arXiv

EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement

Multimodal learning has been proven to be an effective method to improve speech enhancement (SE) performance, especially in challenging situations such as low signal-to-noise ratios, speech noise, or unseen noise types. In previous studies, several types of auxiliary data have been used to construct multimodal SE systems, such as lip images, electropalatography, or electromagnetic midsagittal articulography. In this paper, we propose a novel EMGSE framework for multimodal SE, which integrates audio and facial electromyography (EMG) signals. Facial EMG is a biological signal containing articulatory movement information, which can be measured in a non-invasive way. Experimental results show that the proposed EMGSE system can achieve better performance than the audio-only SE system. The benefits of fusing EMG signals with acoustic signals for SE are notable under challenging circumstances. Furthermore, this study reveals that cheek EMG is sufficient for SE.