Source author record

Junmo Lee

Junmo Lee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Sound Artificial Intelligence eess.SP Emerging Technologies Machine Learning

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Avocodo: Generative Adversarial Network for Artifact-free Vocoder

Neural vocoders based on the generative adversarial neural network (GAN) have been widely used due to their fast inference speed and lightweight networks while generating high-quality speech waveforms. Since the perceptually important speech components are primarily concentrated in the low-frequency bands, most GAN-based vocoders perform multi-scale analysis that evaluates downsampled speech waveforms. This multi-scale analysis helps the generator improve speech intelligibility. However, in preliminary experiments, we discovered that the multi-scale analysis which focuses on the low-frequency bands causes unintended artifacts, e.g., aliasing and imaging artifacts, which degrade the synthesized speech waveform quality. Therefore, in this paper, we investigate the relationship between these artifacts and GAN-based vocoders and propose a GAN-based vocoder, called Avocodo, that allows the synthesis of high-fidelity speech with reduced artifacts. We introduce two kinds of discriminators to evaluate speech waveforms in various perspectives: a collaborative multi-band discriminator and a sub-band discriminator. We also utilize a pseudo quadrature mirror filter bank to obtain downsampled multi-band speech waveforms while avoiding aliasing. According to experimental results, Avocodo outperforms baseline GAN-based vocoders, both objectively and subjectively, while reproducing speech with fewer artifacts.

preprint2022arXiv

Novel Weight Update Scheme for Hardware Neural Network based on Synaptic Devices Having Abrupt LTP or LTD Characteristics

Mitigating nonlinear weight update characteristics is one of the main challenges in designing neural networks based on synaptic devices. This paper presents a novel weight update method named conditional reverse update scheme (CRUS) for hardware neural network (HNN) consisting of synaptic devices with highly nonlinear or abrupt conductance update characteristics. We formulate a linear optimization method of conductance in synaptic devices to reduce the average deviation of weight changes from those calculated by the Stochastic Gradient Rule (SGD) algorithm. We introduce a metric called update noise (UN) to analyze the training dynamics during training. We then design a weight update rule that reduces the UN averaged over the training process. The optimized network achieves >90% accuracy on the MNIST dataset under highly nonlinear long-term potentiation (LTP) and long-term depression (LTD) conditions while using inaccurate and infrequent conductance sensing. Furthermore, the proposed method shows better accuracy than previously reported nonlinear weight update mitigation techniques under the same hardware specifications and device conditions. It also exhibits robustness to temporal variations in conductance updates. We expect our scheme to relieve design requirements in device and circuit engineering and serve as a practical technique that can be applied to future HNNs.

preprint2020arXiv

Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning

This paper proposes a controllable end-to-end text-to-speech (TTS) system to control the speaking speed (speed-controllable TTS; SCTTS) of synthesized speech with sentence-level speaking-rate value as an additional input. The speaking-rate value, the ratio of the number of input phonemes to the length of input speech, is adopted in the proposed system to control the speaking speed. Furthermore, the proposed SCTTS system can control the speaking speed while retaining other speech attributes, such as the pitch, by adopting the global style token-based style encoder. The proposed SCTTS does not require any additional well-trained model or an external speech database to extract phoneme-level duration information and can be trained in an end-to-end manner. In addition, our listening tests on fast-, normal-, and slow-speed speech showed that the SCTTS can generate more natural speech than other phoneme duration control approaches which increase or decrease duration at the same rate for the entire sentence, especially in the case of slow-speed speech.

preprint2020arXiv

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

We present a novel high-fidelity real-time neural vocoder called VocGAN. A recently developed GAN-based vocoder, MelGAN, produces speech waveforms in real-time. However, it often produces a waveform that is insufficient in quality or inconsistent with acoustic characteristics of the input mel spectrogram. VocGAN is nearly as fast as MelGAN, but it significantly improves the quality and consistency of the output waveform. VocGAN applies a multi-scale waveform generator and a hierarchically-nested discriminator to learn multiple levels of acoustic properties in a balanced way. It also applies the joint conditional and unconditional objective, which has shown successful results in high-resolution image synthesis. In experiments, VocGAN synthesizes speech waveforms 416.7x faster on a GTX 1080Ti GPU and 3.24x faster on a CPU than real-time. Compared with MelGAN, it also exhibits significantly improved quality in multiple evaluation metrics including mean opinion score (MOS) with minimal additional overhead. Additionally, compared with Parallel WaveGAN, another recently developed high-fidelity vocoder, VocGAN is 6.98x faster on a CPU and exhibits higher MOS.

Junmo Lee

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Avocodo: Generative Adversarial Network for Artifact-free Vocoder

Novel Weight Update Scheme for Hardware Neural Network based on Synaptic Devices Having Abrupt LTP or LTD Characteristics

Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network