Source author record

Cheng Yu

Cheng Yu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.AP eess.AS Sound Machine Learning Computation and Language Artificial Intelligence Computer Vision eess.SP

Catalog footprint

What is connected

24works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing

Recent image editing models have achieved strong visual fidelity but often struggle with tasks requiring complex reasoning. To investigate and enhance the reasoning-grounded planning for image editing, we propose DDA-Thinker, a Thinker-centric framework designed for the independent optimization of a planning module (Thinker) over a fixed generative model (Editor). This decoupled Thinker-centric paradigm facilitates a controlled analysis of the planning module and makes its contribution under a fixed Editor easier to assess. To effectively guide this Thinker, we introduce a dual-atomic reinforcement learning framework. This framework decomposes feedback into two distinct atomic rewards implemented through verifiable checklists: a cognitive-atomic reward to directly assess the quality of the Thinker's executable plan, which serves as the actionable outcome of the Thinker's reasoning, and a visual-atomic reward to assess the final image quality. To improve checklist quality, our checklist synthesis is grounded not only in the source image and user instruction but also in a rational reference description of the ideal post-edit scene. To support this training, we further develop a two-stage data curation pipeline that first synthesizes a diverse and reasoning-focused dataset, then applies difficulty-aware refinement to curate an effective training curriculum for reinforcement learning. Extensive experiments on reasoning-driven image editing benchmarks, including RISE-Bench and KRIS-Bench, demonstrate that our approach substantially improves overall performance. Our method enables a community model to achieve results competitive with strong proprietary models, highlighting the practical potential of Thinker-centric optimization under a fixed-editor setting.

preprint2022arXiv

Conditional Diffusion Probabilistic Model for Speech Enhancement

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are still lagging behind in speech enhancement. This work leverages recent advances in diffusion probabilistic models, and proposes a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes. More specifically, we propose a generalized formulation of the diffusion probabilistic model named conditional diffusion probabilistic model that, in its reverse process, can adapt to non-Gaussian real noises in the estimated speech signal. In our experiments, we demonstrate strong performance of the proposed approach compared to representative generative models, and investigate the generalization capability of our models to other datasets with noise characteristics unseen during training.

preprint2022arXiv

Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech

Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems. In this paper, a cross-utterance conditional VAE (CUC-VAE) is proposed to estimate a posterior probability distribution of the latent prosody features for each phoneme by conditioning on acoustic features, speaker information, and text features obtained from both past and future sentences. At inference time, instead of the standard Gaussian distribution used by VAE, CUC-VAE allows sampling from an utterance-specific prior distribution conditioned on cross-utterance information, which allows the prosody features generated by the TTS system to be related to the context and is more similar to how humans naturally produce prosody. The performance of CUC-VAE is evaluated via a qualitative listening test for naturalness, intelligibility and quantitative measurements, including word error rates and the standard deviation of prosody attributes. Experimental results on LJ-Speech and LibriTTS data show that the proposed CUC-VAE TTS system improves naturalness and prosody diversity with clear margins.

preprint2022arXiv

Perceptual Contrast Stretching on Target Feature for Speech Enhancement

Speech enhancement (SE) performance has improved considerably owing to the use of deep learning models as a base function. Herein, we propose a perceptual contrast stretching (PCS) approach to further improve SE performance. The PCS is derived based on the critical band importance function and is applied to modify the targets of the SE model. Specifically, the contrast of target features is stretched based on perceptual importance, thereby improving the overall SE performance. Compared with post-processing-based implementations, incorporating PCS into the training phase preserves performance and reduces online computation. Notably, PCS can be combined with different SE model architectures and training criteria. Furthermore, PCS does not affect the causality or convergence of SE model training. Experimental results on the VoiceBank-DEMAND dataset show that the proposed method can achieve state-of-the-art performance on both causal (PESQ score = 3.07) and noncausal (PESQ score = 3.35) SE tasks.

preprint2022arXiv

Speech Recovery for Real-World Self-powered Intermittent Devices

The incompleteness of speech inputs severely degrades the performance of all the related speech signal processing applications. Although many researches have been proposed to address this issue, they controlled the data missing conditions by simulation with self-defined masking lengths or sizes. Besides, the masking definitions are different among all these experimental settings. This paper presents a novel intermittent speech recovery (ISR) system for real-world self-powered intermittent devices. Three contributive stages: interpolation, enhancement, and combination are applied to the ISR system for speech reconstruction. The experimental results show that our recovery system increases speech quality by up to 591.7%, while increasing speech intelligibility by up to 80.5%. Most importantly, the proposed ISR system improves the WER scores by up to 52.6%. The promising results not only confirm the effectiveness of the reconstruction but also encourage the utilization of these battery-free wearable/IoT devices.

preprint2021arXiv

Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario

Multi-task learning (MTL) and attention mechanism have been proven to effectively extract robust acoustic features for various speech-related tasks in noisy environments. In this study, we propose an attention-based MTL (ATM) approach that integrates MTL and the attention-weighting mechanism to simultaneously realize a multi-model learning structure that performs speech enhancement (SE) and speaker identification (SI). The proposed ATM system consists of three parts: SE, SI, and attention-Net (AttNet). The SE part is composed of a long-short-term memory (LSTM) model, and a deep neural network (DNN) model is used to develop the SI and AttNet parts. The overall ATM system first extracts the representative features and then enhances the speech signals in LSTM-SE and specifies speaker identity in DNN-SI. The AttNet computes weights based on DNN-SI to prepare better representative features for LSTM-SE. We tested the proposed ATM system on Taiwan Mandarin hearing in noise test sentences. The evaluation results confirmed that the proposed system can effectively enhance speech quality and intelligibility of a given noisy input. Moreover, the accuracy of the SI can also be notably improved by using the proposed ATM system.

preprint2021arXiv

Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing

The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications. Therefore, our study applies a modified Transformer in a speech enhancement task. Specifically, positional encoding in the Transformer may not be necessary for speech enhancement, and hence, it is replaced by convolutional layers. To further improve the perceptual evaluation of the speech quality (PESQ) scores of enhanced speech, the L_1 pre-trained Transformer is fine-tuned using a MetricGAN framework. The proposed MetricGAN can be treated as a general post-processing module to further boost the objective scores of interest. The experiments were conducted using the data sets provided by the organizer of the Deep Noise Suppression (DNS) challenge. Experimental results demonstrated that the proposed system outperformed the challenge baseline, in both subjective and objective evaluations, with a large margin.

preprint2021arXiv

Dissipative solutions to the compressible isentropic Navier-Stokes equations

The existence of dissipative solutions to the compressible isentropic Navier-Stokes equations was established in this paper. This notion was inspired by the concept of dissipative solutions to the incompressible Euler equations of Lions (\cite{Lions-1996}, Section 4.4). Our method is to recover such solutions by passing to the limits from approximated solutions, thanks to compactness argument.

preprint2021arXiv

Global ill-posedness for a dense set of initial data to the Isentropic system of gas dynamics

In dimension $n=2$ and $3$, we show that for any initial datum belonging to a dense subset of the energy space, there exist infinitely many global-in-time admissible weak solutions to the isentropic Euler system whenever $1<γ\leq 1+\frac2n$. This result can be regarded as a compressible counterpart of the one obtained by Szekelyhidi--Wiedemann (ARMA, 2012) for incompressible flows. Similarly to the incompressible result, the admissibility condition is defined in its integral form. Our result is based on a generalization of a key step of the convex integration procedure. This generalization allows, even in the compressible case, to convex integrate any smooth positive Reynolds stress. A large family of subsolutions can then be considered. These subsolutions can be generated, for instance, via regularization of any weak inviscid limit of an associated compressible Navier--Stokes system with degenerate viscosities.

preprint2021arXiv

Inviscid limit of the inhomogeneous incompressible Navier-Stokes equations under the weak Kolmogorov hypothesis in $\mathbb{R}^3$

In this paper, we consider the inviscid limit of inhomogeneous incompressible Navier-Stokes equations under the weak Kolmogorov hypothesis in $\mathbb{R}^3$. In particular, we first deduce the Kolmogorov-type hypothesis in $\mathbb{R}^3$, which yields the uniform bounds of $α^{th}$-order fractional derivatives of $\sqrt{ρ^μ}{\bf u}^μ$ in $L^2_x$ for some $α>0$, independent of the viscosity. The uniform bounds can provide strong convergence of $\sqrt{ρ^μ}\bf u^μ$ in $L^2$ space. This shows that the inviscid limit is a weak solution to the corresponding Euler equations.

preprint2020arXiv

Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE). However, video clips usually contain large amounts of data and pose a high cost in terms of computational resources and thus may complicate the SE system. As an alternative source, a bone-conducted speech signal has a moderate data size while manifesting speech-phoneme structures, and thus complements its air-conducted counterpart. In this study, we propose a novel multi-modal SE structure in the time domain that leverages bone- and air-conducted signals. In addition, we examine two ensemble-learning-based strategies, early fusion (EF) and late fusion (LF), to integrate the two types of speech signals, and adopt a deep learning-based fully convolutional network to conduct the enhancement. The experiment results on the Mandarin corpus indicate that this newly presented multi-modal (integrating bone- and air-conducted signals) SE structure significantly outperforms the single-source SE counterparts (with a bone- or air-conducted signal only) in various speech evaluation metrics. In addition, the adoption of an LF strategy other than an EF in this novel SE multi-modal structure achieves better results.

preprint2020arXiv

Waveform-based Voice Activity Detection Exploiting Fully Convolutional networks with Multi-Branched Encoders

In this study, we propose an encoder-decoder structured system with fully convolutional networks to implement voice activity detection (VAD) directly on the time-domain waveform. The proposed system processes the input waveform to identify its segments to be either speech or non-speech. This novel waveform-based VAD algorithm, with a short-hand notation "WVAD", has two main particularities. First, as compared to most conventional VAD systems that use spectral features, raw-waveforms employed in WVAD contain more comprehensive information and thus are supposed to facilitate more accurate speech/non-speech predictions. Second, based on the multi-branched architecture, WVAD can be extended by using an ensemble of encoders, referred to as WEVAD, that incorporate multiple attribute information in utterances, and thus can yield better VAD performance for specified acoustic conditions. We evaluated the presented WVAD and WEVAD for the VAD task in two datasets: First, the experiments conducted on AURORA2 reveal that WVAD outperforms many state-of-the-art VAD algorithms. Next, the TMHINT task confirms that through combining multiple attributes in utterances, WEVAD behaves even better than WVAD.

preprint2019arXiv

Increasing Compactness Of Deep Learning Based Speech Enhancement Models With Parameter Pruning And Quantization Techniques

Most recent studies on deep learning based speech enhancement (SE) focused on improving denoising performance. However, successful SE applications require striking a desirable balance between denoising performance and computational cost in real scenarios. In this study, we propose a novel parameter pruning (PP) technique, which removes redundant channels in a neural network. In addition, a parameter quantization (PQ) technique was applied to reduce the size of a neural network by representing weights with fewer cluster centroids. Because the techniques are derived based on different concepts, the PP and PQ can be integrated to provide even more compact SE models. The experimental results show that the PP and PQ techniques produce a compacted SE model with a size of only 10.03% compared to that of the original model, resulting in minor performance losses of 1.43% (from 0.70 to 0.69) for STOI and 3.24% (from 1.85 to 1.79) for PESQ. The promising results suggest that the PP and PQ techniques can be used in a SE system in devices with limited storage and computation resources.

preprint2016arXiv

A new proof to the energy conservation for the Navier-Stokes equations

In this paper we give a new proof to the energy conservation for the weak solutions of the incompressible Navier-Stokes equations. This result was first proved by Shinbrot. The new proof relies on a lemma introduced by Lions.

preprint2016arXiv

The weak solution to a Boltzmann type equation and its energy conservation

In this paper, we study the initial value problem of a Boltzmann type equation with a nonlinear degenerate damping. We prove the existence of global weak solutions with large initial data, in three dimensional space. We rely on a variant version of the Gronwall inequality and $L^p$ regularity of average velocities to derive the compactness of solutions to a suitable approximation. This allows us to recover a weak solution by passing to the limits. After the existence result, we also prove energy conservation for the weak solution under some certain condition.

preprint2015arXiv

Existence of Global Weak Solutions for 3D Degenerate Compressible Navier-Stokes Equations

In this paper, we prove the existence of global weak solutions for 3D compressible Navier-Stokes equations with degenerate viscosity. The method is based on the Bresch and Desjardins entropy conservation. The main contribution of this paper is to derive the Mellet-Vasseur type inequality for the weak solutions, even if it is not verified by the first level of approximation. This provides existence of global solutions in time, for the compressible Navier-Stokes equations, for any $γ>1$, in three dimensional space, with large initial data possibly vanishing on the vacuum. This solves an open problem proposed by Lions.

preprint2015arXiv

Global weak solutions to compressible quantum Navier-Stokes equations with damping

The global-in-time existence of weak solutions to the barotropic compressible quantum Navier-Stokes equations with damping is proved for large data in three dimensional space. The model consists of the compressible Navier-Stokes equations with degenerate viscosity, and a nonlinear third-order differential operator, with the quantum Bohm potential, and the damping terms. The global weak solutions to such system is shown by using the Faedo-Galerkin method and the compactness argument. This system is also a very important approximated system to the compressible Navier-Stokes equations. It will help us to prove the existence of global weak solutions to the compressible Navier-Stokes equations with degenerate viscosity in three dimensional space.

preprint2013arXiv

Almost sure existence of Navier-Stokes Equations with randomized data in the whole space

This paper considers the supercritical Navier-Stokes equations posed in the whole space $\R^d$, with suitably randomized initial data, in the weak solution setting. The global weak solutions are constructed for a large set of initial data in $H^{-s}(\R^d)$ for some $s>0$ via a probabilistic argument, and this in turn implies the almost sure existence.

preprint2013arXiv

Global weak solution for a coupled compressible Navier-Stokes and Q-tensor system

In this paper, we study a coupled compressible Navier-Stokes/Q-tensor system modeling the nematic liquid crystal flow in a three-dimensional bounded spatial domain. The existence and long time dynamics of globally defined weak solutions for the coupled system are established, using weak convergence methods, compactness and interpolation arguments. The symmetry and traceless properties of the Q-tensor play key roles in this process.

preprint2013arXiv

Global weak solutions to the inhomogeneous Navier-Stokes-Vlasov equations

A fluid-particle system of the inhomogeneous Navier-Stokes equations and Vlasov equation in the three dimensional space is considered in this paper. The coupling arises from the drag force in the fluid equations and the acceleration in the Vlasov equation. An initial-boundary value problem is studied in a bounded domain with large data. The existence of global weak solutions is established through an approximation scheme, energy estimates, and weak convergence.

preprint2012arXiv

Global weak solutions to the Navier-Stokes-Vlasov equations

In this paper, the system of particles coupled with fluid is considered. The particles are described by a Vlasov equation, and the fluid is governed by a forced Navier-Stokes equations. The interaction with fluid phase governed by Navier-Stokes equations is taken into account through a source term. The resulting system, namely Navier-Stokes-Vlasov equations, is shown to have global weak solutions in three spatial dimensions, and to have a unique global solution in two spatial dimensions.

preprint2012arXiv

Global well-posedness for the two dimensional Navier-Stokes-Vlasov Equations

The global well-posedness for the incompressible Navier-Stokes-Vlasov equations in two spatial dimensions is established by a priori estimates, the characteristic method and the semigroup analysis.

preprint2011arXiv

Global weak solution and large-time behavior for the compressible flow of liquid crystals

The three-dimensional equations for the compressible flow of liquid crystals are considered. An initial-boundary value problem is studied in a bounded domain with large data. The existence and large-time behavior of a global weak solution are established through a three-level approximation, energy estimates, and weak convergence for the adiabatic exponent $γ>\frac32$.

preprint2011arXiv

Incompressible limit for the compressible flow of liquid crystals

The connection between the compressible flow of liquid crystals with low Mach number and the incompressible flow of liquid crystals is studied in a bounded domain. In particular, the convergence of weak solutions of the compressible flow of liquid crystals to the weak solutions of the incompressible flow of liquid crystals is proved when the Mach number approaches zero; that is, the incompressible limit is justified for weak solutions in a bounded domain.

Cheng Yu

What is connected

Connect this record

See the researcher in context

Building this map preview

24 published item(s)

DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing

Conditional Diffusion Probabilistic Model for Speech Enhancement

Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech

Perceptual Contrast Stretching on Target Feature for Speech Enhancement

Speech Recovery for Real-World Self-powered Intermittent Devices

Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario

Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing

Dissipative solutions to the compressible isentropic Navier-Stokes equations

Global ill-posedness for a dense set of initial data to the Isentropic system of gas dynamics

Inviscid limit of the inhomogeneous incompressible Navier-Stokes equations under the weak Kolmogorov hypothesis in $\mathbb{R}^3$

Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

Waveform-based Voice Activity Detection Exploiting Fully Convolutional networks with Multi-Branched Encoders

Increasing Compactness Of Deep Learning Based Speech Enhancement Models With Parameter Pruning And Quantization Techniques

A new proof to the energy conservation for the Navier-Stokes equations

The weak solution to a Boltzmann type equation and its energy conservation

Existence of Global Weak Solutions for 3D Degenerate Compressible Navier-Stokes Equations

Global weak solutions to compressible quantum Navier-Stokes equations with damping

Almost sure existence of Navier-Stokes Equations with randomized data in the whole space

Global weak solution for a coupled compressible Navier-Stokes and Q-tensor system

Global weak solutions to the inhomogeneous Navier-Stokes-Vlasov equations

Global weak solutions to the Navier-Stokes-Vlasov equations

Global well-posedness for the two dimensional Navier-Stokes-Vlasov Equations

Global weak solution and large-time behavior for the compressible flow of liquid crystals

Incompressible limit for the compressible flow of liquid crystals