Source author record

Chao Deng

Chao Deng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.AP Computation and Language Computer Vision eess.AS Networking and Internet Architecture Sound Artificial Intelligence eess.SP Machine Learning math.DS q-fin.PM

Catalog footprint

What is connected

14works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DisCo-Speech: Controllable Zero-Shot Speech Generation with A Disentangled Speech Codec

Codec-based language models (LMs) have revolutionized text-to-speech (TTS). However, standard codecs entangle timbre and prosody, which hinders independent control in continuation-based LMs. To tackle this challenge, we propose DisCo-Speech, a zero-shot controllable TTS framework featuring a disentangled speech codec (DisCodec) and an LM-based generator. The core component DisCodec employs a two-stage design: 1) tri-factor disentanglement to separate speech into content, prosody, and timbre subspaces via parallel encoders and hybrid losses; and 2) fusion and reconstruction that merges content and prosody into unified content-prosody tokens suitable for LM prediction, while jointly optimizing reconstruction to address the disentanglement-reconstruction trade-off. This allows the LM to perform prosodic continuation from a style prompt while the decoder injects target timbre, enabling flexible zero-shot control. Experiments demonstrate that DisCo-Speech achieves competitive voice cloning and superior zero-shot prosody control. By resolving the core entanglement at the codec level, DisCo-Speech provides a robust foundation for controllable speech synthesis.

preprint2024arXiv

Depth Map Denoising Network and Lightweight Fusion Network for Enhanced 3D Face Recognition

With the increasing availability of consumer depth sensors, 3D face recognition (FR) has attracted more and more attention. However, the data acquired by these sensors are often coarse and noisy, making them impractical to use directly. In this paper, we introduce an innovative Depth map denoising network (DMDNet) based on the Denoising Implicit Image Function (DIIF) to reduce noise and enhance the quality of facial depth images for low-quality 3D FR. After generating clean depth faces using DMDNet, we further design a powerful recognition network called Lightweight Depth and Normal Fusion network (LDNFNet), which incorporates a multi-branch fusion block to learn unique and complementary features between different modalities such as depth and normal images. Comprehensive experiments conducted on four distinct low-quality databases demonstrate the effectiveness and robustness of our proposed methods. Furthermore, when combining DMDNet and LDNFNet, we achieve state-of-the-art results on the Lock3DFace database.

preprint2022arXiv

A CTC Triggered Siamese Network with Spatial-Temporal Dropout for Speech Recognition

Siamese networks have shown effective results in unsupervised visual representation learning. These models are designed to learn an invariant representation of two augmentations for one input by maximizing their similarity. In this paper, we propose an effective Siamese network to improve the robustness of End-to-End automatic speech recognition (ASR). We introduce spatial-temporal dropout to support a more violent disturbance for Siamese-ASR framework. Besides, we also relax the similarity regularization to maximize the similarities of distributions on the frames that connectionist temporal classification (CTC) spikes occur rather than on all of them. The efficiency of the proposed architecture is evaluated on two benchmarks, AISHELL-1 and Librispeech, resulting in 7.13% and 6.59% relative character error rate (CER) and word error rate (WER) reductions respectively. Analysis shows that our proposed approach brings a better uniformity for the trained model and enlarges the CTC spikes obviously.

preprint2022arXiv

GenAD: General Representations of Multivariate Time Seriesfor Anomaly Detection

The reliability of wireless base stations in China Mobile is of vital importance, because the cell phone users are connected to the stations and the behaviors of the stations are directly related to user experience. Although the monitoring of the station behaviors can be realized by anomaly detection on multivariate time series, due to complex correlations and various temporal patterns of multivariate series in large-scale stations, building a general unsupervised anomaly detection model with a higher F1-score remains a challenging task. In this paper, we propose a General representation of multivariate time series for Anomaly Detection(GenAD). First, we pre-train a general model on large-scale wireless base stations with self-supervision, which can be easily transferred to a specific station anomaly detection with a small amount of training data. Second, we employ Multi-Correlation Attention and Time-Series Attention to represent the correlations and temporal patterns of the stations. With the above innovations, GenAD increases F1-score by total 9% on real-world datasets in China Mobile, while the performance does not significantly degrade on public datasets with only 10% of the training data.

preprint2022arXiv

Meta Auxiliary Learning for Low-resource Spoken Language Understanding

Spoken language understanding (SLU) treats automatic speech recognition (ASR) and natural language understanding (NLU) as a unified task and usually suffers from data scarcity. We exploit an ASR and NLU joint training method based on meta auxiliary learning to improve the performance of low-resource SLU task by only taking advantage of abundant manual transcriptions of speech data. One obvious advantage of such method is that it provides a flexible framework to implement a low-resource SLU training task without requiring access to any further semantic annotations. In particular, a NLU model is taken as label generation network to predict intent and slot tags from texts; a multi-task network trains ASR task and SLU task synchronously from speech; and the predictions of label generation network are delivered to the multi-task network as semantic targets. The efficiency of the proposed algorithm is demonstrated with experiments on the public CATSLU dataset, which produces more suitable ASR hypotheses for the downstream NLU task.

preprint2022arXiv

OPAL: Occlusion Pattern Aware Loss for Unsupervised Light Field Disparity Estimation

Light field disparity estimation is an essential task in computer vision with various applications. Although supervised learning-based methods have achieved both higher accuracy and efficiency than traditional optimization-based methods, the dependency on ground-truth disparity for training limits the overall generalization performance not to say for real-world scenarios where the ground-truth disparity is hard to capture. In this paper, we argue that unsupervised methods can achieve comparable accuracy, but, more importantly, much higher generalization capacity and efficiency than supervised methods. Specifically, we present the Occlusion Pattern Aware Loss, named OPAL, which successfully extracts and encodes the general occlusion patterns inherent in the light field for loss calculation. OPAL enables: i) accurate and robust estimation by effectively handling occlusions without using any ground-truth information for training and ii) much efficient performance by significantly reducing the network parameters required for accurate inference. Besides, a transformer-based network and a refinement module are proposed for achieving even more accurate results. Extensive experiments demonstrate our method not only significantly improves the accuracy compared with the SOTA unsupervised methods, but also possesses strong generalization capacity, even for real-world data, compared with supervised methods. Our code will be made publicly available.

preprint2020arXiv

Relative wealth concerns with partial information and heterogeneous priors

We establish a Nash equilibrium in a market with $ N $ agents with the performance criteria of relative wealth level when the market return is unobservable. Each investor has a random prior belief on the return rate of the risky asset. The investors can be heterogeneous in both the mean and variance of the prior. By a separation result and a martingale argument, we show that the optimal investment strategy under a stochastic return rate model can be characterized by a fully-coupled linear FBSDE. Two sets of deep neural networks are used for the numerical computation to first find each investor's estimate of the mean return rate and then solve the FBSDEs. We establish the existence and uniqueness result for the class of FBSDEs with stochastic coefficients and solve the utility game under partial information using deep neural network function approximators. We demonstrate the efficiency and accuracy by a base-case comparison with the solution from the finite difference scheme in the linear case and apply the algorithm to the general case of nonlinear hidden variable process. Simulations of investment strategies show a herd effect that investors trade more aggressively under relativeness concerns. Statistical properties of the investment strategies and the portfolio performance, including the Sharpe ratios and the Variance Risk ratios (VRRs) are examed. We observe that the agent with the most accurate prior estimate is likely to lead the herd, and the effect of competition on heterogeneous agents varies more with market characteristics compared to the homogeneous case.

preprint2020arXiv

Smart Prediction of the Complaint Hotspot Problem in Mobile Network

In mobile network, a complaint hotspot problem often affects even thousands of users' service and leads to significant economic losses and bulk complaints. In this paper, we propose an approach to predict a customer complaint based on real-time user signalling data. Through analyzing the network and user sevice procedure, 30 key data fields related to user experience have been extracted in XDR data collected from the S1 interface. Furthermore, we augment these basic features with derived features for user experience evaluation, such as one-hot features, statistical features and differential features. Considering the problems of unbalanced data, we use LightGBM as our prediction model. LightGBM has strong generalization ability and was designed to handle unbalanced data. Experiments we conducted prove the effectiveness and efficiency of this proposal. This approach has been deployed for daily routine to locate the hot complaint problem scope as well as to report affected users and area.

preprint2016arXiv

Variational Autoencoders for Semi-supervised Text Classification

Although semi-supervised variational autoencoder (SemiVAE) works in image classification task, it fails in text classification task if using vanilla LSTM as its decoder. From a perspective of reinforcement learning, it is verified that the decoder's capability to distinguish between different categorical labels is essential. Therefore, Semi-supervised Sequential Variational Autoencoder (SSVAE) is proposed, which increases the capability by feeding label into its decoder RNN at each time-step. Two specific decoder structures are investigated and both of them are verified to be effective. Besides, in order to reduce the computational complexity in training, a novel optimization method is proposed, which estimates the gradient of the unlabeled objective function by sampling, along with two variance reduction techniques. Experimental results on Large Movie Review Dataset (IMDB) and AG's News corpus show that the proposed approach significantly improves the classification accuracy compared with pure-supervised classifiers, and achieves competitive performance against previous advanced methods. State-of-the-art results can be obtained by integrating other pretraining-based methods.

preprint2013arXiv

Well-posedness and ill-posedness of the 3D generalized Navier-Stokes equations in Triebel-Lizorkin spaces

In this paper, we study the Cauchy problem of the 3-dimensional (3D) generalized incompressible Navier-Stokes equations (gNS) in Triebel-Lizorkin space $\dot{F}^{-α,r}_{q_α}(\mathbb{R}^3)$ with $(α,r)\in(1,5/4)\times[2,\infty]$ and $q_α=\frac{3}{α-1}$. Our work establishes a {\it dichotomy} of well-posedness and ill-posedness depending on $r=2$ or $r>2$. Specifically, by combining the new endpoint bilinear estimates in $L^{q_α}_x L^2_T$ with the characterization of Triebel-Lizorkin space via fractional semigroup, we prove the well-posedness of the gNS in $\dot{F}^{-α,r}_{q_α}(\mathbb{R}^3)$ for $r=2$. On the other hand, for any $r>2$, we show that the solution to the gNS can develop {\it norm inflation} in the sense that arbitrarily small initial data in the spaces $\dot{F}^{-α,r}_{q_α}(\mathbb{R}^3)$ can lead the corresponding solution to become arbitrarily large after an arbitrarily short time. In particular, such dichotomy of Triebel-Lizorkin spaces is also true for the classical N-S equations, i.e.\,\,$α=1$. Thus the Triebel-Lizorkin space framework naturally provides better connection between the well-known Koch-Tataru's $BMO^{-1}$ well-posed work and Bourgain-Pavlović's $\dot{B}_\infty^{-1,\infty}$ ill-posed work.

preprint2012arXiv

Global Well-posedness of the Parabolic-parabolic Keller-Segel Model in $L^{1}(R^2)\times{L}^{\infty}(R^2)$ and $H^1_b(R^2)\times{H}^1(R^2)$

In this paper, we study global well-posedness of the two-dimensional Keller-Segel model in Lebesgue space and Sobolev space. Recall that in the paper "Existence and uniqueness theorem on mild solutions to the Keller-Segel system in the scaling invariant space, J. Differential Equations, {252}(2012), 1213--1228", Kozono, Sugiyama & Wachi studied global well-posedness of $n$($\ge3$) dimensional Keller-Segel system and posted a question about the even local in time existence for the Keller-Segel system with $L^1(R^2)\times{L}^\infty(R^2)$ initial data. Here we give an affirmative answer to this question: in fact, we show the global in time existence and uniqueness for $L^1(R^2)\times{L}^{\infty}(R^2)$ initial data. Furthermore, we prove that for any $H^1_b(R^2) \times {H}^1(R^2)$ initial data with $H^1_b(R^2):=H^1(R^2)\cap{L}^\infty(R^2)$, there also exists a unique global mild solution to the parabolic-parabolic Keller-Segel model. The estimates of ${\sup_{t>0}}t^{1-\frac{n}{p}}\|u\|_{L^p}$ for $(n,p)=(2,\infty)$ and the introduced special half norm, i.e. $\sup_{t>0}t^{1/2}(1+t)^{-1/2}\|\nabla{v}\|_{L^\infty}$, are crucial in our proof.

preprint2012arXiv

Well-posedness of a Parabolic-hyperbolic Keller-Segel System in the Sobolev Space Framework

We study the global strong solutions to a 3-dimensional parabolic-hyperbolic Keller-Segel model with initial data close to a stable equilibrium with perturbations belonging to $L^2(\mathbb R^3)\times H^1(\mathbb{R}^3)$. We obtain global well-posedness and decay property. Furthermore, if the mean value of initial cell density is smaller than a suitabale constant, then the chemical concentration decays exponentially to zero as $t$ goes to infinity. Proofs of the main results are based on an application of Fourier analysis method to uniform estimates for a linearized parabolic-hyperbolic system and also based on the smoothing effect of the cell density as well as the damping effect of the chemical concentration.

preprint2011arXiv

Random-data Cauchy Problem for the Periodic Navier-Stokes Equations with Initial Data in Negative-order Sobolev Spaces

In this paper we study existence of solutions of the initial-boundary value problems of the Navier-Stokes equations with a periodic boundary value condition for initial data in the Sobolev spaces $\mathcal{H}^{s}(\mathbb{T}^N)$ with a negative order $-1<s<0$, where $N=2, 3$. By using the randomization approach of N. Burq and N. Tzvetkov, we prove that for almost all $ω\inΩ$, where $Ω$ is the sample space of a probability space $(Ω,\mathcal{A},p)$, for the randomized initial data $\vec{f}^ω\in\mathcal{H}_σ^{s}(\mathbb{T}^N)$ with $-1<s<0$, such a problem has a unique local solution.

preprint2011arXiv

Well-posedness of the Viscous Boussinesq System in Besov Spaces of Negative Order Near Index $s=-1$

This paper is concerned with well-posedness of the Boussinesq system. We prove that the $n$ ($n\ge2$) dimensional Boussinesq system is well-psoed for small initial data $(\vec{u}_0,θ_0)$ ($\nabla\cdot\vec{u}_0=0$) either in $({B}^{-1}_{\infty,1}\cap{B^{-1,1}_{\infty,\infty}})\times{B}^{-1}_{p,r}$ or in ${B^{-1,1}_{\infty,\infty}}\times{B}^{-1,ε}_{p,\infty}$ if $r\in[1,\infty]$, $ε>0$ and $p\in(\frac{n}{2},\infty)$, where $B^{s,ε}_{p,q}$ ($s\in\mathbb{R}$, $1\leq p,q\leq\infty$, $ε>0$) is the logarithmically modified Besov space to the standard Besov space $B^{s}_{p,q}$. We also prove that this system is well-posed for small initial data in $({B}^{-1}_{\infty,1}\cap{B^{-1,1}_{\infty,\infty}})\times({B}^{-1}_{\frac{n}{2},1}\cap{B^{-1,1}_{\frac{n}{2},\infty}})$.

Chao Deng

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

DisCo-Speech: Controllable Zero-Shot Speech Generation with A Disentangled Speech Codec

Depth Map Denoising Network and Lightweight Fusion Network for Enhanced 3D Face Recognition

A CTC Triggered Siamese Network with Spatial-Temporal Dropout for Speech Recognition

GenAD: General Representations of Multivariate Time Seriesfor Anomaly Detection

Meta Auxiliary Learning for Low-resource Spoken Language Understanding

OPAL: Occlusion Pattern Aware Loss for Unsupervised Light Field Disparity Estimation

Relative wealth concerns with partial information and heterogeneous priors

Smart Prediction of the Complaint Hotspot Problem in Mobile Network

Variational Autoencoders for Semi-supervised Text Classification

Well-posedness and ill-posedness of the 3D generalized Navier-Stokes equations in Triebel-Lizorkin spaces

Global Well-posedness of the Parabolic-parabolic Keller-Segel Model in $L^{1}(R^2)\times{L}^{\infty}(R^2)$ and $H^1_b(R^2)\times{H}^1(R^2)$

Well-posedness of a Parabolic-hyperbolic Keller-Segel System in the Sobolev Space Framework

Random-data Cauchy Problem for the Periodic Navier-Stokes Equations with Initial Data in Negative-order Sobolev Spaces

Well-posedness of the Viscous Boussinesq System in Besov Spaces of Negative Order Near Index $s=-1$