Source author record

Zehua Chen

Zehua Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Sound Computation and Language eess.AS math.ST physics.chem-ph Statistics Theory Artificial Intelligence eess.SP Methodology

Catalog footprint

What is connected

9works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Audio Super-Resolution with Latent Bridge Models

Audio super-resolution (SR), i.e., upsampling the low-resolution (LR) waveform to the high-resolution (HR) version, has recently been explored with diffusion and bridge models, while previous methods often suffer from sub-optimal upsampling quality due to their uninformative generation prior. Towards high-quality audio super-resolution, we present a new system with latent bridge models (LBMs), where we compress the audio waveform into a continuous latent space and design an LBM to enable a latent-to-latent generation process that naturally matches the LR-toHR upsampling process, thereby fully exploiting the instructive prior information contained in the LR waveform. To further enhance the training results despite the limited availability of HR samples, we introduce frequency-aware LBMs, where the prior and target frequency are taken as model input, enabling LBMs to explicitly learn an any-to-any upsampling process at the training stage. Furthermore, we design cascaded LBMs and present two prior augmentation strategies, where we make the first attempt to unlock the audio upsampling beyond 48 kHz and empower a seamless cascaded SR process, providing higher flexibility for audio post-production. Comprehensive experimental results evaluated on the VCTK, ESC-50, Song-Describer benchmark datasets and two internal testsets demonstrate that we achieve state-of-the-art objective and perceptual quality for any-to-48kHz SR across speech, audio, and music signals, as well as setting the first record for any-to-192kHz audio SR. Demo at https://AudioLBM.github.io/.

preprint2025arXiv

Exploiting the Prior of Generative Time Series Imputation

Time series imputation, i.e., filling the missing values of a time recording, finds various applications in electricity, finance, and weather modelling. Previous methods have introduced generative models such as diffusion probabilistic models and Schrodinger bridge models to conditionally generate the missing values from Gaussian noise or directly from linear interpolation results. However, as their prior is not informative to the ground-truth target, their generation process inevitably suffer increased burden and limited imputation accuracy. In this work, we present Bridge-TS, building a data-to-data generation process for generative time series imputation and exploiting the design of prior with two novel designs. Firstly, we propose expert prior, leveraging a pretrained transformer-based module as an expert to fill the missing values with a deterministic estimation, and then taking the results as the prior of ground truth target. Secondly, we explore compositional priors, utilizing several pretrained models to provide different estimation results, and then combining them in the data-to-data generation process to achieve a compositional priors-to-target imputation process. Experiments conducted on several benchmark datasets such as ETT, Exchange, and Weather show that Bridge-TS reaches a new record of imputation accuracy in terms of mean square error and mean absolute error, demonstrating the superiority of improving prior for generative time series imputation.

preprint2023arXiv

Incorporating Nuclear Quantum Effects in Molecular Dynamics with a Constrained Minimized Energy Surface

The accurate incorporation of nuclear quantum effects in large-scale molecular dynamics (MD) simulations remains a significant challenge. Recently, we combined constrained nuclear-electronic orbital (CNEO) theory with classical MD and obtained a new approach (CNEO-MD) that can accurately and efficiently incorporate nuclear quantum effects into classical simulations. In this Letter, we provide the theoretical foundation for CNEO-MD by developing an alternative formulation of the equations of motion for MD. In this new formulation, the expectation values of quantum nuclear positions evolve classically on an effective energy surface that is obtained from a constrained energy minimization procedure when solving for the quantum nuclear wave function, thus enabling the incorporation of nuclear quantum effects in classical MD simulations. For comparison with other existing approaches, we examined a series of model systems and found that this new MD approach is significantly more accurate than the conventional way of performing classical MD, and it also generally outperforms centroid MD and ring-polymer MD in describing vibrations in these model systems.

preprint2022arXiv

InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training

Denoising diffusion probabilistic models (diffusion models for short) require a large number of iterations in inference to achieve the generation quality that matches or surpasses the state-of-the-art generative models, which invariably results in slow inference speed. Previous approaches aim to optimize the choice of inference schedule over a few iterations to speed up inference. However, this results in reduced generation quality, mainly because the inference process is optimized separately, without jointly optimizing with the training process. In this paper, we propose InferGrad, a diffusion model for vocoder that incorporates inference process into training, to reduce the inference iterations while maintaining high generation quality. More specifically, during training, we generate data from random noise through a reverse process under inference schedules with a few iterations, and impose a loss to minimize the gap between the generated and ground-truth data samples. Then, unlike existing approaches, the training of InferGrad considers the inference process. The advantages of InferGrad are demonstrated through experiments on the LJSpeech dataset showing that InferGrad achieves better voice quality than the baseline WaveGrad under same conditions while maintaining the same voice quality as the baseline but with $3$x speedup ($2$ iterations for InferGrad vs $6$ iterations for WaveGrad).

preprint2022arXiv

Multireference Density Functional Theory for Describing Ground and Excited States with Renormalized Singles

We applied renormalized singles (RS) in the multireference density functional theory (DFT) to calculate accurate energies of ground and excited states. The multireference DFT approach determines the total energy of the $N$-electron system as the sum of the ($N-2$)-electron energy from a density functional approximation (DFA) and the two-electron addition energies from the particle-particle Tamm-Dancoff approximation (ppTDA), naturally including multireference description. The ppTDA@RS-DFA approach uses the RS Hamiltonian capturing all singles contributions in calculating two-electron addition energies, and its total energy is optimized with the optimized effective potential method. It significantly improves the original ppTDA@DFA. For ground states, ppTDA@RS-DFA properly describes dissociation curves tested and the double bond rotation of ethylene. For excited states, ppTDA@RS-DFA provides accurate excitation energies and largely eliminates the DFA dependence. ppTDA@RS-DFA thus provides an efficient multireference approach to systems with static correlation.

preprint2022arXiv

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech

Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples. However, their iterative refinement process in high-dimensional data space results in slow inference speed, which restricts their application in real-time systems. Previous works have explored speeding up by minimizing the number of inference steps but at the cost of sample quality. In this work, to improve the inference speed for DDPM-based TTS model while achieving high sample quality, we propose ResGrad, a lightweight diffusion model which learns to refine the output spectrogram of an existing TTS model (e.g., FastSpeech 2) by predicting the residual between the model output and the corresponding ground-truth speech. ResGrad has several advantages: 1) Compare with other acceleration methods for DDPM which need to synthesize speech from scratch, ResGrad reduces the complexity of task by changing the generation target from ground-truth mel-spectrogram to the residual, resulting into a more lightweight model and thus a smaller real-time factor. 2) ResGrad is employed in the inference process of the existing TTS model in a plug-and-play way, without re-training this model. We verify ResGrad on the single-speaker dataset LJSpeech and two more challenging datasets with multiple speakers (LibriTTS) and high sampling rate (VCTK). Experimental results show that in comparison with other speed-up methods of DDPMs: 1) ResGrad achieves better sample quality with the same inference speed measured by real-time factor; 2) with similar speech quality, ResGrad synthesizes speech faster than baseline methods by more than 10 times. Audio samples are available at https://resgrad1.github.io/.

preprint2011arXiv

Extended BIC for linear regression models with diverging number of relevant features and high or ultra-high feature spaces

In many conventional scientific investigations with high or ultra-high dimensional feature spaces, the relevant features, though sparse, are large in number compared with classical statistical problems, and the magnitude of their effects tapers off. It is reasonable to model the number of relevant features as a diverging sequence when sample size increases. In this article, we investigate the properties of the extended Bayes information criterion (EBIC) (Chen and Chen, 2008) for feature selection in linear regression models with diverging number of relevant features in high or ultra-high dimensional feature spaces. The selection consistency of the EBIC in this situation is established. The application of EBIC to feature selection is considered in a two-stage feature selection procedure. Simulation studies are conducted to demonstrate the performance of the EBIC together with the two-stage feature selection procedure in finite sample cases.

preprint2011arXiv

Selection Consistency of EBIC for GLIM with Non-canonical Links and Diverging Number of Parameters

In this article, we investigate the properties of the EBIC in variable selection for generalized linear models with non-canonical links and diverging number of parameters in ultra-high dimensional feature space. The selection consistency of the EBIC in this situation is established under moderate conditions. The finite sample performance of the EBIC coupled with a forward selection procedure is demonstrated through simulation studies and a real data analysis.

preprint2011arXiv

Sequential Lasso for feature selection with ultra-high dimensional feature space

We propose a novel approach, Sequential Lasso, for feature selection in linear regression models with ultra-high dimensional feature spaces. We investigate in this article the asymptotic properties of Sequential Lasso and establish its selection consistency. Like other sequential methods, the implementation of Sequential Lasso is not limited by the dimensionality of the feature space. It has advantages over other sequential methods. The simulation studies comparing Sequential Lasso with other sequential methods are reported.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint