Researcher profile

Cheng Gong

Cheng Gong contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis

While emotional text-to-speech (TTS) has made significant progress, most existing research remains limited to utterance-level emotional expression and fails to support word-level control. Achieving word-level expressive control poses fundamental challenges, primarily due to the complexity of modeling multi-emotion transitions and the scarcity of annotated datasets that capture intra-sentence emotional and prosodic variation. In this paper, we propose WeSCon, the first self-training framework that enables word-level control of both emotion and speaking rate in a pretrained zero-shot TTS model, without relying on datasets containing intra-sentence emotion or speed transitions. Our method introduces a transition-smoothing strategy and a dynamic speed control mechanism to guide the pretrained TTS model in performing word-level expressive synthesis through a multi-round inference process. To further simplify the inference, we incorporate a dynamic emotional attention bias mechanism and fine-tune the model via self-training, thereby activating its ability for word-level expressive control in an end-to-end manner. Experimental results show that WeSCon effectively overcomes data scarcity, achieving state-of-the-art performance in word-level emotional expression control while preserving the strong zero-shot synthesis capabilities of the original TTS model.

preprint2022arXiv

An Ensemble Learning Framework for Vehicle Trajectory Prediction in Interactive Scenarios

Precisely modeling interactions and accurately predicting trajectories of surrounding vehicles are essential to the decision-making and path-planning of intelligent vehicles. This paper proposes a novel framework based on ensemble learning to improve the performance of trajectory predictions in interactive scenarios. The framework is termed Interactive Ensemble Trajectory Predictor (IETP). IETP assembles interaction-aware trajectory predictors as base learners to build an ensemble learner. Firstly, each base learner in IETP observes historical trajectories of vehicles in the scene. Then each base learner handles interactions between vehicles to predict trajectories. Finally, an ensemble learner is built to predict trajectories by applying two ensemble strategies on the predictions from all base learners. Predictions generated by the ensemble learner are final outputs of IETP. In this study, three experiments using different data are conducted based on the NGSIM dataset. Experimental results show that IETP improves the predicting accuracy and decreases the variance of errors compared to base learners. In addition, IETP exceeds baseline models with 50% of the training data, indicating that IETP is data-efficient. Moreover, the implementation of IETP is publicly available at https://github.com/BIT-Jack/IETP.

preprint2022arXiv

Gravitational wave constraints on Lorentz and parity violations in gravity: high-order spatial derivative cases

High-order spatial derivatives are of crucial importance for constructing the low energy effective action of a Lorentz or parity violating theory of quantum gravity. One example is the Hořava-Lifshitz gravity, in one has to consider at least the sixth-order spatial derivatives in the gravitational action, in order to make the theory power-counting renormalizable. In this paper, we consider the Lorentz and parity violating effects on the propagation of GWs due to the fifth and sixth-order spatial derivatives respectively. For this purpose we calculate the corresponding Lorentz and parity violating waveforms of GWs produced by the coalescence of compact binaries. By using these modified waveforms, we perform the full Bayesian inference with the help of the open source software \texttt{Bilby} on the selected GW events of binary black hole (BBH) and binary neutron stars (BNS) merges in the LIGO-Virgo catalogs GWTC-1 and GWTC-2. Overall we do not find any significant evidence of Lorentz and parity violation due to the fifth and sixth-order spatial derivatives and thus place lower bounds on the energy scales $M_{\rm LV} > 2.4 \times 10^{-16} \; {\rm GeV}$ for Lorentz violation and $M_{\rm PV} > 1.0 \times 10^{-14} \; {\rm GeV}$ for parity violation at 90\% confidence level. Both constraints represent the first constraints on the fifth- and sixth-order spatial derivative terms respectively in the framework of spatial covariant gravity by using the observational data of GWs.

preprint2021arXiv

DiDiSpeech: A Large Scale Mandarin Speech Corpus

This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech. It consists of about 800 hours of speech data at 48kHz sampling rate from 6000 speakers and the corresponding texts. All speech data in the corpus is recorded in quiet environment and is suitable for various speech processing tasks, such as voice conversion, multi-speaker text-to-speech and automatic speech recognition. We conduct experiments with multiple speech tasks and evaluate the performance, showing that it is promising to use the corpus for both academic research and practical application. The corpus is available at https://outreach.didichuxing.com/research/opendata/.

preprint2021arXiv

Fundamental group of Galois covers of degree $6$ surfaces

In this paper we consider the Galois covers of algebraic surfaces of degree 6, with all associated planar degenerations. We compute the fundamental groups of those Galois covers, using their degeneration. We show that for 8 types of degenerations the fundamental group of the Galois cover is non-trivial and for 20 types it is trivial. Moreover, we compute the Chern numbers of all the surfaces with this type of degeneration and prove that the signatures of all their Galois covers are negative. We formulate a conjecture regarding the structure of the fundamental groups of the Galois covers based on our findings. With an appendix by the authors listing the detailed computations and an appendix by Guo Zhiming classifying degree 6 planar degenerations.

preprint2020arXiv

Driver Behavior Modelling at the Urban Intersection via Canonical Correlation Analysis

The urban intersection is a typically dynamic and complex scenario for intelligent vehicles, which exists a variety of driving behaviors and traffic participants. Accurately modelling the driver behavior at the intersection is essential for intelligent transportation systems (ITS). Previous researches mainly focus on using attention mechanism to model the degree of correlation. In this research, a canonical correlation analysis (CCA)-based framework is proposed. The value of canonical correlation is used for feature selection. Gaussian mixture model and Gaussian process regression are applied for driver behavior modelling. Two experiments using simulated and naturalistic driving data are designed for verification. Experimental results are consistent with the driver's judgment. Comparative studies show that the proposed framework can obtain a better performance.

preprint2020arXiv

High-precision target positioning system for unmanned vehicles based on binocular vision

Unmanned vehicles often need to locate targets with high precision during work. In the unmanned material handling workshop, the unmanned vehicle needs to perform high-precision pose estimation of the workpiece to accurately grasp the workpiece. In this context, this paper proposes a high-precision unmanned vehicle target positioning system based on binocular vision. The system uses a region-based stereo matching algorithm to obtain a disparity map, and uses the RANSAC algorithm to extract position and posture features, which achives the estimation of the position and attitude of a six-degree-of-freedom cylindrical workpiece. In order to verify the effect of the system, this paper collects the accuracy and calculation time of the output results of the cylinder in different poses. The experimental data shows that the position accuracy of the system is 0.61~1.17mm and the angular accuracy is 1.95~5.13°, which can achieve better high-precision positioning effect.

preprint2020arXiv

Resonant Asymmetric All-Dielectric Metasurface for Boosting Third-Harmonic Generation

Resonant metasurfaces have received extensive attention due to their sharp spectral feature and extraordinary field enhancement. In this work, by breaking the in-plane symmetry of silicon nanopillars, we achieve a sharp Fano resonance. The far-field radiation and near-field distribution of metasurfaces are calculated and analyzed to further uncover the resonant performance of metasurfaces. Moreover, the theoretical derivation and simulation exhibit an inverse quadratic dependence of Q-factors on asymmetry parameters, revealing that the resonance is governed by the symmetry-protected bound states in the continuum. Finally we experimentally demonstrate the sharp resonance, and employ it to effciently boost the third-harmonic generation. This enhancement can be attributed to the strong optical intensity enhancement inside the metasurface.

preprint2020arXiv

VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization

Quantization has been proven to be an effective method for reducing the computing and/or storage cost of DNNs. However, the trade-off between the quantization bitwidth and final accuracy is complex and non-convex, which makes it difficult to be optimized directly. Minimizing direct quantization loss (DQL) of the coefficient data is an effective local optimization method, but previous works often neglect the accurate control of the DQL, resulting in a higher loss of the final DNN model accuracy. In this paper, we propose a novel metric called Vector Loss. Based on this new metric, we develop a new quantization solution called VecQ, which can guarantee minimal direct quantization loss and better model accuracy. In addition, in order to speed up the proposed quantization process during model training, we accelerate the quantization process with a parameterized probability estimation method and template-based derivation calculation. We evaluate our proposed algorithm on MNIST, CIFAR, ImageNet, IMDB movie review and THUCNews text data sets with numerical DNN models. The results demonstrate that our proposed quantization solution is more accurate and effective than the state-of-the-art approaches yet with more flexible bitwidth support. Moreover, the evaluation of our quantized models on Saliency Object Detection (SOD) tasks maintains comparable feature extraction quality with up to 16$\times$ weight size reduction.

preprint2013arXiv

Band alignment of two-dimensional transition metal dichalcogenides: application in tunnel field effect transistors

Tunnel field effect transistors (TFETs) based on vertical stacking of two dimensional materials are of interest for low-power logic devices. The monolayer transition metal dichalcogenides (TMDs) with sizable band gaps show promise in building p-n junctions (couples) for TFET applications. Band alignment information is essential for realizing broken gap junctions with excellent electron tunneling efficiencies. Promising couples composed of monolayer TMDs are suggested to be VIB-MeX2 (Me= W, Mo; X= Te, Se) as the n-type source and IVB-MeX2 (Me = Zr, Hf; X= S, Se) as the p-type drain by density functional theory calculations.

preprint2011arXiv

Field emission from atomically thin edges of reduced graphene oxide

Point sources exhibit low threshold electron emission due to local field enhancement at the tip. The development and implementation of tip emitters have been hampered by the need to position them sufficiently apart to achieve field enhancement, limiting the number of emission sites and therefore the overall current. Here we report low threshold field (< 0.1V/um) emission of multiple electron beams from atomically thin edges of reduced graphene oxide (rGO). Field emission microscopy (FEM) measurements show evidence for interference from emission sites that are separated by a few nanometers, suggesting that the emitted electron beams may be coherent. Based on our high-resolution transmission electron microscopy, infrared spectroscopy and simulation results, field emission from the rGO edge is attributed to a stable and unique aggregation of oxygen groups in the form of cyclic edge ethers. Such closely spaced electron beams from rGO offer prospects for novel applications and understanding the physics of linear electron sources.