Researcher profile

Kai Zhu

Kai Zhu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

EditEmoTalk: Controllable Speech-Driven 3D Facial Animation with Continuous Expression Editing

Speech-driven 3D facial animation aims to generate realistic and expressive facial motions directly from audio. While recent methods achieve high-quality lip synchronization, they often rely on discrete emotion categories, limiting continuous and fine-grained emotional control. We present EditEmoTalk, a controllable speech-driven 3D facial animation framework with continuous emotion editing. The key idea is a boundary-aware semantic embedding that learns the normal directions of inter-emotion decision boundaries, enabling a continuous expression manifold for smooth emotion manipulation. Moreover, we introduce an emotional consistency loss that enforces semantic alignment between the generated motion dynamics and the target emotion embedding through a mapping network, ensuring faithful emotional expression. Extensive experiments demonstrate that EditEmoTalk achieves superior controllability, expressiveness, and generalization while maintaining accurate lip synchronization. Code and pretrained models will be released.

preprint2026arXiv

MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

Video generation is rapidly evolving from single-shot synthesis to complex multi-shot audio-video (MSAV) narratives to meet real-world demands. However, evaluating such frontier models remains a fundamental challenge. Existing benchmarks are limited in scope and data diversity, and rely on rigid evaluation pipelines, preventing systematic and reliable assessment of modern MSAV models. To bridge these gaps, we introduce MSAVBench, the first comprehensive benchmark and adaptive hybrid evaluation framework for multi-shot audio-video generation. Our benchmark spans four key dimensions, video, audio, shot, and reference, covering diverse task settings, varying shot counts of up to 15, and challenging non-realistic scenarios. Our evaluation framework improves robustness through an adaptive self-correction mechanism for shot segmentation, instance-wise rubrics for subjective metrics, and tool-grounded evidence extraction for complex judgments. Furthermore, MSAVBench achieves high alignment with human judgments, reaching a Spearman rank correlation of 91.5%. Our systematic evaluation of 19 state-of-the-art closed- and open-source models shows that current systems still struggle with director-level control and fine-grained audio-visual synchronization, while modular or agentic generation pipelines offer a promising path toward narrowing the gap between open- and closed-source models. We will release the benchmark data and evaluation code to facilitate future research.

preprint2024arXiv

A Data-driven dE/dx Simulation with Normalizing Flow

In high-energy physics, precise measurements rely on highly reliable detector simulations. Traditionally, these simulations involve incorporating experiment data to model detector responses and fine-tuning them. However, due to the complexity of the experiment data, tuning the simulation can be challenging. One crucial aspect for charged particle identification is the measurement of energy deposition per unit length (referred to as dE/dx). This paper proposes a data-driven dE/dx simulation method using the Normalizing Flow technique, which can learn the dE/dx distribution directly from experiment data. By employing this method, not only can the need for manual tuning of the dE/dx simulation be eliminated, but also high-precision simulation can be achieved.

preprint2022arXiv

FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization

MLP-like models built entirely upon multi-layer perceptrons have recently been revisited, exhibiting the comparable performance with transformers. It is one of most promising architectures due to the excellent trade-off between network capability and efficiency in the large-scale recognition tasks. However, its generalization performance to heterogeneous tasks is inferior to other architectures (e.g., CNNs and transformers) due to the extensive retention of domain information. To address this problem, we propose a novel frequency-aware MLP architecture, in which the domain-specific features are filtered out in the transformed frequency domain, augmenting the invariant descriptor for label prediction. Specifically, we design an adaptive Fourier filter layer, in which a learnable frequency filter is utilized to adjust the amplitude distribution by optimizing both the real and imaginary parts. A low-rank enhancement module is further proposed to rectify the filtered features by adding the low-frequency components from SVD decomposition. Finally, a momentum update strategy is utilized to stabilize the optimization to fluctuation of model parameters and inputs by the output distillation with weighted historical states. To our best knowledge, we are the first to propose a MLP-like backbone for domain generalization. Extensive experiments on three benchmarks demonstrate significant generalization performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.

preprint2022arXiv

Selecting the physical solution via $η$-$η'$ mixing

Based on $η$-$η'$ mixing analysis, we propose a novel method to extract the physical solutions for the hadronic properties of the $Y(4230)$ resonance from the experimental data. Experimentally, multiple solutions have been reported in the decays of $Y(4230) \to ηJ/ψ$ and $Y(4230) \to η' J/ψ$. Utilizing our method, we determine a unique solution for the process $Y(4230) \to η' J/ψ$. Likewise, two solutions for the process $Y(4230) \to ηJ/ψ$ are preferred among the originally reported three solutions under the assumption that $Y(4230)$ dose not take an $s\bar{s}$ component.

preprint2022arXiv

Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning

Non-exemplar class-incremental learning is to recognize both the old and new classes when old class samples cannot be saved. It is a challenging task since representation optimization and feature retention can only be achieved under supervision from new classes. To address this problem, we propose a novel self-sustaining representation expansion scheme. Our scheme consists of a structure reorganization strategy that fuses main-branch expansion and side-branch updating to maintain the old features, and a main-branch distillation scheme to transfer the invariant knowledge. Furthermore, a prototype selection mechanism is proposed to enhance the discrimination between the old and new classes by selectively incorporating new samples into the distillation process. Extensive experiments on three benchmarks demonstrate significant incremental performance, outperforming the state-of-the-art methods by a margin of 3%, 3% and 6%, respectively.

preprint2022arXiv

Tensor amplitudes for partial wave analysis of $ψ\toΔ\barΔ$ within helicity frame

We have derived the tensor amplitudes for partial wave analysis of $ψ\toΔ\barΔ$, $Δ\to p π$ within the helicity frame, as well as the amplitudes for the other decay sequences with same final states. These formulae are practical for the experiments measuring $ψ$ decaying into $p \bar{p}π^+ π^-$ final states, such as BESIII with its recently collected huge $J/ψ$ and $ψ(2S)$ data samples.

preprint2021arXiv

An Adaptive Interpolation Scheme for Wideband Frequency Sweep in Electromagnetic Simulations

An adaptive interpolation scheme is proposed to accurately calculate the wideband responses in electromagnetic simulations. In the proposed scheme, the sampling points are first carefully divided into several groups based on their responses to avoid the Runge phenomenon and the error fluctuations, and then different interpolation strategies are used to calculate the responses in the whole frequency band. If the relative error does not satisfy the predefined threshold in a specific frequency band, it will be refined until the error criteria is met. The detailed error analysis is also presented to verify the accuracy of the interpolation scheme. At last, two numerical examples including the antenna radiation and the filter simulation are carried out to validate its accuracy and efficiency.

preprint2021arXiv

Systematic errors induced by the elliptical power-law model in galaxy-galaxy strong lens modeling

The elliptical power-law (EPL) model of the mass in a galaxy is widely used in strong gravitational lensing analyses. However, the distribution of mass in real galaxies is more complex. We quantify the biases due to this model mismatch by simulating and then analysing mock {\it Hubble Space Telescope} imaging of lenses with mass distributions inferred from SDSS-MaNGA stellar dynamics data. We find accurate recovery of source galaxy morphology, except for a slight tendency to infer sources to be more compact than their true size. The Einstein radius of the lens is also robustly recovered with 0.1% accuracy, as is the global density slope, with 2.5% relative systematic error, compared to the 3.4% intrinsic dispersion. However, asymmetry in real lenses also leads to a spurious fitted `external shear' with typical strength, $γ_{\rm ext}=0.015$. Furthermore, time delays inferred from lens modelling without measurements of stellar dynamics are typically underestimated by $\sim$5%. Using such measurements from a sub-sample of 37 lenses would bias measurements of the Hubble constant $H_0$ by $\sim$9%. Although this work is based on a particular set of MaNGA galaxies, and the specific value of the detected biases may change for another set of strong lenses, our results strongly suggest the next generation cosmography needs to use more complex lens mass models.

preprint2020arXiv

Amplitudes separation and strong-electromagnetic relative phase in the $ψ(2S)$ decays into baryons

The strong, electromagnetic and mixed strong-electromagnetic amplitudes of the $ψ(2S)$ decays into baryon-anti-baryon pairs have been obtained by exploiting all available data sets in the framework of an effective Lagrangian model. We observed that at the $ψ(2S)$ mass the QCD regime is not completely perturbative, as can be inferred by the relative strength of the strong and the mixed strong-electromagnetic amplitudes. Recently a similar conclusion has been reached also for the $J/ψ$ decays. The relative phase between the strong and the electromagnetic amplitudes is $φ= (58\pm 8)^\circ$, to be compared with $φ= (73\pm 8)^\circ$ obtained for the $J/ψ$. On the other hand, in the case of the $ψ(2S)$ meson, different values of the ratio between strong and mixed strong-electromagnetic amplitudes are phenomenologically required, while for the $J/ψ$ meson only one ratio was enough to describe the data. Finally, we also observed a peculiar behavior of the mixed strong-electromagnetic amplitudes of the decays $ψ(2S)\toΣ^+ \overline Σ^-$ and $ψ(2S)\toΣ^- \overline Σ^+$.

preprint2020arXiv

An analysis of carrier dynamics in methylammonium lead triiodide perovskite solar cells using cross-correlation noise spectroscopy

Using cross-correlation current noise spectroscopy, we have investigated carrier dynamics in methylammonium lead triiodide solar cells. This method provides a space selectivity for devices with planar multi-layered structure, effectively amplifying current noise contributions coming from the most resistive element of the stack. In the studied solar cells, we observe near full-scale shot noise, indicating the dominance of noise generation by a single source, likely the interface between the perovskite and the spiro-OMeTAD hole-transport layer. We argue that the strong 1/f noise term has contributions both from the perovskite layer and interfaces. It displays non-ideal dependence on photocurrent, $S \propto I^{1.4}$ (instead of usual $S \propto I^2$ ), which is likely due to current-induced halide migration. Finally, we observe generation-recombination noise. The relaxation time of this process grows linearly with photocurrent, which allows to attribute this contribution to bimolecular recombination in the perovskite bulk absorption layer. Extrapolating our results, we estimate that at the standard 1 sun illumination the electron-hole recombination time is 5 microseconds.

preprint2020arXiv

Lessons on Star-forming Ultra-diffuse Galaxies from The Stacked Spectra of Sloan Digital Sky Survey

We investigate the on-average properties for 28 star-forming ultra-diffuse galaxies (UDGs) located in low-density environments, by stacking their spectra from the Sloan Digital Sky Survey. These relatively-isolated UDGs, with stellar masses of $\log_{10}(M_*/M_{\odot})\sim 8.57\pm0.29$, have the on-average total-stellar-metallicity [M/H]$\sim -0.82\pm0.14$, iron-metallicity [Fe/H]$\sim -1.00\pm0.16$, stellar age $t_*\sim5.2\pm0.5$ Gyr, $α$-enhancement [$α$/Fe]$\sim 0.24\pm0.10$, and oxygen abundance 12+log(O/H)$\sim 8.16\pm0.06$, as well as central stellar velocity dispersion $54\pm12$ km/s. On the star-formation rate versus stellar mass diagram, these UDGs are located lower than the extrapolated star-forming main sequence from the massive spirals, but roughly follow the main sequence of low-surface-brightness dwarf galaxies. We find that these star-forming UDGs are not particularly metal-poor or metal-rich for their stellar masses, as compared with the metallicity-mass relations of the nearby typical dwarfs. With the UDG data of this work and previous studies, we also find a coarse correlation between [Fe/H] and magnesium-element enhancement [Mg/Fe] for UDGs: [Mg/Fe]$\simeq-0.43(\pm0.26)$[Fe/H]$-0.14(\pm0.40)$.

preprint2020arXiv

One-Shot Texture Retrieval with Global Context Metric

In this paper, we tackle one-shot texture retrieval: given an example of a new reference texture, detect and segment all the pixels of the same texture category within an arbitrary image. To address this problem, we present an OS-TR network to encode both reference and query image, leading to achieve texture segmentation towards the reference category. Unlike the existing texture encoding methods that integrate CNN with orderless pooling, we propose a directionality-aware module to capture the texture variations at each direction, resulting in spatially invariant representation. To segment new categories given only few examples, we incorporate a self-gating mechanism into relation network to exploit global context information for adjusting per-channel modulation weights of local relation features. Extensive experiments on benchmark texture datasets and real scenarios demonstrate the above-par segmentation performance and robust generalization across domains of our proposed method.