Source author record

Zhen Xiao

Zhen Xiao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Computation and Language Computer Vision eess.AS Methodology Multiagent Systems Robotics Sound

Catalog footprint

What is connected

7works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

SDTalk: Structured Facial Priors and Dual-Branch Motion Fields for Generalizable Gaussian Talking Head Synthesis

High-quality, real-time talking head synthesis remains a fundamental challenge in computer vision. Existing reconstruction- and rendering-based methods typically rely on identity-specific models, limiting cross-identity generalization. To address this issue, we propose SDTalk, a one-shot 3D Gaussian Splatting (3DGS)-based framework that generalizes to unseen identities without personalized training or fine-tuning. Our framework comprises two modules with a two-stage training strategy. In the first stage, we incorporate structured facial priors into the reconstruction module and separately predict 3DGS parameters for visible and occluded regions, enabling complete head reconstruction from a single image. In the second stage, we introduce a dual-branch motion field to model coarse and fine facial dynamics, improving detail fidelity and lip synchronization. Experiments demonstrate that SDTalk surpasses existing methods in both visual quality and inference efficiency.

preprint2026arXiv

Sub-Footprint Effect Correction in FW-LiDAR Point Clouds via Intra-Footprint Target Unmixing

Sub-footprint target mixing within a laser footprint significantly increases LiDAR intensity uncertainty, especially in complex environments where heterogeneous materials inside one footprint cause nonlinear distortions that impair intensity-based applications. However, the forward mixing inherent to the single-pixel detection mode of LiDAR systems blurs sub-footprint contributions, making sub-footprint effects difficult to address effectively in existing studies. To address this issue, we introduce a novel, physics-based framework that explicitly resolves sub-footprint intensity correction in full-waveform LiDAR (FW-LiDAR) point clouds. The key innovation is to make the otherwise implicit intra-footprint mixing process explicit: we first develop a spatiotemporal laser-beam distribution model to physically characterize within-footprint forward mixing of multi-target returns. Building on this formulation, we incorporate ancillary information including waveform parameters and surface geometry as constraints to pose a well-defined inverse unmixing problem and decompose each footprint into fractional contributions from multiple sub-targets. We then recover sub-footprint-corrected intensities by inverting the observed mixtures through a unified combination of parametric and model-driven approaches. To the best of our knowledge, few prior studies explicitly establish sub-footprint inversion and correction within a single laser footprint, and our framework offers a principled, physics-grounded solution. Experiments on both controlled and real-world LiDAR datasets demonstrate that the proposed method significantly enhances semantic separability across heterogeneous targets and intensity consistency across homogeneous targets.

preprint2023arXiv

Transformer in Transformer as Backbone for Deep Reinforcement Learning

Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work focuses on the former. Previous methods build the network with several modules like CNN, LSTM and Attention. Recent methods combine the Transformer with these modules for better performance. However, it requires tedious optimization skills to train a network composed of mixed modules, making these methods inconvenient to be used in practice. In this paper, we propose to design \emph{pure Transformer-based networks} for deep RL, aiming at providing off-the-shelf backbones for both the online and offline settings. Specifically, the Transformer in Transformer (TIT) backbone is proposed, which cascades two Transformers in a very natural way: the inner one is used to process a single observation, while the outer one is responsible for processing the observation history; combining both is expected to extract spatial-temporal representations for good decision-making. Experiments show that TIT can achieve satisfactory performance in different settings consistently.

preprint2021arXiv

Speech-language Pre-training for End-to-end Spoken Language Understanding

End-to-end (E2E) spoken language understanding (SLU) can infer semantics directly from speech signal without cascading an automatic speech recognizer (ASR) with a natural language understanding (NLU) module. However, paired utterance recordings and corresponding semantics may not always be available or sufficient to train an E2E SLU model in a real production environment. In this paper, we propose to unify a well-optimized E2E ASR encoder (speech) and a pre-trained language model encoder (language) into a transformer decoder. The unified speech-language pre-trained model (SLP) is continually enhanced on limited labeled data from a target domain by using a conditional masked language model (MLM) objective, and thus can effectively generate a sequence of intent, slot type, and slot value for given input speech in the inference. The experimental results on two public corpora show that our approach to E2E SLU is superior to the conventional cascaded method. It also outperforms the present state-of-the-art approaches to E2E SLU with much less paired data.

preprint2020arXiv

Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning

Social psychology and real experiences show that cognitive consistency plays an important role to keep human society in order: if people have a more consistent cognition about their environments, they are more likely to achieve better cooperation. Meanwhile, only cognitive consistency within a neighborhood matters because humans only interact directly with their neighbors. Inspired by these observations, we take the first step to introduce \emph{neighborhood cognitive consistency} (NCC) into multi-agent reinforcement learning (MARL). Our NCC design is quite general and can be easily combined with existing MARL methods. As examples, we propose neighborhood cognition consistent deep Q-learning and Actor-Critic to facilitate large-scale multi-agent cooperations. Extensive experiments on several challenging tasks (i.e., packet routing, wifi configuration, and Google football player control) justify the superior performance of our methods compared with state-of-the-art MARL approaches.

preprint2020arXiv

Reward Design in Cooperative Multi-agent Reinforcement Learning for Packet Routing

In cooperative multi-agent reinforcement learning (MARL), how to design a suitable reward signal to accelerate learning and stabilize convergence is a critical problem. The global reward signal assigns the same global reward to all agents without distinguishing their contributions, while the local reward signal provides different local rewards to each agent based solely on individual behavior. Both of the two reward assignment approaches have some shortcomings: the former might encourage lazy agents, while the latter might produce selfish agents. In this paper, we study reward design problem in cooperative MARL based on packet routing environments. Firstly, we show that the above two reward signals are prone to produce suboptimal policies. Then, inspired by some observations and considerations, we design some mixed reward signals, which are off-the-shelf to learn better policies. Finally, we turn the mixed reward signals into the adaptive counterparts, which achieve best results in our experiments. Other reward signals are also discussed in this paper. As reward design is a very fundamental problem in RL and especially in MARL, we hope that MARL researchers can rethink the rewards used in their systems.

preprint2016arXiv

Constrained Nonlinear and Mixed Effects Differential Equation Models for Dynamic Cell Polarity Signaling

The key of tip growth in eukaryotes is the polarized distribution on plasma membrane of a particle named ROP1. This distribution is the result of a positive feedback loop, whose mechanism can be described by a Differential Equation parametrized by two meaningful parameters kpf and knf . We introduce a mechanistic Integro-Differential Equation (IDE) derived from a spatiotemporal model of cell polarity and we show how this model can be fitted to real data, i.e., ROP1 intensities measured on pollen tubes. At first, we provide an existence and uniqueness result for the solution of our IDE model under certain conditions. Interestingly, this analysis gives a tractable expression for the likelihood, and our approach can be seen as the estimation of a constrained nonlinear model. Moreover, we introduce a population variability by a constrained nonlinear mixed model. We then propose a constrained Least Squares method to fit the model for the single pollen tube case, and two methods, constrained Methods of Moments and constrained Restricted Maximum Likelihood (REML) to fit the model for the multiple pollen tubes case. The performances of all three methods are studied through simulations and are used on an in-house multiple pollen tubes dataset generated at UC Riverside.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint