Researcher profile

Weixuan Chen

Weixuan Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

Evolving Token Communication with Parametric Memory Network

Token communication has emerged as a promising framework for efficient wireless transmission by representing source data as compact semantic tokens. However, transmitting full semantic tokens still incurs considerable communication overhead. In this paper, we propose an evolving semantic token communication system with a parametric memory network over MIMO fading channels. Specifically, only an equal-length prefix of each semantic token is transmitted, which reduces transmission cost while preserving a consistent token structure for receiver-side recovery. At the receiver, a parametric memory network is introduced to reconstruct the missing suffix information from the received token prefixes, where semantic memory is stored implicitly in the network parameters. To realize this design, full semantic tokens are first organized into a codebook, and truncated tokens are paired with the codeword labels of their corresponding full tokens. Based on these token-label pairs, kNN-based teacher distributions are constructed to fine-tune a pretrained GPT-2-based recovery module, which learns to infer the codeword distribution of each incomplete token and recover the corresponding complete semantic token. In addition, an online evolution strategy is developed to periodically update the parametric memory network and the entire system using newly observed test samples, thereby improving adaptability under distribution shifts. Experimental results demonstrate that the proposed method consistently outperforms the existing evolving memory benchmark under different channel conditions and channel bandwidth ratios, with up to 1.09 dB PSNR improvement.

preprint2026arXiv

OmniEncoder: See, Hear, and Feel Continuous Motion Like Humans With One Encoder

Recent advances in omni-modal large language models have enabled remarkable progress in joint vision-audio understanding. However, prevailing architectures rely on modality-specific encoders with a \emph{video-coarse, audio-dense} design -- sampling visual frames at 1--2 fps while processing audio waveforms at 25 fps -- resulting in systems that perceive video \emph{frame by frame, modality by modality} rather than holistically as humans do. Such a discrepancy leaves models with impoverished cross-modal interaction during encoding and an inability to capture fine-grained visual motion. To bridge this gap, we present \textbf{Omni-Encoder, a unified Transformer backbone designed to co-embed visual and audio signals at a symmetrical 25 fps} within a shared latent space. This architecture leverages three core innovations -- the Omni-Encoder Token Template, Omni-RoPE, and Temporal Window Shifting -- to effectively reconcile the dual challenges of modality disentanglement and computational efficiency. Experiments demonstrate that, compared to the modality-specific baseline Qwen2.5-Omni under the same input token budget to the LLM decoder, Omni-Encoder delivers substantial gains on visual continuous understanding tasks -- such as sign language recognition and fine-grained sports action analysis -- while maintaining competitive performance on established audio-visual benchmarks such as AVQA and Speaker Identification and Localization. These results suggest that unified omnivorous encoding offers a promising direction for building omni-modal models that more closely reflect the integrated nature of human perception.

preprint2020arXiv

Neural Mesh Refiner for 6-DoF Pose Estimation

How can we effectively utilise the 2D monocular image information for recovering the 6D pose (6-DoF) of the visual objects? Deep learning has shown to be effective for robust and real-time monocular pose estimation. Oftentimes, the network learns to regress the 6-DoF pose using a naive loss function. However, due to a lack of geometrical scene understanding from the directly regressed pose estimation, there are misalignments between the rendered mesh from the 3D object and the 2D instance segmentation result, e.g., bounding boxes and masks prediction. This paper bridges the gap between 2D mask generation and 3D location prediction via a differentiable neural mesh renderer. We utilise the overlay between the accurate mask prediction and less accurate mesh prediction to iteratively optimise the direct regressed 6D pose information with a focus on translation estimation. By leveraging geometry, we demonstrate that our technique significantly improves direct regression performance on the difficult task of translation estimation and achieve the state of the art results on Peking University/Baidu - Autonomous Driving dataset and the ApolloScape 3D Car Instance dataset. The code can be found at \url{https://bit.ly/2IRihfU}.