Researcher profile

Weitao Xu

Weitao Xu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

AGWM: Affordance-Grounded World Models for Environments with Compositional Prerequisites

In model-based learning, the agent learns behaviors by simulating trajectories based on world model predictions. Standard world models typically learn a stationary transition function that maps states and actions to next states, when an action and an outcome frequently co-occur in training data, the model tends to internalize this correlation as a general causal rule while ignoring action preconditions. In interactive environments, however, agent actions can reshape the future affordance space. At each timestep, an action may becomes executable only after its prerequisites are met, or non-executable when they are destroyed. We term such events structure-changing events (SC events). As a result, a conventional world model often fails to determine whether a given action is executable in the current state, especially in multi-step predictions. Each imagined step is conditioned on an incorrect affordance state, and therefore the prediction error compounds over the rollout horizon. In this paper, we propose AGWM (Affordance-Grounded World Model), which learns an abstract affordance structure represented as a DAG of prerequisite dependencies to explicitly track the dynamic executability of actions. Experiments on game-based simulated environments demonstrate the effectiveness of our method by achieving lower multi-step prediction error, better generalization to novel configurations, and improved interpretability.

preprint2026arXiv

SwiftChannel: Algorithm-Hardware Co-Design for Deep Learning-Based 5G Channel Estimation

Channel estimation is crucial in 5G communication networks for optimizing transmission parameters and ensuring reliable, high-speed communication. However, the use of multiple-input and multiple-output (MIMO) and millimeter-wave (mmWave) in 5G networks presents challenges in achieving accurate estimation under strict latency requirements on resource-limited hardware platforms. To address these challenges, we propose SwiftChannel, an algorithm-hardware co-design framework that integrates a hardware-friendly deep learning-based channel estimator with a dedicated accelerator. Our approach employs a convolutional neural network enhanced with a parameter-free attention mechanism, which effectively reconstructs full-resolution spatial-frequency domain channel matrices from low-resolution least squares (LS) estimates. We further develop a multi-stage model compression pipeline combining knowledge distillation, convolution re-parameterization, and quantization-aware training, resulting in substantial model size reduction with negligible accuracy loss. The hardware accelerator, implementing the compressed model and the LS estimator on FPGA platforms using High-level Synthesis (HLS), features a fine-grained pipeline architecture and optimized dataflow strategies. Tested on a Zynq UltraScale+ RFSoC, the accelerator achieves sub-millisecond latency, providing up to 24x speed-up and over 33x improvement in energy efficiency compared to GPU-based solutions. Extensive evaluations demonstrate that the proposed design generalizes not only across various noise levels and user mobilities, but also to a variety of unseen channel profiles, outperforming state-of-the-art baselines. By unifying algorithmic innovation with hardware-aware design, our work presents a future-proof channel estimation solution for 5G MIMO systems.

preprint2026arXiv

ViM-Q: Scalable Algorithm-Hardware Co-Design for Vision Mamba Model Inference on FPGA

Vision Mamba (ViM) models offer a compelling efficiency advantage over Transformers by leveraging the linear complexity of State Space Models (SSMs), yet efficiently deploying them on FPGAs remains challenging. Linear layers struggle with dynamic activation outliers that render static quantization ineffective, while uniform quantization fails to capture the weight distribution at low bit-widths. Furthermore, while associative scan accelerates SSMs on GPUs, its memory access patterns are misaligned with the streaming dataflow required by FPGAs. To address these challenges, we present ViM-Q, a scalable algorithm-hardware co-design for end-to-end ViM inference on the edge. We introduce a hardware-aware quantization scheme combining dynamic per-token activation quantization and per-channel smoothing to mitigate outliers, alongside a custom 4-bit per-block Additive Power-of-Two (APoT) weight quantization. The models are deployed on a runtime-parameterizable FPGA accelerator featuring a linear engine employing a Lookup-Table (LUT) unit to replace multiplications with shift-add operations, and a fine-grained pipelined SSM engine that parallelizes the state dimension while preserving sequential recurrence. Crucially, the hardware supports runtime configuration, adapting to diverse dimensions and input resolutions across the ViM family. Implemented on an AMD ZCU102 FPGA, ViM-Q achieves an average 4.96x speedup and 59.8x energy efficiency gain over a quantized NVIDIA RTX 3090 GPU baseline for low-batch inference on ViM-tiny. This co-design shows a viable path for deploying ViM models on resource-constrained edge devices.

preprint2020arXiv

A Multi-view CNN-based Acoustic Classification System for Automatic Animal Species Identification

Automatic identification of animal species by their vocalization is an important and challenging task. Although many kinds of audio monitoring system have been proposed in the literature, they suffer from several disadvantages such as non-trivial feature selection, accuracy degradation because of environmental noise or intensive local computation. In this paper, we propose a deep learning based acoustic classification framework for Wireless Acoustic Sensor Network (WASN). The proposed framework is based on cloud architecture which relaxes the computational burden on the wireless sensor node. To improve the recognition accuracy, we design a multi-view Convolution Neural Network (CNN) to extract the short-, middle-, and long-term dependencies in parallel. The evaluation on two real datasets shows that the proposed architecture can achieve high accuracy and outperforms traditional classification systems significantly when the environmental noise dominate the audio signal (low SNR). Moreover, we implement and deploy the proposed system on a testbed and analyse the system performance in real-world environments. Both simulation and real-world evaluation demonstrate the accuracy and robustness of the proposed acoustic classification system in distinguishing species of animals.

preprint2020arXiv

A Novel Emergency Light Based Smart Building Solution: Design, Implementation and Use Cases

Deployment of Internet of Things (IoT) in smart buildings has received considerable interest from both the academic community and commercial sectors. Unfortunately, widespread adoption of current smart building solutions is inhibited by the high costs associated with installation and maintenance. Moreover, different types of IoT devices from different manufacturers typically form distinct networks and data silos. There is a need to use a common backbone network that facilitates interoperability and seamless data exchange in a uniform way. In this paper, we present EMIoT, a novel solution for smart buildings that breaks these barriers by leveraging existing emergency lighting systems. In EMIoT, we embed a wireless LoRa module in each emergency light to turn them into wireless routers. EMIoT has been deployed in more than 50 buildings of different types in Sydney Australia and has been successfully running over two years. We present the design and implementation of EMIoT in this paper. Moreover, we use the deployment in a residential building as a use case to show the performance of EMIoT in real-world environments and share lessons learned. Finally, we discuss the advantages and disadvantages of EMIoT. This paper provides practical insights for IoT deployment in smart buildings for practitioners and solution providers.

preprint2020arXiv

Key Generation for Internet of Things: A Contemporary Survey

Key generation is a promising technique to bootstrap secure communications for the Internet of Things (IoT) devices that have no prior knowledge between each other. In the past few years, a variety of key generation protocols and systems have been proposed. In this survey, we review and categorise recent key generation systems based on a novel taxonomy. Then, we provide both quantitative and qualitative comparisons of existing approaches. We also discuss the security vulnerabilities of key generation schemes and possible countermeasures. Finally, we discuss the current challenges and point out several potential research directions.

preprint2020arXiv

Simultaneous Energy Harvesting and Gait Recognition using Piezoelectric Energy Harvester

Piezoelectric energy harvester, which generates electricity from stress or vibrations, is gaining increasing attention as a viable solution to extend battery life in wearables. Recent research further reveals that, besides generating energy, PEH can also serve as a passive sensor to detect human gait power-efficiently because its stress or vibration patterns are significantly influenced by the gait. However, as PEHs are not designed for precise measurement of motion, achievable gait recognition accuracy remains low with conventional classification algorithms. The accuracy deteriorates further when the generated electricity is stored simultaneously. To classify gait reliably while simultaneously storing generated energy, we make two distinct contributions. First, we propose a preprocessing algorithm to filter out the effect of energy storage on PEH electricity signal. Second, we propose a long short-term memory (LSTM) network-based classifier to accurately capture temporal information in gait-induced electricity generation. We prototype the proposed gait recognition architecture in the form factor of an insole and evaluate its gait recognition as well as energy harvesting performance with 20 subjects. Our results show that the proposed architecture detects human gait with 12% higher recall and harvests up to 127% more energy while consuming 38% less power compared to the state-of-the-art.