Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2026arXiv

CoVSpec: Efficient Device-Edge Co-Inference for Vision-Language Models via Speculative Decoding

Vision-language models (VLMs) have demonstrated strong capabilities in multimodal perception and reasoning. However, deploying large VLMs on mobile devices remains challenging due to their substantial computational and memory demands. A practical alternative is device-edge co-inference, where a lightweight draft VLM on the mobile device collaborates with a larger target VLM on the edge server via speculative decoding. Nevertheless, directly extending speculative decoding to VLMs suffers from severe inefficiency due to excessive visual-token computation and high communication overhead. To address these challenges, we propose CoVSpec, an efficient collaborative speculative decoding framework for VLM inference. Specifically, we first develop a training-free visual token reduction framework that prunes redundant visual tokens on the mobile device by jointly considering query relevance, token activity, and low-rank dependency. Moreover, we design an adaptive drafting strategy that dynamically adjusts both the verification frequency and the draft length. In addition, we introduce a parallel branching mechanism with decoupled verification-correction to improve draft-side utilization during target-side verification and reduce correction-related transmission overhead. Experiments on multiple benchmarks show that CoVSpec achieves up to 2.21x higher throughput than target-only inference and reduces communication overhead by more than 96% compared with baselines, without compromising task accuracy.

preprint2026arXiv

Enabling Training-Free Semantic Communication Systems with Generative Diffusion Models

Semantic communication (SemCom) has recently emerged as a promising paradigm for next-generation wireless systems. Empowered by advanced artificial intelligence (AI) technologies, SemCom has achieved significant improvements in transmission quality and efficiency. However, existing SemCom systems either rely on training over large datasets and specific channel conditions or suffer from performance degradation under channel noise when operating in a training-free manner. To address these issues, we explore the use of generative diffusion models (GDMs) as training-free SemCom systems. Specifically, we design a semantic encoding and decoding method based on the inversion and sampling process of the denoising diffusion implicit model (DDIM), which introduces a two-stage forward diffusion process, split between the transmitter and receiver to enhance robustness against channel noise. Moreover, we optimize sampling steps to compensate for the increased noise level caused by channel noise. We also conduct a brief analysis to provide insights about this design. Simulations on the Kodak dataset validate that the proposed system outperforms the existing baseline SemCom systems across various metrics.

preprint2026arXiv

Evolving Token Communication with Parametric Memory Network

Token communication has emerged as a promising framework for efficient wireless transmission by representing source data as compact semantic tokens. However, transmitting full semantic tokens still incurs considerable communication overhead. In this paper, we propose an evolving semantic token communication system with a parametric memory network over MIMO fading channels. Specifically, only an equal-length prefix of each semantic token is transmitted, which reduces transmission cost while preserving a consistent token structure for receiver-side recovery. At the receiver, a parametric memory network is introduced to reconstruct the missing suffix information from the received token prefixes, where semantic memory is stored implicitly in the network parameters. To realize this design, full semantic tokens are first organized into a codebook, and truncated tokens are paired with the codeword labels of their corresponding full tokens. Based on these token-label pairs, kNN-based teacher distributions are constructed to fine-tune a pretrained GPT-2-based recovery module, which learns to infer the codeword distribution of each incomplete token and recover the corresponding complete semantic token. In addition, an online evolution strategy is developed to periodically update the parametric memory network and the entire system using newly observed test samples, thereby improving adaptability under distribution shifts. Experimental results demonstrate that the proposed method consistently outperforms the existing evolving memory benchmark under different channel conditions and channel bandwidth ratios, with up to 1.09 dB PSNR improvement.

preprint2026arXiv

Rethinking Secure Semantic Communications in the Age of Generative and Agentic AI: Threats and Opportunities

Semantic communication (SemCom) improves communication efficiency by transmitting task-relevant information instead of raw bits and is expected to be a key technology for 6G networks. Recent advances in generative AI (GenAI) further enhance SemCom by enabling robust semantic encoding and decoding under limited channel conditions. However, these efficiency gains also introduce new security and privacy vulnerabilities. Due to the broadcast nature of wireless channels, eavesdroppers can also use powerful GenAI-based semantic decoders to recover private information from intercepted signals. Moreover, rapid advances in agentic AI enable eavesdroppers to perform long-term and adaptive inference through the integration of memory, external knowledge, and reasoning capabilities. This allows eavesdroppers to further infer user private behavior and intent beyond the transmitted content. Motivated by these emerging challenges, this paper comprehensively rethinks the security and privacy of SemCom systems in the age of generative and agentic AI. We first present a systematic taxonomy of eavesdropping threat models in SemCom systems. Then, we provide insights into how GenAI and agentic AI can enhance eavesdropping threats. Meanwhile, we also highlight potential opportunities for leveraging GenAI and agentic AI to design privacy-preserving SemCom systems.

preprint2026arXiv

TCLNet: A Hybrid Transformer-CNN Framework Leveraging Language Models as Lossless Compressors for CSI Feedback

In frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems, downlink channel state information (CSI) plays a crucial role in achieving high spectrum and energy efficiency. However, the CSI feedback overhead becomes a major bottleneck as the number of antennas increases. Although existing deep learning-based CSI compression methods have shown great potential, they still face limitations in capturing both local and global features of CSI, thereby limiting achievable compression efficiency. To address these issues, we propose TCLNet, a unified CSI compression framework that integrates a hybrid Transformer-CNN architecture for lossy compression with a hybrid language model (LM) and factorized model (FM) design for lossless compression. The lossy module jointly exploits local features and global context, while the lossless module adaptively switches between context-aware coding and parallel coding to optimize the rate-distortion-complexity (RDC) trade-off. Extensive experiments on both real-world and simulated datasets demonstrate that the proposed TCLNet outperforms existing approaches in terms of reconstruction accuracy and transmission efficiency, achieving up to a 5 dB performance gain across diverse scenarios. Moreover, we show that large language models (LLMs) can be leveraged as zero-shot CSI lossless compressors via carefully designed prompts.

preprint2026arXiv

When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context

Context: Retrieval-augmented code generation relies on cross-file repository context, but retrieved snippets may come from obsolete project states. Objectives: We study whether temporally stale repository snippets act as harmless noise or actively induce current-state-incompatible code. Methods: We conduct a controlled diagnostic study on a curated 17-sample set of production-helper signature changes from five Python repositories. For each sample, we compare current-only, stale-only, no-retrieval, and mixed current/stale retrieval conditions under prompts that hide commit freshness and expected current signatures. Results: Under neutralized prompts, stale-only retrieval induces stale helper references on 15/17 Qwen2.5-Coder-7B-Instruct samples and 13/17 gpt-4.1-mini samples, corresponding to 88.2 and 76.5 percentage-point increases over current-only retrieval. No retrieval produces zero stale references but only 1/17 passing completions. The two models share 75.0% Jaccard overlap among stale-triggering samples, and mixed conditions show that adding valid current evidence largely rescues stale-only failures. Conclusion: Temporal validity of retrieved repository context is a distinct diagnostic variable for Code RAG robustness: stale context can actively bias models toward obsolete repository state rather than merely removing useful evidence.

preprint2024arXiv

Point Cloud in the Air

Acquisition and processing of point clouds (PCs) is a crucial enabler for many emerging applications reliant on 3D spatial data, such as robot navigation, autonomous vehicles, and augmented reality. In most scenarios, PCs acquired by remote sensors must be transmitted to an edge server for fusion, segmentation, or inference. Wireless transmission of PCs not only puts on increased burden on the already congested wireless spectrum, but also confronts a unique set of challenges arising from the irregular and unstructured nature of PCs. In this paper, we meticulously delineate these challenges and offer a comprehensive examination of existing solutions while candidly acknowledging their inherent limitations. In response to these intricacies, we proffer four pragmatic solution frameworks, spanning advanced techniques, hybrid schemes, and distributed data aggregation approaches. In doing so, our goal is to chart a path toward efficient, reliable, and low-latency wireless PC transmission.

preprint2022arXiv

A resource-efficient deep learning framework for low-dose brain PET image reconstruction and analysis

18F-fluorodeoxyglucose (18F-FDG) Positron Emission Tomography (PET) imaging usually needs a full-dose radioactive tracer to obtain satisfactory diagnostic results, which raises concerns about the potential health risks of radiation exposure, especially for pediatric patients. Reconstructing the low-dose PET (L-PET) images to the high-quality full-dose PET (F-PET) ones is an effective way that both reduces the radiation exposure and remains diagnostic accuracy. In this paper, we propose a resource-efficient deep learning framework for L-PET reconstruction and analysis, referred to as transGAN-SDAM, to generate F-PET from corresponding L-PET, and quantify the standard uptake value ratios (SUVRs) of these generated F-PET at whole brain. The transGAN-SDAM consists of two modules: a transformer-encoded Generative Adversarial Network (transGAN) and a Spatial Deformable Aggregation Module (SDAM). The transGAN generates higher quality F-PET images, and then the SDAM integrates the spatial information of a sequence of generated F-PET slices to synthesize whole-brain F-PET images. Experimental results demonstrate the superiority and rationality of our approach.

preprint2022arXiv

AsyncFedED: Asynchronous Federated Learning with Euclidean Distance based Adaptive Weight Aggregation

In an asynchronous federated learning framework, the server updates the global model once it receives an update from a client instead of waiting for all the updates to arrive as in the synchronous setting. This allows heterogeneous devices with varied computing power to train the local models without pausing, thereby speeding up the training process. However, it introduces the stale model problem, where the newly arrived update was calculated based on a set of stale weights that are older than the current global model, which may hurt the convergence of the model. In this paper, we present an asynchronous federated learning framework with a proposed adaptive weight aggregation algorithm, referred to as AsyncFedED. To the best of our knowledge this aggregation method is the first to take the staleness of the arrived gradients, measured by the Euclidean distance between the stale model and the current global model, and the number of local epochs that have been performed, into account. Assuming general non-convex loss functions, we prove the convergence of the proposed method theoretically. Numerical results validate the effectiveness of the proposed AsyncFedED in terms of the convergence rate and model accuracy compared to the existing methods for three considered tasks.

preprint2022arXiv

Generalisation of continuous time random walk to anomalous diffusion MRI models with an age-related evaluation of human corpus callosum

Diffusion MRI measures of the human brain provide key insight into microstructural variations across individuals and into the impact of central nervous system diseases and disorders. One approach to extract information from diffusion signals has been to use biologically relevant analytical models to link millimetre scale diffusion MRI measures with microscale influences. The other approach has been to represent diffusion as an anomalous transport process and infer microstructural information from the different anomalous diffusion equation parameters. In this study, we investigated how parameters of various anomalous diffusion models vary with age in the human brain white matter, particularly focusing on the corpus callosum. We first unified several established anomalous diffusion models (the super-diffusion, sub-diffusion, quasi-diffusion and fractional Bloch-Torrey models) under the continuous time random walk modelling framework. This unification allows a consistent parameter fitting strategy to be applied from which meaningful model parameter comparisons can be made. We then provided a novel way to derive the diffusional kurtosis imaging (DKI) model, which is shown to be a degree two approximation of the sub-diffusion model. This link between the DKI and sub-diffusion models led to a new robust technique for generating maps of kurtosis and diffusivity using the sub-diffusion parameters \b{eta}_SUB and D_SUB. Superior tissue contrast is achieved in kurtosis maps based on the sub-diffusion model. 7T diffusion weighted MRI data for 65 healthy participants in the age range 19-78 years was used in this study. Results revealed that anomalous diffusion model parameters α and \b{eta} have shown consistent positive correlation with age in the corpus callosum, indicating α and \b{eta} are sensitive to tissue microstructural changes in aging.

preprint2022arXiv

OTFPF: Optimal Transport-Based Feature Pyramid Fusion Network for Brain Age Estimation with 3D Overlapped ConvNeXt

Chronological age of healthy brain is able to be predicted using deep neural networks from T1-weighted magnetic resonance images (T1 MRIs), and the predicted brain age could serve as an effective biomarker for detecting aging-related diseases or disorders. In this paper, we propose an end-to-end neural network architecture, referred to as optimal transport based feature pyramid fusion (OTFPF) network, for the brain age estimation with T1 MRIs. The OTFPF consists of three types of modules: Optimal Transport based Feature Pyramid Fusion (OTFPF) module, 3D overlapped ConvNeXt (3D OL-ConvNeXt) module and fusion module. These modules strengthen the OTFPF network's understanding of each brain's semi-multimodal and multi-level feature pyramid information, and significantly improve its estimation performances. Comparing with recent state-of-the-art models, the proposed OTFPF converges faster and performs better. The experiments with 11,728 MRIs aged 3-97 years show that OTFPF network could provide accurate brain age estimation, yielding mean absolute error (MAE) of 2.097, Pearson's correlation coefficient (PCC) of 0.993 and Spearman's rank correlation coefficient (SRCC) of 0.989, between the estimated and chronological ages. Widespread quantitative experiments and ablation experiments demonstrate the superiority and rationality of OTFPF network. The codes and implement details will be released on GitHub: https://github.com/ZJU-Brain/OTFPF after final decision.

preprint2022arXiv

Semantic-aware Speech to Text Transmission with Redundancy Removal

Deep learning (DL) based semantic communication methods have been explored for the efficient transmission of images, text, and speech in recent years. In contrast to traditional wireless communication methods that focus on the transmission of abstract symbols, semantic communication approaches attempt to achieve better transmission efficiency by only sending the semantic-related information of the source data. In this paper, we consider semantic-oriented speech to text transmission. We propose a novel end-to-end DL-based transceiver, which includes an attention-based soft alignment module and a redundancy removal module to compress the transmitted data. In particular, the former extracts only the text-related semantic features, and the latter further drops the semantically redundant content, greatly reducing the amount of semantic redundancy compared to existing methods. We also propose a two-stage training scheme, which speeds up the training of the proposed DL model. The simulation results indicate that our proposed method outperforms current methods in terms of the accuracy of the received text and transmission efficiency. Moreover, the proposed method also has a smaller model size and shorter end-to-end runtime.

preprint2022arXiv

Semantic-preserved Communication System for Highly Efficient Speech Transmission

Deep learning (DL) based semantic communication methods have been explored for the efficient transmission of images, text, and speech in recent years. In contrast to traditional wireless communication methods that focus on the transmission of abstract symbols, semantic communication approaches attempt to achieve better transmission efficiency by only sending the semantic-related information of the source data. In this paper, we consider semantic-oriented speech transmission which transmits only the semantic-relevant information over the channel for the speech recognition task, and a compact additional set of semantic-irrelevant information for the speech reconstruction task. We propose a novel end-to-end DL-based transceiver which extracts and encodes the semantic information from the input speech spectrums at the transmitter and outputs the corresponding transcriptions from the decoded semantic information at the receiver. For the speech to speech transmission, we further include a CTC alignment module that extracts a small number of additional semantic-irrelevant but speech-related information for the better reconstruction of the original speech signals at the receiver. The simulation results confirm that our proposed method outperforms current methods in terms of the accuracy of the predicted text for the speech to text transmission and the quality of the recovered speech signals for the speech to speech transmission, and significantly improves transmission efficiency. More specifically, the proposed method only sends 16% of the amount of the transmitted symbols required by the existing methods while achieving about 10% reduction in WER for the speech to text transmission. For the speech to speech transmission, it results in an even more remarkable improvement in terms of transmission efficiency with only 0.2% of the amount of the transmitted symbols required by the existing method.

preprint2021arXiv

RCoNet: Deformable Mutual Information Maximization and High-order Uncertainty-aware Learning for Robust COVID-19 Detection

The novel 2019 Coronavirus (COVID-19) infection has spread world widely and is currently a major healthcare challenge around the world. Chest Computed Tomography (CT) and X-ray images have been well recognized to be two effective techniques for clinical COVID-19 disease diagnoses. Due to faster imaging time and considerably lower cost than CT, detecting COVID-19 in chest X-ray (CXR) images is preferred for efficient diagnosis, assessment and treatment. However, considering the similarity between COVID-19 and pneumonia, CXR samples with deep features distributed near category boundaries are easily misclassified by the hyper-planes learned from limited training data. Moreover, most existing approaches for COVID-19 detection focus on the accuracy of prediction and overlook the uncertainty estimation, which is particularly important when dealing with noisy datasets. To alleviate these concerns, we propose a novel deep network named {\em RCoNet$^k_s$} for robust COVID-19 detection which employs {\em Deformable Mutual Information Maximization} (DeIM), {\em Mixed High-order Moment Feature} (MHMF) and {\em Multi-expert Uncertainty-aware Learning} (MUL). With DeIM, the mutual information (MI) between input data and the corresponding latent representations can be well estimated and maximized to capture compact and disentangled representational characteristics. Meanwhile, MHMF can fully explore the benefits of using high-order statistics and extract discriminative features of complex distributions in medical imaging. Finally, MUL creates multiple parallel dropout networks for each CXR image to evaluate uncertainty and thus prevent performance degradation caused by the noise in the data.

preprint2020arXiv

Distributed Deep Convolutional Compression for Massive MIMO CSI Feedback

Massive multiple-input multiple-output (MIMO) systems require downlink channel state information (CSI) at the base station (BS) to achieve spatial diversity and multiplexing gains. In a frequency division duplex (FDD) multiuser massive MIMO network, each user needs to compress and feedback its downlink CSI to the BS. The CSI overhead scales with the number of antennas, users and subcarriers, and becomes a major bottleneck for the overall spectral efficiency. In this paper, we propose a deep learning (DL)-based CSI compression scheme, called DeepCMC, composed of convolutional layers followed by quantization and entropy coding blocks. In comparison with previous DL-based CSI reduction structures, DeepCMC proposes a novel fully-convolutional neural network (NN) architecture, with residual layers at the decoder, and incorporates quantization and entropy coding blocks into its design. DeepCMC is trained to minimize a weighted rate-distortion cost, which enables a trade-off between the CSI quality and its feedback overhead. Simulation results demonstrate that DeepCMC outperforms the state of the art CSI compression schemes in terms of the reconstruction quality of CSI for the same compression rate. We also propose a distributed version of DeepCMC for a multi-user MIMO scenario to encode and reconstruct the CSI from multiple users in a distributed manner. Distributed DeepCMC not only utilizes the inherent CSI structures of a single MIMO user for compression, but also benefits from the correlations among the channel matrices of nearby users to further improve the performance in comparison with DeepCMC. We also propose a reduced-complexity training method for distributed DeepCMC, allowing to scale it to multiple users, and suggest a cluster-based distributed DeepCMC approach for practical implementation.

preprint2019arXiv

CNN-based Analog CSI Feedback in FDD MIMO-OFDM Systems

Massive multiple-input multiple-output (MIMO) systems require downlink channel state information (CSI) at the base station (BS) to better utilize the available spatial diversity and multiplexing gains. However, in a frequency division duplex (FDD) massive MIMO system, CSI feedback overhead degrades the overall spectral efficiency. Convolutional neural network (CNN)-based CSI feedback compression schemes has received a lot of attention recently due to significant improvements in compression efficiency; however, they still require reliable feedback links to convey the compressed CSI information to the BS. Instead, we propose here a CNN-based analog feedback scheme, called AnalogDeepCMC, which directly maps the downlink CSI to uplink channel input. Corresponding noisy channel outputs are used by another CNN to reconstruct the DL channel estimate. Not only the proposed outperforms existing digital CSI feedback schemes in terms of the achievable downlink rate, but also simplifies the operation as it does not require explicit quantization, coding and modulation, and provides a low-latency alternative particularly in rapidly changing MIMO channels, where the CSI needs to be estimated and fed back periodically.