Source author record

Yi Zhong

Yi Zhong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision eess.SP Machine Learning Sound

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models

Low-bit post-training quantization (PTQ) is a pivotal technique for deploying Vision-Language Models (VLMs) on resource-constrained devices. However, existing PTQ methods often degrade VLMs' accuracy due to the heterogeneous activation distributions of text and vision modalities during quantization. We find that this cross-modal heterogeneity is distributed unevenly across channels: a small subset of channels contains most modality-specific outliers, and these outliers typically reside in different channels for each modality. Motivated by this, we propose SplitQ, a channel-Splitting-driven post-training Quantization framework. At its core, SplitQ introduces a novel Modality-specific Outlier Channel Decoupling (MOCD) module that effectively isolates salient modality-specific outlier channels with minimal overhead. To further address the remaining cross-modal distribution discrepancies, we design an Adaptive Cross-Modal Calibration (ACC) module that employs dual lightweight learnable branches to dynamically mitigate modality-induced quantization errors. Extensive experiments on popular VLMs demonstrate that SplitQ significantly outperforms existing approaches across 6 popular multi-modal datasets under all evaluated quantization settings, including W4A8, W4A4, W3A3, and W3A2. Notably, SplitQ preserves 93.5% of FP16 performance under the challenging W3A3 setting (69.5 vs. 74.3), pushing the efficiency frontier for deploying advanced VLMs. Our code is available at https://github.com/EMVision-NK/SplitQ

preprint2026arXiv

Energy-Based Cell Association in Nonuniform Renewable Energy-Powered Cellular Networks: Analysis and Optimization of Carbon Efficiency

The increasing global push for carbon reduction highlights the importance of integrating renewable energy into the supply chain of cellular networks. However, due to the stochastic nature of renewable energy generation and the uneven load distribution across base stations, the utilization rate of renewable energy remains low. To address these challenges, this paper investigates the trade-off between carbon emissions and downlink throughput in cellular networks, offering insights into optimizing both network performance and sustainability. The renewable energy state of base station batteries and the number of occupied channels are modeled as a quasi-birth-death process. We construct models for the probability of channel blocking, average successful transmission probability for users, downlink throughput, carbon emissions, and carbon efficiency based on stochastic geometry. Based on these analyses, an energy-based cell association scheme is proposed to optimize the carbon efficiency of cellular networks. The results show that, compared to the closest cell association scheme, the energy-based cell association scheme is capable of reducing the carbon emissions of the network by 13.0% and improving the carbon efficiency by 11.3%.

preprint2026arXiv

SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition

Spiking neural networks (SNNs) offer a promising path toward energy-efficient speech command recognition (SCR) by leveraging their event-driven processing paradigm. However, existing SNN-based SCR methods often struggle to capture rich temporal dependencies and contextual information from speech due to limited temporal modeling and binary spike-based representations. To address these challenges, we first introduce the multi-view spiking temporal-aware self-attention (MSTASA) module, which combines effective spiking temporal-aware attention with a multi-view learning framework to model complementary temporal dependencies in speech commands. Building on MSTASA, we further propose SpikCommander, a fully spike-driven transformer architecture that integrates MSTASA with a spiking contextual refinement channel MLP (SCR-MLP) to jointly enhance temporal context modeling and channel-wise feature integration. We evaluate our method on three benchmark datasets: the Spiking Heidelberg Dataset (SHD), the Spiking Speech Commands (SSC), and the Google Speech Commands V2 (GSC). Extensive experiments demonstrate that SpikCommander consistently outperforms state-of-the-art (SOTA) SNN approaches with fewer parameters under comparable time steps, highlighting its effectiveness and efficiency for robust speech command recognition.

Yi Zhong

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models

Energy-Based Cell Association in Nonuniform Renewable Energy-Powered Cellular Networks: Analysis and Optimization of Carbon Efficiency

SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition