Researcher profile

Guangxu Zhu

Guangxu Zhu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments

Fine-tuning LLMs is necessary for various dedicated downstream tasks, but classic backpropagation-based fine-tuning methods require substantial GPU memory. To this end, a recent work, MeZO, which relies solely on forward passes to fine-tune LLMs, significantly reduces GPU requirements at the cost of slower convergence due to its indifference to loss landscapes. Standard solutions, such as Adam, explore loss landscapes by estimating the first- and second-order moments and storing them in memory to guide the model's movement through dimensions with lower curvature and vice versa. However, directly applying Adam negates MeZO's advantage as it will triple the memory requirement. In light of this, we propose AdaMeZO, a zeroth-order optimizer that leverages Adam-style first- and second-moment estimates without maintaining them in memory. We present a theoretical analysis of AdaMeZO, corroborated by extensive experiments demonstrating AdaMeZO's performance, showing that AdaMeZO can outperform MeZO while requiring up to $70\%$ fewer forward passes. Trajectory visualizations affirm AdaMeZO's ability to adapt to diverse loss landscapes.

preprint2026arXiv

FlexSpec: Frozen Drafts Meet Evolving Targets in Edge-Cloud Collaborative LLM Speculative Decoding

Deploying large language models (LLMs) in mobile and edge computing environments is constrained by limited on-device resources, scarce wireless bandwidth, and frequent model evolution. Although edge-cloud collaborative inference with speculative decoding (SD) can reduce end-to-end latency by executing a lightweight draft model at the edge and verifying it with a cloud-side target model, existing frameworks fundamentally rely on tight coupling between the two models. Consequently, repeated model synchronization introduces excessive communication overhead, increasing end-to-end latency, and ultimately limiting the scalability of SD in edge environments. To address these limitations, we propose FlexSpec, a communication-efficient collaborative inference framework tailored for evolving edge-cloud systems. The core design of FlexSpec is a shared-backbone architecture that allows a single and static edge-side draft model to remain compatible with a large family of evolving cloud-side target models. By decoupling edge deployment from cloud-side model updates, FlexSpec eliminates the need for edge-side retraining or repeated model downloads, substantially reducing communication and maintenance costs. Furthermore, to accommodate time-varying wireless conditions and heterogeneous device constraints, we develop a channel-aware adaptive speculation mechanism that dynamically adjusts the speculative draft length based on real-time channel state information and device energy budgets. Extensive experiments demonstrate that FlexSpec achieves superior performance compared to conventional SD approaches in terms of inference efficiency.

preprint2022arXiv

Accelerating Edge Intelligence via Integrated Sensing and Communication

Realizing edge intelligence consists of sensing, communication, training, and inference stages. Conventionally, the sensing and communication stages are executed sequentially, which results in excessive amount of dataset generation and uploading time. This paper proposes to accelerate edge intelligence via integrated sensing and communication (ISAC). As such, the sensing and communication stages are merged so as to make the best use of the wireless signals for the dual purpose of dataset generation and uploading. However, ISAC also introduces additional interference between sensing and communication functionalities. To address this challenge, this paper proposes a classification error minimization formulation to design the ISAC beamforming and time allocation. The globally optimal solution is derived via the rank-1 guaranteed semidefinite relaxation, and performance analysis is performed to quantify the ISAC gain over that of conventional edge intelligence. Simulation results are provided to verify the effectiveness of the proposed ISAC-assisted edge intelligence system. Interestingly, we find that ISAC is always beneficial, when the duration of generating a sample is more than the duration of uploading a sample. Otherwise, the ISAC gain can vanish or even be negative. Nevertheless, we still derive a sufficient condition, under which a positive ISAC gain is feasible.

preprint2022arXiv

An Overview on Over-the-Air Federated Edge Learning

Over-the-air federated edge learning (Air-FEEL) has emerged as a promising solution to support edge artificial intelligence (AI) in future beyond 5G (B5G) and 6G networks. In Air-FEEL, distributed edge devices use their local data to collaboratively train AI models while preserving data privacy, in which the over-the-air model/gradient aggregation is exploited for enhancing the learning efficiency. This article provides an overview on the state of the art of Air-FEEL. First, we present the basic principle of Air-FEEL, and introduce the technical challenges for Air-FEEL design due to the over-the-air aggregation errors, as well as the resource and data heterogeneities at edge devices. Next, we present the fundamental performance metrics for Air-FEEL, and review resource management solutions and design considerations for enhancing the Air-FEEL performance. Finally, several interesting research directions are pointed out to motivate future work.

preprint2022arXiv

Integrated Sensing, Communication, and Computation Over-the-Air: MIMO Beamforming Design

To support the unprecedented growth of the Internet of Things (IoT) applications, tremendous data need to be collected by the IoT devices and delivered to the server for further computation. By utilizing the same signals for both radar sensing and data communication, the integrated sensing and communication (ISAC) technique has broken the barriers between data collection and delivery in the physical layer. By exploiting the analog-wave addition in a multi-access channel, over-the-air computation (AirComp) enables function computation via transmissions in the physical layer. The promising performance of ISAC and AirComp motivates the current work on developing a framework called integrated sensing, communication, and computation over-the-air (ISCCO). The performance metrics of radar sensing and AirComp are evaluated by the mean squared errors of the estimated target response matrix and the received computation results, respectively. The design challenge of MIMO ISCCO lies in the joint optimization of beamformers for sensing, communication, and computation at both the IoT devices and the server, which results in a non-convex problem. To solve this problem, an algorithmic solution based on the technique of semidefinite relaxation is proposed. The use case of target location estimation based on ISCCO is demonstrated in simulation to show the performance superiority.

preprint2022arXiv

Low-Latency Cooperative Spectrum Sensing via Truncated Vertical Federated Learning

In recent years, the exponential increase in the demand of wireless data transmission rises the urgency for accurate spectrum sensing approaches to improve spectrum efficiency. The unreliability of conventional spectrum sensing methods by using measurements from a single secondary user (SU) has motivated research on cooperative spectrum sensing (CSS). In this work, we propose a vertical federated learning (VFL) framework to exploit the distributed features across multiple SUs without compromising data privacy. However, the repetitive training process in VFL faces the issue of high communication latency. To accelerate the training process, we propose a truncated vertical federated learning (T-VFL) algorithm, where the training latency is highly reduced by integrating the standard VFL algorithm with a channel-aware user scheduling policy. The convergence performance of T-VFL is provided via mathematical analysis and justified by simulation results. Moreover, to guarantee the convergence performance of the T-VFL algorithm, we conclude three design rules on the neural architectures used under the VFL framework, whose effectiveness is proved through simulations.

preprint2022arXiv

Over-the-Air Computation with Imperfect Channel State Information

This paper investigates the effect of imperfect channel state information (CSI) on the over-the-air computation (AirComp) system, in which multiple wireless devices (WDs) send individual messages to one access point (AP) for distributed functional computation. By particularly considering the channel estimation errors, we jointly design the transmit coefficients at the WDs and the receive strategy at the AP, for minimizing the computation mean squared error (MSE). First, we consider the single-input single-output (SISO) case with each WD and AP equipped with one single antenna, in which the globally optimal solution to the computation MSE minimization problem is obtained in closed form. Next, we consider the single-input multiple-output (SIMO) case with multiple receive antennas at the AP, in which a high-quality solution is obtained based on alternating optimization and convex optimization. For both cases, the optimized power control solution at the WDs follows a threshold-based regularized channel inversion structure; while for the SIMO case, the receive beamforming at the AP follows a sum-minimum MSE (MMSE) structure. It is shown that with finite receive antennas, a non-zero computation MSE is inevitable due to the channel estimation errors even when the WDs' transmit powers become infinity; while with massive receive antennas, a vanishing MSE is achievable when the channel vectors are independent and identically distributed. Finally, numerical results are provided to demonstrate the effectiveness of the proposed designs.

preprint2022arXiv

Task-Oriented Sensing, Computation, and Communication Integration for Multi-Device Edge AI

This paper studies a new multi-device edge artificial-intelligent (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC) to enable low-latency intelligent services at the network edge. In this system, multiple ISAC devices perform radar sensing to obtain multi-view data, and then offload the quantized version of extracted features to a centralized edge server, which conducts model inference based on the cascaded feature vectors. Under this setup and by considering classification tasks, we measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain, which is defined as the distance of two classes in the Euclidean feature space under normalized covariance. To maximize the discriminant gain, we first quantify the influence of the sensing, computation, and communication processes on it with a derived closed-form expression. Then, an end-to-end task-oriented resource management approach is developed by integrating the three processes into a joint design. This integrated sensing, computation, and communication (ISCC) design approach, however, leads to a challenging non-convex optimization problem, due to the complicated form of discriminant gain and the device heterogeneity in terms of channel gain, quantization level, and generated feature subsets. Remarkably, the considered non-convex problem can be optimally solved based on the sum-of-ratios method. This gives the optimal ISCC scheme, that jointly determines the transmit power and time allocation at multiple devices for sensing and communication, as well as their quantization bits allocation for computation distortion control. By using human motions recognition as a concrete AI inference task, extensive experiments are conducted to verify the performance of our derived optimal ISCC scheme.

preprint2022arXiv

Towards Tailored Models on Private AIoT Devices: Federated Direct Neural Architecture Search

Neural networks often encounter various stringent resource constraints while deploying on edge devices. To tackle these problems with less human efforts, automated machine learning becomes popular in finding various neural architectures that fit diverse Artificial Intelligence of Things (AIoT) scenarios. Recently, to prevent the leakage of private information while enable automated machine intelligence, there is an emerging trend to integrate federated learning and neural architecture search (NAS). Although promising as it may seem, the coupling of difficulties from both tenets makes the algorithm development quite challenging. In particular, how to efficiently search the optimal neural architecture directly from massive non-independent and identically distributed (non-IID) data among AIoT devices in a federated manner is a hard nut to crack. In this paper, to tackle this challenge, by leveraging the advances in ProxylessNAS, we propose a Federated Direct Neural Architecture Search (FDNAS) framework that allows for hardware-friendly NAS from non- IID data across devices. To further adapt to both various data distributions and different types of devices with heterogeneous embedded hardware platforms, inspired by meta-learning, a Cluster Federated Direct Neural Architecture Search (CFDNAS) framework is proposed to achieve device-aware NAS, in the sense that each device can learn a tailored deep learning model for its particular data distribution and hardware constraint. Extensive experiments on non-IID datasets have shown the state-of-the-art accuracy-efficiency trade-offs achieved by the proposed solution in the presence of both data and device heterogeneity.

preprint2022arXiv

Turning Channel Noise into an Accelerator for Over-the-Air Principal Component Analysis

Recently years, the attempts on distilling mobile data into useful knowledge has been led to the deployment of machine learning algorithms at the network edge. Principal component analysis (PCA) is a classic technique for extracting the linear structure of a dataset, which is useful for feature extraction and data compression. In this work, we propose the deployment of distributed PCA over a multi-access channel based on the algorithm of stochastic gradient descent to learn the dominant feature space of a distributed dataset at multiple devices. Over-the-air aggregation is adopted to reduce the multi-access latency, giving the name over-the-air PCA. The novelty of this design lies in exploiting channel noise to accelerate the descent in the region around each saddle point encountered by gradient descent, thereby increasing the convergence speed of over-the-air PCA. The idea is materialized by proposing a power-control scheme which detects the type of descent region and controlling the level of channel noise accordingly. The scheme is proved to achieve a faster convergence rate than in the case without power control.

preprint2022arXiv

Vertical Federated Edge Learning with Distributed Integrated Sensing and Communication

This letter studies a vertical federated edge learning (FEEL) system for collaborative objects/human motion recognition by exploiting the distributed integrated sensing and communication (ISAC). In this system, distributed edge devices first send wireless signals to sense targeted objects/human, and then exchange intermediate computed vectors (instead of raw sensing data) for collaborative recognition while preserving data privacy. To boost the spectrum and hardware utilization efficiency for FEEL, we exploit ISAC for both target sensing and data exchange, by employing dedicated frequency-modulated continuous-wave (FMCW) signals at each edge device. Under this setup, we propose a vertical FEEL framework for realizing the recognition based on the collected multi-view wireless sensing data. In this framework, each edge device owns an individual local L-model to transform its sensing data into an intermediate vector with relatively low dimensions, which is then transmitted to a coordinating edge device for final output via a common downstream S-model. By considering a human motion recognition task, experimental results show that our vertical FEEL based approach achieves recognition accuracy up to 98\% with an improvement up to 8\% compared to the benchmarks, including on-device training and horizontal FEEL.

preprint2021arXiv

Training Time Minimization for Federated Edge Learning with Optimized Gradient Quantization and Bandwidth Allocation

Training a machine learning model with federated edge learning (FEEL) is typically time-consuming due to the constrained computation power of edge devices and limited wireless resources in edge networks. In this paper, the training time minimization problem is investigated in a quantized FEEL system, where the heterogeneous edge devices send quantized gradients to the edge server via orthogonal channels. In particular, a stochastic quantization scheme is adopted for compression of uploaded gradients, which can reduce the burden of per-round communication but may come at the cost of increasing number of communication rounds. The training time is modeled by taking into account the communication time, computation time and the number of communication rounds. Based on the proposed training time model, the intrinsic trade-off between the number of communication rounds and per-round latency is characterized. Specifically, we analyze the convergence behavior of the quantized FEEL in terms of the optimality gap. Further, a joint data-and-model-driven fitting method is proposed to obtain the exact optimality gap, based on which the closed-form expressions for the number of communication rounds and the total training time are obtained. Constrained by total bandwidth, the training time minimization problem is formulated as a joint quantization level and bandwidth allocation optimization problem. To this end, an algorithm based on alternating optimization is proposed, which alternatively solves the subproblem of quantization optimization via successive convex approximation and the subproblem of bandwidth allocation via bisection search. With different learning tasks and models, the validation of our analysis and the near-optimal performance of the proposed optimization algorithm are demonstrated by the experimental results.

preprint2020arXiv

Cooperative Interference Management for Over-the-Air Computation Networks

This paper considers a multi-cell AirComp network and investigates the optimal power control policies over multiple cells to regulate the effect of inter-cell interference. First, we consider the scenario of centralized multi-cell power control, where we characterize the Pareto boundary of the multi-cell MSE region by minimizing the sum MSE subject to a set of constraints on individual MSEs. Though the sum-MSE minimization problem is non-convex and its direct solution intractable, we optimally solve this problem via equivalently solving a sequence of convex second-order cone program feasibility problems together with a bisection search. Next, we consider distributed power control in the other scenario without a centralized controller, for which an alternative IT-based method is proposed to characterize the same MSE Pareto boundary, and enable a decentralized power control algorithm. Accordingly, each AP only needs to individually control the power of its associated devices, but subject to a set of IT constraints on their interference to neighboring cells, while different APs can cooperate in iteratively updating the IT levels by pairwise information exchange, to achieve a Pareto-optimal MSE tuple. Last, simulation results demonstrate that cooperative power control using the proposed algorithms can substantially reduce the sum MSE of AirComp networks.

preprint2020arXiv

Optimized Power Control for Over-the-Air Computation in Fading Channels

In this paper, we study the power control problem for Over-the-air computation (AirComp) over fading channels. Our objective is to minimize the computation error by jointly optimizing the transmit power at the power-constrained devices and a signal scaling factor (called denoising factor) at the fusion center (FC). The problem is generally non-convex due to the coupling of the transmit power over devices and denoising factor at the FC. To tackle the challenge, we first consider the special case with static channels, for which we derive the optimal solution in closed form. The optimal power control exhibits a threshold-based structure. Specifically, for each device, if the product of the channel quality and power budget, called quality indicator, exceeds an optimized threshold, this device applies channel-inversion power control; otherwise, it performs full power transmission. Building on the results, we proceed to consider the general case with time-varying channels. To solve the more challenging non-convex power control problem, we use the Lagrange-duality method via exploiting its "time-sharing" property. The derived optimal power control exhibits a regularized channel inversion structure, where the regularization has the function of balancing the tradeoff between the signal-magnitude alignment and noise suppression. Moreover, for the special case with only one device being power limited, we show that the optimal power control for the power-limited device has an interesting channel-inversion water-filling structure, while those for other devices (with sufficient power budgets) reduce to channel-inversion power control over all fading states.