Researcher profile

Kaibin Huang

Kaibin Huang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
29works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

29 published item(s)

preprint2026arXiv

Space Network of Experts: Architecture and Expert Placement

Leveraging continuous solar energy harvesting at high efficiency, space data centers are envisioned as a promising platform for executing energy-intensive large language models (LLMs). Recognizing this advantage, space and AI conglomerates (e.g., SpaceX, Google) are actively investing in this vision. One key challenge, however, is the efficient distributed deployment of a large-scale LLM in a satellite network due to the limited onboard computing and communication resources. This gives rise to a placement problem that involves partitioning and mapping model components to satellites such that the fundamentally different model architecture and network topology can be reconciled to ensure low-latency token generation. To address this problem, we present the Space Network of Experts (Space-XNet) framework targeting the distributed execution of a popular mixture-of-experts (MoE) model in space. The proposed placement strategies are two-level: (1) layer placement, which assigns MoE layers to satellite subnets; and (2) intra-layer expert placement, which assigns individual experts to satellites associated with the same layer/subnet. For layer placement, we exploit the ring-like communication pattern of autoregressive inference to partition the satellite constellation along the orbiting direction into subnets arranged on a ring, each hosting one MoE layer. Based on this architecture, we formulate and solve an optimization problem for intra-layer expert placement to map experts with heterogeneous activation probabilities onto satellites. The derived strategy reveals an intuitive principle: a frequently activated expert should be mapped to a satellite on a routing path with low expected latency. Experiments over a thousand-satellite constellation show that Space-XNet achieves at least a threefold latency reduction compared with conventional random and ablation-based placement strategies.

preprint2025arXiv

Digitalizing Over-the-Air Computation via The Novel Complement Coded Modulation

To overcome inherent limitations of analog signals in over-the-air computation (AirComp), this letter proposes a two's complement-based coding scheme for the AirComp implementation with compatible digital modulations. Specifically, quantized discrete values are encoded into binary sequences using the two's complement and transmitted over multiple subcarriers. At the receiver, we design a decoder that constructs a functional mapping between the superimposed digital modulation signals and the target of computational results, theoretically ensuring asymptotic error free computation with the minimal codeword length. To further mitigate the adverse effects of channel fading, we adopt a truncated inversion strategy for pre-processing. Benefiting from the unified symbol distribution after the proposed encoding, we derive the optimal linear minimum mean squared error (LMMSE) detector in closed form and propose a low complexity algorithm seeking for the optimal truncation selection. Furthermore, the inherent importance differences among the coded outputs motivate an uneven power allocation strategy across subcarriers to improve computational accuracy. Numerical results validate the superiority of the proposed scheme over existing digital AirComp approaches, especially at low signal to-noise ratio (SNR) regimes.

preprint2023arXiv

Semantic Data Sourcing for 6G Edge Intelligence

As a new function of 6G networks, edge intelligence refers to the ubiquitous deployment of machine learning and artificial intelligence (AI) algorithms at the network edge to empower many emerging applications ranging from sensing to auto-pilot. To support relevant use cases, including sensing, edge learning, and edge inference, all require transmission of high-dimensional data or AI models over the air. To overcome the bottleneck, we propose a novel framework of SEMantic DAta Sourcing (SEMDAS) for locating semantically matched data sources to efficiently enable edge-intelligence operations. The comprehensive framework comprises new architecture, protocol, semantic matching techniques, and design principles for task-oriented wireless techniques. As the key component of SEMDAS, we discuss a set of machine learning based semantic matching techniques targeting different edge-intelligence use cases. Moreover, for designing task-oriented wireless techniques, we discuss different tradeoffs in SEMDAS systems, propose the new concept of joint semantics-and-channel matching, and point to a number of research opportunities. The SEMDAS framework not only overcomes the said communication bottleneck but also addresses other networking issues including long-distance transmission, sparse connectivity, high-speed mobility, link disruptions, and security. In addition, experimental results using a real dataset are presented to demonstrate the performance gain of SEMDAS.

preprint2022arXiv

A Perspective on Time towards Wireless 6G

With the advent of 5G technology, the notion of latency got a prominent role in wireless connectivity, serving as a proxy term for addressing the requirements for real-time communication. As wireless systems evolve towards 6G, the ambition to immerse the digital into the physical reality will increase. Besides making the real-time requirements more stringent, this immersion will bring the notions of time, simultaneity, presence, and causality to a new level of complexity. A growing body of research points out that latency is insufficient to parameterize all real-time requirements. Notably, one such requirement that received a significant attention is information freshness, defined through the Age of Information (AoI) and its derivatives. The objective of this article is to investigate the general notion of timing in wireless communication systems and networks and its relation to effective information generation, processing, transmission, and reconstruction at the senders and receivers. We establish a general statistical framework of timing requirements in wireless communication systems, which subsumes both latency and AoI. The framework is made by associating a timing component with the two basic statistical operations, decision and estimation. We first use the framework to present a representative sample of the existing works that deal with timing in wireless communication. Next, it is shown how the framework can be used with different communication models of increasing complexity, starting from the basic Shannon one-way communication model and arriving to communication models for consensus, distributed learning, and inference. Overall, this paper fills an important gap in the literature by providing a systematic treatment of various timing measures in wireless communication and sets the basis for design and optimization for the next-generation real-time systems.

preprint2022arXiv

A Two-Timescale Approach to Mobility Management for Multi-Cell Mobile Edge Computing

Mobile edge computing (MEC) is a promising technology for enhancing the computation capacities and features of mobile users by offloading complex computation tasks to the edge servers. However, mobility poses great challenges on delivering reliable MEC service required for latency-critical applications. First, mobility management has to tackle the dynamics of both user's location changes and task arrivals that vary in different timescales. Second, user mobility could induce service migration, leading to reliability loss due to the migration delay. In this paper, we propose a two-timescale mobility management framework by joint control of service migration and transmission power to address the above challenges. Specifically, the service migration operates at a large timescale to support user mobility in the multi-cell network, while the power control is performed at a small timescale for real-time task offloading. Their joint control is formulated as an optimization problem aiming at the long-term mobile energy minimization subject to the reliability requirement of computation offloading. To solve the problem, we propose a Lyapunov-based framework to decompose the problem into different timescales, based on which a low-complexity two-timescale online algorithm is developed by exploiting the problem structure. The proposed online algorithm is shown to be asymptotically optimal via theoretical analysis, and is further developed to accommodate the multiuser management. The simulation results demonstrate that our proposed algorithm can significantly improve the energy and reliability performance.

preprint2022arXiv

Accelerating Federated Edge Learning via Topology Optimization

Federated edge learning (FEEL) is envisioned as a promising paradigm to achieve privacy-preserving distributed learning. However, it consumes excessive learning time due to the existence of straggler devices. In this paper, a novel topology-optimized federated edge learning (TOFEL) scheme is proposed to tackle the heterogeneity issue in federated learning and to improve the communication-and-computation efficiency. Specifically, a problem of jointly optimizing the aggregation topology and computing speed is formulated to minimize the weighted summation of energy consumption and latency. To solve the mixed-integer nonlinear problem, we propose a novel solution method of penalty-based successive convex approximation, which converges to a stationary point of the primal problem under mild conditions. To facilitate real-time decision making, an imitation-learning based method is developed, where deep neural networks (DNNs) are trained offline to mimic the penalty-based method, and the trained imitation DNNs are deployed at the edge devices for online inference. Thereby, an efficient imitate-learning based approach is seamlessly integrated into the TOFEL framework. Simulation results demonstrate that the proposed TOFEL scheme accelerates the federated learning process, and achieves a higher energy efficiency. Moreover, we apply the scheme to 3D object detection with multi-vehicle point cloud datasets in the CARLA simulator. The results confirm the superior learning performance of the TOFEL scheme over conventional designs with the same resource and deadline constraints.

preprint2022arXiv

An Energy-efficient Aerial Backhaul System with Reconfigurable Intelligent Surface

In this paper, we propose a novel wireless architecture, mounted on a high-altitude aerial platform, which is enabled by reconfigurable intelligent surface (RIS). By installing RIS on the aerial platform, rich line-of-sight and full-area coverage can be achieved, thereby, overcoming the limitations of the conventional terrestrial RIS. We consider a scenario where a sudden increase in traffic in an urban area triggers authorities to rapidly deploy unmanned-aerial vehicle base stations (UAV-BSs) to serve the ground users. In this scenario, since the direct backhaul link from the ground source can be blocked due to several obstacles from the urban area, we propose reflecting the backhaul signal using aerial-RIS so that it successfully reaches the UAV-BSs. We jointly optimize the placement and array-partition strategies of aerial-RIS and the phases of RIS elements, which leads to an increase in energy-efficiency of every UAV-BS. We show that the complexity of our algorithm can be bounded by the quadratic order, thus implying high computational efficiency. We verify the performance of the proposed algorithm via extensive numerical evaluations and show that our method achieves an outstanding performance in terms of energy-efficiency compared to benchmark schemes.

preprint2022arXiv

Analog MIMO Communication for One-shot Distributed Principal Component Analysis

A fundamental algorithm for data analytics at the edge of wireless networks is distributed principal component analysis (DPCA), which finds the most important information embedded in a distributed high-dimensional dataset by distributed computation of a reduced-dimension data subspace, called principal components (PCs). In this paper, to support one-shot DPCA in wireless systems, we propose a framework of analog MIMO transmission featuring the uncoded analog transmission of local PCs for estimating the global PCs. To cope with channel distortion and noise, two maximum-likelihood (global) PC estimators are presented corresponding to the cases with and without receive channel state information (CSI). The first design, termed coherent PC estimator, is derived by solving a Procrustes problem and reveals the form of regularized channel inversion where the regulation attempts to alleviate the effects of both receiver noise and data noise. The second one, termed blind PC estimator, is designed based on the subspace channel-rotation-invariance property and computes a centroid of received local PCs on a Grassmann manifold. Using the manifold-perturbation theory, tight bounds on the mean square subspace distance (MSSD) of both estimators are derived for performance evaluation. The results reveal simple scaling laws of MSSD concerning device population, data and channel signal-to-noise ratios (SNRs), and array sizes. More importantly, both estimators are found to have identical scaling laws, suggesting the dispensability of CSI to accelerate DPCA. Simulation results validate the derived results and demonstrate the promising latency performance of the proposed analog MIMO

preprint2022arXiv

Distributed Over-the-air Computing for Fast Distributed Optimization: Beamforming Design and Convergence Analysis

Distributed optimization concerns the optimization of a common function in a distributed network, which finds a wide range of applications ranging from machine learning to vehicle platooning. Its key operation is to aggregate all local state information (LSI) at devices to update their states. The required extensive message exchange and many iterations cause a communication bottleneck when the LSI is high dimensional or at high mobility. To overcome the bottleneck, we propose in this work the framework of distributed over-the-air computing (AirComp) to realize a one-step aggregation for distributed optimization by exploiting simultaneous multicast beamforming of all devices and the property of analog waveform superposition of a multi-access channel. We consider two design criteria. The first one is to minimize the sum AirComp error (i.e., sum mean-squared error (MSE)) with respect to the desired average-functional values. An efficient solution approach is proposed by transforming the non-convex beamforming problem into an equivalent concave-convex fractional program and solving it by nesting convex programming into a bisection search. The second criterion, called zero-forcing (ZF) multicast beamforming, is to force the received over-the-air aggregated signals at devices to be equal to the desired functional values. In this case, the optimal beamforming admits closed form. Both the MMSE and ZF beamforming exhibit a centroid structure resulting from averaging columns of conventional MMSE/ZF precoding. Last, the convergence of a classic distributed optimization algorithm is analyzed. The distributed AirComp is found to accelerate convergence by dramatically reducing communication latency. Another key finding is that the ZF beamforming outperforms the MMSE design as the latter is shown to cause bias in subgradient estimation.

preprint2022arXiv

Federated Dropout -- A Simple Approach for Enabling Federated Learning on Resource Constrained Devices

Federated learning (FL) is a popular framework for training an AI model using distributed mobile data in a wireless network. It features data parallelism by distributing the learning task to multiple edge devices while attempting to preserve their local-data privacy. One main challenge confronting practical FL is that resource constrained devices struggle with the computation intensive task of updating of a deep-neural network model. To tackle the challenge, in this paper, a federated dropout (FedDrop) scheme is proposed building on the classic dropout scheme for random model pruning. Specifically, in each iteration of the FL algorithm, several subnets are independently generated from the global model at the server using dropout but with heterogeneous dropout rates (i.e., parameter-pruning probabilities),each of which is adapted to the state of an assigned channel. The subnets are downloaded to associated devices for updating. Thereby, FedDrop reduces both the communication overhead and devices' computation loads compared with the conventional FL while outperforming the latter in the case of overfitting and also the FL scheme with uniform dropout (i.e., identical subnets).

preprint2022arXiv

Integrated Sensing, Communication, and Computation Over-the-Air: MIMO Beamforming Design

To support the unprecedented growth of the Internet of Things (IoT) applications, tremendous data need to be collected by the IoT devices and delivered to the server for further computation. By utilizing the same signals for both radar sensing and data communication, the integrated sensing and communication (ISAC) technique has broken the barriers between data collection and delivery in the physical layer. By exploiting the analog-wave addition in a multi-access channel, over-the-air computation (AirComp) enables function computation via transmissions in the physical layer. The promising performance of ISAC and AirComp motivates the current work on developing a framework called integrated sensing, communication, and computation over-the-air (ISCCO). The performance metrics of radar sensing and AirComp are evaluated by the mean squared errors of the estimated target response matrix and the received computation results, respectively. The design challenge of MIMO ISCCO lies in the joint optimization of beamformers for sensing, communication, and computation at both the IoT devices and the server, which results in a non-convex problem. To solve this problem, an algorithmic solution based on the technique of semidefinite relaxation is proposed. The use case of target location estimation based on ISCCO is demonstrated in simulation to show the performance superiority.

preprint2022arXiv

Realizing Ultra-Fast and Energy-Efficient Baseband Processing Using Analogue Resistive Switching Memory

To support emerging applications ranging from holographic communications to extended reality, next-generation mobile wireless communication systems require ultra-fast and energy-efficient (UFEE) baseband processors. Traditional complementary metal-oxide-semiconductor (CMOS)-based baseband processors face two challenges in transistor scaling and the von Neumann bottleneck. To address these challenges, in-memory computing-based baseband processors using resistive random-access memory (RRAM) present an attractive solution. In this paper, we propose and demonstrate RRAM-based in-memory baseband processing for the widely adopted multiple-input-multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) air interface. Its key feature is to execute the key operations, including discrete Fourier transform (DFT) and MIMO detection using linear minimum mean square error (L-MMSE) and zero forcing (ZF), in one-step. In addition, RRAM-based channel estimation as well as mapper/demapper modules are proposed. By prototyping and simulations, we demonstrate that the RRAM-based full-fledged communication system can significantly outperform its CMOS-based counterpart in terms of speed and energy efficiency by $10^3$ and $10^6$ times, respectively. The results pave a potential pathway for RRAM-based in-memory computing to be implemented in the era of the sixth generation (6G) mobile communications.

preprint2022arXiv

Resource Allocation for Multiuser Edge Inference with Batching and Early Exiting (Extended Version)

The deployment of inference services at the network edge, called edge inference, offloads computation-intensive inference tasks from mobile devices to edge servers, thereby enhancing the former's capabilities and battery lives. In a multiuser system, the joint allocation of communication-and-computation ($\text{C}^\text{2}$) resources (i.e., scheduling and bandwidth allocation) is made challenging by adopting efficient inference techniques, batching and early exiting, and further complicated by the heterogeneity in users' requirements on accuracy and latency. Batching groups multiple tasks into one batch for parallel processing to reduce time-consuming memory access and thereby boosts the throughput (i.e., completed task per second). On the other hand, early exiting allows a task to exit from a deep-neural network without traversing the whole network to support a tradeoff between accuracy and latency. In this work, we study optimal $\text{C}^\text{2}$ resource allocation with batching and early exiting, which is an NP-complete integer programming problem. A set of efficient algorithms are designed under the criterion of maximum throughput by tackling the challenge. Experimental results demonstrate that both optimal and sub-optimal $\text{C}^\text{2}$ resource allocation algorithms can leverage integrated batching and early exiting to double the inference throughput compared with conventional schemes.

preprint2022arXiv

Turning Channel Noise into an Accelerator for Over-the-Air Principal Component Analysis

Recently years, the attempts on distilling mobile data into useful knowledge has been led to the deployment of machine learning algorithms at the network edge. Principal component analysis (PCA) is a classic technique for extracting the linear structure of a dataset, which is useful for feature extraction and data compression. In this work, we propose the deployment of distributed PCA over a multi-access channel based on the algorithm of stochastic gradient descent to learn the dominant feature space of a distributed dataset at multiple devices. Over-the-air aggregation is adopted to reduce the multi-access latency, giving the name over-the-air PCA. The novelty of this design lies in exploiting channel noise to accelerate the descent in the region around each saddle point encountered by gradient descent, thereby increasing the convergence speed of over-the-air PCA. The idea is materialized by proposing a power-control scheme which detects the type of descent region and controlling the level of channel noise accordingly. The scheme is proved to achieve a faster convergence rate than in the case without power control.

preprint2021arXiv

Multi-Cell Mobile Edge Computing: Joint Service Migration and Resource Allocation

Mobile-edge computing (MEC) enhances the capacities and features of mobile devices by offloading computation-intensive tasks over wireless networks to edge servers. One challenge faced by the deployment of MEC in cellular networks is to support user mobility. As a result, offloaded tasks can be seamlessly migrated between base stations (BSs) without compromising the resource-utilization efficiency and link reliability. In this paper, we tackle the challenge by optimizing the policy for migration/handover between BSs by jointly managing computation-and-radio resources. The objectives are twofold: maximizing the sum offloading rate, quantifying MEC throughput, and minimizing the migration cost. The policy design is formulated as a decision-optimization problem that accounts for virtualization, I/O interference between virtual machines (VMs), and wireless multi-access. To solve the complex combinatorial problem, we develop an efficient relaxation-and-rounding based solution approach. The approach relies on an optimal iterative algorithm for solving the integer-relaxed problem and a novel integer-recovery design. The latter outperforms the traditional rounding method by exploiting the derived problem properties and applying matching theory. In addition, we also consider the design for a special case of "hotspot mitigation", referring to alleviating an overloaded server/BS by migrating its load to the nearby idle servers/BSs. From simulation results, we observed close-to-optimal performance of the proposed migration policies under various settings. This demonstrates their efficiency in computation-and-radio resource management for joint service migration and BS handover in multi-cell MEC networks.

preprint2021arXiv

Reconfigurable Intelligent Surface Assisted Edge Machine Learning

The ever-growing popularity and rapid improving of artificial intelligence (AI) have raised rethinking on the evolution of wireless networks. Mobile edge computing (MEC) provides a natural platform for AI applications since it provides rich computation resources to train AI models, as well as low-latency access to the data generated by mobile and Internet of Things devices. In this paper, we present an infrastructure to perform machine learning tasks at an MEC server with the assistance of a reconfigurable intelligent surface (RIS). In contrast to conventional communication systems where the principal criteria are to maximize the throughput, we aim at optimizing the learning performance. Specifically, we minimize the maximum learning error of all users by jointly optimizing the beamforming vectors of the base station and the phase-shift matrix of the RIS. An alternating optimization-based framework is proposed to optimize the two terms iteratively, where closed-form expressions of the beamforming vectors are derived, and an alternating direction method of multipliers (ADMM)-based algorithm is designed together with an error level searching framework to effectively solve the nonconvex optimization problem of the phase-shift matrix. Simulation results demonstrate significant gains of deploying an RIS and validate the advantages of our proposed algorithms over various benchmarks.

preprint2021arXiv

Wireless Power Transfer for Future Networks: Signal Processing, Machine Learning, Computing, and Sensing

Wireless power transfer (WPT) is an emerging paradigm that will enable using wireless to its full potential in future networks, not only to convey information but also to deliver energy. Such networks will enable trillions of future low-power devices to sense, compute, connect, and energize anywhere, anytime, and on the move. The design of such future networks brings new challenges and opportunities for signal processing, machine learning, sensing, and computing so as to make the best use of the RF radiations, spectrum, and network infrastructure in providing cost-effective and real-time power supplies to wireless devices and enable wireless-powered applications. In this paper, we first review recent signal processing techniques to make WPT and wireless information and power transfer as efficient as possible. Topics include power amplifier and energy harvester nonlinearities, active and passive beamforming, intelligent reflecting surfaces, receive combining with multi-antenna harvester, modulation, coding, waveform, massive MIMO, channel acquisition, transmit diversity, multi-user power region characterization, coordinated multipoint, and distributed antenna systems. Then, we overview two different design methodologies: the model and optimize approach relying on analytical system models, modern convex optimization, and communication theory, and the learning approach based on data-driven end-to-end learning and physics-based learning. We discuss the pros and cons of each approach, especially when accounting for various nonlinearities in wireless-powered networks, and identify interesting emerging opportunities for the approaches to complement each other. Finally, we identify new emerging wireless technologies where WPT may play a key role -- wireless-powered mobile edge computing and wireless-powered sensing -- arguing WPT, communication, computation, and sensing must be jointly designed.

preprint2021arXiv

Wirelessly Powered Federated Edge Learning: Optimal Tradeoffs Between Convergence and Power Transfer

Federated edge learning (FEEL) is a widely adopted framework for training an artificial intelligence (AI) model distributively at edge devices to leverage their data while preserving their data privacy. The execution of a power-hungry learning task at energy-constrained devices is a key challenge confronting the implementation of FEEL. To tackle the challenge, we propose the solution of powering devices using wireless power transfer (WPT). To derive guidelines on deploying the resultant wirelessly powered FEEL (WP-FEEL) system, this work aims at the derivation of the tradeoff between the model convergence and the settings of power sources in two scenarios: 1) the transmission power and density of power-beacons (dedicated charging stations) if they are deployed, or otherwise 2) the transmission power of a server (access-point). The development of the proposed analytical framework relates the accuracy of distributed stochastic gradient estimation to the WPT settings, the randomness in both communication and WPT links, and devices' computation capacities. Furthermore, the local-computation at devices (i.e., mini-batch size and processor clock frequency) is optimized to efficiently use the harvested energy for gradient estimation. The resultant learning-WPT tradeoffs reveal the simple scaling laws of the model-convergence rate with respect to the transferred energy as well as the devices' computational energy efficiencies. The results provide useful guidelines on WPT provisioning to provide a guaranteer on learning performance. They are corroborated by experimental results using a real dataset.

preprint2020arXiv

Capacity of Remote Classification Over Wireless Channels

Wireless connectivity creates a computing paradigm that merges communication and inference. A basic operation in this paradigm is the one where a device offloads classification tasks to the edge servers. We term this remote classification, with a potential to enable intelligent applications. Remote classification is challenged by the finite and variable data rate of the wireless channel, which affects the capability to transfer high-dimensional features and thus limits the classification resolution. We introduce a set of metrics under the name of classification capacity that are defined as the maximum number of classes that can be discerned over a given communication channel while meeting a target classification error probability. The objective is to choose a subset of classes from a library that offers satisfactory performance over a given channel. We treat two cases of subset selection. First, a device can select the subset by pruning the class library until arriving at a subset that meets the targeted error probability while maximizing the classification capacity. Adopting a subspace data model, we prove the equivalence of classification capacity maximization to Grassmannian packing. The results show that the classification capacity grows exponentially with the instantaneous communication rate, and super-exponentially with the dimensions of each data cluster. This also holds for ergodic and outage capacities with fading if the instantaneous rate is replaced with an average rate and a fixed rate, respectively. In the second case, a device has a preference of class subset for every communication rate, which is modeled as an instance of uniformly sampling the library. Without class selection, the classification capacity and its ergodic and outage counterparts are proved to scale linearly with their corresponding communication rates instead of the exponential growth in the last case.

preprint2020arXiv

Cooperative Interference Management for Over-the-Air Computation Networks

This paper considers a multi-cell AirComp network and investigates the optimal power control policies over multiple cells to regulate the effect of inter-cell interference. First, we consider the scenario of centralized multi-cell power control, where we characterize the Pareto boundary of the multi-cell MSE region by minimizing the sum MSE subject to a set of constraints on individual MSEs. Though the sum-MSE minimization problem is non-convex and its direct solution intractable, we optimally solve this problem via equivalently solving a sequence of convex second-order cone program feasibility problems together with a bisection search. Next, we consider distributed power control in the other scenario without a centralized controller, for which an alternative IT-based method is proposed to characterize the same MSE Pareto boundary, and enable a decentralized power control algorithm. Accordingly, each AP only needs to individually control the power of its associated devices, but subject to a set of IT constraints on their interference to neighboring cells, while different APs can cooperate in iteratively updating the IT levels by pairwise information exchange, to achieve a Pareto-optimal MSE tuple. Last, simulation results demonstrate that cooperative power control using the proposed algorithms can substantially reduce the sum MSE of AirComp networks.

preprint2020arXiv

Cooperative Multi-Point Vehicular Positioning Using Millimeter-Wave Surface Reflection (Extended version)

Multi-point vehicular positioning is one essential operation for autonomous vehicles. However, the state-of-the-art positioning technologies, relying on reflected signals from a target (i.e., RADAR and LIDAR), cannot work without line-of-sight. Besides, it takes significant time for environment scanning and object recognition with potential detection inaccuracy, especially in complex urban situations. Some recent fatal accidents involving autonomous vehicles further expose such limitations. In this paper, we aim at overcoming these limitations by proposing a novel relative positioning approach, called Cooperative Multi-point Positioning (COMPOP). The COMPOP establishes cooperation between a target vehicle (TV) and a sensing vehicle (SV) if a LoS path exists, where a TV explicitly lets an SV to know the TV's existence by transmitting positioning waveforms. This cooperation makes it possible to remove the time-consuming scanning and target recognizing processes, facilitating real-time positioning. One prerequisite for the cooperation is a clock synchronization between a pair of TV and SV. To this end, we use a phase-differential-of-arrival based approach to remove the TV-SV clock difference from the received signal. With clock difference correction, the TV's position can be obtained via peak detection over a 3D power spectrum constructed by a Fourier transform (FT) based algorithm. The COMPOP also incorporates nearby vehicles, without knowing their locations, into the above cooperation for the case without a LoS path. The effectiveness of the COMPOP is verified by several simulations concerning practical channel parameters.

preprint2020arXiv

Energy-Efficient Resource Management for Federated Edge Learning with CPU-GPU Heterogeneous Computing

Edge machine learning involves the deployment of learning algorithms at the network edge to leverage massive distributed data and computation resources to train artificial intelligence (AI) models. Among others, the framework of federated edge learning (FEEL) is popular for its data-privacy preservation. FEEL coordinates global model training at an edge server and local model training at edge devices that are connected by wireless links. This work contributes to the energy-efficient implementation of FEEL in wireless networks by designing joint computation-and-communication resource management ($\text{C}^2$RM). The design targets the state-of-the-art heterogeneous mobile architecture where parallel computing using both a CPU and a GPU, called heterogeneous computing, can significantly improve both the performance and energy efficiency. To minimize the sum energy consumption of devices, we propose a novel $\text{C}^2$RM framework featuring multi-dimensional control including bandwidth allocation, CPU-GPU workload partitioning and speed scaling at each device, and $\text{C}^2$ time division for each link. The key component of the framework is a set of equilibriums in energy rates with respect to different control variables that are proved to exist among devices or between processing units at each device. The results are applied to designing efficient algorithms for computing the optimal $\text{C}^2$RM policies faster than the standard optimization tools. Based on the equilibriums, we further design energy-efficient schemes for device scheduling and greedy spectrum sharing that scavenges "spectrum holes" resulting from heterogeneous $\text{C}^2$ time divisions among devices. Using a real dataset, experiments are conducted to demonstrate the effectiveness of $\text{C}^2$RM on improving the energy efficiency of a FEEL system.

preprint2020arXiv

Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of Partitioned Edge Learning

To leverage data and computation capabilities of mobile devices, machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models, resulting in the new paradigm of edge learning. In this paper, we consider the framework of partitioned edge learning for iteratively training a large-scale model using many resource-constrained devices (called workers). To this end, in each iteration, the model is dynamically partitioned into parametric blocks, which are downloaded to worker groups for updating using data subsets. Then, the local updates are uploaded to and cascaded by the server for updating a global model. To reduce resource usage by minimizing the total learning-and-communication latency, this work focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation (for downloading and uploading). Two design approaches are adopted. First, a practical sequential approach, called partially integrated parameter-and-bandwidth allocation (PABA), yields two schemes, namely bandwidth aware parameter allocation and parameter aware bandwidth allocation. The former minimizes the load for the slowest (in computing) of worker groups, each training a same parametric block. The latter allocates the largest bandwidth to the worker being the latency bottleneck. Second, PABA are jointly optimized. Despite its being a nonconvex problem, an efficient and optimal solution algorithm is derived by intelligently nesting a bisection search and solving a convex problem. Experimental results using real data demonstrate that integrating PABA can substantially improve the performance of partitioned edge learning in terms of latency (by e.g., 46%) and accuracy (by e.g., 4%).

preprint2020arXiv

Optimized Power Control for Over-the-Air Computation in Fading Channels

In this paper, we study the power control problem for Over-the-air computation (AirComp) over fading channels. Our objective is to minimize the computation error by jointly optimizing the transmit power at the power-constrained devices and a signal scaling factor (called denoising factor) at the fusion center (FC). The problem is generally non-convex due to the coupling of the transmit power over devices and denoising factor at the FC. To tackle the challenge, we first consider the special case with static channels, for which we derive the optimal solution in closed form. The optimal power control exhibits a threshold-based structure. Specifically, for each device, if the product of the channel quality and power budget, called quality indicator, exceeds an optimized threshold, this device applies channel-inversion power control; otherwise, it performs full power transmission. Building on the results, we proceed to consider the general case with time-varying channels. To solve the more challenging non-convex power control problem, we use the Lagrange-duality method via exploiting its "time-sharing" property. The derived optimal power control exhibits a regularized channel inversion structure, where the regularization has the function of balancing the tradeoff between the signal-magnitude alignment and noise suppression. Moreover, for the special case with only one device being power limited, we show that the optimal power control for the power-limited device has an interesting channel-inversion water-filling structure, while those for other devices (with sufficient power budgets) reduce to channel-inversion power control over all fading states.

preprint2020arXiv

Scheduling for Cellular Federated Edge Learning with Importance and Channel Awareness

In cellular federated edge learning (FEEL), multiple edge devices holding local data jointly train a neural network by communicating learning updates with an access point without exchanging their data samples. With very limited communication resources, it is beneficial to schedule the most informative local learning updates. In this paper, a novel scheduling policy is proposed to exploit both diversity in multiuser channels and diversity in the "importance" of the edge devices' learning updates. First, a new probabilistic scheduling framework is developed to yield unbiased update aggregation in FEEL. The importance of a local learning update is measured by its gradient divergence. If one edge device is scheduled in each communication round, the scheduling policy is derived in closed form to achieve the optimal trade-off between channel quality and update importance. The probabilistic scheduling framework is then extended to allow scheduling multiple edge devices in each communication round. Numerical results obtained using popular models and learning datasets demonstrate that the proposed scheduling policy can achieve faster model convergence and higher learning accuracy than conventional scheduling policies that only exploit a single type of diversity.

preprint2020arXiv

Scheduling for Mobile Edge Computing with Random User Arrivals: An Approximate MDP and Reinforcement Learning Approach

In this paper, we investigate the scheduling design of a mobile edge computing (MEC) system, where active mobile devices with computation tasks randomly appear in a cell. Every task can be computed at either the mobile device or the MEC server. We jointly optimize the task offloading decision, uplink transmission device selection and power allocation by formulating the problem as an infinite-horizon Markov decision process (MDP). Compared with most of the existing literature, this is the first attempt to address the transmission and computation optimization with the random device arrivals in an infinite time horizon to our best knowledge. Due to the uncertainty in the device number and location, the conventional approximate MDP approaches addressing the curse of dimensionality cannot be applied. An alternative and suitable low-complexity solution framework is proposed in this work. We first introduce a baseline scheduling policy, whose value function can be derived analytically with the statistics of random mobile device arrivals. Then, one-step policy iteration is adopted to obtain a sub-optimal scheduling policy whose performance can be bounded analytically. The complexity of deriving the sub-optimal policy is reduced dramatically compared with conventional solutions of MDP by eliminating the complicated value iteration. To address a more general scenario where the statistics of random mobile device arrivals are unknown, a novel and efficient algorithm integrating reinforcement learning and stochastic gradient descent (SGD) is proposed to improve the system performance in an online manner. Simulation results show that the gain of the sub-optimal policy over various benchmarks is significant.

preprint2020arXiv

Simultaneous Signal-and-Interference Alignment for Two-Cell Over-the-Air Computation

The next-generation wireless networks are envisioned to support large-scale sensing and distributed machine learning, thereby enabling new intelligent mobile applications. One common network operation will be the aggregation of distributed data (such as sensor observations or AI-model updates) for functional computation (e.g., averaging) so as to support large-scale sensing and distributed machine learning. An efficient solution for data aggregation, called "over-the-air computation" (AirComp), embeds functional computation into simultaneous access by many edge devices. Such schemes exploit the waveform superposition of a multi-access channel to allow an access point to receive a desired function of simultaneous signals. In this work, we aim at realizing AirComp in a two-cell multi-antenna system. To this end, a novel scheme of simultaneous signal-and-interference alignment (SIA) is proposed that builds on classic IA to manage interference for multi-cell AirComp. The principle of SIA is to divide the spatial channel space into two subspaces with equal dimensions: one for signal alignment required by AirComp and the other for inter-cell IA. As a result, the number of interference-free spatially multiplexed functional streams received by each AP is maximized (equal to half of the available spatial degrees-of-freedom). Furthermore, the number is independent of the population of devices in each cell. In addition, the extension to SIA for more than two cells is discussed.

preprint2020arXiv

V2X-Based Vehicular Positioning: Opportunities, Challenges, and Future Directions

Vehicle-to-Everything (V2X) will create many new opportunities in the area of wireless communications, while its feasibility on enabling vehicular positioning has not been explored yet. Vehicular positioning is a crucial operation for autonomous driving. Its complexity and stringent safety requirement render conventional technologies like RADAR and LIDAR inadequate. This article aims at investigating whether V2X can help vehicular positioning from different perspectives. We first explain V2X's critical advantages over other approaches and suggest new scenarios of V2X-based vehicular positioning. Then we review the state-of-the-art positioning techniques discussed in the ongoing 3GPP standardization and point out their limitations. Lastly, some promising research directions for V2X-based vehicular positioning are presented, which shed light on realizing fully autonomous driving by overcoming the current barriers.

preprint2019arXiv

High-Dimensional Stochastic Gradient Quantization for Communication-Efficient Edge Learning

Edge machine learning involves the deployment of learning algorithms at the wireless network edge so as to leverage massive mobile data for enabling intelligent applications. The mainstream edge learning approach, federated learning, has been developed based on distributed gradient descent. Based on the approach, stochastic gradients are computed at edge devices and then transmitted to an edge server for updating a global AI model. Since each stochastic gradient is typically high-dimensional (with millions to billions of coefficients), communication overhead becomes a bottleneck for edge learning. To address this issue, we propose in this work a novel framework of hierarchical stochastic gradient quantization and study its effect on the learning performance. First, the framework features a practical hierarchical architecture for decomposing the stochastic gradient into its norm and normalized block gradients, and efficiently quantizes them using a uniform quantizer and a low-dimensional codebook on a Grassmann manifold, respectively. Subsequently, the quantized normalized block gradients are scaled and cascaded to yield the quantized normalized stochastic gradient using a so-called hinge vector designed under the criterion of minimum distortion. The hinge vector is also efficiently compressed using another low-dimensional Grassmannian quantizer. The other feature of the framework is a bit-allocation scheme for reducing the quantization error. The scheme determines the resolutions of the low-dimensional quantizers in the proposed framework. The framework is proved to guarantee model convergency by analyzing the convergence rate as a function of the quantization bits. Furthermore, by simulation, our design is shown to substantially reduce the communication overhead compared with the state-of-the-art signSGD scheme, while both achieve similar learning accuracies.