Researcher profile

Yunlong Cai

Yunlong Cai contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
18works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

18 published item(s)

preprint2026arXiv

Quantize-Sample-and-Verify: LLM Acceleration via Adaptive Edge-Cloud Speculative Decoding

In edge-cloud speculative decoding (SD), edge devices equipped with small language models (SLMs) generate draft tokens that are verified by large language models (LLMs) in the cloud. A key bottleneck in such systems is the limited communication bandwidth between edge and cloud, which necessitates quantization of the information transmitted about generated tokens. In this work, we introduce a novel quantize-sample (Q-S) strategy that provably preserves the output distribution of the cloud-based model, ensuring that the verified tokens match the distribution of those that would have been generated directly by the LLM. We develop a throughput model for edge-cloud SD that explicitly accounts for communication latency. Leveraging this model, we propose an adaptive mechanism that optimizes token throughput by dynamically adjusting the draft length and quantization precision in response to both semantic uncertainty and channel conditions. Simulations demonstrate that the proposed Q-S approach significantly improves decoding efficiency in realistic edge-cloud deployment scenarios.

preprint2024arXiv

Fluid Antennas-Enabled Multiuser Uplink: A Low-Complexity Gradient Descent for Total Transmit Power Minimization

We investigate multiuser uplink communication from multiple single-antenna users to a base station (BS), which is equipped with a movable-antenna (MA) array and adopts zero-forcing receivers to decode multiple signals. We aim to optimize the MAs' positions at the BS, to minimize the total transmit power of all users subject to the minimum rate requirement. After applying transformations, we show that the problem is equivalent to minimizing the sum of each eigenvalue's reciprocal of a matrix, which is a function of all MAs' positions. Subsequently, the projected gradient descent (PGD) method is utilized to find a locally optimal solution. In particular, different from the latest related work, we exploit the eigenvalue decomposition to successfully derive a closed-form gradient for the PGD, which facilitates the practical implementation greatly. We demonstrate by simulations that via careful optimization for all MAs' positions in our proposed design, the total transmit power of all users can be decreased significantly as compared to competitive benchmarks.

preprint2023arXiv

Simultaneously Transmitting and Reflecting (STAR) RIS Assisted Over-the-Air Computation Systems

The performance of over-the-air computation (AirComp) systems degrades due to the hostile channel conditions of wireless devices (WDs), which can be significantly improved by the employment of reconfigurable intelligent surfaces (RISs). However, the conventional RISs require that the WDs have to be located in the half-plane of the reflection space, which restricts their potential benefits. To address this issue, the novel family of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) is considered in AirComp systems to improve the computation accuracy across a wide coverage area. To minimize the computation mean-squared-error (MSE) in STAR-RIS assisted AirComp systems, we propose a joint beamforming design for optimizing both the transmit power at the WDs, as well as the passive reflect and transmit beamforming matrices at the STAR-RIS, and the receive beamforming vector at the fusion center (FC). Specifically, in the updates of the passive reflect and transmit beamforming matrices, closed-form solutions are derived by introducing an auxiliary variable and exploiting the coupled binary phase-shift conditions. Moreover, by assuming that the number of antennas at the FC and that of elements at the STAR-RIS/RIS are sufficiently high, we theoretically prove that the STAR-RIS assisted AirComp systems provide higher computation accuracy than the conventional RIS assisted systems. Our numerical results show that the proposed beamforming design outperforms the benchmark schemes relying on random phase-shift constraints and the deployment of conventional RIS. Moreover, its performance is close to the lower bound achieved by the beamforming design based on the STAR-RIS dispensing with coupled phase-shift constraints.

preprint2022arXiv

A Unified Multi-Task Semantic Communication System with Domain Adaptation

The task-oriented semantic communication systems have achieved significant performance gain, however, the paradigm that employs a model for a specific task might be limited, since the system has to be updated once the task is changed or multiple models are stored for serving various tasks. To address this issue, we firstly propose a unified deep learning enabled semantic communication system (U-DeepSC), where a unified model is developed to serve various transmission tasks. To jointly serve these tasks in one model with fixed parameters, we employ domain adaptation in the training procedure to specify the task-specific features for each task. Thus, the system only needs to transmit the task-specific features, rather than all the features, to reduce the transmission overhead. Moreover, since each task is of different difficulty and requires different number of layers to achieve satisfactory performance, we develop the multi-exit architecture to provide early-exit results for relatively simple tasks. In the experiments, we employ a proposed U-DeepSC to serve five tasks with multi-modalities. Simulation results demonstrate that our proposed U-DeepSC achieves comparable performance to the task-oriented semantic communication system designed for a specific task with significant transmission overhead reduction and much less number of model parameters.

preprint2022arXiv

Joint Pilot Optimization, Target Detection and Channel Estimation for Integrated Sensing and Communication Systems

Radar sensing will be integrated into the 6G communication system to support various applications. In this integrated sensing and communication system, a radar target may also be a communication channel scatterer. In this case, the radar and communication channels exhibit certain joint burst sparsity. We propose a two-stage joint pilot optimization, target detection and channel estimation scheme to exploit such joint burst sparsity and pilot beamforming gain to enhance detection/estimation performance. In Stage 1, the base station (BS) sends downlink pilots (DP) for initial target search, and the user sends uplink pilots (UP) for channel estimation. Then the BS performs joint target detection and channel estimation based on the reflected DP and received UP signals. In Stage 2, the BS exploits the prior information obtained in Stage 1 to optimize the DP signal to achieve beamforming gain and further refine the performance. A Turbo Sparse Bayesian inference algorithm is proposed for joint target detection and channel estimation in both stages. The pilot optimization problem in Stage 2 is a semi-definite programming with rank-1 constraints. By replacing the rank-1 constraint with a tight and smooth approximation, we propose an efficient pilot optimization algorithm based on the majorization-minimization method. Simulations verify the advantages of the proposed scheme.

preprint2022arXiv

Latency Minimization for mmWave D2D Mobile Edge Computing Systems: Joint Task Allocation and Hybrid Beamforming Design

Mobile edge computing (MEC) and millimeter wave (mmWave) communications are capable of significantly reducing the network's delay and enhancing its capacity. In this paper we investigate a mmWave and device-to-device (D2D) assisted MEC system, in which user A carries out some computational tasks and shares the results with user B with the aid of a base station (BS). We propose a novel two-timescale joint hybrid beamforming and task allocation algorithm to reduce the system latency whilst cut down the required signaling overhead. Specifically, the high-dimensional analog beamforming matrices are updated in a frame-based manner based on the channel state information (CSI) samples, where each frame consists of a number of time slots, while the low-dimensional digital beamforming matrices and the offloading ratio are optimized more frequently relied on the low-dimensional effective channel matrices in each time slot. A stochastic successive convex approximation (SSCA) based algorithm is developed to design the long-term analog beamforming matrices. As for the short-term variables, the digital beamforming matrices are optimized relying on the innovative penalty-concave convex procedure (penalty-CCCP) for handling the mmWave non-linear transmit power constraint, and the offloading ratio can be obtained via the derived closed-form solution. Simulation results verify the effectiveness of the proposed algorithm by comparing the benchmarks.

preprint2022arXiv

Mixed-Timescale Deep-Unfolding for Joint Channel Estimation and Hybrid Beamforming

In massive multiple-input multiple-output (MIMO) systems, hybrid analog-digital beamforming is an essential technique for exploiting the potential array gain without using a dedicated radio frequency chain for each antenna. However, due to the large number of antennas, the conventional channel estimation and hybrid beamforming algorithms generally require high computational complexity and signaling overhead. In this work, we propose an end-to-end deep-unfolding neural network (NN) joint channel estimation and hybrid beamforming (JCEHB) algorithm to maximize the system sum rate in time-division duplex (TDD) massive MIMO. Specifically, the recursive least-squares (RLS) algorithm and stochastic successive convex approximation (SSCA) algorithm are unfolded for channel estimation and hybrid beamforming, respectively. In order to reduce the signaling overhead, we consider a mixed-timescale hybrid beamforming scheme, where the analog beamforming matrices are optimized based on the channel state information (CSI) statistics offline, while the digital beamforming matrices are designed at each time slot based on the estimated low-dimensional equivalent CSI matrices. We jointly train the analog beamformers together with the trainable parameters of the RLS and SSCA induced deep-unfolding NNs based on the CSI statistics offline. During data transmission, we estimate the low-dimensional equivalent CSI by the RLS induced deep-unfolding NN and update the digital beamformers. In addition, we propose a mixed-timescale deep-unfolding NN where the analog beamformers are optimized online, and extend the framework to frequency-division duplex (FDD) systems where channel feedback is considered. Simulation results show that the proposed algorithm can significantly outperform conventional algorithms with reduced computational complexity and signaling overhead.

preprint2022arXiv

Multiband Delay Estimation for Localization Using a Two-Stage Global Estimation Scheme

The time of arrival (TOA)-based localization techniques, which need to estimate the delay of the line-of-sight (LoS) path, have been widely employed in location-aware networks. To achieve a high-accuracy delay estimation, a number of multiband-based algorithms have been proposed recently, which exploit the channel state information (CSI) measurements over multiple non-contiguous frequency bands. However, to the best of our knowledge, there still lacks an efficient scheme that fully exploits the multiband gains when the phase distortion factors caused by hardware imperfections are considered, due to that the associated multi-parameter estimation problem contains many local optimums and the existing algorithms can easily get stuck in a "bad" local optimum. To address these issues, we propose a novel two-stage global estimation (TSGE) scheme for multiband delay estimation. In the coarse stage, we exploit the group sparsity structure of the multiband channel and propose a Turbo Bayesian inference (Turbo-BI) algorithm to achieve a good initial delay estimation based on a coarse signal model, which is transformed from the original multiband signal model by absorbing the carrier frequency terms. The estimation problem derived from the coarse signal model contains less local optimums and thus a more stable estimation can be achieved than directly using the original signal model. Then in the refined stage, with the help of coarse estimation results to narrow down the search range, we perform a global delay estimation using a particle swarm optimization-least square (PSO-LS) algorithm based on a refined multiband signal model to exploit the multiband gains to further improve the estimation accuracy. Simulation results show that the proposed TSGE significantly outperforms the benchmarks with comparative computational complexity.

preprint2022arXiv

RIS-Assisted Communication Radar Coexistence: Joint Beamforming Design and Analysis

Integrated sensing and communication (ISAC) has been regarded as one of the most promising technologies for future wireless communications. However, the mutual interference in the communication radar coexistence system cannot be ignored. Inspired by the studies of reconfigurable intelligent surface (RIS), we propose a double-RIS-assisted coexistence system where two RISs are deployed for enhancing communication signals and suppressing mutual interference. We aim to jointly optimize the beamforming of RISs and radar to maximize communication performance while maintaining radar detection performance. The investigated problem is challenging, and thus we transform it into an equivalent but more tractable form by introducing auxiliary variables. Then, we propose a penalty dual decomposition (PDD)-based algorithm to solve the resultant problem. Moreover, we consider two special cases: the large radar transmit power scenario and the low radar transmit power scenario. For the former, we prove that the beamforming design is only determined by the communication channel and the corresponding optimal joint beamforming strategy can be obtained in closed-form. For the latter, we minimize the mutual interference via the block coordinate descent (BCD) method. By combining the solutions of these two cases, a low-complexity algorithm is also developed. Finally, simulation results show that both the PDD-based and low-complexity algorithms outperform benchmark algorithms.

preprint2022arXiv

Robust Semantic Communications Against Semantic Noise

Although the semantic communications have exhibited satisfactory performance in a large number of tasks, the impact of semantic noise and the robustness of the systems have not been well investigated. Semantic noise is a particular kind of noise in semantic communication systems, which refers to the misleading between the intended semantic symbols and received ones. In this paper, we first propose a framework for the robust end-to-end semantic communication systems to combat the semantic noise. Particularly, we analyze the causes of semantic noise and propose a practical method to generate it. To remove the effect of semantic noise, adversarial training is proposed to incorporate the samples with semantic noise in the training dataset. Then, the masked autoencoder (MAE) is designed as the architecture of a robust semantic communication system, where a portion of the input is masked. To further improve the robustness of semantic communication systems, we firstly employ the vector quantization-variational autoencoder (VQ-VAE) to design a discrete codebook shared by the transmitter and the receiver for encoded feature representation. Thus, the transmitter simply needs to transmit the indices of these features in the codebook. Simulation results show that our proposed method significantly improves the robustness of semantic communication systems against semantic noise with significant reduction on the transmission overhead.

preprint2021arXiv

Channel Estimation for Hybrid Massive MIMO Systems with Adaptive-Resolution ADCs

Achieving high channel estimation accuracy and reducing hardware cost as well as power dissipation constitute substantial challenges in the design of massive multiple-input multiple-output (MIMO) systems. To resolve these difficulties, sophisticated pilot designs have been conceived for the family of energy-efficient hybrid analog-digital (HAD) beamforming architecture relying on adaptive-resolution analog-to-digital converters (RADCs). In this paper, we jointly optimize the pilot sequences, the number of RADC quantization bits and the hybrid receiver combiner in the uplink of multiuser massive MIMO systems. We solve the associated mean square error (MSE) minimization problem of channel estimation in the context of correlated Rayleigh fading channels subject to practical constraints. The associated mixed-integer problem is quite challenging due to the nonconvex nature of the objective function and of the constraints. By relying on advanced fractional programming (FP) techniques, we first recast the original problem into a more tractable yet equivalent form, which allows the decoupling of the fractional objective function. We then conceive a pair of novel algorithms for solving the resultant problems for codebook-based and codebook-free pilot schemes, respectively. To reduce the design complexity, we also propose a simplified algorithm for the codebook-based pilot scheme. Our simulation results confirm the superiority of the proposed algorithms over the relevant state-of-the-art benchmark schemes.

preprint2021arXiv

Channel Estimation for IRS-aided Multiuser Communications with Reduced Error Propagation

Intelligent reflecting surface (IRS) has emerged as a promising paradigm to improve the capacity and reliability of a wireless communication system by smartly reconfiguring the wireless propagation environment. To achieve the promising gains of IRS, the acquisition of the channel state information (CSI) is essential, which however is practically difficult since the IRS does not employ any transmit/receive radio frequency (RF) chains in general and it has limited signal processing capability. In this paper, we study the uplink channel estimation problem for an IRS-aided multiuser single-input multi-output (SIMO) system, and propose a novel two-phase channel estimation (2PCE) strategy which can alleviate the negative effects caused by error propagation in the existing three-phase channel estimation approach, i.e., the channel estimation errors in previous phases will deteriorate the estimation performance in later phases, and enhance the channel estimation performance with the same amount of channel training overhead as in the existing approach. Moreover, the asymptotic mean squared error (MSE) of the 2PCE strategy is analyzed when the least-square (LS) channel estimation method is employed, and we show that the 2PCE strategy can outperform the existing approach. Finally, extensive simulation results are presented to validate the effectiveness of the 2PCE strategy.

preprint2021arXiv

Joint Deep Reinforcement Learning and Unfolding: Beam Selection and Precoding for mmWave Multiuser MIMO with Lens Arrays

The millimeter wave (mmWave) multiuser multiple-input multiple-output (MU-MIMO) systems with discrete lens arrays (DLA) have received great attention due to their simple hardware implementation and excellent performance. In this work, we investigate the joint design of beam selection and digital precoding matrices for mmWave MU-MIMO systems with DLA to maximize the sum-rate subject to the transmit power constraint and the constraints of the selection matrix structure. The investigated non-convex problem with discrete variables and coupled constraints is challenging to solve and an efficient framework of joint neural network (NN) design is proposed to tackle it. Specifically, the proposed framework consists of a deep reinforcement learning (DRL)-based NN and a deep-unfolding NN, which are employed to optimize the beam selection and digital precoding matrices, respectively. As for the DRL-based NN, we formulate the beam selection problem as a Markov decision process and a double deep Q-network algorithm is developed to solve it. The base station is considered to be an agent, where the state, action, and reward function are carefully designed. Regarding the design of the digital precoding matrix, we develop an iterative weighted minimum mean-square error algorithm induced deep-unfolding NN, which unfolds this algorithm into a layerwise structure with introduced trainable parameters. Simulation results verify that this jointly trained NN remarkably outperforms the existing iterative algorithms with reduced complexity and stronger robustness.

preprint2021arXiv

Latency Minimization in Intelligent Reflecting Surface Assisted D2D Offloading Systems

In this letter, we investigate an intelligent reflecting surface (IRS) aided device-to-device (D2D) offloading system, where an IRS is employed to assist in computation offloading from a group of users with intensive tasks to another group of idle users. We propose a new two-timescale joint passive beamforming and resource allocation algorithm based on stochastic successive convex approximation to minimize the system latency while cutting down the heavy overhead in exchange of channel state information (CSI). Specifically, the high-dimensional passive beamforming vector at the IRS is updated in a frame-based manner based on the channel statistics, where each frame consists of a number of time slots, while the offloading ratio and user matching strategy are optimized relied on the low-dimensional real-time effective channel coefficients in each time slot. The convergence property and the computational complexity of the proposed algorithm are also examined. Simulation results show that our proposed algorithm significantly outperforms the conventional benchmarks.

preprint2021arXiv

Non-Orthogonal Multiple Access for UAV-Aided Heterogeneous Networks: A Stochastic Geometry Model

In this work, we explore the potential benefits of deploying unmanned aerial vehicles (UAVs) as aerial base stations (ABSs) with sub-6GHz band and small cells terrestrial base stations (TBSs) with millimeter wave (mmWave) band in a hybrid heterogeneous networks (HetNets). A flexible non-orthogonal multiple access (NOMA) based user association policy is proposed. By using the tools from stochastic geometry, new analytical expressions for association probability, coverage probability and spectrum efficiency are derived for characterizing the performance of UAV-aided HetNets under the realistic Air-to-Ground (A2G) channels and the Ground-to-Ground (G2G) channels with a LoS ball blockage model. Finally, we provide insights on the proposed hybrid HetNets by numerical results. We confirm that i) the proposed NOMA enabled HetNets is capable of achieving superior performance compared with the OMA enabled ABSs by setting power allocation factors and targeted signal-to-interference-plus-noise ratio (SINR) threshold properly; ii) there is a tradeoff between the association probabilities and the spectrum efficiency in the NOMA enabled ABSs tier; iii) the coverage probability and spectrum efficiency of the NOMA enabled ABSs tier is largely affected by the imperfect successive interference cancellation (ipSIC) coefficient, power allocation factors and SINR threshold; iv) compared with only sub-6GHz ABSs, mmWave enabled TBSs are capable of enhancing the spectrum efficiency of the HetNets when the mmWave line-of-sight (LoS) link is available.

preprint2020arXiv

Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding Design for Multiuser MIMO Systems

Optimization theory assisted algorithms have received great attention for precoding design in multiuser multiple-input multiple-output (MU-MIMO) systems. Although the resultant optimization algorithms are able to provide excellent performance, they generally require considerable computational complexity, which gets in the way of their practical application in real-time systems. In this work, in order to address this issue, we first propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed in matrix form to better solve the problems in communication systems. Then, we implement the proposed deepunfolding framework to solve the sum-rate maximization problem for precoding design in MU-MIMO systems. An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed. Specifically, the iterative WMMSE algorithm is unfolded into a layer-wise structure, where a number of trainable parameters are introduced to replace the highcomplexity operations in the forward propagation. To train the network, a generalized chain rule of the IAIDNN is proposed to depict the recurrence relation of gradients between two adjacent layers in the back propagation. Moreover, we discuss the computational complexity and generalization ability of the proposed scheme. Simulation results show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.

preprint2020arXiv

Low-Complexity Joint Power Allocation and Trajectory Design for UAV-Enabled Secure Communications with Power Splitting

An unmanned aerial vehicle (UAV)-aided secure communication system is conceived and investigated, where the UAV transmits legitimate information to a ground user in the presence of an eavesdropper (Eve). To guarantee the security, the UAV employs a power splitting approach, where its transmit power can be divided into two parts for transmitting confidential messages and artificial noise (AN), respectively. We aim to maximize the average secrecy rate by jointly optimizing the UAV's trajectory, the transmit power levels and the corresponding power splitting ratios allocated to different time slots during the whole flight time, subject to both the maximum UAV speed constraint, the total mobility energy constraint, the total transmit power constraint, and other related constraints. To efficiently tackle this non-convex optimization problem, we propose an iterative algorithm by blending the benefits of the block coordinate descent (BCD) method, the concave-convex procedure (CCCP) and the alternating direction method of multipliers (ADMM). Specially, we show that the proposed algorithm exhibits very low computational complexity and each of its updating steps can be formulated in a nearly closed form. Our simulation results validate the efficiency of the proposed algorithm.

preprint2020arXiv

MIMO-Aided Nonlinear Hybrid Transceiver Design for Multiuser mmWave Systems Relying on Tomlinson-Harashima Precoding

Hybrid analog-digital (A/D) transceivers designed for millimeter wave (mmWave) systems have received substantial research attention, as a benefit of their lower cost and modest energy consumption compared to their fully-digital counterparts. We further improve their performance by conceiving a Tomlinson-Harashima precoding (THP) based nonlinear joint design for the downlink of multiuser multiple-input multiple-output (MIMO) mmWave systems. Our optimization criterion is that of minimizing the mean square error (MSE) of the system under channel uncertainties subject both to realistic transmit power constraint and to the unit modulus constraint imposed on the elements of the analog beamforming (BF) matrices governing the BF operation in the radio frequency domain. We transform this optimization problem into a more tractable form and develop an efficient block coordinate descent (BCD) based algorithm for solving it. Then, a novel two-timescale nonlinear joint hybrid transceiver design algorithm is developed, which can be viewed as an extension of the BCD-based joint design algorithm for reducing both the channel state information (CSI) signalling overhead and the effects of outdated CSI. Moreover, we determine the near-optimal cancellation order for the THP structure based on the lower bound of the MSE. The proposed algorithms can be guaranteed to converge to a Karush-Kuhn-Tucker (KKT) solution of the original problem. The simulation results demonstrate that our proposed nonlinear joint hybrid transceiver design algorithms significantly outperform the existing linear hybrid transceiver algorithms and approach the performance of the fully-digital transceiver, despite its lower cost and power dissipation.