Researcher profile

Yik-Chung Wu

Yik-Chung Wu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2026arXiv

OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond

The rapid advancement toward long-context reasoning and multi-modal intelligence has made the memory footprint of the Key-Value (KV) cache a dominant memory bottleneck for efficient deployment. While the established per-channel quantization effectively accommodates intrinsic channel-wise outliers in Key tensors, its efficacy diminishes under extreme compression. In this work, we revisit the inherent limitations of the per-channel quantization paradigm from both empirical and theoretical perspectives. Our analysis identifies Token Norm Imbalance (TNI) as the primary bottleneck to quantization fidelity. We demonstrate that TNI systematically amplifies errors when shared quantization parameters are required to span token groups exhibiting substantial norm disparities. Instead of relying on intricate quantization pipelines (e.g., TurboQuant), we propose OScaR (Omni-Scaled Canalized Rotation), an accurate and lightweight KV cache compression framework for X-LLMs (i.e., text-only, multi-modal, and omni-modal LLMs). Advancing the per-channel paradigm, OScaR employs Canalized Rotation followed by Omni-Token Scaling to mitigate TNI-induced sequence-dimensional variance both effectively and efficiently, further supported by our optimized system design and CUDA kernels. Extensive evaluations across X-LLMs show that OScaR consistently outperforms existing methods and achieves near-lossless performance under INT2 quantization, establishing it as a robust, low-complexity, and universal framework that defines a new Pareto front. Compared with the BF16 FlashDecoding-v2 baseline, our OScaR implementation achieves a notable up to 3.0x speedup in decoding, reduces memory footprint by 5.3x, and increases throughput by 4.1x. The code for OScaR is publicly available at https://github.com/ZunhaiSu/OScaR-KV-Quant.

preprint2022arXiv

Bayesian Low-rank Matrix Completion with Dual-graph Embedding: Prior Analysis and Tuning-free Inference

Recently, there is a revival of interest in low-rank matrix completion-based unsupervised learning through the lens of dual-graph regularization, which has significantly improved the performance of multidisciplinary machine learning tasks such as recommendation systems, genotype imputation and image inpainting. While the dual-graph regularization contributes a major part of the success, computational costly hyper-parameter tunning is usually involved. To circumvent such a drawback and improve the completion performance, we propose a novel Bayesian learning algorithm that automatically learns the hyper-parameters associated with dual-graph regularization, and at the same time, guarantees the low-rankness of matrix completion. Notably, a novel prior is devised to promote the low-rankness of the matrix and encode the dual-graph information simultaneously, which is more challenging than the single-graph counterpart. A nontrivial conditional conjugacy between the proposed priors and likelihood function is then explored such that an efficient algorithm is derived under variational inference framework. Extensive experiments using synthetic and real-world datasets demonstrate the state-of-the-art performance of the proposed learning algorithm for various data analysis tasks.

preprint2022arXiv

Edge Federated Learning Via Unit-Modulus Over-The-Air Computation

Edge federated learning (FL) is an emerging paradigm that trains a global parametric model from distributed datasets based on wireless communications. This paper proposes a unit-modulus over-the-air computation (UMAirComp) framework to facilitate efficient edge federated learning, which simultaneously uploads local model parameters and updates global model parameters via analog beamforming. The proposed framework avoids sophisticated baseband signal processing, leading to low communication delays and implementation costs. Training loss bounds of UMAirComp FL systems are derived and two low-complexity large-scale optimization algorithms, termed penalty alternating minimization (PAM) and accelerated gradient projection (AGP), are proposed to minimize the nonconvex nonsmooth loss bound. Simulation results show that the proposed UMAirComp framework with PAM algorithm achieves a smaller mean square error of model parameters' estimation, training loss, and test error compared with other benchmark schemes. Moreover, the proposed UMAirComp framework with AGP algorithm achieves satisfactory performance while reduces the computational complexity by orders of magnitude compared with existing optimization algorithms. Finally, we demonstrate the implementation of UMAirComp in a vehicle-to-everything autonomous driving simulation platform. It is found that autonomous driving tasks are more sensitive to model parameter errors than other tasks since the neural networks for autonomous driving contain sparser model parameters.

preprint2022arXiv

ENGNN: A General Edge-Update Empowered GNN Architecture for Radio Resource Management in Wireless Networks

In order to achieve high data rate and ubiquitous connectivity in future wireless networks, a key task is to efficiently manage the radio resource by judicious beamforming and power allocation. Unfortunately, the iterative nature of the commonly applied optimization-based algorithms cannot meet the low latency requirements due to the high computational complexity. For real-time implementations, deep learning-based approaches, especially the graph neural networks (GNNs), have been demonstrated with good scalability and generalization performance due to the permutation equivariance (PE) property. However, the current architectures are only equipped with the node-update mechanism, which prohibits the applications to a more general setup, where the unknown variables are also defined on the graph edges. To fill this gap, we propose an edge-update mechanism, which enables GNNs to handle both node and edge variables and prove its PE property with respect to both transmitters and receivers. Simulation results on typical radio resource management problems demonstrate that the proposed method achieves higher sum rate but with much shorter computation time than state-of-the-art methods and generalizes well on different numbers of base stations and users, different noise variances, interference levels, and transmit power budgets.

preprint2022arXiv

Intelligent Reflecting Surface Aided Mobile Edge Computing With Binary Offloading: Energy Minimization for IoT Devices

Mobile edge computing (MEC) is envisioned as a promising technique to support computation-intensive and timecritical applications in future Internet of Things (IoT) era. However, the uplink transmission performance will be highly impacted by the hostile wireless channel, the low bandwidth, and the low transmission power of IoT devices. Recently, intelligent reflecting surface (IRS) has drawn much attention because of its capability to control the wireless environments so as to enhance the spectrum and energy efficiencies of wireless communications. In this paper, we consider an IRS-aided multidevice MEC system where each IoT device follows the binary offloading policy, i.e., a task has to be computed as a whole either locally or remotely at the edge server. We aim to minimize the total energy consumption of devices by jointly optimizing the binary offloading modes, the CPU frequencies, the offloading powers, the offloading times and the IRS phase shifts for all devices. Two algorithms, which are greedy-based and penalty-based, are proposed to solve the challenging nonconvex and discontinuous problem. It is found that the penalty-based method has only linear complexity with respect to the number of devices, but it performs close to the greedy-based method with cubic complexity with respect to number of devices. Furthermore, binary offloading via IRS indeed saves more energy compared to the case without IRS.

preprint2022arXiv

Phase Shift Design in RIS Empowered Wireless Networks: From Optimization to AI-Based Methods

Reconfigurable intelligent surfaces (RISs) have a revolutionary capability to customize the radio propagation environment for wireless networks. To fully exploit the advantages of RISs in wireless systems, the phases of the reflecting elements must be jointly designed with conventional communication resources, such as beamformers, transmit power, and computation time. However, due to the unique constraints on the phase shift, and massive numbers of reflecting units and users in large-scale networks, the resulting optimization problems are challenging to solve. This paper provides a review of current optimization methods and artificial intelligence-based methods for handling the constraints imposed by RIS and compares them in terms of solution quality and computational complexity. Future challenges in phase shift optimization involving RISs are also described and potential solutions are discussed.

preprint2022arXiv

Towards Flexible Sparsity-Aware Modeling: Automatic Tensor Rank Learning Using The Generalized Hyperbolic Prior

Tensor rank learning for canonical polyadic decomposition (CPD) has long been deemed as an essential yet challenging problem. In particular, since the tensor rank controls the complexity of the CPD model, its inaccurate learning would cause overfitting to noise or underfitting to the signal sources, and even destroy the interpretability of model parameters. However, the optimal determination of a tensor rank is known to be a non-deterministic polynomial-time hard (NP-hard) task. Rather than exhaustively searching for the best tensor rank via trial-and-error experiments, Bayesian inference under the Gaussian-gamma prior was introduced in the context of probabilistic CPD modeling, and it was shown to be an effective strategy for automatic tensor rank determination. This triggered flourishing research on other structured tensor CPDs with automatic tensor rank learning. On the other side of the coin, these research works also reveal that the Gaussian-gamma model does not perform well for high-rank tensors and/or low signal-to-noise ratios (SNRs). To overcome these drawbacks, in this paper, we introduce a more advanced generalized hyperbolic (GH) prior to the probabilistic CPD model, which not only includes the Gaussian-gamma model as a special case, but also is more flexible to adapt to different levels of sparsity. Based on this novel probabilistic model, an algorithm is developed under the framework of variational inference, where each update is obtained in a closed-form. Extensive numerical results, using synthetic data and real-world datasets, demonstrate the significantly improved performance of the proposed method in learning both low as well as high tensor ranks even for low SNR cases.

preprint2021arXiv

Space Shift Keying with Reconfigurable Intelligent Surfaces: Phase Configuration Designs and Performance Analysis

Reconfigurable intelligent surface (RIS)-assisted transmission and space shift keying (SSK) appear as promising candidates for future energy-efficient wireless systems. In this paper, two RIS-based SSK schemes are proposed to efficiently improve the error and throughput performance of conventional SSK systems, respectively. The first one, termed RIS-SSK with passive beamforming (RIS-SSK-PB), employs an RIS for beamforming and targets the maximization of the minimum squared Euclidean distance between any two decision points. The second one, termed RIS-SSK with Alamouti space-time block coding (RIS-SSK-ASTBC), employs an RIS for ASTBC and enables the RIS to transmit its own Alamouti-coded information while reflecting the incident SSK signals to the destination. A low-complexity beamformer and an efficient maximum-likelihood (ML) detector are designed for RIS-SSK-PB and RIS-SSK-ASTBC, respectively. Approximate expressions for the average bit error probabilities of the source and/or the RIS are derived in closed-form assuming ML detection. Extensive computer simulations are conducted to verify the performance analysis. Results show that RIS-SSK-PB significantly outperforms the existing RIS-free and RIS-based SSK schemes, and RIS-SSK-ASTBC enables highly reliable transmission with throughput improvement.

preprint2020arXiv

Angle Aware User Cooperation for Secure Massive MIMO in Rician Fading Channel

Massive multiple-input multiple-output communications can achieve high-level security by concentrating radio frequency signals towards the legitimate users. However, this system is vulnerable in a Rician fading environment if the eavesdropper positions itself such that its channel is highly "similar" to the channel of a legitimate user. To address this problem, this paper proposes an angle aware user cooperation (AAUC) scheme, which avoids direct transmission to the attacked user and relies on other users for cooperative relaying. The proposed scheme only requires the eavesdropper's angle information, and adopts an angular secrecy model to represent the average secrecy rate of the attacked system. With this angular model, the AAUC problem turns out to be nonconvex, and a successive convex optimization algorithm, which converges to a Karush-Kuhn-Tucker solution, is proposed. Furthermore, a closed-form solution and a Bregman first-order method are derived for the cases of large-scale antennas and large-scale users, respectively. Extension to the intelligent reflecting surfaces based scheme is also discussed. Simulation results demonstrate the effectiveness of the proposed successive convex optimization based AAUC scheme, and also validate the low-complexity nature of the proposed large-scale optimization algorithms.

preprint2020arXiv

Caching at Base Stations with Multi-Cluster Multicast Wireless Backhaul via Accelerated First-Order Algorithm

Cloud radio access network (C-RAN) has been recognized as a promising architecture for next-generation wireless systems to \textcolor{black}{support} the rapidly increasing demand for higher data rate. However, the performance of C-RAN is limited by the backhaul capacities, especially for the wireless deployment. While C-RAN with fixed BS caching has been demonstrated to reduce backhaul consumption, it is more challenging to further optimize the cache allocation at BSs with multi-cluster multicast backhaul, where the inter-cluster interference induces additional non-convexity to the cache optimization problem. Despite the challenges, we propose an accelerated first-order algorithm, which achieves much higher content downloading sum-rate than a second-order algorithm running for the same amount of time. Simulation results demonstrate that, by simultaneously delivering the required contents to different multicast clusters, the proposed algorithm achieves significantly higher downloading sum-rate than those of time-division single-cluster transmission schemes. Moreover, it is found that the proposed algorithm allocates larger cache sizes to the farther BSs within the nearer clusters, which provides insight to the superiority of the proposed cache allocation.

preprint2020arXiv

Edge Learning with Unmanned Ground Vehicle: Joint Path, Energy and Sample Size Planning

Edge learning (EL), which uses edge computing as a platform to execute machine learning algorithms, is able to fully exploit the massive sensing data generated by Internet of Things (IoT). However, due to the limited transmit power at IoT devices, collecting the sensing data in EL systems is a challenging task. To address this challenge, this paper proposes to integrate unmanned ground vehicle (UGV) with EL. With such a scheme, the UGV could improve the communication quality by approaching various IoT devices. However, different devices may transmit different data for different machine learning jobs and a fundamental question is how to jointly plan the UGV path, the devices' energy consumption, and the number of samples for different jobs? This paper further proposes a graph-based path planning model, a network energy consumption model and a sample size planning model that characterizes F-measure as a function of the minority class sample size. With these models, the joint path, energy and sample size planning (JPESP) problem is formulated as a large-scale mixed integer nonlinear programming (MINLP) problem, which is nontrivial to solve due to the high-dimensional discontinuous variables related to UGV movement. To this end, it is proved that each IoT device should be served only once along the path, thus the problem dimension is significantly reduced. Furthermore, to handle the discontinuous variables, a tabu search (TS) based algorithm is derived, which converges in expectation to the optimal solution to the JPESP problem. Simulation results under different task scenarios show that our optimization schemes outperform the fixed EL and the full path EL schemes.

preprint2020arXiv

Energy-Efficient Non-Orthogonal Multicast and Unicast Transmission of Cell-Free Massive MIMO Systems with SWIPT

This work investigates the energy-efficient resource allocation for layered-division multiplexing (LDM) based non-orthogonal multicast and unicast transmission in cell-free massive multiple-input multiple-output (MIMO) systems, where each user equipment (UE) performs wireless information and power transfer simultaneously. To begin with, the achievable data rates for multicast and unicast services are derived in closed form, as well as the received radio frequency (RF) power at each UE. Based on the analytical results, a nonsmooth and nonconvex optimization problem for energy efficiency (EE) maximization is formulated, which is however a challenging fractional programming problem with complex constraints. To suit the massive access setting, a first-order algorithm is developed to find both initial feasible point and the nearly optimal solution. Moreover, an accelerated algorithm is designed to improve the convergence speed. Numerical results demonstrate that the proposed first-order algorithms can achieve almost the same EE as that of second-order approaches yet with much lower computational complexity, which provides insight into the superiority of the proposed algorithms for massive access in cell-free massive MIMO systems.

preprint2020arXiv

Learning Centric Power Allocation for Edge Intelligence

While machine-type communication (MTC) devices generate massive data, they often cannot process this data due to limited energy and computation power. To this end, edge intelligence has been proposed, which collects distributed data and performs machine learning at the edge. However, this paradigm needs to maximize the learning performance instead of the communication throughput, for which the celebrated water-filling and max-min fairness algorithms become inefficient since they allocate resources merely according to the quality of wireless channels. This paper proposes a learning centric power allocation (LCPA) method, which allocates radio resources based on an empirical classification error model. To get insights into LCPA, an asymptotic optimal solution is derived. The solution shows that the transmit powers are inversely proportional to the channel gain, and scale exponentially with the learning parameters. Experimental results show that the proposed LCPA algorithm significantly outperforms other power allocation algorithms.

preprint2020arXiv

Machine Intelligence at the Edge with Learning Centric Power Allocation

While machine-type communication (MTC) devices generate considerable amounts of data, they often cannot process the data due to limited energy and computational power. To empower MTC with intelligence, edge machine learning has been proposed. However, power allocation in this paradigm requires maximizing the learning performance instead of the communication throughput, for which the celebrated water-filling and max-min fairness algorithms become inefficient. To this end, this paper proposes learning centric power allocation (LCPA), which provides a new perspective on radio resource allocation in learning driven scenarios. By employing 1) an empirical classification error model that is supported by learning theory and 2) an uncertainty sampling method that accounts for different distributions at users, LCPA is formulated as a nonconvex nonsmooth optimization problem, and is solved using a majorization minimization (MM) framework. To get deeper insights into LCPA, asymptotic analysis shows that the transmit powers are inversely proportional to the channel gains, and scale exponentially with the learning parameters. This is in contrast to traditional power allocations where quality of wireless channels is the only consideration. Last but not least, a large-scale optimization algorithm termed mirror-prox LCPA is further proposed to enable LCPA in large-scale settings. Extensive numerical results demonstrate that the proposed LCPA algorithms outperform traditional power allocation algorithms, and the large-scale optimization algorithm reduces the computation time by orders of magnitude compared with MM-based LCPA but still achieves competing learning performance.

preprint2020arXiv

Massive Access in Secure NOMA under Imperfect CSI: Security Guaranteed Sum-Rate Maximization with First-Order Algorithm

Non-orthogonal multiple access (NOMA) is a promising solution for secure transmission under massive access. However, in addition to the uncertain channel state information (CSI) of the eavesdroppers due to their passive nature, the CSI of the legitimate users may also be imperfect at the base station due to the limited feedback. Under both channel uncertainties, the optimal power allocation and transmission rate design for a secure NOMA scheme is currently not known due to the difficulty of handling the probabilistic constraints. This paper fills this gap by proposing novel transformation of the probabilistic constraints and variable decoupling so that the security guaranteed sum-rate maximization problem can be solved by alternatively executing branch-and-bound method and difference of convex programming. To scale the solution to a truly massive access scenario, a first-order algorithm with very low complexity is further proposed. Simulation results show that the proposed first-order algorithm achieves identical performance to the conventional method but saves at least two orders of magnitude in computation time. Moreover, the resultant transmission scheme significantly improves the security guaranteed sum-rate compared to the orthogonal multiple access transmission and NOMA ignoring CSI uncertainty.