Researcher profile

Kin K. Leung

Kin K. Leung contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2025arXiv

Adaptive Resource Orchestration for Distributed Quantum Computing Systems

Scaling quantum computing beyond a single device requires networking many quantum processing units (QPUs) into a coherent quantum-HPC system. We propose the Modular Entanglement Hub (ModEn-Hub) architecture: a hub-and-spoke photonic interconnect paired with a real-time quantum network orchestrator. ModEn-Hub centralizes entanglement sources and shared quantum memory to deliver on-demand, high-fidelity Bell pairs across heterogeneous QPUs, while the control plane schedules teleportation-based non-local gates, launches parallel entanglement attempts, and maintains a small ebit cache. To quantify benefits, we implement a lightweight, reproducible Monte Carlo study under realistic loss and tight round budgets, comparing a naive sequential baseline to an orchestrated policy with logarithmically scaled parallelism and opportunistic caching. Across 1-128 QPUs and 2,500 trials per point, ModEn-Hub-style orchestration sustains about 90% teleportation success while the baseline degrades toward about 30%, at the cost of higher average entanglement attempts (about 10-12 versus about 3). These results provide clear, high-level evidence that adaptive resource orchestration in the ModEn-Hub enables scalable and efficient quantum-HPC operation on near-term hardware.

preprint2022arXiv

Model Pruning Enables Efficient Federated Learning on Edge Devices

Federated learning (FL) allows model training from local data collected by edge/mobile devices while preserving data privacy, which has wide applicability to image and vision applications. A challenge is that client devices in FL usually have much more limited computation and communication resources compared to servers in a datacenter. To overcome this challenge, we propose PruneFL -- a novel FL approach with adaptive and distributed parameter pruning, which adapts the model size during FL to reduce both communication and computation overhead and minimize the overall training time, while maintaining a similar accuracy as the original model. PruneFL includes initial pruning at a selected client and further pruning as part of the FL process. The model size is adapted during this process, which includes maximizing the approximate empirical risk reduction divided by the time of one FL round. Our experiments with various datasets on edge devices (e.g., Raspberry Pi) show that: (i) we significantly reduce the training time compared to conventional FL and various other pruning-based methods; (ii) the pruned model with automatically determined size converges to an accuracy that is very similar to the original model, and it is also a lottery ticket of the original model.

preprint2020arXiv

Adaptive Gradient Sparsification for Efficient Federated Learning: An Online Learning Approach

Federated learning (FL) is an emerging technique for training machine learning models using geographically dispersed data collected by local entities. It includes local computation and synchronization steps. To reduce the communication overhead and improve the overall efficiency of FL, gradient sparsification (GS) can be applied, where instead of the full gradient, only a small subset of important elements of the gradient is communicated. Existing work on GS uses a fixed degree of gradient sparsity for i.i.d.-distributed data within a datacenter. In this paper, we consider adaptive degree of sparsity and non-i.i.d. local datasets. We first present a fairness-aware GS method which ensures that different clients provide a similar amount of updates. Then, with the goal of minimizing the overall training time, we propose a novel online learning formulation and algorithm for automatically determining the near-optimal communication and computation trade-off that is controlled by the degree of gradient sparsity. The online learning algorithm uses an estimated sign of the derivative of the objective function, which gives a regret bound that is asymptotically equal to the case where exact derivative is available. Experiments with real datasets confirm the benefits of our proposed approaches, showing up to $40\%$ improvement in model accuracy for a finite training time.

preprint2020arXiv

Energy-Efficient Resource Management for Federated Edge Learning with CPU-GPU Heterogeneous Computing

Edge machine learning involves the deployment of learning algorithms at the network edge to leverage massive distributed data and computation resources to train artificial intelligence (AI) models. Among others, the framework of federated edge learning (FEEL) is popular for its data-privacy preservation. FEEL coordinates global model training at an edge server and local model training at edge devices that are connected by wireless links. This work contributes to the energy-efficient implementation of FEEL in wireless networks by designing joint computation-and-communication resource management ($\text{C}^2$RM). The design targets the state-of-the-art heterogeneous mobile architecture where parallel computing using both a CPU and a GPU, called heterogeneous computing, can significantly improve both the performance and energy efficiency. To minimize the sum energy consumption of devices, we propose a novel $\text{C}^2$RM framework featuring multi-dimensional control including bandwidth allocation, CPU-GPU workload partitioning and speed scaling at each device, and $\text{C}^2$ time division for each link. The key component of the framework is a set of equilibriums in energy rates with respect to different control variables that are proved to exist among devices or between processing units at each device. The results are applied to designing efficient algorithms for computing the optimal $\text{C}^2$RM policies faster than the standard optimization tools. Based on the equilibriums, we further design energy-efficient schemes for device scheduling and greedy spectrum sharing that scavenges "spectrum holes" resulting from heterogeneous $\text{C}^2$ time divisions among devices. Using a real dataset, experiments are conducted to demonstrate the effectiveness of $\text{C}^2$RM on improving the energy efficiency of a FEEL system.

preprint2020arXiv

Fast-Fourier-Forecasting Resource Utilisation in Distributed Systems

Distributed computing systems often consist of hundreds of nodes, executing tasks with different resource requirements. Efficient resource provisioning and task scheduling in such systems are non-trivial and require close monitoring and accurate forecasting of the state of the system, specifically resource utilisation at its constituent machines. Two challenges present themselves towards these objectives. First, collecting monitoring data entails substantial communication overhead. This overhead can be prohibitively high, especially in networks where bandwidth is limited. Second, forecasting models to predict resource utilisation should be accurate and need to exhibit high inference speed. Mission critical scheduling and resource allocation algorithms use these predictions and rely on their immediate availability. To address the first challenge, we present a communication-efficient data collection mechanism. Resource utilisation data is collected at the individual machines in the system and transmitted to a central controller in batches. Each batch is processed by an adaptive data-reduction algorithm based on Fourier transforms and truncation in the frequency domain. We show that the proposed mechanism leads to a significant reduction in communication overhead while incurring only minimal error and adhering to accuracy guarantees. To address the second challenge, we propose a deep learning architecture using complex Gated Recurrent Units to forecast resource utilisation. This architecture is directly integrated with the above data collection mechanism to improve inference speed of our forecasting model. Using two real-world datasets, we demonstrate the effectiveness of our approach, both in terms of forecasting accuracy and inference speed. Our approach resolves challenges encountered in resource provisioning frameworks and can be applied to other forecasting problems.

preprint2020arXiv

Let's Share: A Game-Theoretic Framework for Resource Sharing in Mobile Edge Clouds

Mobile edge computing seeks to provide resources to different delay-sensitive applications. This is a challenging problem as an edge cloud-service provider may not have sufficient resources to satisfy all resource requests. Furthermore, allocating available resources optimally to different applications is also challenging. Resource sharing among different edge cloud-service providers can address the aforementioned limitation as certain service providers may have resources available that can be ``rented'' by other service providers. However, edge cloud service providers can have different objectives or \emph{utilities}. Therefore, there is a need for an efficient and effective mechanism to share resources among service providers, while considering the different objectives of various providers. We model resource sharing as a multi-objective optimization problem and present a solution framework based on \emph{Cooperative Game Theory} (CGT). We consider the strategy where each service provider allocates resources to its native applications first and shares the remaining resources with applications from other service providers. We prove that for a monotonic, non-decreasing utility function, the game is canonical and convex. Hence, the \emph{core} is not empty and the grand coalition is stable. We propose two algorithms \emph{Game-theoretic Pareto optimal allocation} (GPOA) and \emph{Polyandrous-Polygamous Matching based Pareto Optimal Allocation} (PPMPOA) that provide allocations from the core. Hence the obtained allocations are \emph{Pareto} optimal and the grand coalition of all the service providers is stable. Experimental results confirm that our proposed resource sharing framework improves utilities of edge cloud-service providers and application request satisfaction.

preprint2020arXiv

Overcoming Noisy and Irrelevant Data in Federated Learning

Many image and vision applications require a large amount of data for model training. Collecting all such data at a central location can be challenging due to data privacy and communication bandwidth restrictions. Federated learning is an effective way of training a machine learning model in a distributed manner from local data collected by client devices, which does not require exchanging the raw data among clients. A challenge is that among the large variety of data collected at each client, it is likely that only a subset is relevant for a learning task while the rest of data has a negative impact on model training. Therefore, before starting the learning process, it is important to select the subset of data that is relevant to the given federated learning task. In this paper, we propose a method for distributedly selecting relevant data, where we use a benchmark model trained on a small benchmark dataset that is task-specific, to evaluate the relevance of individual data samples at each client and select the data with sufficiently high relevance. Then, each client only uses the selected subset of its data in the federated learning process. The effectiveness of our proposed approach is evaluated on multiple real-world image datasets in a simulated system with a large number of clients, showing up to $25\%$ improvement in model accuracy compared to training with all data.

preprint2020arXiv

Resource Allocation in One-dimensional Distributed Service Networks

We consider assignment policies that allocate resources to users, where both resources and users are located on a one-dimensional line. First, we consider unidirectional assignment policies that allocate resources only to users located to their left. We propose the Move to Right (MTR) policy, which scans from left to right assigning nearest rightmost available resource to a user, and contrast it to the Unidirectional Gale-Shapley (UGS) matching policy. While both these policies are optimal among all unidirectional policies, we show that they are equivalent with respect to the expected distance traveled by a request (request distance), although MTR is fairer. Moreover, we show that when user and resource locations are modeled by statistical point processes, and resources are allowed to satisfy more than one user, the spatial system under unidirectional policies can be mapped into bulk service queuing systems, thus allowing the application of a plethora of queuing theory results that yield closed form expressions. As we consider a case where different resources can satisfy different numbers of users, we also generate new results for bulk service queues. We also consider bidirectional policies where there are no directional restrictions on resource allocation and develop an algorithm for computing the optimal assignment which is more efficient than known algorithms in the literature when there are more resources than users. Finally, numerical evaluation of performance of unidirectional and bidirectional allocation schemes yields design guidelines beneficial for resource placement.

preprint2020arXiv

State Action Separable Reinforcement Learning

Reinforcement Learning (RL) based methods have seen their paramount successes in solving serial decision-making and control problems in recent years. For conventional RL formulations, Markov Decision Process (MDP) and state-action-value function are the basis for the problem modeling and policy evaluation. However, several challenging issues still remain. Among most cited issues, the enormity of state/action space is an important factor that causes inefficiency in accurately approximating the state-action-value function. We observe that although actions directly define the agents' behaviors, for many problems the next state after a state transition matters more than the action taken, in determining the return of such a state transition. In this regard, we propose a new learning paradigm, State Action Separable Reinforcement Learning (sasRL), wherein the action space is decoupled from the value function learning process for higher efficiency. Then, a light-weight transition model is learned to assist the agent to determine the action that triggers the associated state transition. In addition, our convergence analysis reveals that under certain conditions, the convergence time of sasRL is $O(T^{1/k})$, where $T$ is the convergence time for updating the value function in the MDP-based formulation and $k$ is a weighting factor. Experiments on several gaming scenarios show that sasRL outperforms state-of-the-art MDP-based RL algorithms by up to $75\%$.

preprint2010arXiv

On the Universality of Sequential Slotted Amplify and Forward Strategy in Cooperative Communications

While cooperative communication has many benefits and is expected to play an important role in future wireless networks, many challenges are still unsolved. Previous research has developed different relaying strategies for cooperative multiple access channels (CMA), cooperative multiple relay channels (CMR) and cooperative broadcast channels (CBC). However, there lacks a unifying strategy that is universally optimal for these three classical channel models. Sequential slotted amplify and forward (SSAF) strategy was previously proposed to achieve the optimal diversity and multiplexing tradeoff (DMT) for CMR. In this paper, the use of SSAF strategy is extended to CBC and CMA, and its optimality for both of them is shown. For CBC, a CBC-SSAF strategy is proposed which can asymptotically achieve the DMT upper bound when the number of cooperative users is large. For CMA, a CMA-SSAF strategy is proposed which even can exactly achieve the DMT upper bound with any number of cooperative users. In this way, SSAF strategy is shown to be universally optimal for all these three classical channel models and has great potential to provide universal optimality for wireless cooperative networks.

preprint2010arXiv

Wireless Network Coding with Imperfect Overhearing

Not only is network coding essential to achieve the capacity of a single-session multicast network, it can also help to improve the throughput of wireless networks with multiple unicast sessions when overheard information is available. Most previous research aimed at realizing such improvement by using perfectly overheard information, while in practice, especially for wireless networks, overheard information is often imperfect. To date, it is unclear whether network coding should still be used in such situations with imperfect overhearing. In this paper, a simple but ubiquitous wireless network model with two unicast sessions is used to investigate this problem. From the diversity and multiplexing tradeoff perspective, it is proved that even when overheard information is imperfect, network coding can still help to improve the overall system performance. This result implies that network coding should be used actively regardless of the reception quality of overheard information.