Source author record

Kin K. Leung

Kin K. Leung appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Networking and Internet Architecture Machine Learning Information Theory math.IT math.OC Performance Artificial Intelligence eess.SP Multiagent Systems quant-ph

Catalog footprint

What is connected

16works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Adaptive Resource Orchestration for Distributed Quantum Computing Systems

Scaling quantum computing beyond a single device requires networking many quantum processing units (QPUs) into a coherent quantum-HPC system. We propose the Modular Entanglement Hub (ModEn-Hub) architecture: a hub-and-spoke photonic interconnect paired with a real-time quantum network orchestrator. ModEn-Hub centralizes entanglement sources and shared quantum memory to deliver on-demand, high-fidelity Bell pairs across heterogeneous QPUs, while the control plane schedules teleportation-based non-local gates, launches parallel entanglement attempts, and maintains a small ebit cache. To quantify benefits, we implement a lightweight, reproducible Monte Carlo study under realistic loss and tight round budgets, comparing a naive sequential baseline to an orchestrated policy with logarithmically scaled parallelism and opportunistic caching. Across 1-128 QPUs and 2,500 trials per point, ModEn-Hub-style orchestration sustains about 90% teleportation success while the baseline degrades toward about 30%, at the cost of higher average entanglement attempts (about 10-12 versus about 3). These results provide clear, high-level evidence that adaptive resource orchestration in the ModEn-Hub enables scalable and efficient quantum-HPC operation on near-term hardware.

preprint2022arXiv

Model Pruning Enables Efficient Federated Learning on Edge Devices

Federated learning (FL) allows model training from local data collected by edge/mobile devices while preserving data privacy, which has wide applicability to image and vision applications. A challenge is that client devices in FL usually have much more limited computation and communication resources compared to servers in a datacenter. To overcome this challenge, we propose PruneFL -- a novel FL approach with adaptive and distributed parameter pruning, which adapts the model size during FL to reduce both communication and computation overhead and minimize the overall training time, while maintaining a similar accuracy as the original model. PruneFL includes initial pruning at a selected client and further pruning as part of the FL process. The model size is adapted during this process, which includes maximizing the approximate empirical risk reduction divided by the time of one FL round. Our experiments with various datasets on edge devices (e.g., Raspberry Pi) show that: (i) we significantly reduce the training time compared to conventional FL and various other pruning-based methods; (ii) the pruned model with automatically determined size converges to an accuracy that is very similar to the original model, and it is also a lottery ticket of the original model.

preprint2020arXiv

Adaptive Gradient Sparsification for Efficient Federated Learning: An Online Learning Approach

Federated learning (FL) is an emerging technique for training machine learning models using geographically dispersed data collected by local entities. It includes local computation and synchronization steps. To reduce the communication overhead and improve the overall efficiency of FL, gradient sparsification (GS) can be applied, where instead of the full gradient, only a small subset of important elements of the gradient is communicated. Existing work on GS uses a fixed degree of gradient sparsity for i.i.d.-distributed data within a datacenter. In this paper, we consider adaptive degree of sparsity and non-i.i.d. local datasets. We first present a fairness-aware GS method which ensures that different clients provide a similar amount of updates. Then, with the goal of minimizing the overall training time, we propose a novel online learning formulation and algorithm for automatically determining the near-optimal communication and computation trade-off that is controlled by the degree of gradient sparsity. The online learning algorithm uses an estimated sign of the derivative of the objective function, which gives a regret bound that is asymptotically equal to the case where exact derivative is available. Experiments with real datasets confirm the benefits of our proposed approaches, showing up to $40\%$ improvement in model accuracy for a finite training time.

preprint2020arXiv

Additive Link Metrics Identification: Proof of Selected Lemmas and Propositions

This is a technical report, containing all the lemma and proposition proofs in paper "Topological Constraints on Identifying Additive Link Metrics via End-to-end Paths Measurements" by Liang Ma, Ting He, Kin K. Leung, Don Towsley, and Ananthram Swami, published in Annual Conference of The International Technology Alliance (ACITA), 2012.

preprint2020arXiv

Energy-Efficient Resource Management for Federated Edge Learning with CPU-GPU Heterogeneous Computing

Edge machine learning involves the deployment of learning algorithms at the network edge to leverage massive distributed data and computation resources to train artificial intelligence (AI) models. Among others, the framework of federated edge learning (FEEL) is popular for its data-privacy preservation. FEEL coordinates global model training at an edge server and local model training at edge devices that are connected by wireless links. This work contributes to the energy-efficient implementation of FEEL in wireless networks by designing joint computation-and-communication resource management ($\text{C}^2$RM). The design targets the state-of-the-art heterogeneous mobile architecture where parallel computing using both a CPU and a GPU, called heterogeneous computing, can significantly improve both the performance and energy efficiency. To minimize the sum energy consumption of devices, we propose a novel $\text{C}^2$RM framework featuring multi-dimensional control including bandwidth allocation, CPU-GPU workload partitioning and speed scaling at each device, and $\text{C}^2$ time division for each link. The key component of the framework is a set of equilibriums in energy rates with respect to different control variables that are proved to exist among devices or between processing units at each device. The results are applied to designing efficient algorithms for computing the optimal $\text{C}^2$RM policies faster than the standard optimization tools. Based on the equilibriums, we further design energy-efficient schemes for device scheduling and greedy spectrum sharing that scavenges "spectrum holes" resulting from heterogeneous $\text{C}^2$ time divisions among devices. Using a real dataset, experiments are conducted to demonstrate the effectiveness of $\text{C}^2$RM on improving the energy efficiency of a FEEL system.

preprint2020arXiv

Fast-Fourier-Forecasting Resource Utilisation in Distributed Systems

Distributed computing systems often consist of hundreds of nodes, executing tasks with different resource requirements. Efficient resource provisioning and task scheduling in such systems are non-trivial and require close monitoring and accurate forecasting of the state of the system, specifically resource utilisation at its constituent machines. Two challenges present themselves towards these objectives. First, collecting monitoring data entails substantial communication overhead. This overhead can be prohibitively high, especially in networks where bandwidth is limited. Second, forecasting models to predict resource utilisation should be accurate and need to exhibit high inference speed. Mission critical scheduling and resource allocation algorithms use these predictions and rely on their immediate availability. To address the first challenge, we present a communication-efficient data collection mechanism. Resource utilisation data is collected at the individual machines in the system and transmitted to a central controller in batches. Each batch is processed by an adaptive data-reduction algorithm based on Fourier transforms and truncation in the frequency domain. We show that the proposed mechanism leads to a significant reduction in communication overhead while incurring only minimal error and adhering to accuracy guarantees. To address the second challenge, we propose a deep learning architecture using complex Gated Recurrent Units to forecast resource utilisation. This architecture is directly integrated with the above data collection mechanism to improve inference speed of our forecasting model. Using two real-world datasets, we demonstrate the effectiveness of our approach, both in terms of forecasting accuracy and inference speed. Our approach resolves challenges encountered in resource provisioning frameworks and can be applied to other forecasting problems.

preprint2020arXiv

Let's Share: A Game-Theoretic Framework for Resource Sharing in Mobile Edge Clouds

Mobile edge computing seeks to provide resources to different delay-sensitive applications. This is a challenging problem as an edge cloud-service provider may not have sufficient resources to satisfy all resource requests. Furthermore, allocating available resources optimally to different applications is also challenging. Resource sharing among different edge cloud-service providers can address the aforementioned limitation as certain service providers may have resources available that can be ``rented'' by other service providers. However, edge cloud service providers can have different objectives or \emph{utilities}. Therefore, there is a need for an efficient and effective mechanism to share resources among service providers, while considering the different objectives of various providers. We model resource sharing as a multi-objective optimization problem and present a solution framework based on \emph{Cooperative Game Theory} (CGT). We consider the strategy where each service provider allocates resources to its native applications first and shares the remaining resources with applications from other service providers. We prove that for a monotonic, non-decreasing utility function, the game is canonical and convex. Hence, the \emph{core} is not empty and the grand coalition is stable. We propose two algorithms \emph{Game-theoretic Pareto optimal allocation} (GPOA) and \emph{Polyandrous-Polygamous Matching based Pareto Optimal Allocation} (PPMPOA) that provide allocations from the core. Hence the obtained allocations are \emph{Pareto} optimal and the grand coalition of all the service providers is stable. Experimental results confirm that our proposed resource sharing framework improves utilities of edge cloud-service providers and application request satisfaction.

preprint2020arXiv

Overcoming Noisy and Irrelevant Data in Federated Learning

Many image and vision applications require a large amount of data for model training. Collecting all such data at a central location can be challenging due to data privacy and communication bandwidth restrictions. Federated learning is an effective way of training a machine learning model in a distributed manner from local data collected by client devices, which does not require exchanging the raw data among clients. A challenge is that among the large variety of data collected at each client, it is likely that only a subset is relevant for a learning task while the rest of data has a negative impact on model training. Therefore, before starting the learning process, it is important to select the subset of data that is relevant to the given federated learning task. In this paper, we propose a method for distributedly selecting relevant data, where we use a benchmark model trained on a small benchmark dataset that is task-specific, to evaluate the relevance of individual data samples at each client and select the data with sufficiently high relevance. Then, each client only uses the selected subset of its data in the federated learning process. The effectiveness of our proposed approach is evaluated on multiple real-world image datasets in a simulated system with a large number of clients, showing up to $25\%$ improvement in model accuracy compared to training with all data.

preprint2020arXiv

Resource Allocation in One-dimensional Distributed Service Networks

We consider assignment policies that allocate resources to users, where both resources and users are located on a one-dimensional line. First, we consider unidirectional assignment policies that allocate resources only to users located to their left. We propose the Move to Right (MTR) policy, which scans from left to right assigning nearest rightmost available resource to a user, and contrast it to the Unidirectional Gale-Shapley (UGS) matching policy. While both these policies are optimal among all unidirectional policies, we show that they are equivalent with respect to the expected distance traveled by a request (request distance), although MTR is fairer. Moreover, we show that when user and resource locations are modeled by statistical point processes, and resources are allowed to satisfy more than one user, the spatial system under unidirectional policies can be mapped into bulk service queuing systems, thus allowing the application of a plethora of queuing theory results that yield closed form expressions. As we consider a case where different resources can satisfy different numbers of users, we also generate new results for bulk service queues. We also consider bidirectional policies where there are no directional restrictions on resource allocation and develop an algorithm for computing the optimal assignment which is more efficient than known algorithms in the literature when there are more resources than users. Finally, numerical evaluation of performance of unidirectional and bidirectional allocation schemes yields design guidelines beneficial for resource placement.

preprint2020arXiv

State Action Separable Reinforcement Learning

Reinforcement Learning (RL) based methods have seen their paramount successes in solving serial decision-making and control problems in recent years. For conventional RL formulations, Markov Decision Process (MDP) and state-action-value function are the basis for the problem modeling and policy evaluation. However, several challenging issues still remain. Among most cited issues, the enormity of state/action space is an important factor that causes inefficiency in accurately approximating the state-action-value function. We observe that although actions directly define the agents' behaviors, for many problems the next state after a state transition matters more than the action taken, in determining the return of such a state transition. In this regard, we propose a new learning paradigm, State Action Separable Reinforcement Learning (sasRL), wherein the action space is decoupled from the value function learning process for higher efficiency. Then, a light-weight transition model is learned to assist the agent to determine the action that triggers the associated state transition. In addition, our convergence analysis reveals that under certain conditions, the convergence time of sasRL is $O(T^{1/k})$, where $T$ is the convergence time for updating the value function in the MDP-based formulation and $k$ is a weighting factor. Experiments on several gaming scenarios show that sasRL outperforms state-of-the-art MDP-based RL algorithms by up to $75\%$.

preprint2016arXiv

Dynamic Service Placement for Mobile Micro-Clouds with Predicted Future Costs

Mobile micro-clouds are promising for enabling performance-critical cloud applications. However, one challenge therein is the dynamics at the network edge. In this paper, we study how to place service instances to cope with these dynamics, where multiple users and service instances coexist in the system. Our goal is to find the optimal placement (configuration) of instances to minimize the average cost over time, leveraging the ability of predicting future cost parameters with known accuracy. We first propose an offline algorithm that solves for the optimal configuration in a specific look-ahead time-window. Then, we propose an online approximation algorithm with polynomial time-complexity to find the placement in real-time whenever an instance arrives. We analytically show that the online algorithm is $O(1)$-competitive for a broad family of cost functions. Afterwards, the impact of prediction errors is considered and a method for finding the optimal look-ahead window size is proposed, which minimizes an upper bound of the average actual cost. The effectiveness of the proposed approach is evaluated by simulations with both synthetic and real-world (San Francisco taxi) user-mobility traces. The theoretical methodology used in this paper can potentially be applied to a larger class of dynamic resource allocation problems.

preprint2015arXiv

Mobility-Induced Service Migration in Mobile Micro-Clouds

Mobile micro-cloud is an emerging technology in distributed computing, which is aimed at providing seamless computing/data access to the edge of the network when a centralized service may suffer from poor connectivity and long latency. Different from the traditional cloud, a mobile micro-cloud is smaller and deployed closer to users, typically attached to a cellular basestation or wireless network access point. Due to the relatively small coverage area of each basestation or access point, when a user moves across areas covered by different basestations or access points which are attached to different micro-clouds, issues of service performance and service migration become important. In this paper, we consider such migration issues. We model the general problem as a Markov decision process (MDP), and show that, in the special case where the mobile user follows a one-dimensional asymmetric random walk mobility model, the optimal policy for service migration is a threshold policy. We obtain the analytical solution for the cost resulting from arbitrary thresholds, and then propose an algorithm for finding the optimal thresholds. The proposed algorithm is more efficient than standard mechanisms for solving MDPs.

preprint2015arXiv

Network Capability in Localizing Node Failures via End-to-end Path Measurements

We investigate the capability of localizing node failures in communication networks from binary states (normal/failed) of end-to-end paths. Given a set of nodes of interest, uniquely localizing failures within this set requires that different observable path states associate with different node failure events. However, this condition is difficult to test on large networks due to the need to enumerate all possible node failures. Our first contribution is a set of sufficient/necessary conditions for identifying a bounded number of failures within an arbitrary node set that can be tested in polynomial time. In addition to network topology and locations of monitors, our conditions also incorporate constraints imposed by the probing mechanism used. We consider three probing mechanisms that differ according to whether measurement paths are (i) arbitrarily controllable, (ii) controllable but cycle-free, or (iii) uncontrollable (determined by the default routing protocol). Our second contribution is to quantify the capability of failure localization through (1) the maximum number of failures (anywhere in the network) such that failures within a given node set can be uniquely localized, and (2) the largest node set within which failures can be uniquely localized under a given bound on the total number of failures. Both measures in (1-2) can be converted into functions of a per-node property, which can be computed efficiently based on the above sufficient/necessary conditions. We demonstrate how measures (1-2) proposed for quantifying failure localization capability can be used to evaluate the impact of various parameters, including topology, number of monitors, and probing mechanisms.

preprint2012arXiv

A Methodology for Studying VANET Performance with Practical Vehicle Distribution in Urban Environment

In a Vehicular Ad-hoc Network (VANET), the amount of interference from neighboring nodes to a communication link is governed by the vehicle density dynamics in vicinity and transmission probabilities of terminals. It is obvious that vehicles are distributed non-homogeneously along a road segment due to traffic controls and speed limits at different portions of the road. The common assumption of homogeneous node distribution in the network in most of the previous work in mobile ad-hoc networks thus appears to be inappropriate in VANETs. In light of the inadequacy, we present in this paper an original methodology to study the performance of VANETs with practical vehicle distribution in urban environment. Specifically, we introduce the stochastic traffic model to characterize the general vehicular traffic flow as well as the randomness of individual vehicles, from which we can acquire the mean dynamics and the probability distribution of vehicular density. As illustrative examples, we demonstrate how the density knowledge from the stochastic traffic model can be utilized to derive the throughput and progress performance of three routing strategies in different channel access protocols. We confirm the accuracy of the analytical results through extensive simulations. Our results demonstrate the applicability of the proposed methodology on modeling protocol performance, and shed insight into the performance analysis of other transmission protocols and network configurations in vehicular networks. Furthermore, we illustrate that the optimal transmission probability for optimized network performance can be obtained as a function of the location space from our results. Such information can be computed by road-side nodes and then broadcasted to road users for optimized multi-hop packet transmission in the communication network.

preprint2010arXiv

On the Universality of Sequential Slotted Amplify and Forward Strategy in Cooperative Communications

While cooperative communication has many benefits and is expected to play an important role in future wireless networks, many challenges are still unsolved. Previous research has developed different relaying strategies for cooperative multiple access channels (CMA), cooperative multiple relay channels (CMR) and cooperative broadcast channels (CBC). However, there lacks a unifying strategy that is universally optimal for these three classical channel models. Sequential slotted amplify and forward (SSAF) strategy was previously proposed to achieve the optimal diversity and multiplexing tradeoff (DMT) for CMR. In this paper, the use of SSAF strategy is extended to CBC and CMA, and its optimality for both of them is shown. For CBC, a CBC-SSAF strategy is proposed which can asymptotically achieve the DMT upper bound when the number of cooperative users is large. For CMA, a CMA-SSAF strategy is proposed which even can exactly achieve the DMT upper bound with any number of cooperative users. In this way, SSAF strategy is shown to be universally optimal for all these three classical channel models and has great potential to provide universal optimality for wireless cooperative networks.

preprint2010arXiv

Wireless Network Coding with Imperfect Overhearing

Not only is network coding essential to achieve the capacity of a single-session multicast network, it can also help to improve the throughput of wireless networks with multiple unicast sessions when overheard information is available. Most previous research aimed at realizing such improvement by using perfectly overheard information, while in practice, especially for wireless networks, overheard information is often imperfect. To date, it is unclear whether network coding should still be used in such situations with imperfect overhearing. In this paper, a simple but ubiquitous wireless network model with two unicast sessions is used to investigate this problem. From the diversity and multiplexing tradeoff perspective, it is proved that even when overheard information is imperfect, network coding can still help to improve the overall system performance. This result implies that network coding should be used actively regardless of the reception quality of overheard information.

Kin K. Leung

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Adaptive Resource Orchestration for Distributed Quantum Computing Systems

Model Pruning Enables Efficient Federated Learning on Edge Devices

Adaptive Gradient Sparsification for Efficient Federated Learning: An Online Learning Approach

Additive Link Metrics Identification: Proof of Selected Lemmas and Propositions

Energy-Efficient Resource Management for Federated Edge Learning with CPU-GPU Heterogeneous Computing

Fast-Fourier-Forecasting Resource Utilisation in Distributed Systems

Let's Share: A Game-Theoretic Framework for Resource Sharing in Mobile Edge Clouds

Overcoming Noisy and Irrelevant Data in Federated Learning

Resource Allocation in One-dimensional Distributed Service Networks

State Action Separable Reinforcement Learning

Dynamic Service Placement for Mobile Micro-Clouds with Predicted Future Costs

Mobility-Induced Service Migration in Mobile Micro-Clouds

Network Capability in Localizing Node Failures via End-to-end Path Measurements

A Methodology for Studying VANET Performance with Practical Vehicle Distribution in Urban Environment

On the Universality of Sequential Slotted Amplify and Forward Strategy in Cooperative Communications

Wireless Network Coding with Imperfect Overhearing