Researcher profile

Blesson Varghese

Blesson Varghese contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2024arXiv

EcoFed: Efficient Communication for DNN Partitioning-based Federated Learning

Efficiently running federated learning (FL) on resource-constrained devices is challenging since they are required to train computationally intensive deep neural networks (DNN) independently. DNN partitioning-based FL (DPFL) has been proposed as one mechanism to accelerate training where the layers of a DNN (or computation) are offloaded from the device to the server. However, this creates significant communication overheads since the intermediate activation and gradient need to be transferred between the device and the server during training. While current research reduces the communication introduced by DNN partitioning using local loss-based methods, we demonstrate that these methods are ineffective in improving the overall efficiency (communication overhead and training speed) of a DPFL system. This is because they suffer from accuracy degradation and ignore the communication costs incurred when transferring the activation from the device to the server. This article proposes EcoFed - a communication efficient framework for DPFL systems. EcoFed eliminates the transmission of the gradient by developing pre-trained initialization of the DNN model on the device for the first time. This reduces the accuracy degradation seen in local loss-based methods. In addition, EcoFed proposes a novel replay buffer mechanism and implements a quantization-based compression technique to reduce the transmission of the activation. It is experimentally demonstrated that EcoFed can reduce the communication cost by up to 133x and accelerate training by up to 21x when compared to classic FL. Compared to vanilla DPFL, EcoFed achieves a 16x communication reduction and 2.86x training time speed-up. EcoFed is available from https://github.com/blessonvar/EcoFed.

preprint2022arXiv

FedAdapt: Adaptive Offloading for IoT Devices in Federated Learning

Applying Federated Learning (FL) on Internet-of-Things devices is necessitated by the large volumes of data they produce and growing concerns of data privacy. However, there are three challenges that need to be addressed to make FL efficient: (i) execution on devices with limited computational capabilities, (ii) accounting for stragglers due to computational heterogeneity of devices, and (iii) adaptation to the changing network bandwidths. This paper presents FedAdapt, an adaptive offloading FL framework to mitigate the aforementioned challenges. FedAdapt accelerates local training in computationally constrained devices by leveraging layer offloading of deep neural networks (DNNs) to servers. Further, FedAdapt adopts reinforcement learning based optimization and clustering to adaptively identify which layers of the DNN should be offloaded for each individual device on to a server to tackle the challenges of computational heterogeneity and changing network bandwidth. Experimental studies are carried out on a lab-based testbed and it is demonstrated that by offloading a DNN from the device to the server FedAdapt reduces the training time of a typical IoT device by over half compared to classic FL. The training time of extreme stragglers and the overall training time can be reduced by up to 57%. Furthermore, with changing network bandwidth, FedAdapt is demonstrated to reduce the training time by up to 40% when compared to classic FL, without sacrificing accuracy.

preprint2022arXiv

FedComm: Understanding Communication Protocols for Edge-based Federated Learning

Federated learning (FL) trains machine learning (ML) models on devices using locally generated data and exchanges models without transferring raw data to a distant server. This exchange incurs a communication overhead and impacts the performance of FL training. There is limited understanding of how communication protocols specifically contribute to the performance of FL. Such an understanding is essential for selecting the right communication protocol when designing an FL system. This paper presents FedComm, a benchmarking methodology to quantify the impact of optimized application layer protocols, namely Message Queue Telemetry Transport (MQTT), Advanced Message Queuing Protocol (AMQP), and ZeroMQ Message Transport Protocol (ZMTP), and non-optimized application layer protocols, namely as TCP and UDP, on the performance of FL. FedComm measures the overall performance of FL in terms of communication time and accuracy under varying computational and network stress and packet loss rates. Experiments on a lab-based testbed demonstrate that TCP outperforms UDP as a non-optimized application layer protocol with higher accuracy and shorter communication times for 4G and Wi-Fi networks. Optimized application layer protocols such as AMQP, MQTT, and ZMTP outperformed non-optimized application layer protocols in most network conditions, resulting in a 2.5x reduction in communication time compared to TCP while maintaining accuracy. The experimental results enable us to highlight a number of open research issues for further investigation. FedComm is available for download from https://github.com/qub-blesson/FedComm.

preprint2022arXiv

FedFly: Towards Migration in Edge-based Distributed Federated Learning

Federated learning (FL) is a privacy-preserving distributed machine learning technique that trains models while keeping all the original data generated on devices locally. Since devices may be resource constrained, offloading can be used to improve FL performance by transferring computational workload from devices to edge servers. However, due to mobility, devices participating in FL may leave the network during training and need to connect to a different edge server. This is challenging because the offloaded computations from edge server need to be migrated. In line with this assertion, we present FedFly, which is, to the best of our knowledge, the first work to migrate a deep neural network (DNN) when devices move between edge servers during FL training. Our empirical results on the CIFAR10 dataset, with both balanced and imbalanced data distribution, support our claims that FedFly can reduce training time by up to 33% when a device moves after 50% of the training is completed, and by up to 45% when 90% of the training is completed when compared to state-of-the-art offloading approach in FL. FedFly has negligible overhead of up to two seconds and does not compromise accuracy. Finally, we highlight a number of open research issues for further investigation. FedFly can be downloaded from https://github.com/qub-blesson/FedFly.

preprint2021arXiv

AVEC: Accelerator Virtualization in Cloud-Edge Computing for Deep Learning Libraries

Edge computing offers the distinct advantage of harnessing compute capabilities on resources located at the edge of the network to run workloads of relatively weak user devices. This is achieved by offloading computationally intensive workloads, such as deep learning from user devices to the edge. Using the edge reduces the overall communication latency of applications as workloads can be processed closer to where data is generated on user devices rather than sending them to geographically distant clouds. Specialised hardware accelerators, such as Graphics Processing Units (GPUs) available in the cloud-edge network can enhance the performance of computationally intensive workloads that are offloaded from devices on to the edge. The underlying approach required to facilitate this is virtualization of GPUs. This paper therefore sets out to investigate the potential of GPU accelerator virtualization to improve the performance of deep learning workloads in a cloud-edge environment. The AVEC accelerator virtualization framework is proposed that incurs minimum overheads and requires no source-code modification of the workload. AVEC intercepts local calls to a GPU on a device and forwards them to an edge resource seamlessly. The feasibility of AVEC is demonstrated on a real-world application, namely OpenPose using the Caffe deep learning library. It is observed that on a lab-based experimental test-bed AVEC delivers up to 7.48x speedup despite communication overheads incurred due to data transfers.

preprint2020arXiv

Context-aware Distribution of Fog Applications Using Deep Reinforcement Learning

Fog computing is an emerging paradigm that aims to meet the increasing computation demands arising from the billions of devices connected to the Internet. Offloading services of an application from the Cloud to the edge of the network can improve the overall Quality-of-Service (QoS) of the application since it can process data closer to user devices. Diverse Fog nodes ranging from Wi-Fi routers to mini-clouds with varying resource capabilities makes it challenging to determine which services of an application need to be offloaded. In this paper, a context-aware mechanism for distributing applications across the Cloud and the Fog is proposed. The mechanism dynamically generates (re)deployment plans for the application to maximise the performance efficiency of the application by taking the QoS and running costs into account. The mechanism relies on deep Q-networks to generate a distribution plan without prior knowledge of the available resources on the Fog node, the network condition and the application. The feasibility of the proposed context-aware distribution mechanism is demonstrated on two use-cases, namely a face detection application and a location-based mobile game. The benefits are increased utility of dynamic distribution in both use cases, when compared to a static distribution approach used in existing research.

preprint2020arXiv

Cross Architectural Power Modelling

Existing power modelling research focuses on the model rather than the process for developing models. An automated power modelling process that can be deployed on different processors for developing power models with high accuracy is developed. For this, (i) an automated hardware performance counter selection method that selects counters best correlated to power on both ARM and Intel processors, (ii) a noise filter based on clustering that can reduce the mean error in power models, and (iii) a two stage power model that surmounts challenges in using existing power models across multiple architectures are proposed and developed. The key results are: (i) the automated hardware performance counter selection method achieves comparable selection to the manual method reported in the literature, (ii) the noise filter reduces the mean error in power models by up to 55%, and (iii) the two stage power model can predict dynamic power with less than 8% error on both ARM and Intel processors, which is an improvement over classic models.

preprint2020arXiv

DYVERSE: DYnamic VERtical Scaling in Multi-tenant Edge Environments

Multi-tenancy in resource-constrained environments is a key challenge in Edge computing. In this paper, we develop 'DYVERSE: DYnamic VERtical Scaling in Edge' environments, which is the first light-weight and dynamic vertical scaling mechanism for managing resources allocated to applications for facilitating multi-tenancy in Edge environments. To enable dynamic vertical scaling, one static and three dynamic priority management approaches that are workload-aware, community-aware and system-aware, respectively are proposed. This research advocates that dynamic vertical scaling and priority management approaches reduce Service Level Objective (SLO) violation rates. An online-game and a face detection workload in a Cloud-Edge test-bed are used to validate the research. The merits of DYVERSE is that there is only a sub-second overhead per Edge server when 32 Edge servers are deployed on a single Edge node. When compared to executing applications on the Edge servers without dynamic vertical scaling, static priorities and dynamic priorities reduce SLO violation rates of requests by up to 4% and 12% for the online game, respectively, and in both cases 6% for the face detection workload. Moreover, for both workloads, the system-aware dynamic vertical scaling method effectively reduces the latency of non-violated requests, when compared to other methods.

preprint2020arXiv

Modelling Fog Offloading Performance

Fog computing has emerged as a computing paradigm aimed at addressing the issues of latency, bandwidth and privacy when mobile devices are communicating with remote cloud services. The concept is to offload compute services closer to the data. However many challenges exist in the realisation of this approach. During offloading, (part of) the application underpinned by the services may be unavailable, which the user will experience as down time. This paper describes work aimed at building models to allow prediction of such down time based on metrics (operational data) of the underlying and surrounding infrastructure. Such prediction would be invaluable in the context of automated Fog offloading and adaptive decision making in Fog orchestration. Models that cater for four container-based stateless and stateful offload techniques, namely Save and Load, Export and Import, Push and Pull and Live Migration, are built using four (linear and non-linear) regression techniques. Experimental results comprising over 42 million data points from multiple lab-based Fog infrastructure are presented. The results highlight that reasonably accurate predictions (measured by the coefficient of determination for regression models, mean absolute percentage error, and mean absolute error) may be obtained when considering 25 metrics relevant to the infrastructure.

preprint2020arXiv

Priority-based Fair Scheduling in Edge Computing

Scheduling is important in Edge computing. In contrast to the Cloud, Edge resources are hardware limited and cannot support workload-driven infrastructure scaling. Hence, resource allocation and scheduling for the Edge requires a fresh perspective. Existing Edge scheduling research assumes availability of all needed resources whenever a job request is made. This paper challenges that assumption, since not all job requests from a Cloud server can be scheduled on an Edge node. Thus, guaranteeing fairness among the clients (Cloud servers offloading jobs) while accounting for priorities of the jobs becomes a critical task. This paper presents four scheduling techniques, the first is a naive first come first serve strategy and further proposes three strategies, namely a client fair, priority fair, and hybrid that accounts for the fairness of both clients and job priorities. An evaluation on a target platform under three different scenarios, namely equal, random, and Gaussian job distributions is presented. The experimental studies highlight the low overheads and the distribution of scheduled jobs on the Edge node when compared to the naive strategy. The results confirm the superior performance of the hybrid strategy and showcase the feasibility of fair schedulers for Edge computing.

preprint2020arXiv

WattsApp: Power-Aware Container Scheduling

Containers are becoming a popular workload deployment mechanism in modern distributed systems. However, there are limited software-based methods (hardware-based methods are expensive requiring hardware level changes) for obtaining the power consumed by containers for facilitating power-aware container scheduling, an essential activity for efficient management of distributed systems. This paper presents WattsApp, a tool underpinned by a six step software-based method for power-aware container scheduling to minimize power cap violations on a server. The proposed method relies on a neural network-based power estimation model and a power capped container scheduling technique. Experimental studies are pursued in a lab-based environment on 10 benchmarks deployed on Intel and ARM processors. The results highlight that the power estimation model has negligible overheads for data collection - nearly 90% of all data samples can be estimated with less than a 10% error, and the Mean Absolute Percentage Error (MAPE) is less than 6%. The power-aware scheduling of WattsApp is more effective than Intel's Running Power Average Limit (RAPL) based power capping for both single and multiple containers as it does not degrade the performance of all containers running on the server. The results confirm the feasibility of WattsApp.

preprint2018arXiv

Resource Management in Fog/Edge Computing: A Survey

Contrary to using distant and centralized cloud data center resources, employing decentralized resources at the edge of a network for processing data closer to user devices, such as smartphones and tablets, is an upcoming computing paradigm, referred to as fog/edge computing. Fog/edge resources are typically resource-constrained, heterogeneous, and dynamic compared to the cloud, thereby making resource management an important challenge that needs to be addressed. This article reviews publications as early as 1991, with 85% of the publications between 2013-2018, to identify and classify the architectures, infrastructure, and underlying algorithms for managing resources in fog/edge computing.