Researcher profile

Adam Barker

Adam Barker contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
18works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

18 published item(s)

preprint2020arXiv

Benchmarking and Performance Modelling of MapReduce Communication Pattern

Understanding and predicting the performance of big data applications running in the cloud or on-premises could help minimise the overall cost of operations and provide opportunities in efforts to identify performance bottlenecks. The complexity of the low-level internals of big data frameworks and the ubiquity of application and workload configuration parameters makes it challenging and expensive to come up with comprehensive performance modelling solutions. In this paper, instead of focusing on a wide range of configurable parameters, we studied the low-level internals of the MapReduce communication pattern and used a minimal set of performance drivers to develop a set of phase level parametric models for approximating the execution time of a given application on a given cluster. Models can be used to infer the performance of unseen applications and approximate their performance when an arbitrary dataset is used as input. Our approach is validated by running empirical experiments in two setups. On average the error rate in both setups is plus or minus 10% from the measured values.

preprint2016arXiv

A Linked Data Scalability Challenge: Concept Reuse Leads to Semantic Decay

The increasing amount of available Linked Data resources is laying the foundations for more advanced Semantic Web applications. One of their main limitations, however, remains the general low level of data quality. In this paper we focus on a measure of quality which is negatively affected by the increase of the available resources. We propose a measure of semantic richness of Linked Data concepts and we demonstrate our hypothesis that the more a concept is reused, the less semantically rich it becomes. This is a significant scalability issue, as one of the core aspects of Linked Data is the propagation of semantic information on the Web by reusing common terms. We prove our hypothesis with respect to our measure of semantic richness and we validate our model empirically. Finally, we suggest possible future directions to address this scalability problem.

preprint2016arXiv

Integrating Know-How into the Linked Data Cloud

This paper presents the first framework for integrating procedural knowledge, or "know-how", into the Linked Data Cloud. Know-how available on the Web, such as step-by-step instructions, is largely unstructured and isolated from other sources of online knowledge. To overcome these limitations, we propose extending to procedural knowledge the benefits that Linked Data has already brought to representing, retrieving and reusing declarative knowledge. We describe a framework for representing generic know-how as Linked Data and for automatically acquiring this representation from existing resources on the Web. This system also allows the automatic generation of links between different know-how resources, and between those resources and other online knowledge bases, such as DBpedia. We discuss the results of applying this framework to a real-world scenario and we show how it outperforms existing manual community-driven integration efforts.

preprint2015arXiv

Autonomous Fault Detection in Self-Healing Systems using Restricted Boltzmann Machines

Autonomously detecting and recovering from faults is one approach for reducing the operational complexity and costs associated with managing computing environments. We present a novel methodology for autonomously generating investigation leads that help identify systems faults, and extends our previous work in this area by leveraging Restricted Boltzmann Machines (RBMs) and contrastive divergence learning to analyse changes in historical feature data. This allows us to heuristically identify the root cause of a fault, and demonstrate an improvement to the state of the art by showing feature data can be predicted heuristically beyond a single instance to include entire sequences of information.

preprint2015arXiv

Executing Bag of Distributed Tasks on Virtually Unlimited Cloud Resources

Bag-of-Distributed-Tasks (BoDT) application is the collection of identical and independent tasks each of which requires a piece of input data located around the world. As a result, Cloud computing offers an ef- fective way to execute BoT application as it not only consists of multiple geographically distributed data centres but also allows a user to pay for what she actually uses only. In this paper, BoDT on the Cloud using virtually unlimited cloud resources. A heuristic algorithm is proposed to find an execution plan that takes budget constraints into account. Compared with other approaches, with the same given budget, our algorithm is able to reduce the overall execution time up to 50%.

preprint2014arXiv

A Semantic Web of Know-How: Linked Data for Community-Centric Tasks

This paper proposes a novel framework for representing community know-how on the Semantic Web. Procedural knowledge generated by web communities typically takes the form of natural language instructions or videos and is largely unstructured. The absence of semantic structure impedes the deployment of many useful applications, in particular the ability to discover and integrate know-how automatically. We discuss the characteristics of community know-how and argue that existing knowledge representation frameworks fail to represent it adequately. We present a novel framework for representing the semantic structure of community know-how and demonstrate the feasibility of our approach by providing a concrete implementation which includes a method for automatically acquiring procedural knowledge for real-world tasks.

preprint2014arXiv

Academic Cloud Computing Research: Five Pitfalls and Five Opportunities

This discussion paper argues that there are five fundamental pitfalls, which can restrict academics from conducting cloud computing research at the infrastructure level, which is currently where the vast majority of academic research lies. Instead academics should be conducting higher risk research, in order to gain understanding and open up entirely new areas. We call for a renewed mindset and argue that academic research should focus less upon physical infrastructure and embrace the abstractions provided by clouds through five opportunities: user driven research, new programming models, PaaS environments, and improved tools to support elasticity and large-scale debugging. The objective of this paper is to foster discussion, and to define a roadmap forward, which will allow academia to make longer-term impacts to the cloud computing community.

preprint2014arXiv

Are Clouds Ready to Accelerate Ad hoc Financial Simulations?

Applications employed in the financial services industry to capture and estimate a variety of risk metrics are underpinned by stochastic simulations which are data, memory and computationally intensive. Many of these simulations are routinely performed on production-based computing systems. Ad hoc simulations in addition to routine simulations are required to obtain up-to-date views of risk metrics. Such simulations are currently not performed as they cannot be accommodated on production clusters, which are typically over committed resources. Scalable, on-demand and pay-as-you go Virtual Machines (VMs) offered by the cloud are a potential platform to satisfy the data, memory and computational constraints of the simulation. However, "Are clouds ready to accelerate ad hoc financial simulations?" The research reported in this paper aims to experimentally verify this question by developing and deploying an important financial simulation, referred to as 'Aggregate Risk Analysis' on the cloud. Parallel techniques to improve efficiency and performance of the simulations are explored. Challenges such as accommodating large input data on limited memory VMs and rapidly processing data for real-time use are surmounted. The key result of this investigation is that Aggregate Risk Analysis can be accommodated on cloud VMs. Acceleration of up to 24x using multiple hardware accelerators over the implementation on a single accelerator, 6x over a multiple core implementation and approximately 60x over a baseline implementation was achieved on the cloud. However, computational time is wasted for every dollar spent on the cloud due to poor acceleration over multiple virtual cores. Interestingly, private VMs can offer better performance than public VMs on comparable underlying hardware.

preprint2014arXiv

Executing Bag of Distributed Tasks on the Cloud: Investigating the Trade-offs Between Performance and Cost

Bag of Distributed Tasks (BoDT) can benefit from decentralised execution on the Cloud. However, there is a trade-off between the performance that can be achieved by employing a large number of Cloud VMs for the tasks and the monetary constraints that are often placed by a user. The research reported in this paper is motivated towards investigating this trade-off so that an optimal plan for deploying BoDT applications on the cloud can be generated. A heuristic algorithm, which considers the user's preference of performance and cost is proposed and implemented. The feasibility of the algorithm is demonstrated by generating execution plans for a sample application. The key result is that the algorithm generates optimal execution plans for the application over 91\% of the time.

preprint2014arXiv

Location, Location, Location: Data-Intensive Distributed Computing in the Cloud

When orchestrating highly distributed and data-intensive Web service workflows the geographical placement of the orchestration engine can greatly affect the overall performance of a workflow. Orchestration engines are typically run from within an organisations' network, and may have to transfer data across long geographical distances, which in turn increases execution time and degrades the overall performance of a workflow. In this paper we present CloudForecast: a Web service framework and analysis tool which given a workflow specification, computes the optimal Amazon EC2 Cloud region to automatically deploy the orchestration engine and execute the workflow. We use geographical distance of the workflow, network latency and HTTP round-trip time between Amazon Cloud regions and the workflow nodes to find a ranking of Cloud regions. This combined set of simple metrics effectively predicts where the workflow orchestration engine should be deployed in order to reduce overall execution time. We evaluate our approach by executing randomly generated data-intensive workflows deployed on the PlanetLab platform in order to rank Amazon EC2 Cloud regions. Our experimental results show that our proposed optimisation strategy, depending on the particular workflow, can speed up execution time on average by 82.25% compared to local execution. We also show that the standard deviation of execution time is reduced by an average of almost 65% using the optimisation strategy.

preprint2014arXiv

Optimal Deployment of Geographically Distributed Workflow Engines on the Cloud

When orchestrating Web service workflows, the geographical placement of the orchestration engine(s) can greatly affect workflow performance. Data may have to be transferred across long geographical distances, which in turn increases execution time and degrades the overall performance of a workflow. In this paper, we present a framework that, given a DAG-based workflow specification, computes the op- timal Amazon EC2 cloud regions to deploy the orchestration engines and execute a workflow. The framework incorporates a constraint model that solves the workflow deployment problem, which is generated using an automated constraint modelling system. The feasibility of the framework is evaluated by executing different sample workflows representative of sci- entific workloads. The experimental results indicate that the framework reduces the workflow execution time and provides a speed up of 1.3x-2.5x over centralised approaches.

preprint2014arXiv

Uncovering the Perfect Place: Optimising Workflow Engine Deployment in the Cloud

When orchestrating highly distributed and data-intensive Web service workflows the geographical placement of the orchestration engine can greatly affect the overall performance of a workflow. We present CloudForecast: a Web service framework and analysis tool which, given a workflow specification, computes the optimal Amazon EC2 Cloud region to automatically deploy the orchestration engine and execute the workflow. We use geographical distance of the workflow, network latency and HTTP round-trip time between Amazon Cloud regions and the workflow nodes to find a ranking of Cloud regions. This overall ranking predicts where the workflow orchestration engine should be deployed in order to reduce overall execution time. Our experimental results show that our proposed optimisation strategy, depending on the particular workflow, can speed up execution time on average by 82.25% compared to local execution.

preprint2013arXiv

A Cloud Computing Survey: Developments and Future Trends in Infrastructure as a Service Computing

Cloud computing is a recent paradigm based around the notion of delivery of resources via a service model over the Internet. Despite being a new paradigm of computation, cloud computing owes its origins to a number of previous paradigms. The term cloud computing is well defined and no longer merits rigorous taxonomies to furnish a definition. Instead this survey paper considers the past, present and future of cloud computing. As an evolution of previous paradigms, we consider the predecessors to cloud computing and what significance they still hold to cloud services. Additionally we examine the technologies which comprise cloud computing and how the challenges and future developments of these technologies will influence the field. Finally we examine the challenges that limit the growth, application and development of cloud computing and suggest directions required to overcome these challenges in order to further the success of cloud computing.

preprint2013arXiv

A Dataflow Language for Decentralised Orchestration of Web Service Workflows

Orchestrating centralised service-oriented workflows presents significant scalability challenges that include: the consumption of network bandwidth, degradation of performance, and single points of failure. This paper presents a high-level dataflow specification language that attempts to address these scalability challenges. This language provides simple abstractions for orchestrating large-scale web service workflows, and separates between the workflow logic and its execution. It is based on a data-driven model that permits parallelism to improve the workflow performance. We provide a decentralised architecture that allows the computation logic to be moved "closer" to services involved in the workflow. This is achieved through partitioning the workflow specification into smaller fragments that may be sent to remote orchestration services for execution. The orchestration services rely on proxies that exploit connectivity to services in the workflow. These proxies perform service invocations and compositions on behalf of the orchestration services, and carry out data collection, retrieval, and mediation tasks. The evaluation of our architecture implementation concludes that our decentralised approach reduces the execution time of workflows, and scales accordingly with the increasing size of data sets.

preprint2013arXiv

An Architecture for Decentralised Orchestration of Web Service Workflows

Service-oriented workflows are typically executed using a centralised orchestration approach that presents significant scalability challenges. These challenges include the consumption of network bandwidth, degradation of performance, and single-points of failure. We provide a decentralised orchestration architecture that attempts to address these challenges. Our architecture adopts a design model that permits the computation to be moved "closer" to services in a workflow. This is achieved by partitioning workflows specified using our simple dataflow language into smaller fragments, which may be sent to remote locations for execution.

preprint2013arXiv

Monitoring Large-Scale Cloud Systems with Layered Gossip Protocols

Monitoring is an essential aspect of maintaining and developing computer systems that increases in difficulty proportional to the size of the system. The need for robust monitoring tools has become more evident with the advent of cloud computing. Infrastructure as a Service (IaaS) clouds allow end users to deploy vast numbers of virtual machines as part of dynamic and transient architectures. Current monitoring solutions, including many of those in the open-source domain rely on outdated concepts including manual deployment and configuration, centralised data collection and adapt poorly to membership churn. In this paper we propose the development of a cloud monitoring suite to provide scalable and robust lookup, data collection and analysis services for large-scale cloud systems. In lieu of centrally managed monitoring we propose a multi-tier architecture using a layered gossip protocol to aggregate monitoring information and facilitate lookup, information collection and the identification of redundant capacity. This allows for a resource aware data collection and storage architecture that operates over the system being monitored. This in turn enables monitoring to be done in-situ without the need for significant additional infrastructure to facilitate monitoring services. We evaluate this approach against alternative monitoring paradigms and demonstrate how our solution is well adapted to usage in a cloud-computing context.

preprint2013arXiv

RBioCloud: A Light-weight Framework for Bioconductor and R-based Jobs on the Cloud

Large-scale ad hoc analytics of genomic data is popular using the R-programming language supported by 671 software packages provided by Bioconductor. More recently, analytical jobs are benefitting from on-demand computing and storage, their scalability and their low maintenance cost, all of which are offered by the cloud. While Biologists and Bioinformaticists can take an analytical job and execute it on their personal workstations, it remains challenging to seamlessly execute the job on the cloud infrastructure without extensive knowledge of the cloud dashboard. How analytical jobs can not only with minimum effort be executed on the cloud, but also how both the resources and data required by the job can be managed is explored in this paper. An open-source light-weight framework for executing R-scripts using Bioconductor packages, referred to as `RBioCloud', is designed and developed. RBioCloud offers a set of simple command-line tools for managing the cloud resources, the data and the execution of the job. Three biological test cases validate the feasibility of RBioCloud. The framework is publicly available from http://www.rbiocloud.com.

preprint2013arXiv

Undefined By Data: A Survey of Big Data Definitions

The term big data has become ubiquitous. Owing to a shared origin between academia, industry and the media there is no single unified definition, and various stakeholders provide diverse and often contradictory definitions. The lack of a consistent definition introduces ambiguity and hampers discourse relating to big data. This short paper attempts to collate the various definitions which have gained some degree of traction and to furnish a clear and concise definition of an otherwise ambiguous term.