Source author record

Richard McClatchey

Richard McClatchey appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Software Engineering Distributed, Parallel, and Cluster Computing cs.CY physics.comp-ph Computational Engineering, Finance, and Science physics.data-an physics.ins-det

Catalog footprint

What is connected

26works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2015arXiv

Analysis Traceability and Provenance for HEP

This paper presents the use of the CRISTAL software in the N4U project. CRISTAL was used to create a set of provenance aware analysis tools for the Neuroscience domain. This paper advocates that the approach taken in N4U to build the analysis suite is sufficiently generic to be able to be applied to the HEP domain. A mapping to the PROV model for provenance interoperability is also presented and how this can be applied to the HEP domain for the interoperability of HEP analyses.

preprint2015arXiv

Designing Traceability into Big Data Systems

Providing an appropriate level of accessibility and traceability to data or process elements (so-called Items) in large volumes of data, often Cloud-resident, is an essential requirement in the Big Data era. Enterprise-wide data systems need to be designed from the outset to support usage of such Items across the spectrum of business use rather than from any specific application view. The design philosophy advocated in this paper is to drive the design process using a so-called description-driven approach which enriches models with meta-data and description and focuses the design process on Item re-use, thereby promoting traceability. Details are given of the description-driven design of big data systems at CERN, in health informatics and in business process management. Evidence is presented that the approach leads to design simplicity and consequent ease of management thanks to loose typing and the adoption of a unified approach to Item management and usage.

preprint2015arXiv

Development of a Large-scale Neuroimages and Clinical Variables Data Atlas in the neuGRID4You (N4U) project

Exceptional growth in the availability of large-scale clinical imaging datasets has led to the development of computational infrastructures offering scientists access to image repositories and associated clinical variables data. The EU FP7 neuGRID and its follow on neuGRID4You (N4U) project is a leading e-Infrastructure where neuroscientists can find core services and resources for brain image analysis. The core component of this e-Infrastructure is the N4U Virtual Laboratory, which offers an easy access for neuroscientists to a wide range of datasets and algorithms, pipelines, computational resources, services, and associated support services. The foundation of this virtual laboratory is a massive data store plus information services called the Data Atlas that stores datasets, clinical study data, data dictionaries, algorithm/pipeline definitions, and provides interfaces for parameterised querying so that neuroscientists can perform analyses on required datasets. This paper presents the overall design and development of the Data Atlas, its associated datasets and indexing and a set of retrieval services that originated from the development of the N4U Virtual Laboratory in the EU FP7 N4U project in the light of user requirements.

preprint2015arXiv

Position Paper: Provenance Data Visualisation for Neuroimaging Analysis

Visualisation facilitates the understanding of scientific data both through exploration and explanation of visualised data. Provenance contributes to the understanding of data by containing the contributing factors behind a result. With the significant increase in data volumes and algorithm complexity, clinical researchers are struggling with information tracking, analysis reproducibility and the verification of scientific output. Data coming from various heterogeneous sources (multiple sources with varying level of trust) in a collaborative environment adds to the uncertainty of the scientific output. Systems are required that offer provenance data capture and visualisation support for analyses. We present an account for the need to visualise provenance information in order to aid the process of verification of scientific outputs, comparison of analyses,progression and evolution of results for neuroimaging analysis.

preprint2015arXiv

Scientific Workflow Repeatability through Cloud-Aware Provenance

The transformations, analyses and interpretations of data in scientific workflows are vital for the repeatability and reliability of scientific workflows. This provenance of scientific workflows has been effectively carried out in Grid based scientific workflow systems. However, recent adoption of Cloud-based scientific workflows present an opportunity to investigate the suitability of existing approaches or propose new approaches to collect provenance information from the Cloud and to utilize it for workflow repeatability in the Cloud infrastructure. The dynamic nature of the Cloud in comparison to the Grid makes it difficult because resources are provisioned on-demand unlike the Grid. This paper presents a novel approach that can assist in mitigating this challenge. This approach can collect Cloud infrastructure information along with workflow provenance and can establish a mapping between them. This mapping is later used to re-provision resources on the Cloud. The repeatability of the workflow execution is performed by: (a) capturing the Cloud infrastructure information (virtual machine configuration) along with the workflow provenance, and (b) re-provisioning the similar resources on the Cloud and re-executing the workflow on them. The evaluation of an initial prototype suggests that the proposed approach is feasible and can be investigated further.

preprint2015arXiv

Traceability and Provenance in Big Data Medical Systems

Providing an appropriate level of accessibility to and tracking of data or process elements in large volumes of medical data, is an essential requirement in the Big Data era. Researchers require systems that provide traceability of information through provenance data capture and management to support their clinical analyses. We present an approach that has been adopted in the neuGRID and N4U projects, which aimed to provide detailed traceability to support research analysis processes in the study of biomarkers for Alzheimers disease, but is generically applicable across medical systems. To facilitate the orchestration of complex, large-scale analyses in these projects we have adapted CRISTAL, a workflow and provenance tracking solution. The use of CRISTAL has provided a rich environment for neuroscientists to track and manage the evolution of data and workflow usage over time in neuGRID and N4U.

preprint2015arXiv

Using Cloud-Aware Provenance to Reproduce Scientific Workflow Execution on Cloud

Provenance has been thought of a mechanism to verify a workflow and to provide workflow reproducibility. This provenance of scientific workflows has been effectively carried out in Grid based scientific workflow systems. However, recent adoption of Cloud-based scientific workflows present an opportunity to investigate the suitability of existing approaches or propose new approaches to collect provenance information from the Cloud and to utilize it for workflow repeatability in the Cloud infrastructure. This paper presents a novel approach that can assist in mitigating this challenge. This approach can collect Cloud infrastructure information from an outside Cloud client along with workflow provenance and can establish a mapping between them. This mapping is later used to re-provision resources on the Cloud for workflow execution. The reproducibility of the workflow execution is performed by: (a) capturing the Cloud infrastructure information (virtual machine configuration) along with the workflow provenance, (b) re-provisioning the similar resources on the Cloud and re-executing the workflow on them and (c) by comparing the outputs of workflows. The evaluation of the prototype suggests that the proposed approach is feasible and can be investigated further. Moreover, there is no reference reproducibility model exists in literature that can provide guidelines to achieve this goal in Cloud. This paper also attempts to present a model that is used in the proposed design to achieve workflow reproducibility in the Cloud environment.

preprint2014arXiv

A Description Driven Approach for Flexible Metadata Tracking

Evolving user requirements presents a considerable software engineering challenge, all the more so in an environment where data will be stored for a very long time, and must remain usable as the system specification evolves around it. Capturing the description of the system addresses this issue since a description-driven approach enables new versions of data structures and processes to be created alongside the old, thereby providing a history of changes to the underlying data models and enabling the capture of provenance data. This description-driven approach is advocated in this paper in which a system called CRISTAL is presented. CRISTAL is based on description-driven principles; it can use previous versions of stored descriptions to define various versions of data which can be stored in various forms. To demonstrate the efficacy of this approach the history of the project at CERN is presented where CRISTAL was used to track data and process definitions and their associated provenance data in the construction of the CMS ECAL detector, how it was applied to handle analysis tracking and data index provenance in the neuGRID and N4U projects, and how it will be matured further in the CRISTAL-ISE project. We believe that the CRISTAL approach could be invaluable in handling the evolution, indexing and tracking of large datasets, and are keen to apply it further in this direction.

preprint2014arXiv

An Integrated e-science Analysis Base for Computation Neuroscience Experiments and Analysis

Recent developments in data management and imaging technologies have significantly affected diagnostic and extrapolative research in the understanding of neurodegenerative diseases. However, the impact of these new technologies is largely dependent on the speed and reliability with which the medical data can be visualised, analysed and interpreted. The EUs neuGRID for Users (N4U) is a follow-on project to neuGRID, which aims to provide an integrated environment to carry out computational neuroscience experiments. This paper reports on the design and development of the N4U Analysis Base and related Information Services, which addresses existing research and practical challenges by offering an integrated medical data analysis environment with the necessary building blocks for neuroscientists to optimally exploit neuroscience workflows, large image datasets and algorithms in order to conduct analyses. The N4U Analysis Base enables such analyses by indexing and interlinking the neuroimaging and clinical study datasets stored on the N4U Grid infrastructure, algorithms and scientific workflow definitions along with their associated provenance information.

preprint2014arXiv

CRISTAL : A Practical Study in Designing Systems to Cope with Change

Software engineers frequently face the challenge of developing systems whose requirements are likely to change in order to adapt to organizational reconfigurations or other external pressures. Evolving requirements present difficulties, especially in environments in which business agility demands shorter development times and responsive prototyping. This paper uses a study from CERN in Geneva to address these research questions by employing a description-driven approach that is responsive to changes in user requirements and that facilitates dynamic system reconfiguration. The study describes how handling descriptions of objects in practice alongside their instances (making the objects self-describing) can mediate the effects of evolving user requirements on system development. This paper reports on and draws lessons from the practical use of a description-driven system over time. It also identifies lessons that can be learned from adopting such a self-describing description-driven approach in future software development.

preprint2014arXiv

CRISTAL-ISE : Provenance Applied in Industry

This paper presents the CRISTAL-iSE project as a framework for the management of provenance information in industry. The project itself is a research collaboration between academia and industry. A key factor in the project is the use of a system known as CRISTAL which is a mature system based on proven description driven principles. A crucial element in the description driven approach is that the fact that objects (Items) are described at runtime enabling managed systems to be both dynamic and flexible. Another factor is the notion that all Items in CRISTAL are stored and versioned, therefore enabling a provenance collection system. In this paper a concrete application, called Agilium, is briefly described and a future application CIMAG-RA is presented which will harness the power of both CRISTAL and Agilium.

preprint2014arXiv

Data Management Challenges in Paediatric Information Systems

There is a compelling demand for the data integration and exploitation of heterogeneous biomedical information for improved clinical practice, medical research, and personalised healthcare across the EU. The area of paediatric information integration is particularly challenging since the patients physiology changes with growth and different aspects of health being regularly monitored over extended periods of time. Paediatricians require access to heterogeneous data sets, often collected in different locations with different apparatus and over extended timescales. Using a Grid platform originally developed for physics at CERN and a novel integrated semantic data model the Health-e-Child project has developed an integrated healthcare platform for European paediatrics, providing seamless integration of traditional and emerging sources of biomedical data. The long-term goal of the project was to provide uninhibited access to universal biomedical knowledge repositories for personalised and preventive healthcare, large-scale information-based biomedical research and training, and informed policy making. The project built a Grid-enabled european network of leading clinical centres that can share and annotate paediatric data, can validate systems clinically, and diffuse clinical excellence across Europe by setting up new technologies, clinical workflows, and standards. The Health-e-Child project highlights data management challenges for the future of European paediatric healthcare and is the subject of this chapter.

preprint2014arXiv

Designing Reusable Systems that Can Handle Change - Description-Driven Systems : Revisiting Object-Oriented Principles

In the age of the Cloud and so-called Big Data systems must be increasingly flexible, reconfigurable and adaptable to change in addition to being developed rapidly. As a consequence, designing systems to cater for evolution is becoming critical to their success. To be able to cope with change, systems must have the capability of reuse and the ability to adapt as and when necessary to changes in requirements. Allowing systems to be self-describing is one way to facilitate this. To address the issues of reuse in designing evolvable systems, this paper proposes a so-called description-driven approach to systems design. This approach enables new versions of data structures and processes to be created alongside the old, thereby providing a history of changes to the underlying data models and enabling the capture of provenance data. The efficacy of the description-driven approach is exemplified by the CRISTAL project. CRISTAL is based on description-driven design principles; it uses versions of stored descriptions to define various versions of data which can be stored in diverse forms. This paper discusses the need for capturing holistic system description when modelling large-scale distributed systems.

preprint2014arXiv

Model Driven Engineering for Science Gateways

From n-Tier client/server applications, to more complex academic Grids, or even the most recent and promising industrial Clouds, the last decade has witnessed significant developments in distributed computing. In spite of this conceptual heterogeneity, Service-Oriented Architectures (SOA) seem to have emerged as the common underlying abstraction paradigm. Suitable access to data and applications resident in SOAs via so-called Science Gateways has thus become a pressing need in various fields of science, in order to realize the benefits of Grid and Cloud infrastructures. In this context, authors have consolidated work from three complementary experiences in European projects, which have developed and deployed large-scale production quality infrastructures as Science Gateways to support research in breast cancer, paediatric diseases and neurodegenerative pathologies respectively. In analysing the requirements from these biomedical applications the authors were able to elaborate on commonly faced Grid development issues, while proposing an adaptable and extensible engineering framework for Science Gateways. This paper thus proposes the application of an architecture-centric Model-Driven Engineering (MDE) approach to service-oriented developments, making it possible to define Science Gateways that satisfy quality of service requirements, execution platform and distribution criteria at design time. An novel investigation is presented on the applicability of the resulting grid MDE (gMDE) to specific examples, and conclusions are drawn on the benefits of this approach and its possible application to other areas, in particular that of Distributed Computing Infrastructures (DCI) interoperability.

preprint2014arXiv

Research Traceability using Provenance Services for Biomedical Analysis

We outline the approach being developed in the neuGRID project to use provenance management techniques for the purposes of capturing and preserving the provenance data that emerges in the specification and execution of workflows in biomedical analyses. In the neuGRID project a provenance service has been designed and implemented that is intended to capture, store, retrieve and reconstruct the workflow information needed to facilitate users in conducting user analyses. We describe the architecture of the neuGRID provenance service and discuss how the CRISTAL system from CERN is being adapted to address the requirements of the project and then consider how a generalised approach for provenance management could emerge for more generic application to the (Health)Grid community.

preprint2014arXiv

The Case for Cloud Service Trustmarks and Assurance-as-a-Service

Cloud computing represents a significant economic opportunity for Europe. However, this growth is threatened by adoption barriers largely related to trust. This position paper examines trust and confidence issues in cloud computing and advances a case for addressing them through the implementation of a novel trustmark scheme for cloud service providers. The proposed trustmark would be both active and dynamic featuring multi-modal information about the performance of the underlying cloud service. The trustmarks would be informed by live performance data from the cloud service provider, or ideally an independent third-party accountability and assurance service that would communicate up-to-date information relating to service performance and dependability. By combining assurance measures with a remediation scheme, cloud service providers could both signal dependability to customers and the wider marketplace and provide customers, auditors and regulators with a mechanism for determining accountability in the event of failure or non-compliance. As a result, the trustmarks would convey to consumers of cloud services and other stakeholders that strong assurance and accountability measures are in place for the service in question and thereby address trust and confidence issues in cloud computing.

preprint2014arXiv

Towards Provenance and Traceability in CRISTAL for HEP

This paper discusses the CRISTAL object lifecycle management system and its use in provenance data management and the traceability of system events. This software was initially used to capture the construction and calibration of the CMS ECAL detector at CERN for later use by physicists in their data analysis. Some further uses of CRISTAL in different projects (CMS, neuGRID and N4U) are presented as examples of its flexible data model. From these examples, applications are drawn for the High Energy Physics domain and some initial ideas for its use in data preservation HEP are outlined in detail in this paper. Currently investigations are underway to gauge the feasibility of using the N4U Analysis Service or a derivative of it to address the requirements of data and analysis logging and provenance capture within the HEP long term data analysis environment.

preprint2012arXiv

A Fault Tolerant, Dynamic and Low Latency BDII Architecture for Grids

The current BDII model relies on information gathering from agents that run on each core node of a Grid. This information is then published into a Grid wide information resource known as Top BDII. The Top level BDIIs are updated typically in cycles of a few minutes each. A new BDDI architecture is proposed and described in this paper based on the hypothesis that only a few attribute values change in each BDDI information cycle and consequently it may not be necessary to update each parameter in a cycle. It has been demonstrated that significant performance gains can be achieved by exchanging only the information about records that changed during a cycle. Our investigations have led us to implement a low latency and fault tolerant BDII system that involves only minimal data transfer and facilitates secure transactions in a Grid environment.

preprint2012arXiv

An Architecture for Integrated Intelligence in Urban Management using Cloud Computing

With the emergence of new methodologies and technologies it has now become possible to manage large amounts of environmental sensing data and apply new integrated computing models to acquire information intelligence. This paper advocates the application of cloud capacity to support the information, communication and decision making needs of a wide variety of stakeholders in the complex business of the management of urban and regional development. The complexity lies in the interactions and impacts embodied in the concept of the urban-ecosystem at various governance levels. This highlights the need for more effective integrated environmental management systems. This paper offers a user-orientated approach based on requirements for an effective management of the urban-ecosystem and the potential contributions that can be supported by the cloud computing community. Furthermore, the commonality of the influence of the drivers of change at the urban level offers the opportunity for the cloud computing community to develop generic solutions that can serve the needs of hundreds of cities from Europe and indeed globally.

preprint2012arXiv

CMS Workflow Execution using Intelligent Job Scheduling and Data Access Strategies

Complex scientific workflows can process large amounts of data using thousands of tasks. The turnaround times of these workflows are often affected by various latencies such as the resource discovery, scheduling and data access latencies for the individual workflow processes or actors. Minimizing these latencies will improve the overall execution time of a workflow and thus lead to a more efficient and robust processing environment. In this paper, we propose a pilot job based infrastructure that has intelligent data reuse and job execution strategies to minimize the scheduling, queuing, execution and data access latencies. The results have shown that significant improvements in the overall turnaround time of a workflow can be achieved with this approach. The proposed approach has been evaluated, first using the CMS Tier0 data processing workflow, and then simulating the workflows to evaluate its effectiveness in a controlled environment.

preprint2012arXiv

Context-Aware Service Utilisation in the Clouds and Energy Conservation

Ubiquitous computing environments are characterised by smart, interconnected artefacts embedded in our physical world that are projected to provide useful services to human inhabitants unobtrusively. Mobile devices are becoming the primary tools of human interaction with these embedded artefacts and utilisation of services available in smart computing environments such as clouds. Advancements in capabilities of mobile devices allow a number of user and environment related context consumers to be hosted on these devices. Without a coordinating component, these context consumers and providers are a potential burden on device resources; specifically the effect of uncoordinated computation and communication with cloud-enabled services can negatively impact the battery life. Therefore energy conservation is a major concern in realising the collaboration and utilisation of mobile device based context-aware applications and cloud based services. This paper presents the concept of a context-brokering component to aid in coordination and communication of context information between mobile devices and services deployed in a cloud infrastructure. A prototype context broker is experimentally analysed for effects on energy conservation when accessing and coordinating with cloud services on a smart device, with results signifying reduction in energy consumption.

preprint2012arXiv

Research Traceability using Provenance Services for Biomedical Analysis

preprint2012arXiv

Reusable Services from the neuGRID Project for Grid-Based Health Applications

By abstracting Grid middleware specific considerations from clinical research applications, re-usable services should be developed that will provide generic functionality aimed specifically at medical applications. In the scope of the neuGRID project, generic services are being designed and developed which will be applied to satisfy the requirements of neuroscientists. These services will bring together sources of data and computing elements into a single view as far as applications are concerned, making it possible to cope with centralised, distributed or hybrid data and provide native support for common medical file formats. Services will include querying, provenance, portal, anonymization and pipeline services together with a 'glueing' service for connection to Grid services. Thus lower-level services will hide the peculiarities of any specific Grid technology from upper layers, provide application independence and will enable the selection of 'fit-for-purpose' infrastructures. This paper outlines the design strategy being followed in neuGRID using the glueing and pipeline services as examples.

preprint2012arXiv

Risk-Driven Compliant Access Controls for Clouds

There is widespread agreement that cloud computing have proven cost cutting and agility benefits. However, security and regulatory compliance issues are continuing to challenge the wide acceptance of such technology both from social and commercial stakeholders. An important facture behind this is the fact that clouds and in particular public clouds are usually deployed and used within broad geographical or even international domains. This implies that the exchange of private and other protected data within the cloud environment would be governed by multiple jurisdictions. These jurisdictions have a great degree of harmonisation; however, they present possible conflicts that are hard to negotiate at run time. So far, important efforts were played in order to deal with regulatory compliance management for large distributed systems. However, measurable solutions are required for the context of cloud. In this position paper, we are suggesting an approach that starts with a conceptual model of explicit regulatory requirements for exchanging private data on a multijurisdictional environment and build on it in order to define metrics for non-compliance or, in other terms, risks to compliance. These metrics will be integrated within usual data access-control policies and will be checked at policy analysis time before a decision to allow/deny the data access is made.

preprint2006arXiv

From Grid Middleware to a Grid Operating System

Grid computing has made substantial advances during the last decade. Grid middleware such as Globus has contributed greatly in making this possible. There are, however, significant barriers to the adoption of Grid computing in other fields, most notably day-to-day user computing environments. We will demonstrate in this paper that this is primarily due to the limitations of the existing Grid middleware which does not take into account the needs of everyday scientific and business users. In this paper we will formally advocate a Grid Operating System and propose an architecture to migrate Grid computing into a Grid operating system which we believe would help remove most of the technical barriers to the adoption of Grid computing and make it relevant to the day-to-day user. We believe this proposed transition to a Grid operating system will drive more pervasive Grid computing research and application development and deployment in future.

preprint2001arXiv

Querying Large Physics Data Sets Over an Information Grid

Optimising use of the Web (WWW) for LHC data analysis is a complex problem and illustrates the challenges arising from the integration of and computation across massive amounts of information distributed worldwide. Finding the right piece of information can, at times, be extremely time-consuming, if not impossible. So-called Grids have been proposed to facilitate LHC computing and many groups have embarked on studies of data replication, data migration and networking philosophies. Other aspects such as the role of 'middleware' for Grids are emerging as requiring research. This paper positions the need for appropriate middleware that enables users to resolve physics queries across massive data sets. It identifies the role of meta-data for query resolution and the importance of Information Grids for high-energy physics analysis rather than just Computational or Data Grids. This paper identifies software that is being implemented at CERN to enable the querying of very large collaborating HEP data-sets, initially being employed for the construction of CMS detectors.

Richard McClatchey

What is connected

Connect this record

See the researcher in context

Building this map preview

26 published item(s)

Analysis Traceability and Provenance for HEP

Designing Traceability into Big Data Systems

Development of a Large-scale Neuroimages and Clinical Variables Data Atlas in the neuGRID4You (N4U) project

Position Paper: Provenance Data Visualisation for Neuroimaging Analysis

Scientific Workflow Repeatability through Cloud-Aware Provenance

Traceability and Provenance in Big Data Medical Systems

Using Cloud-Aware Provenance to Reproduce Scientific Workflow Execution on Cloud

A Description Driven Approach for Flexible Metadata Tracking

An Integrated e-science Analysis Base for Computation Neuroscience Experiments and Analysis

CRISTAL : A Practical Study in Designing Systems to Cope with Change

CRISTAL-ISE : Provenance Applied in Industry

Data Management Challenges in Paediatric Information Systems

Designing Reusable Systems that Can Handle Change - Description-Driven Systems : Revisiting Object-Oriented Principles

Model Driven Engineering for Science Gateways

Research Traceability using Provenance Services for Biomedical Analysis

The Case for Cloud Service Trustmarks and Assurance-as-a-Service

Towards Provenance and Traceability in CRISTAL for HEP

A Fault Tolerant, Dynamic and Low Latency BDII Architecture for Grids

An Architecture for Integrated Intelligence in Urban Management using Cloud Computing

CMS Workflow Execution using Intelligent Job Scheduling and Data Access Strategies

Context-Aware Service Utilisation in the Clouds and Energy Conservation

Research Traceability using Provenance Services for Biomedical Analysis

Reusable Services from the neuGRID Project for Grid-Based Health Applications

Risk-Driven Compliant Access Controls for Clouds

From Grid Middleware to a Grid Operating System

Querying Large Physics Data Sets Over an Information Grid