Researcher profile

Ewa Deelman

Ewa Deelman contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

(POSTER) From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications

Scientists increasingly rely on sensor-based data; however transforming raw streams into insights across the edge-to-cloud continuum remains difficult due to the breadth of expertise required to coordinate the necessary data and computation flow. This paper introduces a pattern-based, AI-assisted methodology for rapid development of sensor-driven applications. Using Pegasus workflows executing on the FABRIC testbed, we demonstrate a 5-step development loop that shifts workflow construction and deployment from code-first to intent-first design. Starting from an existing Orcasound hydrophone workflow as a reusable template, we generate and refine workflows for air quality, earthquake, and soil moisture monitoring applications. We further show how these workflows extend to edge resources-including BlueField-3 DPUs and Raspberry Pis-through configuration and placement rather than workflow redesign. Our evaluation, from the perspective of a novice Pegasus user, shows that AI-assisted pattern reuse compresses multi-stage workflow development to 1-1.5 days per workflow while preserving the rigor and portability of workflow-based execution.

preprint2026arXiv

From Sensors to Insight: Rapid, Edge-to-Core Application Development for Sensor-Driven Applications

Scientists increasingly rely on sensor-based data, yet transforming raw streams into insights across the edge-to-cloud continuum remains difficult. Provisioning heterogeneous infrastructure and managing execution on emerging platforms like Data Processing Units typically requires cross-domain expertise, creating significant barriers to rapid prototyping. This paper introduces an experience-driven methodology for the rapid development of sensor-driven applications. By combining pattern-based workflow engineering with AI-assisted development-implemented via Pegasus on the FABRIC testbed - we utilize an existing Orcasound hydrophone workflow as a reusable template. We introduce a pattern-based engineering methodology to generate and refine workflows for air quality, earthquake, and soil moisture monitoring. Furthermore, we show how these abstract structures are extended to edge resources through modular configuration and placement. Our evaluation focuses on user productivity and practical lessons rather than peak performance. Through these case studies, we illustrate how AI-assisted, pattern-based development lowers the entry barrier for non-experts and enables iterative exploration of sensor-driven applications across distributed infrastructures.

preprint2022arXiv

Co-scheduling Ensembles of In Situ Workflows

Molecular dynamics (MD) simulations are widely used to study large-scale molecular systems. HPC systems are ideal platforms to run these studies, however, reaching the necessary simulation timescale to detect rare processes is challenging, even with modern supercomputers. To overcome the timescale limitation, the simulation of a long MD trajectory is replaced by multiple short-range simulations that are executed simultaneously in an ensemble of simulations. Analyses are usually co-scheduled with these simulations to efficiently process large volumes of data generated by the simulations at runtime, thanks to in situ techniques. Executing a workflow ensemble of simulations and their in situ analyses requires efficient co-scheduling strategies and sophisticated management of computational resources so that they are not slowing down each other. In this paper, we propose an efficient method to co-schedule simulations and in situ analyses such that the makespan of the workflow ensemble is minimized. We present a novel approach to allocate resources for a workflow ensemble under resource constraints by using a theoretical framework modeling the workflow ensemble's execution. We evaluate the proposed approach using an accurate simulator based on the WRENCH simulation framework on various workflow ensemble configurations. Results demonstrate the significance of co-scheduling simulations and in situ analyses that couple data together to benefit from data locality, in which inefficient scheduling decisions can lead up to a factor 30 slowdown in makespan.

preprint2022arXiv

Data Integrity Error Localization in Networked Systems with Missing Data

Most recent network failure diagnosis systems focused on data center networks where complex measurement systems can be deployed to derive routing information and ensure network coverage in order to achieve accurate and fast fault localization. In this paper, we target wide-area networks that support data-intensive distributed applications. We first present a new multi-output prediction model that directly maps the application level observations to localize the system component failures. In reality, this application-centric approach may face the missing data challenge as some input (feature) data to the inference models may be missing due to incomplete or lost measurements in wide area networks. We show that the presented prediction model naturally allows the {\it multivariate} imputation to recover the missing data. We evaluate multiple imputation algorithms and show that the prediction performance can be improved significantly in a large-scale network. As far as we know, this is the first study on the missing data issue and applying imputation techniques in network failure localization.

preprint2022arXiv

Reproducibility of the First Image of a Black Hole in the Galaxy M87 from the Event Horizon Telescope (EHT) Collaboration

This paper presents an interdisciplinary effort aiming to develop and share sustainable knowledge necessary to analyze, understand, and use published scientific results to advance reproducibility in multi-messenger astrophysics. Specifically, we target the breakthrough work associated with the generation of the first image of a black hole, called M87. The image was computed by the Event Horizon Telescope Collaboration. Based on the artifacts made available by EHT, we deliver documentation, code, and a computational environment to reproduce the first image of a black hole. Our deliverables support new discovery in multi-messenger astrophysics by providing all the necessary tools for generalizing methods and findings from the EHT use case. Challenges encountered during the reproducibility of EHT results are reported. The result of our effort is an open-source, containerized software package that enables the public to reproduce the first image of a black hole in the galaxy M87.

preprint2021arXiv

Blueprint: Cyberinfrastructure Center of Excellence

In 2018, NSF funded an effort to pilot a Cyberinfrastructure Center of Excellence (CI CoE or Center) that would serve the cyberinfrastructure (CI) needs of the NSF Major Facilities (MFs) and large projects with advanced CI architectures. The goal of the CI CoE Pilot project (Pilot) effort was to develop a model and a blueprint for such a CoE by engaging with the MFs, understanding their CI needs, understanding the contributions the MFs are making to the CI community, and exploring opportunities for building a broader CI community. This document summarizes the results of community engagements conducted during the first two years of the project and describes the identified CI needs of the MFs. To better understand MFs' CI, the Pilot has developed and validated a model of the MF data lifecycle that follows the data generation and management within a facility and gained an understanding of how this model captures the fundamental stages that the facilities' data passes through from the scientific instruments to the principal investigators and their teams, to the broader collaborations and the public. The Pilot also aimed to understand what CI workforce development challenges the MFs face while designing, constructing, and operating their CI and what solutions they are exploring and adopting within their projects. Based on the needs of the MFs in the data lifecycle and workforce development areas, this document outlines a blueprint for a CI CoE that will learn about and share the CI solutions designed, developed, and/or adopted by the MFs, provide expertise to the largest NSF projects with advanced and complex CI architectures, and foster a community of CI practitioners and researchers.

preprint2021arXiv

Reproducing GW150914: the first observation of gravitational waves from a binary black hole merger

In 2016, LIGO and Virgo announced the first observation of gravitational waves from a binary black hole merger, known as GW150914. To establish the confidence of this detection, large-scale scientific workflows were used to measure the event's statistical significance. They used code written by the LIGO/Virgo and were executed on the LIGO Data Grid. The codes are publicly available, but there has not yet been an attempt to directly reproduce the results, although several analyses have replicated the analysis, confirming the detection. We attempt to reproduce the result presented in the GW150914 discovery paper using publicly available code on the Open Science Grid. We show that we can reproduce the main result but we cannot exactly reproduce the LIGO analysis as the original data set used is not public. We discuss the challenges we encountered and make recommendations for scientists who wish to make their work reproducible.

preprint2021arXiv

SIM-SITU: A Framework for the Faithful Simulation of in-situ Workflows

The amount of data generated by numerical simulations in various scientific domains such as molecular dynamics, climate modeling, biology, or astrophysics, led to a fundamental redesign of application workflows. The throughput and the capacity of storage subsystems have not evolved as fast as the computing power in extreme-scale supercomputers. As a result, the classical post-hoc analysis of simulation outputs became highly inefficient. In-situ workflows have then emerged as a solution in which simulation and data analytics are intertwined through shared computing resources, thus lower latencies. Determining the best allocation, i.e., how many resources to allocate to each component of an in-situ workflow; and mapping, i.e., where and at which frequency to run the data analytics component, is a complex task whose performance assessment is crucial to the efficient execution of in-situ workflows. However, such a performance evaluation of different allocation and mapping strategies usually relies either on directly running them on the targeted execution environments, which can rapidly become extremely time-and resource-consuming, or on resorting to the simulation of simplified models of the components of an in-situ workflow, which can lack of realism. In both cases, the validity of the performance evaluation is limited. To address this issue, we introduce SIM-SITU, a framework for the faithful simulation of in-situ workflows. This framework builds on the SimGrid toolkit and benefits of several important features of this versatile simulation tool. We designed SIM-SITU to reflect the typical structure of in-situ workflows and thanks to its modular design, SIM-SITU has the necessary flexibility to easily and faithfully evaluate the behavior and performance of various allocation and mapping strategies for in-situ workflows. We illustrate the simulation capabilities of SIM-SITU on a Molecular Dynamics use case. We study the impact of different allocation and mapping strategies on performance and show how users can leverage SIM-SITU to determine interesting tradeoffs when designing their in-situ workflow.

preprint2020arXiv

WorkflowHub: Community Framework for Enabling Scientific Workflow Research and Development -- Technical Report

Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed at heterogeneous, distributed resources. The workflow research and development community has employed a number of methods for the quantitative evaluation of existing and novel workflow algorithms and systems. In particular, a common approach is to simulate workflow executions. In previous work, we have presented a collection of tools that have been used for aiding research and development activities in the Pegasus project, and that have been adopted by others for conducting workflow research. Despite their popularity, there are several shortcomings that prevent easy adoption, maintenance, and consistency with the evolving structures and computational requirements of production workflows. In this work, we present WorkflowHub, a community framework that provides a collection of tools for analyzing workflow execution traces, producing realistic synthetic workflow traces, and simulating workflow executions. We demonstrate the realism of the generated synthetic traces by comparing simulated executions of these traces with actual workflow executions. We also contrast these results with those obtained when using the previously available collection of tools. We find that our framework not only can be used to generate representative synthetic workflow traces (i.e., with workflow structures and task characteristics distributions that resembles those in traces obtained from real-world workflow executions), but can also generate representative workflow traces at larger scales than that of available workflow traces.

preprint2010arXiv

Metadata and provenance management

Scientists today collect, analyze, and generate TeraBytes and PetaBytes of data. These data are often shared and further processed and analyzed among collaborators. In order to facilitate sharing and data interpretations, data need to carry with it metadata about how the data was collected or generated, and provenance information about how the data was processed. This chapter describes metadata and provenance in the context of the data lifecycle. It also gives an overview of the approaches to metadata and provenance management, followed by examples of how applications use metadata and provenance in their scientific processes.

preprint2010arXiv

Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking

Montage is a portable software toolkit for constructing custom, science-grade mosaics by composing multiple astronomical images. The mosaics constructed by Montage preserve the astrometry (position) and photometry (intensity) of the sources in the input images. The mosaic to be constructed is specified by the user in terms of a set of parameters, including dataset and wavelength to be used, location and size on the sky, coordinate system and projection, and spatial sampling rate. Many astronomical datasets are massive, and are stored in distributed archives that are, in most cases, remote with respect to the available computational resources. Montage can be run on both single- and multi-processor computers, including clusters and grids. Standard grid tools are used to run Montage in the case where the data or computers used to construct a mosaic are located remotely on the Internet. This paper describes the architecture, algorithms, and usage of Montage as both a software toolkit and as a grid portal. Timing results are provided to show how Montage performance scales with number of processors on a cluster computer. In addition, we compare the performance of two methods of running Montage in parallel on a grid.

preprint2010arXiv

Pipeline-Centric Provenance Model

In this paper we propose a new provenance model which is tailored to a class of workflow-based applications. We motivate the approach with use cases from the astronomy community. We generalize the class of applications the approach is relevant to and propose a pipeline-centric provenance model. Finally, we evaluate the benefits in terms of storage needed by the approach when applied to an astronomy application.

preprint2010arXiv

The Role of Provenance Management in Accelerating the Rate of Astronomical Research

The availability of vast quantities of data through electronic archives has transformed astronomical research. It has also enabled the creation of new products, models and simulations, often from distributed input data and models, that are themselves made electronically available. These products will only provide maximal long-term value to astronomers when accompanied by records of their provenance; that is, records of the data and processes used in the creation of such products. We use the creation of image mosaics with the Montage grid-enabled mosaic engine to emphasize the necessity of provenance management and to understand the science requirements that higher-level products impose on provenance management technologies. We describe experiments with one technology, the "Provenance Aware Service Oriented Architecture" (PASOA), that stores provenance information at each step in the computation of a mosaic. The results inform the technical specifications of provenance management systems, including the need for extensible systems built on common standards. Finally, we describe examples of provenance management technology emerging from the fields of geophysics and oceanography that have applicability to astronomy applications.