Researcher profile

Henri Casanova

Henri Casanova contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2023arXiv

Report on 2023 CyberTraining PI Meeting, 26-27 September 2023

This document describes a two-day meeting held for the Principal Investigators (PIs) of NSF CyberTraining grants. The report covers invited talks, panels, and six breakout sessions. The meeting involved over 80 PIs and NSF program managers (PMs). The lessons recorded in detail in the report are a wealth of information that could help current and future PIs, as well as NSF PMs, understand the future directions suggested by the PI community. The meeting was held simultaneously with that of the PIs of the NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) program. This co-location led to two joint sessions: one with NSF speakers and the other on broader impact. Further, the joint poster and refreshment sessions benefited from the interactions between CSSI and CyberTraining PIs.

preprint2022arXiv

Workflows Community Summit: Tightening the Integration between Computing Facilities and Scientific Workflows

The importance of workflows is highlighted by the fact that they have underpinned some of the most significant discoveries of the past decades. Many of these workflows have significant computational, storage, and communication demands, and thus must execute on a range of large-scale computer systems, from local clusters to public clouds and upcoming exascale HPC platforms. Historically, infrastructures for workflow execution consisted of complex, integrated systems, developed in-house by workflow practitioners with strong dependencies on a range of legacy technologies. Due to the increasing need to support workflows, dedicated workflow systems were developed to provide abstractions for creating, executing, and adapting workflows conveniently and efficiently while ensuring portability. While these efforts are all worthwhile individually, there are now hundreds of independent workflow systems. The resulting workflow system technology landscape is fragmented, which may present significant barriers for future workflow users due to many seemingly comparable, yet usually mutually incompatible, systems that exist. In order to tackle some of these challenges, the DOE-funded ExaWorks and NSF-funded WorkflowsRI projects have organized in 2021 a series of events entitled the "Workflows Community Summit". The third edition of the ``Workflows Community Summit" explored workflows challenges and opportunities from the perspective of computing centers and facilities. The third summit brought together a small group of facilities representatives with the aim to understand how workflows are currently being used at each facility, how facilities would like to interact with workflow developers and users, how workflows fit with facility roadmaps, and what opportunities there are for tighter integration between facilities and workflows. More information at: https://workflowsri.org/summits/facilities/

preprint2021arXiv

A Community Roadmap for Scientific Workflows Research and Development

The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds of seemingly equivalent workflow systems, many isolated research claims, and a steep learning curve. To address some of these challenges and lay the groundwork for transforming workflows research and development, the WorkflowsRI and ExaWorks projects partnered to bring the international workflows community together. This paper reports on discussions and findings from two virtual "Workflows Community Summits" (January and April, 2021). The overarching goals of these workshops were to develop a view of the state of the art, identify crucial research challenges in the workflows community, articulate a vision for potential community efforts, and discuss technical approaches for realizing this vision. To this end, participants identified six broad themes: FAIR computational workflows; AI workflows; exascale challenges; APIs, interoperability, reuse, and standards; training and education; and building a workflows community. We summarize discussions and recommendations for each of these themes.

preprint2021arXiv

Modeling the Linux page cache for accurate simulation of data-intensive applications

The emergence of Big Data in recent years has resulted in a growing need for efficient data processing solutions. While infrastructures with sufficient compute power are available, the I/O bottleneck remains. The Linux page cache is an efficient approach to reduce I/O overheads, but few experimental studies of its interactions with Big Data applications exist, partly due to limitations of real-world experiments. Simulation is a popular approach to address these issues, however, existing simulation frameworks do not simulate page caching fully, or even at all. As a result, simulation-based performance studies of data-intensive applications lead to inaccurate results. In this paper, we propose an I/O simulation model that includes the key features of the Linux page cache. We have implemented this model as part of the WRENCH workflow simulation framework, which itself builds on the popular SimGrid distributed systems simulation framework. Our model and its implementation enable the simulation of both single-threaded and multithreaded applications, and of both writeback and writethrough caches for local or network-based filesystems. We evaluate the accuracy of our model in different conditions, including sequential and concurrent applications, as well as local and remote I/Os. We find that our page cache model reduces the simulation error by up to an order of magnitude when compared to state-of-the-art, cacheless simulations.

preprint2020arXiv

WorkflowHub: Community Framework for Enabling Scientific Workflow Research and Development -- Technical Report

Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed at heterogeneous, distributed resources. The workflow research and development community has employed a number of methods for the quantitative evaluation of existing and novel workflow algorithms and systems. In particular, a common approach is to simulate workflow executions. In previous work, we have presented a collection of tools that have been used for aiding research and development activities in the Pegasus project, and that have been adopted by others for conducting workflow research. Despite their popularity, there are several shortcomings that prevent easy adoption, maintenance, and consistency with the evolving structures and computational requirements of production workflows. In this work, we present WorkflowHub, a community framework that provides a collection of tools for analyzing workflow execution traces, producing realistic synthetic workflow traces, and simulating workflow executions. We demonstrate the realism of the generated synthetic traces by comparing simulated executions of these traces with actual workflow executions. We also contrast these results with those obtained when using the previously available collection of tools. We find that our framework not only can be used to generate representative synthetic workflow traces (i.e., with workflow structures and task characteristics distributions that resembles those in traces obtained from real-world workflow executions), but can also generate representative workflow traces at larger scales than that of available workflow traces.