Researcher profile

Daniel S. Katz

Daniel S. Katz contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2024arXiv

Software Citation in HEP: Current State and Recommendations for the Future

In November 2022, the HEP Software Foundation and the Institute for Research and Innovation for Software in High-Energy Physics organized a workshop on the topic of Software Citation and Recognition in HEP. The goal of the workshop was to bring together different types of stakeholders whose roles relate to software citation, and the associated credit it provides, in order to engage the community in a discussion on: the ways HEP experiments handle citation of software, recognition for software efforts that enable physics results disseminated to the public, and how the scholarly publishing ecosystem supports these activities. Reports were given from the publication board leadership of the ATLAS, CMS, and LHCb experiments and HEP open source software community organizations (ROOT, Scikit-HEP, MCnet), and perspectives were given from publishers (Elsevier, JOSS) and related tool providers (INSPIRE, Zenodo). This paper summarizes key findings and recommendations from the workshop as presented at the 26th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2023).

preprint2023arXiv

FAIR AI Models in High Energy Physics

The findable, accessible, interoperable, and reusable (FAIR) data principles provide a framework for examining, evaluating, and improving how data is shared to facilitate scientific discovery. Generalizing these principles to research software and other digital products is an active area of research. Machine learning (ML) models -- algorithms that have been trained on data without being explicitly programmed -- and more generally, artificial intelligence (AI) models, are an important target for this because of the ever-increasing pace with which AI is transforming scientific domains, such as experimental high energy physics (HEP). In this paper, we propose a practical definition of FAIR principles for AI models in HEP and describe a template for the application of these principles. We demonstrate the template's use with an example AI model applied to HEP, in which a graph neural network is used to identify Higgs bosons decaying to two bottom quarks. We report on the robustness of this FAIR AI model, its portability across hardware architectures and software frameworks, and its interpretability.

preprint2022arXiv

A FAIR and AI-ready Higgs boson decay dataset

To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We use additional available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to visualize and explore this dataset. This study marks the first in a planned series of articles that will guide scientists in the creation of FAIR AI models and datasets in high energy particle physics.

preprint2022arXiv

Extended Abstract: Productive Parallel Programming with Parsl

Parsl is a parallel programming library for Python that aims to make it easy to specify parallelism in programs and to realize that parallelism on arbitrary parallel and distributed computing systems. Parsl relies on developers annotating Python functions-wrapping either Python or external applications-to indicate that these functions may be executed concurrently. Developers can then link together functions via the exchange of data. Parsl establishes a dynamic dependency graph and sends tasks for execution on connected resources when dependencies are resolved. Parsl's runtime system enables different compute resources to be used, from laptops to supercomputers, without modification to the Parsl program.

preprint2021arXiv

A Community Roadmap for Scientific Workflows Research and Development

The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds of seemingly equivalent workflow systems, many isolated research claims, and a steep learning curve. To address some of these challenges and lay the groundwork for transforming workflows research and development, the WorkflowsRI and ExaWorks projects partnered to bring the international workflows community together. This paper reports on discussions and findings from two virtual "Workflows Community Summits" (January and April, 2021). The overarching goals of these workshops were to develop a view of the state of the art, identify crucial research challenges in the workflows community, articulate a vision for potential community efforts, and discuss technical approaches for realizing this vision. To this end, participants identified six broad themes: FAIR computational workflows; AI workflows; exascale challenges; APIs, interoperability, reuse, and standards; training and education; and building a workflows community. We summarize discussions and recommendations for each of these themes.

preprint2021arXiv

A Fresh Look at FAIR for Research Software

This document captures the discussion and deliberation of the FAIR for Research Software (FAIR4RS) subgroup that took a fresh look at the applicability of the FAIR Guiding Principles for scientific data management and stewardship for research software. We discuss the vision of research software as ideally reproducible, open, usable, recognized, sustained and robust, and then review both the characteristic and practiced differences of research software and data. This vision and understanding of initial conditions serves as a backdrop for an attempt at translating and interpreting the guiding principles to more fully align with research software. We have found that many of the principles remained relatively intact as written, as long as considerable interpretation was provided. This was particularly the case for the "Findable" and "Accessible" foundational principles. We found that "Interoperability" and "Reusability" are particularly prone to a broad and sometimes opposing set of interpretations as written. We propose two new principles modeled on existing ones, and provide modified guiding text for these principles to help clarify our final interpretation. A series of gaps in translation were captured during this process, and these remain to be addressed. We finish with a consideration of where these translated principles fall short of the vision laid out in the opening.

preprint2018arXiv

A Roadmap for HEP Software and Computing R&D for the 2020s

Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.