Researcher profile

Ryan Chard

Ryan Chard contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

Linking Scientific Instruments and HPC: Patterns, Technologies, Experiences

Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Such online analyses require methods for configuring and running high-performance distributed computing pipelines--what we call flows--linking instruments, HPC (e.g., for analysis, simulation, AI model training), edge computing (for analysis), data stores, metadata catalogs, and high-speed networks. In this article, we review common patterns associated with such flows and describe methods for instantiating those patterns. We also present experiences with the application of these methods to the processing of data from five different scientific instruments, each of which engages HPC resources for data inversion, machine learning model training, or other purposes. We also discuss implications of these new methods for operators and users of scientific facilities.

preprint2022arXiv

Real-Time Streaming and Event-driven Control of Scientific Experiments

Advancements in scientific instrument sensors and connected devices provide unprecedented insight into ongoing experiments and present new opportunities for control, optimization, and steering. However, the diversity of sensors and heterogeneity of their data result in make it challenging to fully realize these new opportunities. Organizing and synthesizing diverse data streams in near-real-time requires both rich automation and Machine Learning (ML). To efficiently utilize ML during an experiment, the entire ML lifecycle must be addressed, including refining experiment configurations, retraining models, and applying decisions-tasks that require an equally diverse array of computational resources spanning centralized HPC to the accelerators at the edge. Here we present the Manufacturing Data and Machine Learning platform (MDML). The MDML is designed to standardize the research and operational environment for advanced data analytics and ML-enabled automated process optimization by providing the cyberinfrastructure to integrate sensor data streams and AI in cyber-physical systems for in-situ analysis. To achieve this, the MDML provides a fabric to receive and aggregate IoT data and simultaneously orchestrate remote computation across the computing continuum. In this paper we describe the MDML and show how it is used in advanced manufacturing to act on IoT data and orchestrate distributed ML to guide experiments.

preprint2022arXiv

Ultrafast Focus Detection for Automated Microscopy

Technological advancements in modern scientific instruments, such as scanning electron microscopes (SEMs), have significantly increased data acquisition rates and image resolutions enabling new questions to be explored; however, the resulting data volumes and velocities, combined with automated experiments, are quickly overwhelming scientists as there remain crucial steps that require human intervention, for example reviewing image focus. We present a fast out-of-focus detection algorithm for electron microscopy images collected serially and demonstrate that it can be used to provide near-real-time quality control for neuroscience workflows. Our technique, \textit{Multi-scale Histologic Feature Detection}, adapts classical computer vision techniques and is based on detecting various fine-grained histologic features. We exploit the inherent parallelism in the technique to employ GPU primitives in order to accelerate characterization. We show that our method can detect of out-of-focus conditions within just 20ms. To make these capabilities generally available, we deploy our feature detector as an on-demand service and show that it can be used to determine the degree of focus in approximately 230ms, enabling near-real-time use.

preprint2020arXiv

funcX: A Federated Function Serving Fabric for Science

Exploding data volumes and velocities, new computational methods and platforms, and ubiquitous connectivity demand new approaches to computation in the sciences. These new approaches must enable computation to be mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), be offloaded to specialized accelerators, or run remotely where resources are available. They also require new design approaches in which monolithic applications can be decomposed into smaller components, that may in turn be executed separately and on the most suitable resources. To address these needs we present funcX---a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. funcX's endpoint software can transform existing clouds, clusters, and supercomputers into function serving systems, while funcX's cloud-hosted service provides transparent, secure, and reliable function execution across a federated ecosystem of endpoints. We motivate the need for funcX with several scientific case studies, present our prototype design and implementation, show optimizations that deliver throughput in excess of 1 million functions per second, and demonstrate, via experiments on two supercomputers, that funcX can scale to more than more than 130000 concurrent workers.

preprint2020arXiv

Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release

Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of small molecules. As a contribution to that effort, we are aggregating numerous small molecules from a variety of sources, using high-performance computing (HPC) to computer diverse properties of those molecules, using the computed properties to train ML/AI models, and then using the resulting models for screening. In this first data release, we make available 23 datasets collected from community sources representing over 4.2 B molecules enriched with pre-computed: 1) molecular fingerprints to aid similarity searches, 2) 2D images of molecules to enable exploration and application of image-based deep learning methods, and 3) 2D and 3D molecular descriptors to speed development of machine learning models. This data release encompasses structural information on the 4.2 B molecules and 60 TB of pre-computed data. Future releases will expand the data to include more detailed molecular simulations, computed models, and other products.

preprint2020arXiv

The Manufacturing Data and Machine Learning Platform: Enabling Real-time Monitoring and Control of Scientific Experiments via IoT

IoT devices and sensor networks present new opportunities for measuring, monitoring, and guiding scientific experiments. Sensors, cameras, and instruments can be combined to provide previously unachievable insights into the state of ongoing experiments. However, IoT devices can vary greatly in the type, volume, and velocity of data they generate, making it challenging to fully realize this potential. Indeed, synergizing diverse IoT data streams in near-real time can require the use of machine learning (ML). In addition, new tools and technologies are required to facilitate the collection, aggregation, and manipulation of sensor data in order to simplify the application of ML models and in turn, fully realize the utility of IoT devices in laboratories. Here we will demonstrate how the use of the Argonne-developed Manufacturing Data and Machine Learning (MDML) platform can analyze and use IoT devices in a manufacturing experiment. MDML is designed to standardize the research and operational environment for advanced data analytics and AI-enabled automated process optimization by providing the infrastructure to integrate AI in cyber-physical systems for in situ analysis. We will show that MDML is capable of processing diverse IoT data streams, using multiple computing resources, and integrating ML models to guide an experiment.

preprint2019arXiv

A Data Ecosystem to Support Machine Learning in Materials Science

Facilitating the application of machine learning to materials science problems will require enhancing the data ecosystem to enable discovery and collection of data from many sources, automated dissemination of new data across the ecosystem, and the connecting of data with materials-specific machine learning models. Here, we present two projects, the Materials Data Facility (MDF) and the Data and Learning Hub for Science (DLHub), that address these needs. We use examples to show how MDF and DLHub capabilities can be leveraged to link data with machine learning models and how users can access those capabilities through web and programmatic interfaces.