Source author record

Yanfei Guo

Yanfei Guo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Performance

Catalog footprint

What is connected

2works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

MPIX Stream: An Explicit Solution to Hybrid MPI+X Programming

The hybrid MPI+X programming paradigm, where X refers to threads or GPUs, has gained prominence in the high-performance computing arena. This corresponds to a trend of system architectures growing more heterogeneous. The current MPI standard only specifies the compatibility levels between MPI and threading runtimes. No MPI concept or interface exists for applications to pass thread context or GPU stream context to MPI implementations explicitly. This lack has made performance optimization complicated in some cases and impossible in other cases. We propose a new concept in MPI, called MPIX stream, to represent the general serial execution context that exists in X runtimes. MPIX streams can be directly mapped to threads or GPU execution streams. Passing thread context into MPI allows implementations to precisely map the execution contexts to network endpoints. Passing GPU execution context into MPI allows implementations to directly operate on GPU streams, lowering the CPU/GPU synchronization cost.

preprint2020arXiv

In-situ Workflow Auto-tuning via Combining Performance Models of Component Applications

In-situ parallel workflows couple multiple component applications, such as simulation and analysis, via streaming data transfer. in order to avoid data exchange via shared file systems. Such workflows are challenging to configure for optimal performance due to the large space of possible configurations. Expert experience is rarely sufficient to identify optimal configurations, and existing empirical auto-tuning approaches are inefficient due to the high cost of obtaining training data for machine learning models. It is also infeasible to optimize individual components independently, due to component interactions. We propose here a new auto-tuning method, Component-based Ensemble Active Learning (CEAL), that combines machine learning techniques with knowledge of in-situ workflow structure to enable automated workflow configuration with a limited number of performance measurements.