Source author record

Allen D. Malony

Allen D. Malony appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing

Catalog footprint

What is connected

2works

1topics

2close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

SKaMPI-OpenSHMEM: Measuring OpenSHMEM Communication Routines

Benchmarking is an important challenge in HPC, in particular, to be able to tune the basic blocks of the software environment used by applications. The communication library and distributed run-time environment are among the most critical ones. In particular, many of the routines provided by communication libraries can be adjusted using parameters such as buffer sizes and communication algorithm. As a consequence, being able to measure accurately the time taken by these routines is crucial in order to optimize them and achieve the best performance. For instance, the SKaMPI library was designed to measure the time taken by MPI routines, relying on MPI's two-sided communication model to measure one-sided and two-sided peer-to-peer communication and collective routines. In this paper, we discuss the benchmarking challenges specific to OpenSHMEM's communication model, mainly to avoid inter-call pipelining and overlapping when measuring the time taken by its routines. We extend SKaMPI for OpenSHMEM for this purpose and demonstrate measurement algorithms that address OpenSHMEM's communication model in practice. Scaling experiments are run on the Summit platform to compare different benchmarking approaches on the SKaMPI benchmark operations. These show the advantages of our techniques for more accurate performance characterization.

preprint2020arXiv

On-the-fly Optimization of Parallel Computation of Symbolic Symplectic Invariants

Group invariants are used in high energy physics to define quantum field theory interactions. In this paper, we are presenting the parallel algebraic computation of special invariants called symplectic and even focusing on one particular invariant that finds recent interest in physics. Our results will export to other invariants. The cost of performing basic computations on the multivariate polynomials involved evolves during the computation, as the polynomials get larger or with an increasing number of terms. However, in some cases, they stay small. Traditionally, high-performance software is optimized by running it on a smaller data set in order to use profiling information to set some tuning parameters. Since the (communication and computation) costs evolve during the computation, the first iterations of the computation might not be representative of the rest of the computation and this approach cannot be applied in this case. To cope with this evolution, we are presenting an approach to get performance data and tune the algorithm during the execution.