Source author record

Yunming Zhang

Yunming Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Programming Languages physics.plasm-ph

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Compilation Techniques for Graph Algorithms on GPUs

The performance of graph programs depends highly on the algorithm, the size and structure of the input graphs, as well as the features of the underlying hardware. No single set of optimizations or one hardware platform works well across all settings. To achieve high performance, the programmer must carefully select which set of optimizations and hardware platforms to use. The GraphIt programming language makes it easy for the programmer to write the algorithm once and optimize it for different inputs using a scheduling language. However, GraphIt currently has no support for generating high performance code for GPUs. Programmers must resort to re-implementing the entire algorithm from scratch in a low-level language with an entirely different set of abstractions and optimizations in order to achieve high performance on GPUs. We propose GG, an extension to the GraphIt compiler framework, that achieves high performance on both CPUs and GPUs using the same algorithm specification. GG significantly expands the optimization space of GPU graph processing frameworks with a novel GPU scheduling language and compiler that enables combining graph optimizations for GPUs. GG also introduces two performance optimizations, Edge-based Thread Warps CTAs load balancing (ETWC) and EdgeBlocking, to expand the optimization space for GPUs. ETWC improves load balancing by dynamically partitioning the edges of each vertex into blocks that are assigned to threads, warps, and CTAs for execution. EdgeBlocking improves the locality of the program by reordering the edges and restricting random memory accesses to fit within the L2 cache. We evaluate GG on 5 algorithms and 9 input graphs on both Pascal and Volta generation NVIDIA GPUs, and show that it achieves up to 5.11x speedup over state-of-the-art GPU graph processing frameworks, and is the fastest on 66 out of the 90 experiments.

preprint2020arXiv

Optimizing Ordered Graph Algorithms with GraphIt

Many graph problems can be solved using ordered parallel graph algorithms that achieve significant speedup over their unordered counterparts by reducing redundant work. This paper introduces a new priority-based extension to GraphIt, a domain-specific language for writing graph applications, to simplify writing high-performance parallel ordered graph algorithms. The extension enables vertices to be processed in a dynamic order while hiding low-level implementation details from the user. We extend the compiler with new program analyses, transformations, and code generation to produce fast implementations of ordered parallel graph algorithms. We also introduce bucket fusion, a new performance optimization that fuses together different rounds of ordered algorithms to reduce synchronization overhead, resulting in $1.2\times$--3$\times$ speedup over the fastest existing ordered algorithm implementations on road networks with large diameters. With the extension, GraphIt achieves up to 3$\times$ speedup on six ordered graph algorithms over state-of-the-art frameworks and hand-optimized implementations (Julienne, Galois, and GAPBS) that support ordered algorithms.

preprint2015arXiv

The energy distribution structure and dynamic characteristics of energy release in electrostatic discharge process

The detail structure of energy output and the dynamic characteristics of electric spark discharge process have been studied to calculate the energy of electric spark induced plasma under different discharge condition accurately. A series of electric spark discharge experiments were conducted with the capacitor stored energy in the range of 10J 100J and 1000J respectively. And the resistance of wire, switch and plasma between electrodes were evaluated by different methods. An optimized method for electric resistance evaluation of the full discharge circuit, three poles switch and electric spark induced plasma during the discharge process was put forward. The electric energy consumed by wire, electric switch and electric spark induced plasma between electrodes were obtained by Joules law. The structure of energy distribution and the dynamic process of energy release during the capacitor discharge process have been studied. Experiments results showed that, with the increase of capacitor released energy, the duration of discharge process becomes longer, and the energy of plasma accounts for more in the capacitor released energy. The dynamic resistance of plasma and three poles switch obtained by energy conversation law is more precise than that obtained by the parameters of electric current oscillation during the discharge process.