Source author record

Yuchen Zhong

Yuchen Zhong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Graphics Machine Learning Performance

Catalog footprint

What is connected

2works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

Distributed training using multiple devices (e.g., GPUs) has been widely adopted for learning DNN models over large datasets. However, the performance of large-scale distributed training tends to be far from linear speed-up in practice. Given the complexity of distributed systems, it is challenging to identify the root cause(s) of inefficiency and exercise effective performance optimizations when unexpected low training speed occurs. To date, there exists no software tool which diagnoses performance issues and helps expedite distributed DNN training, while the training can be run using different deep learning frameworks. This paper proposes dPRO, a toolkit that includes: (1) an efficient profiler that collects runtime traces of distributed DNN training across multiple frameworks, especially fine-grained communication traces, and constructs global data flow graphs including detailed communication operations for accurate replay; (2) an optimizer that effectively identifies performance bottlenecks and explores optimization strategies (from computation, communication, and memory aspects) for training acceleration. We implement dPRO on multiple deep learning frameworks (TensorFlow, MXNet) and representative communication schemes (AllReduce and Parameter Server). Extensive experiments show that dPRO predicts the performance of distributed training in various settings with < 5% errors in most cases and finds optimization strategies with up to 3.48x speed-up over the baselines.

preprint2022arXiv

Morphological Anti-Aliasing Method for Boundary Slope Prediction

Image pixel aliasing caused by insufficient sampling is a long-standing problem in the field of computer graphics. It has always been the goal of researchers to seek anti-aliasing algorithms with high speed and good effect. Due to the deficiencies in local detection and reconstruction of sloping line boundaries, a morphological anti-aliasing method for boundary slope prediction is proposed. This method uses the information of the local line boundary slope to predict and test the end positions of the line boundary in the global scope, thereby reconstructing The boundary information more consistent with the actual boundary is obtained, and a more accurate linear boundary shape is obtained with only a small increase in the amount of calculation. Compared with the previous morphological anti-aliasing algorithm, the proposed method is based on the global morphological boundary. , can reconstruct the straight line boundary more accurately, and apply it to the anti-aliasing calculation, which can further improve the color transition of the straight line boundary, make the inclined straight line boundary have higher continuity, and obtain a better anti-aliasing effect.

Yuchen Zhong

What is connected

Connect this record

See the researcher in context

Building this map preview

2 published item(s)

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

Morphological Anti-Aliasing Method for Boundary Slope Prediction