Researcher profile

Yuchen Zhong

Yuchen Zhong contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

Distributed training using multiple devices (e.g., GPUs) has been widely adopted for learning DNN models over large datasets. However, the performance of large-scale distributed training tends to be far from linear speed-up in practice. Given the complexity of distributed systems, it is challenging to identify the root cause(s) of inefficiency and exercise effective performance optimizations when unexpected low training speed occurs. To date, there exists no software tool which diagnoses performance issues and helps expedite distributed DNN training, while the training can be run using different deep learning frameworks. This paper proposes dPRO, a toolkit that includes: (1) an efficient profiler that collects runtime traces of distributed DNN training across multiple frameworks, especially fine-grained communication traces, and constructs global data flow graphs including detailed communication operations for accurate replay; (2) an optimizer that effectively identifies performance bottlenecks and explores optimization strategies (from computation, communication, and memory aspects) for training acceleration. We implement dPRO on multiple deep learning frameworks (TensorFlow, MXNet) and representative communication schemes (AllReduce and Parameter Server). Extensive experiments show that dPRO predicts the performance of distributed training in various settings with < 5% errors in most cases and finds optimization strategies with up to 3.48x speed-up over the baselines.

preprint2022arXiv

Morphological Anti-Aliasing Method for Boundary Slope Prediction

Image pixel aliasing caused by insufficient sampling is a long-standing problem in the field of computer graphics. It has always been the goal of researchers to seek anti-aliasing algorithms with high speed and good effect. Due to the deficiencies in local detection and reconstruction of sloping line boundaries, a morphological anti-aliasing method for boundary slope prediction is proposed. This method uses the information of the local line boundary slope to predict and test the end positions of the line boundary in the global scope, thereby reconstructing The boundary information more consistent with the actual boundary is obtained, and a more accurate linear boundary shape is obtained with only a small increase in the amount of calculation. Compared with the previous morphological anti-aliasing algorithm, the proposed method is based on the global morphological boundary. , can reconstruct the straight line boundary more accurately, and apply it to the anti-aliasing calculation, which can further improve the color transition of the straight line boundary, make the inclined straight line boundary have higher continuity, and obtain a better anti-aliasing effect.