Researcher profile

Zhengda Bian

Zhengda Bian contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

A Frequency-aware Software Cache for Large Recommendation System Embeddings

Deep learning recommendation models (DLRMs) have been widely applied in Internet companies. The embedding tables of DLRMs are too large to fit on GPU memory entirely. We propose a GPU-based software cache approaches to dynamically manage the embedding table in the CPU and GPU memory space by leveraging the id's frequency statistics of the target dataset. Our proposed software cache is efficient in training entire DLRMs on GPU in a synchronized update manner. It is also scaled to multiple GPUs in combination with the widely used hybrid parallel training approaches. Evaluating our prototype system shows that we can keep only 1.5% of the embedding parameters in the GPU to obtain a decent end-to-end training speed.

preprint2022arXiv

Tesseract: Parallelize the Tensor Parallelism Efficiently

Together with the improvements in state-of-the-art accuracies of various tasks, deep learning models are getting significantly larger. However, it is extremely difficult to implement these large models because limited GPU memory makes it impossible to fit large models into a single GPU or even a GPU server. Besides, it is highly necessary to reduce the training time for large models. Previous methods like Megatron-LM implemented a 1-Dimensional distributed method to use GPUs to speed up the training. However, these methods have a high communication overhead and a low scaling efficiency on large-scale clusters. To solve these problems, we propose Tesseract, a highly scalable tensor parallelism with a novel design. It increases efficiency by reducing communication overhead and lowers the memory required for each GPU. By introducing the novel dimension into tensor parallelism, Tesseract greatly increases the memory capacity of tensor parallelism. Concretely, this new dimension furthermore increases the degree of tensor parallelism. Compared to previous 1-D and 2-D methods, Tesseract manages to reduce the communication cost on each layer, resulting in speedups of 1.38x and 1.53x respectively with strong scaling. In weak scaling experiments, Tesseract achieves a maximum of 4.0/1.7 times inference speedup and 3.4/1.7 times throughput improvement compared to 1-D/2-D methods, respectively. By introducing Tesseract, we offer a more efficient and scalable way to implement large deep learning models with limited GPU resources.