Source author record

Xuechao Wei

Xuechao Wei appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

1works
2topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

1 published item(s)

preprint2022arXiv

GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing

Recently, Graph Neural Networks (GNNs) have become state-of-the-art algorithms for analyzing non-euclidean graph data. However, to realize efficient GNN training is challenging, especially on large graphs. The reasons are many-folded: 1) GNN training incurs a substantial memory footprint. Full-batch training on large graphs even requires hundreds to thousands of gigabytes of memory. 2) GNN training involves both memory-intensive and computation-intensive operations, challenging current CPU/GPU platforms. 3) The irregularity of graphs can result in severe resource under-utilization and load-imbalance problems. This paper presents a GNNear accelerator to tackle these challenges. GNNear adopts a DIMM-based memory system to provide sufficient memory capacity. To match the heterogeneous nature of GNN training, we offload the memory-intensive Reduce operations to in-DIMM Near-Memory-Engines (NMEs), making full use of the high aggregated local bandwidth. We adopt a Centralized-Acceleration-Engine (CAE) to process the computation-intensive Update operations. We further propose several optimization strategies to deal with the irregularity of input graphs and improve GNNear's performance. Comprehensive evaluations on 16 GNN training tasks demonstrate that GNNear achieves 30.8$\times$/2.5$\times$ geomean speedup and 79.6$\times$/7.3$\times$(geomean) higher energy efficiency compared to Xeon E5-2698-v4 CPU and NVIDIA V100 GPU.