Source author record

Duo Wang

Duo Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Hardware Architecture Artificial Intelligence cond-mat.mtrl-sci cond-mat.soft Distributed, Parallel, and Cluster Computing Neural and Evolutionary Computing physics.flu-dyn

Catalog footprint

What is connected

10works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Alleviating Datapath Conflicts and Design Centralization in Graph Analytics Acceleration

Previous graph analytics accelerators have achieved great improvement on throughput by alleviating irregular off-chip memory accesses. However, on-chip side datapath conflicts and design centralization have become the critical issues hindering further throughput improvement. In this paper, a general solution, Multiple-stage Decentralized Propagation network (MDP-network), is proposed to address these issues, inspired by the key idea of trading latency for throughput. Besides, a novel High throughput Graph analytics accelerator, HiGraph, is proposed by deploying MDP-network to address each issue in practice. The experiment shows that compared with state-of-the-art accelerator, HiGraph achieves up to 2.2x speedup (1.5x on average) as well as better scalability.

preprint2022arXiv

Model Architecture Adaption for Bayesian Neural Networks

Bayesian Neural Networks (BNNs) offer a mathematically grounded framework to quantify the uncertainty of model predictions but come with a prohibitive computation cost for both training and inference. In this work, we show a novel network architecture search (NAS) that optimizes BNNs for both accuracy and uncertainty while having a reduced inference latency. Different from canonical NAS that optimizes solely for in-distribution likelihood, the proposed scheme searches for the uncertainty performance using both in- and out-of-distribution data. Our method is able to search for the correct placement of Bayesian layer(s) in a network. In our experiments, the searched models show comparable uncertainty quantification ability and accuracy compared to the state-of-the-art (deep ensemble). In addition, the searched models use only a fraction of the runtime compared to many popular BNN baselines, reducing the inference runtime cost by $2.98 \times$ and $2.92 \times$ respectively on the CIFAR10 dataset when compared to MCDropout and deep ensemble.

preprint2022arXiv

Multi-node Acceleration for Large-scale GCNs

Limited by the memory capacity and compute power, singe-node graph convolutional neural network (GCN) accelerators cannot complete the execution of GCNs within a reasonable amount of time, due to the explosive size of graphs nowadays. Thus, large-scale GCNs call for a multi-node acceleration system (MultiAccSys) like TPU-Pod for large-scale neural networks. In this work, we aim to scale up single-node GCN accelerators to accelerate GCNs on large-scale graphs. We first identify the communication pattern and challenges of multi-node acceleration for GCNs on large-scale graphs. We observe that (1) coarse-grained communication patterns exist in the execution of GCNs in MultiAccSys, which introduces massive amount of redundant network transmissions and off-chip memory accesses; (2) overall, the acceleration of GCNs in MultiAccSys is bandwidth-bound and latency-tolerant. Guided by these two observations, we then propose MultiGCN, the first MultiAccSys for large-scale GCNs that trades network latency for network bandwidth. Specifically, by leveraging the network latency tolerance, we first propose a topology-aware multicast mechanism with a one put per multicast message-passing model to reduce transmissions and alleviate network bandwidth requirements. Second, we introduce a scatter-based round execution mechanism which cooperates with the multicast mechanism and reduces redundant off-chip memory accesses. Compared to the baseline MultiAccSys, MultiGCN achieves 4~12x speedup using only 28%~68% energy, while reducing 32% transmissions and 73% off-chip memory accesses on average. It not only achieves 2.5~8x speedup over the state-of-the-art multi-GPU solution, but also scales to large-scale graphs as opposed to single-node GCN accelerators.

preprint2022arXiv

Nonlinear variation of bedload thickness with fluid flow rate in laminar shearing flow

The movement of subaqueous sediment in laminar shearing flow is numerically investigated by the coupled lattice Boltzmann and discrete element methods. First, the numerical method is validated by comparing the phase diagram proposed by Ouriemi et al. ({\it J. Fluid Mech}., vol. 636, 2009, pp. 321-336). Second, a detailed study on sediment movement is performed for sediment with varying solid volume fractions, and a nonlinear relationship between the normalised thickness of the mobile layer and the normalised fluid flow rate is observed for a densely-packed sediment. Third, an independent investigation on the effective viscosity and friction coefficient of the sediment under different fluid flow rates is conducted in a shear cell; and substitution of these two critical parameters into a theoretical expression proposed by Aussillous et al. ({\it J. Fluid Mech}., vol. 736, 2013, pp. 594-615) provides consistent predictions of bedload thickness with the simulation results of sediment movement. Therefore, we conclude that the non-Newtonian behaviour of densely-packed sediment leads to the nonlinear relationship between the normalised thickness of the mobile layer and the normalised fluid flow rate.

preprint2020arXiv

A Survey of Model Compression and Acceleration for Deep Neural Networks

Deep neural networks (DNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past five years, tremendous progress has been made in this area. In this paper, we review the recent techniques for compacting and accelerating DNN models. In general, these techniques are divided into four categories: parameter pruning and quantization, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and quantization are described first, after that the other techniques are introduced. For each category, we also provide insightful analysis about the performance, related applications, advantages, and drawbacks. Then we go through some very recent successful methods, for example, dynamic capacity networks and stochastic depths networks. After that, we survey the evaluation matrices, the main datasets used for evaluating the model performance, and recent benchmark efforts. Finally, we conclude this paper, discuss remaining the challenges and possible directions for future work.

preprint2020arXiv

Abstract Diagrammatic Reasoning with Multiplex Graph Networks

Abstract reasoning, particularly in the visual domain, is a complex human ability, but it remains a challenging problem for artificial neural learning systems. In this work we propose MXGNet, a multilayer graph neural network for multi-panel diagrammatic reasoning tasks. MXGNet combines three powerful concepts, namely, object-level representation, graph neural networks and multiplex graphs, for solving visual reasoning tasks. MXGNet first extracts object-level representations for each element in all panels of the diagrams, and then forms a multi-layer multiplex graph capturing multiple relations between objects across different diagram panels. MXGNet summarises the multiple graphs extracted from the diagrams of the task, and uses this summarisation to pick the most probable answer from the given candidates. We have tested MXGNet on two types of diagrammatic reasoning tasks, namely Diagram Syllogisms and Raven Progressive Matrices (RPM). For an Euler Diagram Syllogism task MXGNet achieves state-of-the-art accuracy of 99.8%. For PGM and RAVEN, two comprehensive datasets for RPM reasoning, MXGNet outperforms the state-of-the-art models by a considerable margin.

preprint2020arXiv

Are Registration Uncertainty and Error Monotonically Associated

In image-guided neurosurgery, current commercial systems usually provide only rigid registration, partly because it is harder to predict, validate and understand non-rigid registration error. For instance, when surgeons see a discrepancy in aligned image features, they may not be able to distinguish between registration error and actual tissue deformation caused by tumor resection. In this case, the spatial distribution of registration error could help them make more informed decisions, e.g., ignoring the registration where the estimated error is high. However, error estimates are difficult to acquire. Probabilistic image registration (PIR) methods provide measures of registration uncertainty, which could be a surrogate for assessing the registration error. It is intuitive and believed by many clinicians that high uncertainty indicates a large error. However, the monotonic association between uncertainty and error has not been examined in image registration literature. In this pilot study, we attempt to address this fundamental problem by looking at one PIR method, the Gaussian process (GP) registration. We systematically investigate the relation between GP uncertainty and error based on clinical data and show empirically that there is a weak-to-moderate positive monotonic correlation between point-wise GP registration uncertainty and non-rigid registration error.

preprint2020arXiv

Electrical and thermal transport properties of medium-entropy SiyGeySnx alloys

Electrical and thermal transport properties of disordered materials have long been of both theoretical interest and engineering importance. As a new class of materials with an intrinsic compositional disorder, high/medium-entropy alloys (HEAs/MEAs) are being immensely studied mainly for their excellent mechanical properties. By contrast, electrical and thermal transport properties of HEAs/MEAs are less well studied. Here we investigate these two properties of silicon (Si)-germanium (Ge)-tin (Sn) MEAs, where we keep the same content of Si and Ge while increasing the content of Sn from 0 to 1/3 to tune the configurational entropy and thus the degree of compositional disorder. We predict all SiyGeySnx MEAs to be semiconductors with a wide range of bandgaps from near-infrared (0.28 eV) to visible (1.11 eV) in the light spectrum. We find that the bandgaps and effective carrier masses decrease with increasing Sn content. As a result, increasing the compositional disorder in SiyGeySnx MEAs enhances their electrical conductivity. For the thermal transport properties of SiyGeySnx MEAs, our molecular dynamics simulations show an opposite trend in the thermal conductivity of these MEAs at room temperature, which decreases with increasing compositional disorder, owing to enhanced Anderson localization and strong phonon-phonon anharmonic interactions. The enhanced electrical conductivity and weakened thermal conductivity make SiyGeySnx MEAs with high Sn content promising functional materials for thermoelectric applications. Our work demonstrates that HEAs/MEAs not only represent a new class of structural alloys but also a novel category of functional alloys with unique electrical and thermal transport properties.

preprint2020arXiv

Learned Low Precision Graph Neural Networks

Deep Graph Neural Networks (GNNs) show promising performance on a range of graph tasks, yet at present are costly to run and lack many of the optimisations applied to DNNs. We show, for the first time, how to systematically quantise GNNs with minimal or no loss in performance using Network Architecture Search (NAS). We define the possible quantisation search space of GNNs. The proposed novel NAS mechanism, named Low Precision Graph NAS (LPGNAS), constrains both architecture and quantisation choices to be differentiable. LPGNAS learns the optimal architecture coupled with the best quantisation strategy for different components in the GNN automatically using back-propagation in a single search round. On eight different datasets, solving the task of classifying unseen nodes in a graph, LPGNAS generates quantised models with significant reductions in both model and buffer sizes but with similar accuracy to manually designed networks and other NAS results. In particular, on the Pubmed dataset, LPGNAS shows a better size-accuracy Pareto frontier compared to seven other manual and searched baselines, offering a 2.3 times reduction in model size but a 0.4% increase in accuracy when compared to the best NAS competitor. Finally, from our collected quantisation statistics on a wide range of datasets, we suggest a W4A8 (4-bit weights, 8-bit activations) quantisation strategy might be the bottleneck for naive GNN quantisations.

preprint2020arXiv

Probabilistic Dual Network Architecture Search on Graphs

We present the first differentiable Network Architecture Search (NAS) for Graph Neural Networks (GNNs). GNNs show promising performance on a wide range of tasks, but require a large amount of architecture engineering. First, graphs are inherently a non-Euclidean and sophisticated data structure, leading to poor adaptivity of GNN architectures across different datasets. Second, a typical graph block contains numerous different components, such as aggregation and attention, generating a large combinatorial search space. To counter these problems, we propose a Probabilistic Dual Network Architecture Search (PDNAS) framework for GNNs. PDNAS not only optimises the operations within a single graph block (micro-architecture), but also considers how these blocks should be connected to each other (macro-architecture). The dual architecture (micro- and marco-architectures) optimisation allows PDNAS to find deeper GNNs on diverse datasets with better performance compared to other graph NAS methods. Moreover, we use a fully gradient-based search approach to update architectural parameters, making it the first differentiable graph NAS method. PDNAS outperforms existing hand-designed GNNs and NAS results, for example, on the PPI dataset, PDNAS beats its best competitors by 1.67 and 0.17 in F1 scores.

Duo Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Alleviating Datapath Conflicts and Design Centralization in Graph Analytics Acceleration

Model Architecture Adaption for Bayesian Neural Networks

Multi-node Acceleration for Large-scale GCNs

Nonlinear variation of bedload thickness with fluid flow rate in laminar shearing flow

A Survey of Model Compression and Acceleration for Deep Neural Networks

Abstract Diagrammatic Reasoning with Multiplex Graph Networks

Are Registration Uncertainty and Error Monotonically Associated

Electrical and thermal transport properties of medium-entropy SiyGeySnx alloys

Learned Low Precision Graph Neural Networks

Probabilistic Dual Network Architecture Search on Graphs