Source author record

Haoran You

Haoran You appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Hardware Architecture Artificial Intelligence Computer Vision

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Towards Cognitive AI Systems: a Survey and Prospective on Neuro-Symbolic AI

The remarkable advancements in artificial intelligence (AI), primarily driven by deep neural networks, have significantly impacted various aspects of our lives. However, the current challenges surrounding unsustainable computational trajectories, limited robustness, and a lack of explainability call for the development of next-generation AI systems. Neuro-symbolic AI (NSAI) emerges as a promising paradigm, fusing neural, symbolic, and probabilistic approaches to enhance interpretability, robustness, and trustworthiness while facilitating learning from much less data. Recent NSAI systems have demonstrated great potential in collaborative human-AI scenarios with reasoning and cognitive capabilities. In this paper, we provide a systematic review of recent progress in NSAI and analyze the performance characteristics and computational operators of NSAI models. Furthermore, we discuss the challenges and potential future directions of NSAI from both system and architectural perspectives.

preprint2022arXiv

I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization

Graph Convolutional Networks (GCNs) have drawn tremendous attention in the past three years. Compared with other deep learning modalities, high-performance hardware acceleration of GCNs is as critical but even more challenging. The hurdles arise from the poor data locality and redundant computation due to the large size, high sparsity, and irregular non-zero distribution of real-world graphs. In this paper we propose a novel hardware accelerator for GCN inference, called I-GCN, that significantly improves data locality and reduces unnecessary computation. The mechanism is a new online graph restructuring algorithm we refer to as islandization. The proposed algorithm finds clusters of nodes with strong internal but weak external connections. The islandization process yields two major benefits. First, by processing islands rather than individual nodes, there is better on-chip data reuse and fewer off-chip memory accesses. Second, there is less redundant computation as aggregation for common/shared neighbors in an island can be reused. The parallel search, identification, and leverage of graph islands are all handled purely in hardware at runtime working in an incremental pipeline. This is done without any preprocessing of the graph data or adjustment of the GCN model structure. Experimental results show that I-GCN can significantly reduce off-chip accesses and prune 38% of aggregation operations, leading to performance speedups over CPUs, GPUs, the prior art GCN accelerators of 5549x, 403x, and 5.7x on average, respectively.

preprint2022arXiv

LDP: Learnable Dynamic Precision for Efficient Deep Neural Network Training and Inference

Low precision deep neural network (DNN) training is one of the most effective techniques for boosting DNNs' training efficiency, as it trims down the training cost from the finest bit level. While existing works mostly fix the model precision during the whole training process, a few pioneering works have shown that dynamic precision schedules help DNNs converge to a better accuracy while leading to a lower training cost than their static precision training counterparts. However, existing dynamic low precision training methods rely on manually designed precision schedules to achieve advantageous efficiency and accuracy trade-offs, limiting their more comprehensive practical applications and achievable performance. To this end, we propose LDP, a Learnable Dynamic Precision DNN training framework that can automatically learn a temporally and spatially dynamic precision schedule during training towards optimal accuracy and efficiency trade-offs. It is worth noting that LDP-trained DNNs are by nature efficient during inference. Furthermore, we visualize the resulting temporal and spatial precision schedule and distribution of LDP trained DNNs on different tasks to better understand the corresponding DNNs' characteristics at different training stages and DNN layers both during and after training, drawing insights for promoting further innovations. Extensive experiments and ablation studies (seven networks, five datasets, and three tasks) show that the proposed LDP consistently outperforms state-of-the-art (SOTA) low precision DNN training techniques in terms of training efficiency and achieved accuracy trade-offs. For example, in addition to having the advantage of being automated, our LDP achieves a 0.31\% higher accuracy with a 39.1\% lower computational cost when training ResNet-20 on CIFAR-10 as compared with the best SOTA method.

preprint2020arXiv

Bayesian Cycle-Consistent Generative Adversarial Networks via Marginalizing Latent Sampling

Recent techniques built on Generative Adversarial Networks (GANs), such as Cycle-Consistent GANs, are able to learn mappings among different domains built from unpaired datasets, through min-max optimization games between generators and discriminators. However, it remains challenging to stabilize the training process and thus cyclic models fall into mode collapse accompanied by the success of discriminator. To address this problem, we propose an novel Bayesian cyclic model and an integrated cyclic framework for inter-domain mappings. The proposed method motivated by Bayesian GAN explores the full posteriors of cyclic model via sampling latent variables and optimizes the model with maximum a posteriori (MAP) estimation. Hence, we name it Bayesian CycleGAN. In addition, original CycleGAN cannot generate diversified results. But it is feasible for Bayesian framework to diversify generated images by replacing restricted latent variables in inference process. We evaluate the proposed Bayesian CycleGAN on multiple benchmark datasets, including Cityscapes, Maps, and Monet2photo. The proposed method improve the per-pixel accuracy by 15% for the Cityscapes semantic segmentation task within origin framework and improve 20% within the proposed integrated framework, showing better resilience to imbalance confrontation. The diversified results of Monet2Photo style transfer also demonstrate its superiority over original cyclic model. We provide codes for all of our experiments in https://github.com/ranery/Bayesian-CycleGAN.

preprint2020arXiv

SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation

We present SmartExchange, an algorithm-hardware co-design framework to trade higher-cost memory storage/access for lower-cost computation, for energy-efficient inference of deep neural networks (DNNs). We develop a novel algorithm to enforce a specially favorable DNN weight structure, where each layerwise weight matrix can be stored as the product of a small basis matrix and a large sparse coefficient matrix whose non-zero elements are all power-of-2. To our best knowledge, this algorithm is the first formulation that integrates three mainstream model compression ideas: sparsification or pruning, decomposition, and quantization, into one unified framework. The resulting sparse and readily-quantized DNN thus enjoys greatly reduced energy consumption in data movement as well as weight storage. On top of that, we further design a dedicated accelerator to fully utilize the SmartExchange-enforced weights to improve both energy efficiency and latency performance. Extensive experiments show that 1) on the algorithm level, SmartExchange outperforms state-of-the-art compression techniques, including merely sparsification or pruning, decomposition, and quantization, in various ablation studies based on nine DNN models and four datasets; and 2) on the hardware level, the proposed SmartExchange based accelerator can improve the energy efficiency by up to 6.7$\times$ and the speedup by up to 19.2$\times$ over four state-of-the-art DNN accelerators, when benchmarked on seven DNN models (including four standard DNNs, two compact DNN models, and one segmentation model) and three datasets.

Haoran You

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Towards Cognitive AI Systems: a Survey and Prospective on Neuro-Symbolic AI

I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization

LDP: Learnable Dynamic Precision for Efficient Deep Neural Network Training and Inference

Bayesian Cycle-Consistent Generative Adversarial Networks via Marginalizing Latent Sampling

SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation