Researcher profile

Haibin Shen

Haibin Shen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2021arXiv

A Low Power In-Memory Multiplication andAccumulation Array with Modified Radix-4 Inputand Canonical Signed Digit Weights

A mass of data transfer between the processing and storage units has been the leading bottleneck in modern Von-Neuman computing systems, especially when used for Artificial Intelligence (AI) tasks. Computing-in-Memory (CIM) has shown great potential to reduce both latency and power consumption. However, the conventional analog CIM schemes are suffering from reliability issues, which may significantly degenerate the accuracy of the computation. Recently, CIM schemes with digitized input data and weights have been proposed for high reliable computing. However, the properties of the digital memory and input data are not fully utilized. This paper presents a novel low power CIM scheme to further reduce the power consumption by using a Modified Radix-4 (M-RD4) booth algorithm at the input and a Modified Canonical Signed Digit (M-CSD) for the network weights. The simulation results show that M-Rd4 and M-CSD reduce the ratio of $1\times1$ by 78.5\% on LeNet and 80.2\% on AlexNet, and improve the computing efficiency by 41.6\% in average. The computing-power rate at the fixed-point 8-bit is 60.68 TOPS/s/W.

preprint2020arXiv

A GAN-based Tunable Image Compression System

The method of importance map has been widely adopted in DNN-based lossy image compression to achieve bit allocation according to the importance of image contents. However, insufficient allocation of bits in non-important regions often leads to severe distortion at low bpp (bits per pixel), which hampers the development of efficient content-weighted image compression systems. This paper rethinks content-based compression by using Generative Adversarial Network (GAN) to reconstruct the non-important regions. Moreover, multiscale pyramid decomposition is applied to both the encoder and the discriminator to achieve global compression of high-resolution images. A tunable compression scheme is also proposed in this paper to compress an image to any specific compression ratio without retraining the model. The experimental results show that our proposed method improves MS-SSIM by more than 10.3% compared to the recently reported GAN-based method to achieve the same low bpp (0.05) on the Kodak dataset.

preprint2020arXiv

An 8-bit In Resistive Memory Computing Core with Regulated Passive Neuron and Bit Line Weight Mapping

The rapid development of Artificial Intelligence (AI) and Internet of Things (IoT) increases the requirement for edge computing with low power and relatively high processing speed devices. The Computing-In-Memory(CIM) schemes based on emerging resistive Non-Volatile Memory(NVM) show great potential in reducing the power consumption for AI computing. However, the device inconsistency of the non-volatile memory may significantly degenerate the performance of the neural network. In this paper, we propose a low power Resistive RAM (RRAM) based CIM core to not only achieve high computing efficiency but also greatly enhance the robustness by bit line regulator and bit line weight mapping algorithm. The simulation results show that the power consumption of our proposed 8-bit CIM core is only 3.61mW (256*256). The SFDR and SNDR of the CIM core achieve 59.13 dB and 46.13 dB, respectively. The proposed bit line weight mapping scheme improves the top-1 accuracy by 2.46% and 3.47% for AlexNet and VGG16 on ImageNet Large Scale Visual Recognition Competition 2012 (ILSVRC 2012) in 8-bit mode, respectively.

preprint2020arXiv

An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs

Deep Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in a wide range of applications. However, deeper CNN models, which are usually computation consuming, are widely required for complex Artificial Intelligence (AI) tasks. Though recent research progress on network compression such as pruning has emerged as a promising direction to mitigate computational burden, existing accelerators are still prevented from completely utilizing the benefits of leveraging sparsity owing to the irregularity caused by pruning. On the other hand, Field-Programmable Gate Arrays (FPGAs) have been regarded as a promising hardware platform for CNN inference acceleration. However, most existing FPGA accelerators focus on dense CNN and cannot address the irregularity problem. In this paper, we propose a sparse wise dataflow to skip the cycles of processing Multiply-and-Accumulates (MACs) with zero weights and exploit data statistics to minimize energy through zeros gating to avoid unnecessary computations. The proposed sparse wise dataflow leads to a low bandwidth requirement and a high data sharing. Then we design an FPGA accelerator containing a Vector Generator Module (VGM) which can match the index between sparse weights and input activations according to the proposed dataflow. Experimental results demonstrate that our implementation can achieve 987 imag/s and 48 imag/s performance for AlexNet and VGG-16 on Xilinx ZCU102, respectively, which provides 1.5x to 6.7x speedup and 2.0x to 6.2x energy-efficiency over previous CNN FPGA accelerators.

preprint2020arXiv

GAC-GAN: A General Method for Appearance-Controllable Human Video Motion Transfer

Human video motion transfer has a wide range of applications in multimedia, computer vision and graphics. Recently, due to the rapid development of Generative Adversarial Networks (GANs), there has been significant progress in the field. However, almost all existing GAN-based works are prone to address the mapping from human motions to video scenes, with scene appearances are encoded individually in the trained models. Therefore, each trained model can only generate videos with a specific scene appearance, new models are required to be trained to generate new appearances. Besides, existing works lack the capability of appearance control. For example, users have to provide video records of wearing new clothes or performing in new backgrounds to enable clothes or background changing in their synthetic videos, which greatly limits the application flexibility. In this paper, we propose GAC-GAN, a general method for appearance-controllable human video motion transfer. To enable general-purpose appearance synthesis, we propose to include appearance information in the conditioning inputs. Thus, once trained, our model can generate new appearances by altering the input appearance information. To achieve appearance control, we first obtain the appearance-controllable conditioning inputs and then utilize a two-stage GAC-GAN to generate the corresponding appearance-controllable outputs, where we utilize an ACGAN loss and a shadow extraction module for output foreground and background appearance control respectively. We further build a solo dance dataset containing a large number of dance videos for training and evaluation. Experimental results show that, our proposed GAC-GAN can not only support appearance-controllable human video motion transfer but also achieve higher video quality than state-of-art methods.