Source author record

Fengbo Ren

Fengbo Ren appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Information Theory math.IT Distributed, Parallel, and Cluster Computing Machine Learning Computation and Language Hardware Architecture Performance

Catalog footprint

What is connected

8works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

HALO 1.0: A Hardware-agnostic Accelerator Orchestration Framework for Enabling Hardware-agnostic Programming with True Performance Portability for Heterogeneous HPC

This paper presents HALO 1.0, an open-ended extensible multi-agent software framework that implements a set of proposed hardware-agnostic accelerator orchestration (HALO) principles. HALO implements a novel compute-centric message passing interface (C^2MPI) specification for enabling the performance portable execution of a hardware-agnostic host application across heterogeneous accelerators. The experiment results of evaluating eight widely used HPC subroutines based on Intel Xeon E5-2620 CPUs, Intel Arria 10 GX FPGAs, and NVIDIA GeForce RTX 2080 Ti GPUs show that HALO 1.0 allows for a unified control flow for host programs to run across all the computing devices with a consistently top performance portability score, which is up to five orders of magnitude higher than the OpenCL-based solution.

preprint2021arXiv

A Survey of System Architectures and Techniques for FPGA Virtualization

FPGA accelerators are gaining increasing attention in both cloud and edge computing because of their hardware flexibility, high computational throughput, and low power consumption. However, the design flow of FPGAs often requires specific knowledge of the underlying hardware, which hinders the wide adoption of FPGAs by application developers. Therefore, the virtualization of FPGAs becomes extremely important to create a useful abstraction of the hardware suitable for application developers. Such abstraction also enables the sharing of FPGA resources among multiple users and accelerator applications, which is important because, traditionally, FPGAs have been mostly used in single-user, single-embedded-application scenarios. There are many works in the field of FPGA virtualization covering different aspects and targeting different application areas. In this survey, we review the system architectures used in the literature for FPGA virtualization. In addition, we identify the primary objectives of FPGA virtualization, based on which we summarize the techniques for realizing FPGA virtualization. This survey helps researchers to efficiently learn about FPGA virtualization research by providing a comprehensive review of the existing literature.

preprint2020arXiv

Learning in the Frequency Domain

Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same input size, the proposed method achieves 1.41% and 0.66% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.

preprint2020arXiv

MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time

Monocular multi-object detection and localization in 3D space has been proven to be a challenging task. The MoNet3D algorithm is a novel and effective framework that can predict the 3D position of each object in a monocular image and draw a 3D bounding box for each object. The MoNet3D method incorporates prior knowledge of the spatial geometric correlation of neighbouring objects into the deep neural network training process to improve the accuracy of 3D object localization. Experiments on the KITTI dataset show that the accuracy for predicting the depth and horizontal coordinates of objects in 3D space can reach 96.25\% and 94.74\%, respectively. Moreover, the method can realize the real-time image processing at 27.85 FPS, showing promising potential for embedded advanced driving-assistance system applications. Our code is publicly available at https://github.com/CQUlearningsystemgroup/YicongPeng.

preprint2016arXiv

A Binary Convolutional Encoder-decoder Network for Real-time Natural Scene Text Processing

In this paper, we develop a binary convolutional encoder-decoder network (B-CEDNet) for natural scene text processing (NSTP). It converts a text image to a class-distinguished salience map that reveals the categorical, spatial and morphological information of characters. The existing solutions are either memory consuming or run-time consuming that cannot be applied to real-time applications on resource-constrained devices such as advanced driver assistance systems. The developed network can process multiple regions containing characters by one-off forward operation, and is trained to have binary weights and binary feature maps, which lead to both remarkable inference run-time speedup and memory usage reduction. By training with over 200, 000 synthesis scene text images (size of $32\times128$), it can achieve $90\%$ and $91\%$ pixel-wise accuracy on ICDAR-03 and ICDAR-13 datasets. It only consumes $4.59\ ms$ inference run-time realized on GPU with a small network size of 2.14 MB, which is up to $8\times$ faster and $96\%$ smaller than it full-precision version.

preprint2016arXiv

A Data-Driven Compressive Sensing Framework for Long-Term Health Monitoring

Compressive sensing (CS) is a promising technology for realizing energy-efficient wireless sensors for long-term health monitoring. In this paper, we propose a data-driven CS framework that learns signal characteristics and individual variability from patients' data to significantly enhance CS performance and noise resilience. This is accomplished by a co-training approach that optimizes both the sensing matrix and dictionary towards improved restricted isometry property (RIP) and signal sparsity, respectively. Experimental results upon ECG signals show that our framework is able to achieve better reconstruction quality with up to 80% higher compression ratio (CP) than conventional frameworks based on random sensing matrices and overcomplete bases. In addition, our framework shows great noise resilience capability, which tolerates up to 40dB higher noise energy at a CP of 9 times.

preprint2016arXiv

A Data-Driven Compressive Sensing Framework Tailored For Energy-Efficient Wearable Sensing

Compressive sensing (CS) is a promising technology for realizing energy-efficient wireless sensors for long-term health monitoring. However, conventional model-driven CS frameworks suffer from limited compression ratio and reconstruction quality when dealing with physiological signals due to inaccurate models and the overlook of individual variability. In this paper, we propose a data-driven CS framework that can learn signal characteristics and personalized features from any individual recording of physiologic signals to enhance CS performance with a minimized number of measurements. Such improvements are accomplished by a co-training approach that optimizes the sensing matrix and the dictionary towards improved restricted isometry property and signal sparsity, respectively. Experimental results upon ECG signals show that the proposed method, at a compression ratio of 10x, successfully reduces the isometry constant of the trained sensing matrices by 86% against random matrices and improves the overall reconstructed signal-to-noise ratio by 15dB over conventional model-driven approaches.

preprint2016arXiv

An Energy-Efficient Compressive Sensing Framework Incorporating Online Dictionary Learning for Long-term Wireless Health Monitoring

Wireless body area network (WBAN) is emerging in the mobile healthcare area to replace the traditional wire-connected monitoring devices. As wireless data transmission dominates power cost of sensor nodes, it is beneficial to reduce the data size without much information loss. Compressive sensing (CS) is a perfect candidate to achieve this goal compared to existing compression techniques. In this paper, we proposed a general framework that utilize CS and online dictionary learning (ODL) together. The learned dictionary carries individual characteristics of the original signal, under which the signal has an even sparser representation compared to pre-determined dictionaries. As a consequence, the compression ratio is effectively improved by 2-4x comparing to prior works. Besides, the proposed framework offloads pre-processing from sensor nodes to the server node prior to dictionary learning, providing further reduction in hardware costs. As it is data driven, the proposed framework has the potential to be used with a wide range of physiological signals.

Fengbo Ren

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

HALO 1.0: A Hardware-agnostic Accelerator Orchestration Framework for Enabling Hardware-agnostic Programming with True Performance Portability for Heterogeneous HPC

A Survey of System Architectures and Techniques for FPGA Virtualization

Learning in the Frequency Domain

MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time

A Binary Convolutional Encoder-decoder Network for Real-time Natural Scene Text Processing

A Data-Driven Compressive Sensing Framework for Long-Term Health Monitoring

A Data-Driven Compressive Sensing Framework Tailored For Energy-Efficient Wearable Sensing

An Energy-Efficient Compressive Sensing Framework Incorporating Online Dictionary Learning for Long-term Wireless Health Monitoring