Researcher profile

Shawki Areibi

Shawki Areibi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - Baseline
2works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2016arXiv

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have gained significant traction in the field of machine learning, particularly due to their high accuracy in visual recognition. Recent works have pushed the performance of GPU implementations of CNNs to significantly improve their classification and training times. With these improvements, many frameworks have become available for implementing CNNs on both CPUs and GPUs, with no support for FPGA implementations. In this work we present a modified version of the popular CNN framework Caffe, with FPGA support. This allows for classification using CNN models and specialized FPGA implementations with the flexibility of reprogramming the device when necessary, seamless memory transactions between host and device, simple-to-use test benches, and the ability to create pipelined layer implementations. To validate the framework, we use the Xilinx SDAccel environment to implement an FPGA-based Winograd convolution engine and show that the FPGA layer can be used alongside other layers running on a host processor to run several popular CNNs (AlexNet, GoogleNet, VGG A, Overfeat). The results show that our framework achieves 50 GFLOPS across 3x3 convolutions in the benchmarks. This is achieved within a practical framework, which will aid in future development of FPGA-based CNNs.

preprint2016arXiv

Deep Learning on FPGAs: Past, Present, and Future

The rapid growth of data size and accessibility in recent years has instigated a shift of philosophy in algorithm design for artificial intelligence. Instead of engineering algorithms by hand, the ability to learn composable systems automatically from massive amounts of data has led to ground-breaking performance in important domains such as computer vision, speech recognition, and natural language processing. The most popular class of techniques used in these domains is called deep learning, and is seeing significant attention from industry. However, these models require incredible amounts of data and compute power to train, and are limited by the need for better hardware acceleration to accommodate scaling beyond current data and model sizes. While the current solution has been to use clusters of graphics processing units (GPU) as general purpose processors (GPGPU), the use of field programmable gate arrays (FPGA) provide an interesting alternative. Current trends in design tools for FPGAs have made them more compatible with the high-level software practices typically practiced in the deep learning community, making FPGAs more accessible to those who build and deploy models. Since FPGA architectures are flexible, this could also allow researchers the ability to explore model-level optimizations beyond what is possible on fixed architectures such as GPUs. As well, FPGAs tend to provide high performance per watt of power consumption, which is of particular importance for application scientists interested in large scale server-based deployment or resource-limited embedded applications. This review takes a look at deep learning and FPGAs from a hardware acceleration perspective, identifying trends and innovations that make these technologies a natural fit, and motivates a discussion on how FPGAs may best serve the needs of the deep learning community moving forward.