Researcher profile

Arash Fayyazi

Arash Fayyazi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Efficient Compilation and Mapping of Fixed Function Combinational Logic onto Digital Signal Processors Targeting Neural Network Inference and Utilizing High-level Synthesis

Recent efforts for improving the performance of neural network (NN) accelerators that meet today's application requirements have given rise to a new trend of logic-based NN inference relying on fixed function combinational logic. Mapping such large Boolean functions with many input variables and product terms to digital signal processors (DSPs) on Field-programmable gate arrays (FPGAs) needs a novel framework considering the structure and the reconfigurability of DSP blocks during this process. The proposed methodology in this paper maps the fixed function combinational logic blocks to a set of Boolean functions where Boolean operations corresponding to each function are mapped to DSP devices rather than look-up tables (LUTs) on the FPGAs to take advantage of the high performance, low latency, and parallelism of DSP blocks. % This paper also presents an innovative design and optimization methodology for compilation and mapping of NNs, utilizing fixed function combinational logic to DSPs on FPGAs employing high-level synthesis flow. % Our experimental evaluations across several \REVone{datasets} and selected NNs demonstrate the comparable performance of our framework in terms of the inference latency and output accuracy compared to prior art FPGA-based NN accelerators employing DSPs.

preprint2022arXiv

Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators

This paper introduces the sparse periodic systolic (SPS) dataflow, which advances the state-of-the-art hardware accelerator for supporting lightweight neural networks. Specifically, the SPS dataflow enables a novel hardware design approach unlocked by an emergent pruning scheme, periodic pattern-based sparsity (PPS). By exploiting the regularity of PPS, our sparsity-aware compiler optimally reorders the weights and uses a simple indexing unit in hardware to create matches between the weights and activations. Through the compiler-hardware codesign, SPS dataflow enjoys higher degrees of parallelism while being free of the high indexing overhead and without model accuracy loss. Evaluated on popular benchmarks such as VGG and ResNet, the SPS dataflow and accompanying neural network compiler outperform prior work in convolutional neural network (CNN) accelerator designs targeting FPGA devices. Against other sparsity-supporting weight storage formats, SPS results in 4.49x energy efficiency gain while lowering storage requirements by 3.67x for total weight storage (non-pruned weights plus indexing) and 22,044x for indexing memory.

preprint2020arXiv

HIPE-MAGIC: A Technology-Aware Synthesis and Mapping Flow for HIghly Parallel Execution of Memristor-Aided LoGIC

Recent efforts for finding novel computing paradigms that meet today's design requirements have given rise to a new trend of processing-in-memory relying on non-volatile memories. In this paper, we present HIPE-MAGIC, a technology-aware synthesis and mapping flow for highly parallel execution of the memristor-based logic. Our framework is built upon two fundamental contributions: balancing techniques during the logic synthesis, mainly targeting benefits of the parallelism offered by memristive crossbar arrays (MCAs), and an efficient technology mapping framework to maximize the performance and area-efficiency of the memristor-based logic. Our experimental evaluations across several benchmark suites demonstrate the superior performance of HIPE-MAGIC in terms of throughput and energy efficiency compared to recently developed synthesis and mapping flows targeting MCAs, as well as the conventional CPU computing.

preprint2020arXiv

Logic Verification of Ultra-Deep Pipelined Beyond-CMOS Technologies

Traditional logical equivalence checking (LEC) which plays a major role in entire chip design process faces challenges of meeting the requirements demanded by the many emerging technologies that are based on logic models different from standard complementary metal oxide semiconductor (CMOS). In this paper, we propose a LEC framework to be employed in the verification process of beyond-CMOS circuits. Our LEC framework is compatible with existing CMOS technologies, but, also able to check features and capabilities that are unique to beyond-CMOS technologies. For instance, the performance of some emerging technologies benefits from ultra-deep pipelining and verification of such circuits requires new models and algorithms. We, therefore, present the Multi-Cycle Input Dependency (MCID) circuit model which is a novel model representation of design to explicitly capture the dependency of primary outputs of the circuit on sequences of internal signals and inputs. Embedding the proposed circuit model and several structural checking modules, the process of verification can be independent of the underlying technology and signaling. We benchmark the proposed framework on post-synthesis rapid single-flux-quantum (RSFQ) netlists. Results show a comparative verification time of RSFQ circuit benchmark including 32-bit Kogge-Stone adder, 16-bit integer divider, and ISCAS'85 circuits with respect to ABC tool for similar CMOS circuits.

preprint2020arXiv

SynergicLearning: Neural Network-Based Feature Extraction for Highly-Accurate Hyperdimensional Learning

Machine learning models differ in terms of accuracy, computational/memory complexity, training time, and adaptability among other characteristics. For example, neural networks (NNs) are well-known for their high accuracy due to the quality of their automatic feature extraction while brain-inspired hyperdimensional (HD) learning models are famous for their quick training, computational efficiency, and adaptability. This work presents a hybrid, synergic machine learning model that excels at all the said characteristics and is suitable for incremental, on-line learning on a chip. The proposed model comprises an NN and a classifier. The NN acts as a feature extractor and is specifically trained to work well with the classifier that employs the HD computing framework. This work also presents a parameterized hardware implementation of the said feature extraction and classification components while introducing a compiler that maps any arbitrary NN and/or classifier to the aforementioned hardware. The proposed hybrid machine learning model has the same level of accuracy (i.e. $\pm$1%) as NNs while achieving at least 10% improvement in accuracy compared to HD learning models. Additionally, the end-to-end hardware realization of the hybrid model improves power efficiency by 1.60x compared to state-of-the-art, high-performance HD learning implementations while improving latency by 2.13x. These results have profound implications for the application of such synergic models in challenging cognitive tasks.