Source author record

Mostafa E. Salehi

Mostafa E. Salehi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SP Hardware Architecture Machine Learning Distributed, Parallel, and Cluster Computing Neural and Evolutionary Computing

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

MuBiNN: Multi-Level Binarized Recurrent Neural Network for EEG signal Classification

Recurrent Neural Networks (RNN) are widely used for learning sequences in applications such as EEG classification. Complex RNNs could be hardly deployed on wearable devices due to their computation and memory-intensive processing patterns. Generally, reduction in precision leads much more efficiency and binarized RNNs are introduced as energy-efficient solutions. However, naive binarization methods lead to significant accuracy loss in EEG classification. In this paper, we propose a multi-level binarized LSTM, which significantly reduces computations whereas ensuring an accuracy pretty close to the full precision LSTM. Our method reduces the delay of the 3-bit LSTM cell operation 47* with less than 0.01% accuracy loss.

preprint2020arXiv

Multi-level Binarized LSTM in EEG Classification for Wearable Devices

Long Short-Term Memory (LSTM) is widely used in various sequential applications. Complex LSTMs could be hardly deployed on wearable and resourced-limited devices due to the huge amount of computations and memory requirements. Binary LSTMs are introduced to cope with this problem, however, they lead to significant accuracy loss in some application such as EEG classification which is essential to be deployed in wearable devices. In this paper, we propose an efficient multi-level binarized LSTM which has significantly reduced computations whereas ensuring an accuracy pretty close to full precision LSTM. By deploying 5-level binarized weights and inputs, our method reduces area and delay of MAC operation about 31* and 27* in 65nm technology, respectively with less than 0.01% accuracy loss. In contrast to many compute-intensive deep-learning approaches, the proposed algorithm is lightweight, and therefore, brings performance efficiency with accurate LSTM-based EEG classification to real-time wearable devices.

preprint2012arXiv

Design Space Exploration to Find the Optimum Cache and Register File Size for Embedded Applications

In the future, embedded processors must process more computation-intensive network applications and internet traffic and packet-processing tasks become heavier and sophisticated. Since the processor performance is severely related to the average memory access delay and also the number of processor registers affects the performance, cache and register file are two major parts in designing embedded processor architecture. Although increasing cache and register file size leads to performance improvement in embedded applications and packet-processing tasks in high traffic networks with too much packets, the increased area, power consumption and memory hierarchy delay are the overheads of these techniques. Therefore, implementing these components in the optimum size is of significant interest in the design of embedded processors. This paper explores the effect of cache and register file size on the processor performance to calculate the optimum size of these components for embedded applications. Experimental results show that although having bigger cache and register file is one of the performance improvement approaches in embedded processors, however, by increasing the size of these parameters over a threshold level, performance improvement is saturated and then, decreased.

preprint2012arXiv

Performance-Optimum Superscalar Architecture for Embedded Applications

Embedded applications are widely used in portable devices such as wireless phones, personal digital assistants, laptops, etc. High throughput and real time requirements are especially important in such data-intensive tasks. Therefore, architectures that provide the required performance are the most desirable. On the other hand, processor performance is severely related to the average memory access delay, number of processor registers and also size of the instruction window and superscalar parameters. Therefore, cache, register file and superscalar parameters are the major architectural concerns in designing a superscalar architecture for embedded processors. Although increasing cache and register file size leads to performance improvements in high performance embedded processors, the increased area, power consumption and memory delay are the overheads of these techniques. This paper explores the effect of cache, register file and superscalar parameters on the processor performance to specify the optimum size of these parameters for embedded applications. Experimental results show that although having bigger size of these parameters is one of the performance improvement approaches in embedded processors, however, by increasing the size of some parameters over a threshold value, performance improvement is saturated and especially in cache size, increments over this threshold value decrease the performance.

Mostafa E. Salehi

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

MuBiNN: Multi-Level Binarized Recurrent Neural Network for EEG signal Classification

Multi-level Binarized LSTM in EEG Classification for Wearable Devices

Design Space Exploration to Find the Optimum Cache and Register File Size for Embedded Applications

Performance-Optimum Superscalar Architecture for Embedded Applications