Source author record

Ehsan Saboori

Ehsan Saboori appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computer Vision Cryptography and Security Distributed, Parallel, and Cluster Computing Networking and Internet Architecture

Catalog footprint

What is connected

11works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Accelerating Deep Learning Model Inference on Arm CPUs with Ultra-Low Bit Quantization and Runtime

Deep Learning has been one of the most disruptive technological advancements in recent times. The high performance of deep learning models comes at the expense of high computational, storage and power requirements. Sensing the immediate need for accelerating and compressing these models to improve on-device performance, we introduce Deeplite Neutrino for production-ready optimization of the models and Deeplite Runtime for deployment of ultra-low bit quantized models on Arm-based platforms. We implement low-level quantization kernels for Armv7 and Armv8 architectures enabling deployment on the vast array of 32-bit and 64-bit Arm-based devices. With efficient implementations using vectorization, parallelization, and tiling, we realize speedups of up to 2x and 2.2x compared to TensorFlow Lite with XNNPACK backend on classification and detection models, respectively. We also achieve significant speedups of up to 5x and 3.2x compared to ONNX Runtime for classification and detection models, respectively.

preprint2022arXiv

QReg: On Regularization Effects of Quantization

In this paper we study the effects of quantization in DNN training. We hypothesize that weight quantization is a form of regularization and the amount of regularization is correlated with the quantization level (precision). We confirm our hypothesis by providing analytical study and empirical results. By modeling weight quantization as a form of additive noise to weights, we explore how this noise propagates through the network at training time. We then show that the magnitude of this noise is correlated with the level of quantization. To confirm our analytical study, we performed an extensive list of experiments summarized in this paper in which we show that the regularization effects of quantization can be seen in various vision tasks and models, over various datasets. Based on our study, we propose that 8-bit quantization provides a reliable form of regularization in different vision tasks and models.

preprint2022arXiv

Rapid design space exploration of multi-clock domain MPSoCs with hybrid prototyping

This paper presents novel techniques of using hybrid prototyping for early power-performance analysis of MPSoC designs with multiple clock domains. The fundamental idea of hybrid prototyping is to simulate a design with multiple cores by creating an emulation kernel in software on top of a single physical instance of the core. However, so far hybrid prototyping has been limited to homogeneous multicores running at the same clock frequency. Moreover, hybrid prototyping has not yet been demonstrated for efficient design space exploration. Our work focuses on enhancing the capabilities of hybrid prototyping, such that it can be applied to realistic multi-clock MPSoC designs as well to perform early power-performance evaluation of MPSoC designs. Our experiments using industrial strength applications such as JPEG, MP3 and Packet Processing, demonstrate the high accuracy of our hybrid prototypes, and over two orders of magnitude improvement over software simulation speed. We also demonstrate that exploring over 150 design options using hybrid prototyping can be done with high reliability in the order of minutes compared to multiple days using conventional FPGA prototyping.

preprint2021arXiv

Deeplite Neutrino: An End-to-End Framework for Constrained Deep Learning Model Optimization

Designing deep learning-based solutions is becoming a race for training deeper models with a greater number of layers. While a large-size deeper model could provide competitive accuracy, it creates a lot of logistical challenges and unreasonable resource requirements during development and deployment. This has been one of the key reasons for deep learning models not being excessively used in various production environments, especially in edge devices. There is an immediate requirement for optimizing and compressing these deep learning models, to enable on-device intelligence. In this research, we introduce a black-box framework, Deeplite Neutrino for production-ready optimization of deep learning models. The framework provides an easy mechanism for the end-users to provide constraints such as a tolerable drop in accuracy or target size of the optimized models, to guide the whole optimization process. The framework is easy to include in an existing production pipeline and is available as a Python Package, supporting PyTorch and Tensorflow libraries. The optimization performance of the framework is shown across multiple benchmark datasets and popular deep learning models. Further, the framework is currently used in production and the results and testimonials from several clients are summarized.

preprint2013arXiv

Fast Feature Reduction in intrusion detection datasets

In the most intrusion detection systems (IDS), a system tries to learn characteristics of different type of attacks by analyzing packets that sent or received in network. These packets have a lot of features. But not all of them is required to be analyzed to detect that specific type of attack. Detection speed and computational cost is another vital matter here, because in these types of problems, datasets are very huge regularly. In this paper we tried to propose a very simple and fast feature selection method to eliminate features with no helpful information on them. Result faster learning in process of redundant feature omission. We compared our proposed method with three most successful similarity based feature selection algorithm including Correlation Coefficient, Least Square Regression Error and Maximal Information Compression Index. After that we used recommended features by each of these algorithms in two popular classifiers including: Bayes and KNN classifier to measure the quality of the recommendations. Experimental result shows that although the proposed method can't outperform evaluated algorithms with high differences in accuracy, but in computational cost it has huge superiority over them.

preprint2012arXiv

A new scheduling algorithm for server farms load balancing

This paper describes a new scheduling algorithm to distribute jobs in server farm systems. The proposed algorithm overcomes the starvation caused by SRPT (Shortest Remaining Processing Time). This algorithm is used in process scheduling in operating system approach. The algorithm was developed to be used in dispatcher scheduling. This algorithm is non-preemptive discipline, similar to SRPT, in which the priority of each job depends on its estimated run time, and also the amount of time it has spent on waiting. Tasks in the servers are served in order of priority to optimize the system response time. The experiments show that the mean round around time is reduced in the server farm system.

preprint2012arXiv

Analyzing the Dual-Path Peer-to-Peer Anonymous Approach

Dual-Path is an anonymous peer-to-peer approach which provides requester anonymity. This approach provides anonymity between a requester and a provider in peer-to-peer networks with trusted servers called suppernode so the provider will not be able to identify the requester and no other peers can identify the two communicating parties with certainty. Dual-Path establishes two paths for transmitting data. These paths called Request path and Response path. The first one is used for requesting data and the second one is used for sending the requested data to the requester. As Dual-Path approach is similar to Crowds approach, this article compares reliability and performance of Dual-Path and Crowds. For this purpose a simulator is developed and several scenarios are defined to compare Dual-Path and Crowds in different situations. In chapter 2 and 3 Dual-Path and Crowds approaches are briefly described. Chapter 4 is talking about simulator. Chapter 5 explains the scenarios for comparison of performance. Chapter 6 is about comparison of reliability and chapter 7 is conclusion.

preprint2012arXiv

Anonymous Communication in Peer-to-Peer Networks for providing more Privacy and Security

One of the most important issues in peer-to-peer networks is anonymity. The major anonymity for peer-to-peer users concerned with the users' identities and actions which can be revealed by any other members. There are many approaches proposed to provide anonymous peer-to-peer communications. An intruder can get information about the content of the data, the sender's and receiver's identities. Anonymous approaches are designed with the following three goals: to protect the identity of provider, to protect the identity of requester and to protect the contents of transferred data between them. This article presents a new peer-to-peer approach to achieve anonymity between a requester and a provider in peer-to-peer networks with trusted servers called suppernode so that the provider will not be able to identify the requester and no other peers can identify the two communicating parties with certainty. This article shows that the proposed algorithm improved reliability and has more security. This algorithm, based on onion routing and randomization, protects transferring data against traffic analysis attack. The ultimate goal of this anonymous communications algorithm is to allow a requester to communicate with a provider in such a manner that nobody can determine the requester's identity and the content of transferred data.

preprint2012arXiv

Automatic firewall rules generator for anomaly detection systems with Apriori algorithm

Network intrusion detection systems have become a crucial issue for computer systems security infrastructures. Different methods and algorithms are developed and proposed in recent years to improve intrusion detection systems. The most important issue in current systems is that they are poor at detecting novel anomaly attacks. These kinds of attacks refer to any action that significantly deviates from the normal behaviour which is considered intrusion. This paper proposed a model to improve this problem based on data mining techniques. Apriori algorithm is used to predict novel attacks and generate real-time rules for firewall. Apriori algorithm extracts interesting correlation relationships among large set of data items. This paper illustrates how to use Apriori algorithm in intrusion detection systems to cerate a automatic firewall rules generator to detect novel anomaly attack. Apriori is the best-known algorithm to mine association rules. This is an innovative way to find association rules on large scale.

preprint2012arXiv

Data Selection for Semi-Supervised Learning

The real challenge in pattern recognition task and machine learning process is to train a discriminator using labeled data and use it to distinguish between future data as accurate as possible. However, most of the problems in the real world have numerous data, which labeling them is a cumbersome or even an impossible matter. Semi-supervised learning is one approach to overcome these types of problems. It uses only a small set of labeled with the company of huge remain and unlabeled data to train the discriminator. In semi-supervised learning, it is very essential that which data is labeled and depend on position of data it effectiveness changes. In this paper, we proposed an evolutionary approach called Artificial Immune System (AIS) to determine which data is better to be labeled to get the high quality data. The experimental results represent the effectiveness of this algorithm in finding these data points.

preprint2012arXiv

Improving the K-means algorithm using improved downhill simplex search

The k-means algorithm is one of the well-known and most popular clustering algorithms. K-means seeks an optimal partition of the data by minimizing the sum of squared error with an iterative optimization procedure, which belongs to the category of hill climbing algorithms. As we know hill climbing searches are famous for converging to local optimums. Since k-means can converge to a local optimum, different initial points generally lead to different convergence cancroids, which makes it important to start with a reasonable initial partition in order to achieve high quality clustering solutions. However, in theory, there exist no efficient and universal methods for determining such initial partitions. In this paper we tried to find an optimum initial partitioning for k-means algorithm. To achieve this goal we proposed a new improved version of downhill simplex search, and then we used it in order to find an optimal result for clustering approach and then compare this algorithm with Genetic Algorithm base (GA), Genetic K-Means (GKM), Improved Genetic K-Means (IGKM) and k-means algorithms.

Ehsan Saboori

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Accelerating Deep Learning Model Inference on Arm CPUs with Ultra-Low Bit Quantization and Runtime

QReg: On Regularization Effects of Quantization

Rapid design space exploration of multi-clock domain MPSoCs with hybrid prototyping

Deeplite Neutrino: An End-to-End Framework for Constrained Deep Learning Model Optimization

Fast Feature Reduction in intrusion detection datasets

A new scheduling algorithm for server farms load balancing

Analyzing the Dual-Path Peer-to-Peer Anonymous Approach

Anonymous Communication in Peer-to-Peer Networks for providing more Privacy and Security

Automatic firewall rules generator for anomaly detection systems with Apriori algorithm

Data Selection for Semi-Supervised Learning

Improving the K-means algorithm using improved downhill simplex search