Source author record

Shervin Minaee

Shervin Minaee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning eess.IV Computation and Language Artificial Intelligence

Catalog footprint

What is connected

22works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Modern Augmented Reality: Applications, Trends, and Future Directions

Augmented reality (AR) is one of the relatively old, yet trending areas in the intersection of computer vision and computer graphics with numerous applications in several areas, from gaming and entertainment, to education and healthcare. Although it has been around for nearly fifty years, it has seen a lot of interest by the research community in the recent years, mainly because of the huge success of deep learning models for various computer vision and AR applications, which made creating new generations of AR technologies possible. This work tries to provide an overview of modern augmented reality, from both application-level and technical perspective. We first give an overview of main AR applications, grouped into more than ten categories. We then give an overview of around 100 recent promising machine learning based works developed for AR systems, such as deep learning works for AR shopping (clothing, makeup), AR based image filters (such as Snapchat's lenses), AR animations, and more. In the end we discuss about some of the current challenges in AR domain, and the future directions in this area.

preprint2022arXiv

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

Most methods for conditional video synthesis use a single modality as the condition. This comes with major limitations. For example, it is problematic for a model conditioned on an image to generate a specific motion trajectory desired by the user since there is no means to provide motion information. Conversely, language information can describe the desired motion, while not precisely defining the content of the video. This work presents a multimodal video generation framework that benefits from text and images provided jointly or separately. We leverage the recent progress in quantized representations for videos and apply a bidirectional transformer with multiple modalities as inputs to predict a discrete video representation. To improve video quality and consistency, we propose a new video token trained with self-learning and an improved mask-prediction algorithm for sampling video tokens. We introduce text augmentation to improve the robustness of the textual representation and diversity of generated videos. Our framework can incorporate various visual modalities, such as segmentation masks, drawings, and partially occluded images. It can generate much longer sequences than the one used for training. In addition, our model can extract visual information as suggested by the text prompt, e.g., "an object in image one is moving northeast", and generate corresponding videos. We run evaluations on three public datasets and a newly collected dataset labeled with facial attributes, achieving state-of-the-art generation results on all four.

preprint2021arXiv

Biometrics Recognition Using Deep Learning: A Survey

Deep learning-based models have been very successful in achieving state-of-the-art results in many of the computer vision, speech recognition, and natural language processing tasks in the last few years. These models seem a natural fit for handling the ever-increasing scale of biometric recognition problems, from cellphone authentication to airport security systems. Deep learning-based models have increasingly been leveraged to improve the accuracy of different biometric recognition systems in recent years. In this work, we provide a comprehensive survey of more than 120 promising works on biometric recognition (including face, fingerprint, iris, palmprint, ear, voice, signature, and gait recognition), which deploy deep learning models, and show their strengths and potentials in different applications. For each biometric, we first introduce the available datasets that are widely used in the literature and their characteristics. We will then talk about several promising deep learning works developed for that biometric, and show their performance on popular public benchmarks. We will also discuss some of the main challenges while using these models for biometric recognition, and possible future directions to which research in this area is headed.

preprint2021arXiv

Deep Learning Based Text Classification: A Comprehensive Review

Deep learning based models have surpassed classical machine learning based approaches in various text classification tasks, including sentiment analysis, news categorization, question answering, and natural language inference. In this paper, we provide a comprehensive review of more than 150 deep learning based models for text classification developed in recent years, and discuss their technical contributions, similarities, and strengths. We also provide a summary of more than 40 popular datasets widely used for text classification. Finally, we provide a quantitative analysis of the performance of different deep learning models on popular benchmarks, and discuss future research directions.

preprint2020arXiv

COVID CT-Net: Predicting Covid-19 From Chest CT Images Using Attentional Convolutional Network

The novel corona-virus disease (COVID-19) pandemic has caused a major outbreak in more than 200 countries around the world, leading to a severe impact on the health and life of many people globally. As of Aug 25th of 2020, more than 20 million people are infected, and more than 800,000 death are reported. Computed Tomography (CT) images can be used as a as an alternative to the time-consuming "reverse transcription polymerase chain reaction (RT-PCR)" test, to detect COVID-19. In this work we developed a deep learning framework to predict COVID-19 from CT images. We propose to use an attentional convolution network, which can focus on the infected areas of chest, enabling it to perform a more accurate prediction. We trained our model on a dataset of more than 2000 CT images, and report its performance in terms of various popular metrics, such as sensitivity, specificity, area under the curve, and also precision-recall curve, and achieve very promising results. We also provide a visualization of the attention maps of the model for several test images, and show that our model is attending to the infected regions as intended. In addition to developing a machine learning modeling framework, we also provide the manual annotation of the potentionally infected regions of chest, with the help of a board-certified radiologist, and make that publicly available for other researchers.

preprint2020arXiv

COVID TV-UNet: Segmenting COVID-19 Chest CT Images Using Connectivity Imposed U-Net

The novel corona-virus disease (COVID-19) pandemic has caused a major outbreak in more than 200 countries around the world, leading to a severe impact on the health and life of many people globally. As of mid-July 2020, more than 12 million people were infected, and more than 570,000 death were reported. Computed Tomography (CT) images can be used as an alternative to the time-consuming RT-PCR test, to detect COVID-19. In this work we propose a segmentation framework to detect chest regions in CT images, which are infected by COVID-19. We use an architecture similar to U-Net model, and train it to detect ground glass regions, on pixel level. As the infected regions tend to form a connected component (rather than randomly distributed pixels), we add a suitable regularization term to the loss function, to promote connectivity of the segmentation map for COVID-19 pixels. 2D-anisotropic total-variation is used for this purpose, and therefore the proposed model is called "TV-UNet". Through experimental results on a relatively large-scale CT segmentation dataset of around 900 images, we show that adding this new regularization term leads to 2\% gain on overall segmentation performance compared to the U-Net model. Our experimental analysis, ranging from visual evaluation of the predicted segmentation results to quantitative assessment of segmentation performance (precision, recall, Dice score, and mIoU) demonstrated great ability to identify COVID-19 associated regions of the lungs, achieving a mIoU rate of over 99\%, and a Dice score of around 86\%.

preprint2020arXiv

Covid-Transformer: Detecting COVID-19 Trending Topics on Twitter Using Universal Sentence Encoder

The novel corona-virus disease (also known as COVID-19) has led to a pandemic, impacting more than 200 countries across the globe. With its global impact, COVID-19 has become a major concern of people almost everywhere, and therefore there are a large number of tweets coming out from every corner of the world, about COVID-19 related topics. In this work, we try to analyze the tweets and detect the trending topics and major concerns of people on Twitter, which can enable us to better understand the situation, and devise better planning. More specifically we propose a model based on the universal sentence encoder to detect the main topics of Tweets in recent months. We used universal sentence encoder in order to derive the semantic representation and the similarity of tweets. We then used the sentence similarity and their embeddings, and feed them to K-means clustering algorithm to group similar tweets (in semantic sense). After that, the cluster summary is obtained using a text summarization algorithm based on deep learning, which can uncover the underlying topics of each cluster. Through experimental results, we show that our model can detect very informative topics, by processing a large number of tweets on sentence level (which can preserve the overall meaning of the tweets). Since this framework has no restriction on specific data distribution, it can be used to detect trending topics from any other social media and any other context rather than COVID-19. Experimental results show superiority of our proposed approach to other baselines, including TF-IDF, and latent Dirichlet allocation (LDA).

preprint2020arXiv

Palm-GAN: Generating Realistic Palmprint Images Using Total-Variation Regularized GAN

Generating realistic palmprint (more generally biometric) images has always been an interesting and, at the same time, challenging problem. Classical statistical models fail to generate realistic-looking palmprint images, as they are not powerful enough to capture the complicated texture representation of palmprint images. In this work, we present a deep learning framework based on generative adversarial networks (GAN), which is able to generate realistic palmprint images. To help the model learn more realistic images, we proposed to add a suitable regularization to the loss function, which imposes the line connectivity of generated palmprint images. This is very desirable for palmprints, as the principal lines in palm are usually connected. We apply this framework to a popular palmprint databases, and generate images which look very realistic, and similar to the samples in this database. Through experimental results, we show that the generated palmprint images look very realistic, have a good diversity, and are able to capture different parts of the prior distribution. We also report the Frechet Inception distance (FID) of the proposed model, and show that our model is able to achieve really good quantitative performance in terms of FID score.

preprint2020arXiv

Regularized Submodular Maximization at Scale

In this paper, we propose scalable methods for maximizing a regularized submodular function $f = g - \ell$ expressed as the difference between a monotone submodular function $g$ and a modular function $\ell$. Indeed, submodularity is inherently related to the notions of diversity, coverage, and representativeness. In particular, finding the mode of many popular probabilistic models of diversity, such as determinantal point processes, submodular probabilistic models, and strongly log-concave distributions, involves maximization of (regularized) submodular functions. Since a regularized function $f$ can potentially take on negative values, the classic theory of submodular maximization, which heavily relies on the non-negativity assumption of submodular functions, may not be applicable. To circumvent this challenge, we develop the first one-pass streaming algorithm for maximizing a regularized submodular function subject to a $k$-cardinality constraint. It returns a solution $S$ with the guarantee that $f(S)\geq(ϕ^{-2}-ε) \cdot g(OPT)-\ell (OPT)$, where $ϕ$ is the golden ratio. Furthermore, we develop the first distributed algorithm that returns a solution $S$ with the guarantee that $\mathbb{E}[f(S)] \geq (1-ε) [(1-e^{-1}) \cdot g(OPT)-\ell(OPT)]$ in $O(1/ ε)$ rounds of MapReduce computation, without keeping multiple copies of the entire dataset in each round (as it is usually done). We should highlight that our result, even for the unregularized case where the modular term $\ell$ is zero, improves the memory and communication complexity of the existing work by a factor of $O(1/ ε)$ while arguably provides a simpler distributed algorithm and a unifying analysis. We also empirically study the performance of our scalable methods on a set of real-life applications, including finding the mode of distributions, data summarization, and product recommendation.

preprint2016arXiv

Image Segmentation Using Overlapping Group Sparsity

Sparse decomposition has been widely used for different applications, such as source separation, image classification and image denoising. This paper presents a new algorithm for segmentation of an image into background and foreground text and graphics using sparse decomposition. First, the background is represented using a suitable smooth model, which is a linear combination of a few smoothly varying basis functions, and the foreground text and graphics are modeled as a sparse component overlaid on the smooth background. Then the background and foreground are separated using a sparse decomposition framework and imposing some prior information, which promote the smoothness of background, and the sparsity and connectivity of foreground pixels. This algorithm has been tested on a dataset of images extracted from HEVC standard test sequences for screen content coding, and is shown to outperform prior methods, including least absolute deviation fitting, k-means clustering based segmentation in DjVu, and shape primitive extraction and coding algorithm.

preprint2016arXiv

Palmprint Recognition Using Deep Scattering Convolutional Network

Palmprint recognition has drawn a lot of attention during the recent years. Many algorithms have been proposed for palmprint recognition in the past, majority of them being based on features extracted from the transform domain. Many of these transform domain features are not translation or rotation invariant, and therefore a great deal of preprocessing is needed to align the images. In this paper, a powerful image representation, called scattering network/transform, is used for palmprint recognition. Scattering network is a convolutional network where its architecture and filters are predefined wavelet transforms. The first layer of scattering network captures similar features to SIFT descriptors and the higher-layer features capture higher-frequency content of the signal which are lost in SIFT and other similar descriptors. After extraction of the scattering features, their dimensionality is reduced by applying principal component analysis (PCA) which reduces the computational complexity of the recognition task. Two different classifiers are used for recognition: multi-class SVM and minimum-distance classifier. The proposed scheme has been tested on a well-known palmprint database and achieved accuracy rate of 99.95% and 100% using minimum distance classifier and SVM respectively.

preprint2016arXiv

Screen Content Image Segmentation Using Robust Regression and Sparse Decomposition

This paper considers how to separate text and/or graphics from smooth background in screen content and mixed document images and proposes two approaches to perform this segmentation task. The proposed methods make use of the fact that the background in each block is usually smoothly varying and can be modeled well by a linear combination of a few smoothly varying basis functions, while the foreground text and graphics create sharp discontinuity. The algorithms separate the background and foreground pixels by trying to fit background pixel values in the block into a smooth function using two different schemes. One is based on robust regression, where the inlier pixels will be considered as background, while remaining outlier pixels will be considered foreground. The second approach uses a sparse decomposition framework where the background and foreground layers are modeled with a smooth and sparse components respectively. These algorithms have been tested on images extracted from HEVC standard test sequences for screen content coding, and are shown to have superior performance over previous approaches. The proposed methods can be used in different applications such as text extraction, separate coding of background and foreground for compression of screen content, and medical image segmentation.

preprint2016arXiv

Screen Content Image Segmentation Using Sparse Decomposition and Total Variation Minimization

Sparse decomposition has been widely used for different applications, such as source separation, image classification, image denoising and more. This paper presents a new algorithm for segmentation of an image into background and foreground text and graphics using sparse decomposition and total variation minimization. The proposed method is designed based on the assumption that the background part of the image is smoothly varying and can be represented by a linear combination of a few smoothly varying basis functions, while the foreground text and graphics can be modeled with a sparse component overlaid on the smooth background. The background and foreground are separated using a sparse decomposition framework regularized with a few suitable regularization terms which promotes the sparsity and connectivity of foreground pixels. This algorithm has been tested on a dataset of images extracted from HEVC standard test sequences for screen content coding, and is shown to have superior performance over some prior methods, including least absolute deviation fitting, k-means clustering based segmentation in DjVu and shape primitive extraction and coding (SPEC) algorithm.

preprint2015arXiv

A Robust Regression Approach for Background/Foreground Segmentation

Background/foreground segmentation has a lot of applications in image and video processing. In this paper, a segmentation algorithm is proposed which is mainly designed for text and line extraction in screen content. The proposed method makes use of the fact that the background in each block is usually smoothly varying and can be modeled well by a linear combination of a few smoothly varying basis functions, while the foreground text and graphics create sharp discontinuity. The algorithm separates the background and foreground pixels by trying to fit pixel values in the block into a smooth function using a robust regression method. The inlier pixels that can fit well will be considered as background, while remaining outlier pixels will be considered foreground. This algorithm has been extensively tested on several images from HEVC standard test sequences for screen content coding, and is shown to have superior performance over other methods, such as the k-means clustering based segmentation algorithm in DjVu. This background/foreground segmentation can be used in different applications such as: text extraction, separate coding of background and foreground for compression of screen content and mixed content documents, principle line extraction from palmprint and crease detection in fingerprint images.

preprint2015arXiv

Fingerprint Recognition Using Translation Invariant Scattering Network

Fingerprint recognition has drawn a lot of attention during last decades. Different features and algorithms have been used for fingerprint recognition in the past. In this paper, a powerful image representation called scattering transform/network, is used for recognition. Scattering network is a convolutional network where its architecture and filters are predefined wavelet transforms. The first layer of scattering representation is similar to sift descriptors and the higher layers capture higher frequency content of the signal. After extraction of scattering features, their dimensionality is reduced by applying principal component analysis (PCA). At the end, multi-class SVM is used to perform template matching for the recognition task. The proposed scheme is tested on a well-known fingerprint database and has shown promising results with the best accuracy rate of 98\%.

preprint2015arXiv

Highly Accurate Multispectral Palmprint Recognition Using Statistical and Wavelet Features

Palmprint is one of the most useful physiological biometrics that can be used as a powerful means in personal recognition systems. The major features of the palmprints are palm lines, wrinkles and ridges, and many approaches use them in different ways towards solving the palmprint recognition problem. Here we have proposed to use a set of statistical and wavelet-based features; statistical to capture the general characteristics of palmprints; and wavelet-based to find those information not evident in the spatial domain. Also we use two different classification approaches, minimum distance classifier scheme and weighted majority voting algorithm, to perform palmprint matching. The proposed method is tested on a well-known palmprint dataset of 6000 samples and has shown an impressive accuracy rate of 99.65\%-100\% for most scenarios.

preprint2015arXiv

Iris Recognition Using Scattering Transform and Textural Features

Iris recognition has drawn a lot of attention since the mid-twentieth century. Among all biometric features, iris is known to possess a rich set of features. Different features have been used to perform iris recognition in the past. In this paper, two powerful sets of features are introduced to be used for iris recognition: scattering transform-based features and textural features. PCA is also applied on the extracted features to reduce the dimensionality of the feature vector while preserving most of the information of its initial value. Minimum distance classifier is used to perform template matching for each new test sample. The proposed scheme is tested on a well-known iris database, and showed promising results with the best accuracy rate of 99.2%.

preprint2015arXiv

Multispectral Palmprint Recognition Using a Hybrid Feature

Personal identification problem has been a major field of research in recent years. Biometrics-based technologies that exploit fingerprints, iris, face, voice and palmprints, have been in the center of attention to solve this problem. Palmprints can be used instead of fingerprints that have been of the earliest of these biometrics technologies. A palm is covered with the same skin as the fingertips but has a larger surface, giving us more information than the fingertips. The major features of the palm are palm-lines, including principal lines, wrinkles and ridges. Using these lines is one of the most popular approaches towards solving the palmprint recognition problem. Another robust feature is the wavelet energy of palms. In this paper we used a hybrid feature which combines both of these features. %Moreover, multispectral analysis is applied to improve the performance of the system. At the end, minimum distance classifier is used to match test images with one of the training samples. The proposed algorithm has been tested on a well-known multispectral palmprint dataset and achieved an average accuracy of 98.8\%.

preprint2015arXiv

Multispectral Palmprint Recognition Using Textural Features

In order to utilize identification to the best extent, we need robust and fast algorithms and systems to process the data. Having palmprint as a reliable and unique characteristic of every person, we extract and use its features based on its geometry, lines and angles. There are countless ways to define measures for the recognition task. To analyze a new point of view, we extracted textural features and used them for palmprint recognition. Co-occurrence matrix can be used for textural feature extraction. As classifiers, we have used the minimum distance classifier (MDC) and the weighted majority voting system (WMV). The proposed method is tested on a well-known multispectral palmprint dataset of 6000 samples and an accuracy rate of 99.96-100% is obtained for most scenarios which outperforms all previous works in multispectral palmprint recognition.

preprint2015arXiv

On The Power of Joint Wavelet-DCT Features for Multispectral Palmprint Recognition

Biometric-based identification has drawn a lot of attention in the recent years. Among all biometrics, palmprint is known to possess a rich set of features. In this paper we have proposed to use DCT-based features in parallel with wavelet-based ones for palmprint identification. PCA is applied to the features to reduce their dimensionality and the majority voting algorithm is used to perform classification. The features introduced here result in a near-perfectly accurate identification. This method is tested on a well-known multispectral palmprint database and an accuracy rate of 99.97-100\% is achieved, outperforming all previous methods in similar conditions.

preprint2015arXiv

Screen Content Image Segmentation Using Sparse-Smooth Decomposition

Sparse decomposition has been extensively used for different applications including signal compression and denoising and document analysis. In this paper, sparse decomposition is used for image segmentation. The proposed algorithm separates the background and foreground using a sparse-smooth decomposition technique such that the smooth and sparse components correspond to the background and foreground respectively. This algorithm is tested on several test images from HEVC test sequences and is shown to have superior performance over other methods, such as the hierarchical k-means clustering in DjVu. This segmentation algorithm can also be used for text extraction, video compression and medical image segmentation.

preprint2014arXiv

A Geometric Approach For Fully Automatic Chromosome Segmentation

A fundamental task in human chromosome analysis is chromosome segmentation. Segmentation plays an important role in chromosome karyotyping. The first step in segmentation is to remove intrusive objects such as stain debris and other noises. The next step is detection of touching and overlapping chromosomes, and the final step is separation of such chromosomes. Common methods for separation between touching chromosomes are interactive and require human intervention for correct separation between touching and overlapping chromosomes. In this paper, a geometric-based method is used for automatic detection of touching and overlapping chromosomes and separating them. The proposed scheme performs segmentation in two phases. In the first phase, chromosome clusters are detected using three geometric criteria, and in the second phase, chromosome clusters are separated using a cut-line. Most of earlier methods did not work properly in case of chromosome clusters that contained more than two chromosomes. Our method, on the other hand, is quite efficient in separation of such chromosome clusters. At each step, one separation will be performed and this algorithm is repeated until all individual chromosomes are separated. Another important point about the proposed method is that it uses the geometric features of chromosomes which are independent of the type of images and it can easily be applied to any type of images such as binary images and does not require multispectral images as well. We have applied our method to a database containing 62 touching and partially overlapping chromosomes and a success rate of 91.9% is achieved.

Shervin Minaee

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

Modern Augmented Reality: Applications, Trends, and Future Directions

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

Biometrics Recognition Using Deep Learning: A Survey

Deep Learning Based Text Classification: A Comprehensive Review

COVID CT-Net: Predicting Covid-19 From Chest CT Images Using Attentional Convolutional Network

COVID TV-UNet: Segmenting COVID-19 Chest CT Images Using Connectivity Imposed U-Net

Covid-Transformer: Detecting COVID-19 Trending Topics on Twitter Using Universal Sentence Encoder

Palm-GAN: Generating Realistic Palmprint Images Using Total-Variation Regularized GAN

Regularized Submodular Maximization at Scale

Image Segmentation Using Overlapping Group Sparsity

Palmprint Recognition Using Deep Scattering Convolutional Network

Screen Content Image Segmentation Using Robust Regression and Sparse Decomposition

Screen Content Image Segmentation Using Sparse Decomposition and Total Variation Minimization

A Robust Regression Approach for Background/Foreground Segmentation

Fingerprint Recognition Using Translation Invariant Scattering Network

Highly Accurate Multispectral Palmprint Recognition Using Statistical and Wavelet Features

Iris Recognition Using Scattering Transform and Textural Features

Multispectral Palmprint Recognition Using a Hybrid Feature

Multispectral Palmprint Recognition Using Textural Features

On The Power of Joint Wavelet-DCT Features for Multispectral Palmprint Recognition

Screen Content Image Segmentation Using Sparse-Smooth Decomposition

A Geometric Approach For Fully Automatic Chromosome Segmentation