Source author record

Ata Mahjoubfar

Ata Mahjoubfar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning physics.optics Artificial Intelligence Computation and Language Emerging Technologies Neural and Evolutionary Computing physics.data-an

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Designing an Efficient End-to-end Machine Learning Pipeline for Real-time Empty-shelf Detection

On-Shelf Availability (OSA) of products in retail stores is a critical business criterion in the fast moving consumer goods and retails sector. When a product is out-of-stock (OOS) and a customer cannot find it on its designed shelf, this motivates the customer to store-switching or buying nothing, which causes fall in future sales and demands. Retailers are employing several approaches to detect empty shelves and ensure high OSA of products; however, such methods are generally ineffective and infeasible since they are either manual, expensive or less accurate. Recently machine learning based solutions have been proposed, but they suffer from high computational cost and low accuracy problem due to lack of large annotated datasets of on-shelf products. Here, we present an elegant approach for designing an end-to-end machine learning (ML) pipeline for real-time empty shelf detection. Considering the strong dependency between the quality of ML models and the quality of data, we focus on the importance of proper data collection, cleaning and correct data annotation before delving into modeling. Since an empty-shelf detection solution should be computationally-efficient for real-time predictions, we explore different run-time optimizations to improve the model performance. Our dataset contains 1000 images, collected and annotated by following well-defined guidelines. Our low-latency model achieves a mean average F1-score of 68.5%, and can process up to 67 images/s on Intel Xeon Gold and up to 860 images/s on an A100 GPU.

preprint2022arXiv

FashionVQA: A Domain-Specific Visual Question Answering System

Humans apprehend the world through various sensory modalities, yet language is their predominant communication channel. Machine learning systems need to draw on the same multimodal richness to have informed discourses with humans in natural language; this is particularly true for systems specialized in visually-dense information, such as dialogue, recommendation, and search engines for clothing. To this end, we train a visual question answering (VQA) system to answer complex natural language questions about apparel in fashion photoshoot images. The key to the successful training of our VQA model is the automatic creation of a visual question-answering dataset with 168 million samples from item attributes of 207 thousand images using diverse templates. The sample generation employs a strategy that considers the difficulty of the question-answer pairs to emphasize challenging concepts. Contrary to the recent trends in using several datasets for pretraining the visual question answering models, we focused on keeping the dataset fixed while training various models from scratch to isolate the improvements from model architecture changes. We see that using the same transformer for encoding the question and decoding the answer, as in language models, achieves maximum accuracy, showing that visual language models (VLMs) make the best visual question answering systems for our dataset. The accuracy of the best model surpasses the human expert level, even when answering human-generated questions that are not confined to the template formats. Our approach for generating a large-scale multimodal domain-specific dataset provides a path for training specialized models capable of communicating in natural language. The training of such domain-expert models, e.g., our fashion VLM model, cannot rely solely on the large-scale general-purpose datasets collected from the web.

preprint2021arXiv

Neuromorphic Computing for Content-based Image Retrieval

Neuromorphic computing mimics the neural activity of the brain through emulating spiking neural networks. In numerous machine learning tasks, neuromorphic chips are expected to provide superior solutions in terms of cost and power efficiency. Here, we explore the application of Loihi, a neuromorphic computing chip developed by Intel, for the computer vision task of image retrieval. We evaluated the functionalities and the performance metrics that are critical in content-based visual search and recommender systems using deep-learning embeddings. Our results show that the neuromorphic solution is about 2.5 times more energy-efficient compared with an ARM Cortex-A72 CPU and 12.5 times more energy-efficient compared with NVIDIA T4 GPU for inference by a lightweight convolutional neural network without batching while maintaining the same level of matching accuracy. The study validates the potential of neuromorphic computing in low-power image retrieval, as a complementary paradigm to the existing von Neumann architectures.

preprint2015arXiv

High-throughput Biological Cell Classification Featuring Real-time Optical Data Compression

High throughput real-time instruments are needed to acquire large data sets for detection and classification of rare events. Enabled by the photonic time stretch digitizer, a new class of instruments with record throughputs have led to the discovery of optical rogue waves [1], detection of rare cancer cells [2], and the highest analog-to-digital conversion performance ever achieved [3]. Featuring continuous operation at 100 million frames per second and shutter speed of less than a nanosecond, the time stretch camera is ideally suited for screening of blood and other biological samples. It has enabled detection of breast cancer cells in blood with record, one-in-a-million, sensitivity [2]. Owing to their high real-time throughput, instruments produce a torrent of data - equivalent to several 4K movies per second - that overwhelm data acquisition, storage, and processing operations. This predicament calls for technologies that compress images in optical domain and in real-time. An example of this, based on warped stretch transformation and non-uniform Fourier domain sampling will be reported.

preprint2015arXiv

Optical Hardware Accelerators using Nonlinear Dispersion Modes for Energy Efficient Computing

This paper proposes a new class of hardware accelerators to alleviate bottlenecks in the acquisition, analytics, storage and computation of information carried by wideband streaming signals.

preprint2014arXiv

Reconstruction in Time-Bandwidth Compression Systems

Recently it has been shown that the intensity time-bandwidth product of optical signals can be engineered to match that of the data acquisition instrument. In particular, it is possible to slow down an ultrafast signal, resulting in compressed RF bandwidth - a similar benefit to that offered by the Time-Stretch Dispersive Fourier Transform (TS-DFT) - but with reduced temporal record length leading to time-bandwidth compression. The compression is implemented using a warped group delay dispersion leading to non-uniform time stretching of the signal's intensity envelope. Decoding requires optical phase retrieval and reconstruction of the input temporal profile, for the case where information of interest is resides in the complex field. In this paper, we present results on the general behavior of the reconstruction process and its dependence on the signal-to-noise ratio. We also discuss the role of chirp in the input signal.