Source author record

Prerana Mukherjee

Prerana Mukherjee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Multimedia eess.AS eess.IV Graphics Information Retrieval Sound

Catalog footprint

What is connected

8works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

ASOC: Adaptive Self-aware Object Co-localization

The primary goal of this paper is to localize objects in a group of semantically similar images jointly, also known as the object co-localization problem. Most related existing works are essentially weakly-supervised, relying prominently on the neighboring images' weak-supervision. Although weak supervision is beneficial, it is not entirely reliable, for the results are quite sensitive to the neighboring images considered. In this paper, we combine it with a self-awareness phenomenon to mitigate this issue. By self-awareness here, we refer to the solution derived from the image itself in the form of saliency cue, which can also be unreliable if applied alone. Nevertheless, combining these two paradigms together can lead to a better co-localization ability. Specifically, we introduce a dynamic mediator that adaptively strikes a proper balance between the two static solutions to provide an optimal solution. Therefore, we call this method \textit{ASOC}: Adaptive Self-aware Object Co-localization. We perform exhaustive experiments on several benchmark datasets and validate that weak-supervision supplemented with self-awareness has superior performance outperforming several compared competing methods.

preprint2022arXiv

Generating Out of Distribution Adversarial Attack using Latent Space Poisoning

Traditional adversarial attacks rely upon the perturbations generated by gradients from the network which are generally safeguarded by gradient guided search to provide an adversarial counterpart to the network. In this paper, we propose a novel mechanism of generating adversarial examples where the actual image is not corrupted rather its latent space representation is utilized to tamper with the inherent structure of the image while maintaining the perceptual quality intact and to act as legitimate data samples. As opposed to gradient-based attacks, the latent space poisoning exploits the inclination of classifiers to model the independent and identical distribution of the training dataset and tricks it by producing out of distribution samples. We train a disentangled variational autoencoder (beta-VAE) to model the data in latent space and then we add noise perturbations using a class-conditioned distribution function to the latent space under the constraint that it is misclassified to the target label. Our empirical results on MNIST, SVHN, and CelebA dataset validate that the generated adversarial examples can easily fool robust l_0, l_2, l_inf norm classifiers designed using provably robust defense mechanisms.

preprint2022arXiv

MaskMTL: Attribute prediction in masked facial images with deep multitask learning

Predicting attributes in the landmark free facial images is itself a challenging task which gets further complicated when the face gets occluded due to the usage of masks. Smart access control gates which utilize identity verification or the secure login to personal electronic gadgets may utilize face as a biometric trait. Particularly, the Covid-19 pandemic increasingly validates the essentiality of hygienic and contactless identity verification. In such cases, the usage of masks become more inevitable and performing attribute prediction helps in segregating the target vulnerable groups from community spread or ensuring social distancing for them in a collaborative environment. We create a masked face dataset by efficiently overlaying masks of different shape, size and textures to effectively model variability generated by wearing mask. This paper presents a deep Multi-Task Learning (MTL) approach to jointly estimate various heterogeneous attributes from a single masked facial image. Experimental results on benchmark face attribute UTKFace dataset demonstrate that the proposed approach supersedes in performance to other competing techniques. The source code is available at https://github.com/ritikajha/Attribute-prediction-in-masked-facial-images-with-deep-multitask-learning

preprint2022arXiv

OCFormer: One-Class Transformer Network for Image Classification

We propose a novel deep learning framework based on Vision Transformers (ViT) for one-class classification. The core idea is to use zero-centered Gaussian noise as a pseudo-negative class for latent space representation and then train the network using the optimal loss function. In prior works, there have been tremendous efforts to learn a good representation using varieties of loss functions, which ensures both discriminative and compact properties. The proposed one-class Vision Transformer (OCFormer) is exhaustively experimented on CIFAR-10, CIFAR-100, Fashion-MNIST and CelebA eyeglasses datasets. Our method has shown significant improvements over competing CNN based one-class classifier approaches.

preprint2020arXiv

AnimePose: Multi-person 3D pose estimation and animation

3D animation of humans in action is quite challenging as it involves using a huge setup with several motion trackers all over the person's body to track the movements of every limb. This is time-consuming and may cause the person discomfort in wearing exoskeleton body suits with motion sensors. In this work, we present a trivial yet effective solution to generate 3D animation of multiple persons from a 2D video using deep learning. Although significant improvement has been achieved recently in 3D human pose estimation, most of the prior works work well in case of single person pose estimation and multi-person pose estimation is still a challenging problem. In this work, we firstly propose a supervised multi-person 3D pose estimation and animation framework namely AnimePose for a given input RGB video sequence. The pipeline of the proposed system consists of various modules: i) Person detection and segmentation, ii) Depth Map estimation, iii) Lifting 2D to 3D information for person localization iv) Person trajectory prediction and human pose tracking. Our proposed system produces comparable results on previous state-of-the-art 3D multi-person pose estimation methods on publicly available datasets MuCo-3DHP and MuPoTS-3D datasets and it also outperforms previous state-of-the-art human pose tracking methods by a significant margin of 11.7% performance gain on MOTA score on Posetrack 2018 dataset.

preprint2020arXiv

Attentional networks for music generation

Realistic music generation has always remained as a challenging problem as it may lack structure or rationality. In this work, we propose a deep learning based music generation method in order to produce old style music particularly JAZZ with rehashed melodic structures utilizing a Bi-directional Long Short Term Memory (Bi-LSTM) Neural Network with Attention. Owing to the success in modelling long-term temporal dependencies in sequential data and its success in case of videos, Bi-LSTMs with attention serve as the natural choice and early utilization in music generation. We validate in our experiments that Bi-LSTMs with attention are able to preserve the richness and technical nuances of the music performed.

preprint2020arXiv

Semantics Preserving Hierarchy based Retrieval of Indian heritage monuments

Monument classification can be performed on the basis of their appearance and shape from coarse to fine categories. Although there is much semantic information present in the monuments which is reflected in the eras they were built, its type or purpose, the dynasty which established it, etc. Particularly, Indian subcontinent exhibits a huge deal of variation in terms of architectural styles owing to its rich cultural heritage. In this paper, we propose a framework that utilizes hierarchy to preserve semantic information while performing image classification or image retrieval. We encode the learnt deep embeddings to construct a dictionary of images and then utilize a re-ranking framework on the the retrieved results using DeLF features. The semantic information preserved in these embeddings helps to classify unknown monuments at higher level of granularity in hierarchy. We have curated a large, novel Indian heritage monuments dataset comprising of images of historical, cultural and religious importance with subtypes of eras, dynasties and architectural styles. We demonstrate the performance of the proposed framework in image classification and retrieval tasks and compare it with other competing methods on this dataset.

preprint2015arXiv

Benchmarking KAZE and MCM for Multiclass Classification

In this paper, we propose a novel approach for feature generation by appropriately fusing KAZE and SIFT features. We then use this feature set along with Minimal Complexity Machine(MCM) for object classification. We show that KAZE and SIFT features are complementary. Experimental results indicate that an elementary integration of these techniques can outperform the state-of-the-art approaches.

Prerana Mukherjee

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

ASOC: Adaptive Self-aware Object Co-localization

Generating Out of Distribution Adversarial Attack using Latent Space Poisoning

MaskMTL: Attribute prediction in masked facial images with deep multitask learning

OCFormer: One-Class Transformer Network for Image Classification

AnimePose: Multi-person 3D pose estimation and animation

Attentional networks for music generation

Semantics Preserving Hierarchy based Retrieval of Indian heritage monuments

Benchmarking KAZE and MCM for Multiclass Classification