Researcher profile

Mansi Sharma

Mansi Sharma contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

A High Resolution Multi-exposure Stereoscopic Image & Video Database of Natural Scenes

Immersive displays such as VR headsets, AR glasses, Multiview displays, Free point televisions have emerged as a new class of display technologies in recent years, offering a better visual experience and viewer engagement as compared to conventional displays. With the evolution of 3D video and display technologies, the consumer market for High Dynamic Range (HDR) cameras and displays is quickly growing. The lack of appropriate experimental data is a critical hindrance for the development of primary research efforts in the field of 3D HDR video technology. Also, the unavailability of sufficient real world multi-exposure experimental dataset is a major bottleneck for HDR imaging research, thereby limiting the quality of experience (QoE) for the viewers. In this paper, we introduce a diversified stereoscopic multi-exposure dataset captured within the campus of Indian Institute of Technology Madras, which is home to a diverse flora and fauna. The dataset is captured using ZED stereoscopic camera and provides intricate scenes of outdoor locations such as gardens, roadside views, festival venues, buildings and indoor locations such as academic and residential areas. The proposed dataset accommodates wide depth range, complex depth structure, complicate object movement, illumination variations, rich color dynamics, texture discrepancy in addition to significant randomness introduced by moving camera and background motion. The proposed dataset is made publicly available to the research community. Furthermore, the procedure for capturing, aligning and calibrating multi-exposure stereo videos and images is described in detail. Finally, we have discussed the progress, challenges, potential use cases and future research opportunities with respect to HDR imaging, depth estimation, consistent tone mapping and 3D HDR coding.

preprint2022arXiv

A Robust and Scalable Attention Guided Deep Learning Framework for Movement Quality Assessment

Physical rehabilitation programs frequently begin with a brief stay in the hospital and continue with home-based rehabilitation. Lack of feedback on exercise correctness is a significant issue in home-based rehabilitation. Automated movement quality assessment (MQA) using skeletal movement data (hereafter referred to as skeletal data) collected via depth imaging devices can assist with home-based rehabilitation by providing the necessary quantitative feedback. This paper aims to use recent advances in deep learning to address the problem of MQA. Movement quality score generation is an essential component of MQA. We propose three novel skeletal data augmentation schemes. We show that using the proposed augmentations for generating movement quality scores result in significant performance boosts over existing methods. Finally, we propose a novel transformer based architecture for MQA. Four novel feature extractors are proposed and studied that allow the transformer network to operate on skeletal data. We show that adding the attention mechanism in the design of the proposed feature extractor allows the transformer network to pay attention to specific body parts that make a significant contribution towards executing a movement. We report an improvement in movement quality score prediction of 12% on UI-PRMD dataset and 21% on KIMORE dataset compared to the existing methods.

preprint2022arXiv

An Integrated Representation & Compression Scheme Based on Convolutional Autoencoders with 4D DCT Perceptual Encoding for High Dynamic Range Light Fields

The emerging and existing light field displays are highly capable of realistic presentation of 3D scenes on auto-stereoscopic glasses-free platforms. The light field size is a major drawback while utilising 3D displays and streaming purposes. When a light field is of high dynamic range, the size increases drastically. In this paper, we propose a novel compression algorithm for a high dynamic range light field which yields a perceptually lossless compression. The algorithm exploits the inter and intra view correlations of the HDR light field by interpreting it to be a four-dimension volume. The HDR light field compression is based on a novel 4DDCT-UCS (4D-DCT Uniform Colour Space) algorithm. Additional encoding of 4DDCT-UCS acquired images by HEVC eliminates intra-frame, inter-frame and intrinsic redundancies in HDR light field data. Comparison with state-of-the-art coders like JPEG-XL and HDR video coding algorithm exhibits superior compression performance of the proposed scheme for real-world light fields.

preprint2022arXiv

MEStereo-Du2CNN: A Novel Dual Channel CNN for Learning Robust Depth Estimates from Multi-exposure Stereo Images for HDR 3D Applications

Display technologies have evolved over the years. It is critical to develop practical HDR capturing, processing, and display solutions to bring 3D technologies to the next level. Depth estimation of multi-exposure stereo image sequences is an essential task in the development of cost-effective 3D HDR video content. In this paper, we develop a novel deep architecture for multi-exposure stereo depth estimation. The proposed architecture has two novel components. First, the stereo matching technique used in traditional stereo depth estimation is revamped. For the stereo depth estimation component of our architecture, a mono-to-stereo transfer learning approach is deployed. The proposed formulation circumvents the cost volume construction requirement, which is replaced by a ResNet based dual-encoder single-decoder CNN with different weights for feature fusion. EfficientNet based blocks are used to learn the disparity. Secondly, we combine disparity maps obtained from the stereo images at different exposure levels using a robust disparity feature fusion approach. The disparity maps obtained at different exposures are merged using weight maps calculated for different quality measures. The final predicted disparity map obtained is more robust and retains best features that preserve the depth discontinuities. The proposed CNN offers flexibility to train using standard dynamic range stereo data or with multi-exposure low dynamic range stereo sequences. In terms of performance, the proposed model surpasses state-of-the-art monocular and stereo depth estimation methods, both quantitatively and qualitatively, on challenging Scene flow and differently exposed Middlebury stereo datasets. The architecture performs exceedingly well on complex natural scenes, demonstrating its usefulness for diverse 3D HDR applications.

preprint2022arXiv

Tactile-ViewGCN: Learning Shape Descriptor from Tactile Data using Graph Convolutional Network

For humans, our "senses of touch" have always been necessary for our ability to precisely and efficiently manipulate objects of all shapes in any environment, but until recently, not many works have been done to fully understand haptic feedback. This work proposed a novel method for getting a better shape descriptor than existing methods for classifying an object from multiple tactile data collected from a tactile glove. It focuses on improving previous works on object classification using tactile data. The major problem for object classification from multiple tactile data is to find a good way to aggregate features extracted from multiple tactile images. We propose a novel method, dubbed as Tactile-ViewGCN, that hierarchically aggregate tactile features considering relations among different features by using Graph Convolutional Network. Our model outperforms previous methods on the STAG dataset with an accuracy of 81.82%.

preprint2021arXiv

Latent Factor Modeling of Users Subjective Perception for Stereoscopic 3D Video Recommendation

Numerous stereoscopic 3D movies are released every year to theaters and created large revenues. Despite the improvement in stereo capturing and 3D video post-production technology, stereoscopic artifacts which cause viewer discomfort continue to appear even in high-budget films. Existing automatic 3D video quality measurement tools can detect distortions in stereoscopic images or videos, but they fail to consider the viewer's subjective perception of those artifacts, and how these distortions affect their choices. In this paper, we introduce a novel recommendation system for stereoscopic 3D movies based on a latent factor model that meticulously analyse the viewer's subjective ratings and influence of 3D video distortions on their preferences. To the best of our knowledge, this is a first-of-its-kind model that recommends 3D movies based on stereo-film quality ratings accounting correlation between the viewer's visual discomfort and stereoscopic-artifact perception. The proposed model is trained and tested on benchmark Nama3ds1-cospad1 and LFOVIAS3DPh2 S3D video quality assessment datasets. The experiments revealed that resulting matrix-factorization based recommendation system is able to generalize considerably better for the viewer's subjective ratings.

preprint2010arXiv

Bio-Authentication based Secure Transmission System using Steganography

Biometrics deals with identity verification of an individual by using certain physiological or behavioral features associated with a person. Biometric identification systems using fingerprints patterns are called AFIS (Automatic Fingerprint Identification System). In this paper a composite method for Fingerprint recognition is considered using a combination of Fast Fourier Transform (FFT) and Sobel Filters for improvement of a poor quality fingerprint image. Steganography hides messages inside other messages in such a way that an "adversary" would not even know a secret message were present. The objective of our paper is to make a bio-secure system. In this paper bio-authentication has been implemented in terms of finger print recognition and the second part of the paper is an interactive steganographic system hides the user's data by two options- creating a songs list or hiding the data in an image.