Source author record

Shanmuganathan Raman

Shanmuganathan Raman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Machine Learning Graphics

Catalog footprint

What is connected

12works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FlowIID: Single-Step Intrinsic Image Decomposition via Latent Flow Matching

Intrinsic Image Decomposition (IID) separates an image into albedo and shading components. It is a core step in many real-world applications, such as relighting and material editing. Existing IID models achieve good results, but often use a large number of parameters. This makes them costly to combine with other models in real-world settings. To address this problem, we propose a flow matching-based solution. For this, we design a novel architecture, FlowIID, based on latent flow matching. FlowIID combines a VAE-guided latent space with a flow matching module, enabling a stable decomposition of albedo and shading. FlowIID is not only parameter-efficient, but also produces results in a single inference step. Despite its compact design, FlowIID delivers competitive and superior results compared to existing models across various benchmarks. This makes it well-suited for deployment in resource-constrained and real-time vision applications.

preprint2022arXiv

DeepPS2: Revisiting Photometric Stereo Using Two Differently Illuminated Images

Photometric stereo, a problem of recovering 3D surface normals using images of an object captured under different lightings, has been of great interest and importance in computer vision research. Despite the success of existing traditional and deep learning-based methods, it is still challenging due to: (i) the requirement of three or more differently illuminated images, (ii) the inability to model unknown general reflectance, and (iii) the requirement of accurate 3D ground truth surface normals and known lighting information for training. In this work, we attempt to address an under-explored problem of photometric stereo using just two differently illuminated images, referred to as the PS2 problem. It is an intermediate case between a single image-based reconstruction method like Shape from Shading (SfS) and the traditional Photometric Stereo (PS), which requires three or more images. We propose an inverse rendering-based deep learning framework, called DeepPS2, that jointly performs surface normal, albedo, lighting estimation, and image relighting in a completely self-supervised manner with no requirement of ground truth data. We demonstrate how image relighting in conjunction with image reconstruction enhances the lighting estimation in a self-supervised setting.

preprint2021arXiv

APEX-Net: Automatic Plot Extractor Network

Automatic extraction of raw data from 2D line plot images is a problem of great importance having many real-world applications. Several algorithms have been proposed for solving this problem. However, these algorithms involve a significant amount of human intervention. To minimize this intervention, we propose APEX-Net, a deep learning based framework with novel loss functions for solving the plot extraction problem. We introduce APEX-1M, a new large scale dataset which contains both the plot images and the raw data. We demonstrate the performance of APEX-Net on the APEX-1M test set and show that it obtains impressive accuracy. We also show visual results of our network on unseen plot images and demonstrate that it extracts the shape of the plots to a great extent. Finally, we develop a GUI based software for plot extraction that can benefit the community at large. For dataset and more information visit https://sites.google.com/view/apexnetpaper/.

preprint2020arXiv

Deep No-reference Tone Mapped Image Quality Assessment

The process of rendering high dynamic range (HDR) images to be viewed on conventional displays is called tone mapping. However, tone mapping introduces distortions in the final image which may lead to visual displeasure. To quantify these distortions, we introduce a novel no-reference quality assessment technique for these tone mapped images. This technique is composed of two stages. In the first stage, we employ a convolutional neural network (CNN) to generate quality aware maps (also known as distortion maps) from tone mapped images by training it with the ground truth distortion maps. In the second stage, we model the normalized image and distortion maps using an Asymmetric Generalized Gaussian Distribution (AGGD). The parameters of the AGGD model are then used to estimate the quality score using support vector regression (SVR). We show that the proposed technique delivers competitive performance relative to the state-of-the-art techniques. The novelty of this work is its ability to visualize various distortions as quality maps (distortion maps), especially in the no-reference setting, and to use these maps as features to estimate the quality score of tone mapped images.

preprint2020arXiv

Depthwise-STFT based separable Convolutional Neural Networks

In this paper, we propose a new convolutional layer called Depthwise-STFT Separable layer that can serve as an alternative to the standard depthwise separable convolutional layer. The construction of the proposed layer is inspired by the fact that the Fourier coefficients can accurately represent important features such as edges in an image. It utilizes the Fourier coefficients computed (channelwise) in the 2D local neighborhood (e.g., 3x3) of each position of the input map to obtain the feature maps. The Fourier coefficients are computed using 2D Short Term Fourier Transform (STFT) at multiple fixed low frequency points in the 2D local neighborhood at each position. These feature maps at different frequency points are then linearly combined using trainable pointwise (1x1) convolutions. We show that the proposed layer outperforms the standard depthwise separable layer-based models on the CIFAR-10 and CIFAR-100 image classification datasets with reduced space-time complexity.

preprint2020arXiv

Yoga-82: A New Dataset for Fine-grained Classification of Human Poses

Human pose estimation is a well-known problem in computer vision to locate joint positions. Existing datasets for the learning of poses are observed to be not challenging enough in terms of pose diversity, object occlusion, and viewpoints. This makes the pose annotation process relatively simple and restricts the application of the models that have been trained on them. To handle more variety in human poses, we propose the concept of fine-grained hierarchical pose classification, in which we formulate the pose estimation as a classification task, and propose a dataset, Yoga-82, for large-scale yoga pose recognition with 82 classes. Yoga-82 consists of complex poses where fine annotations may not be possible. To resolve this, we provide hierarchical labels for yoga poses based on the body configuration of the pose. The dataset contains a three-level hierarchy including body positions, variations in body positions, and the actual pose names. We present the classification accuracy of the state-of-the-art convolutional neural network architectures on Yoga-82. We also present several hierarchical variants of DenseNet in order to utilize the hierarchical labels.

preprint2016arXiv

Automatic Content-aware Non-Photorealistic Rendering of Images

Non-photorealistic rendering techniques work on image features and often manipulate a set of characteristics such as edges and texture to achieve a desired depiction of the scene. Most computational photography methods decompose an image using edge preserving filters and work on the resulting base and detail layers independently to achieve desired visual effects. We propose a new approach for content-aware non-photorealistic rendering of images where we manipulate the visually salient and the non-salient regions separately. We propose a novel content-aware framework in order to render an image for applications such as detail exaggeration, artificial blurring and image abstraction. The processed regions of the image are blended seamlessly for all these applications. We demonstrate that content awareness of the proposed method leads to automatic generation of non-photorealistic rendering of the same image for the different applications mentioned above.

preprint2016arXiv

Automatic Segmentation of Dynamic Objects from an Image Pair

Automatic segmentation of objects from a single image is a challenging problem which generally requires training on large number of images. We consider the problem of automatically segmenting only the dynamic objects from a given pair of images of a scene captured from different positions. We exploit dense correspondences along with saliency measures in order to first localize the interest points on the dynamic objects from the two images. We propose a novel approach based on techniques from computational geometry in order to automatically segment the dynamic objects from both the images using a top-down segmentation strategy. We discuss how the proposed approach is unique in novelty compared to other state-of-the-art segmentation algorithms. We show that the proposed approach for segmentation is efficient in handling large motions and is able to achieve very good segmentation of the objects for different scenes. We analyse the results with respect to the manually marked ground truth segmentation masks created using our own dataset and provide key observations in order to improve the work in future.

preprint2016arXiv

Zero Shot Hashing

This paper provides a framework to hash images containing instances of unknown object classes. In many object recognition problems, we might have access to huge amount of data. It may so happen that even this huge data doesn't cover the objects belonging to classes that we see in our day to day life. Zero shot learning exploits auxiliary information (also called as signatures) in order to predict the labels corresponding to unknown classes. In this work, we attempt to generate the hash codes for images belonging to unseen classes, information of which is available only through the textual corpus. We formulate this as an unsupervised hashing formulation as the exact labels are not available for the instances of unseen classes. We show that the proposed solution is able to generate hash codes which can predict labels corresponding to unseen classes with appreciably good precision.

preprint2015arXiv

Effective Object Tracking in Unstructured Crowd Scenes

In this paper, we are presenting a rotation variant Oriented Texture Curve (OTC) descriptor based mean shift algorithm for tracking an object in an unstructured crowd scene. The proposed algorithm works by first obtaining the OTC features for a manually selected object target, then a visual vocabulary is created by using all the OTC features of the target. The target histogram is obtained using codebook encoding method which is then used in mean shift framework to perform similarity search. Results are obtained on different videos of challenging scenes and the comparison of the proposed approach with several state-of-the-art approaches are provided. The analysis shows the advantages and limitations of the proposed approach for tracking an object in unstructured crowd scenes.

preprint2015arXiv

SA-CNN: Dynamic Scene Classification using Convolutional Neural Networks

The task of classifying videos of natural dynamic scenes into appropriate classes has gained lot of attention in recent years. The problem especially becomes challenging when the camera used to capture the video is dynamic. In this paper, we analyse the performance of statistical aggregation (SA) techniques on various pre-trained convolutional neural network(CNN) models to address this problem. The proposed approach works by extracting CNN activation features for a number of frames in a video and then uses an aggregation scheme in order to obtain a robust feature descriptor for the video. We show through results that the proposed approach performs better than the-state-of-the arts for the Maryland and YUPenn dataset. The final descriptor obtained is powerful enough to distinguish among dynamic scenes and is even capable of addressing the scenario where the camera motion is dominant and the scene dynamics are complex. Further, this paper shows an extensive study on the performance of various aggregation methods and their combinations. We compare the proposed approach with other dynamic scene classification algorithms on two publicly available datasets - Maryland and YUPenn to demonstrate the superior performance of the proposed approach.

preprint2013arXiv

Efficient Image Retargeting for High Dynamic Range Scenes

Most of the real world scenes have a very high dynamic range (HDR). The mobile phone cameras and the digital cameras available in markets are limited in their capability in both the range and spatial resolution. Same argument can be posed about the limited dynamic range display devices which also differ in the spatial resolution and aspect ratios. In this paper, we address the problem of displaying the high contrast low dynamic range (LDR) image of a HDR scene in a display device which has different spatial resolution compared to that of the capturing digital camera. The optimal solution proposed in this work can be employed with any camera which has the ability to shoot multiple differently exposed images of a scene. Further, the proposed solutions provide the flexibility in the depiction of entire contrast of the HDR scene as a LDR image with an user specified spatial resolution. This task is achieved through an optimized content aware retargeting framework which preserves salient features along with the algorithm to combine multi-exposure images. We show the proposed approach performs exceedingly well in the generation of high contrast LDR image of varying spatial resolution compared to an alternate approach.

Shanmuganathan Raman

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

FlowIID: Single-Step Intrinsic Image Decomposition via Latent Flow Matching

DeepPS2: Revisiting Photometric Stereo Using Two Differently Illuminated Images

APEX-Net: Automatic Plot Extractor Network

Deep No-reference Tone Mapped Image Quality Assessment

Depthwise-STFT based separable Convolutional Neural Networks

Yoga-82: A New Dataset for Fine-grained Classification of Human Poses

Automatic Content-aware Non-Photorealistic Rendering of Images

Automatic Segmentation of Dynamic Objects from an Image Pair

Zero Shot Hashing

Effective Object Tracking in Unstructured Crowd Scenes

SA-CNN: Dynamic Scene Classification using Convolutional Neural Networks

Efficient Image Retargeting for High Dynamic Range Scenes