Source author record

Saurabh Gupta

Saurabh Gupta appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision hep-th Machine Learning Robotics Artificial Intelligence Computation and Language hep-ex hep-ph Genomics physics.flu-dyn Information Retrieval math-ph math.MP nlin.SI quant-ph Quantitative Methods Social and Information Networks

Catalog footprint

What is connected

43works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Detection of Distracted Driver using Convolution Neural Network

With over 50 million car sales annually and over 1.3 million deaths every year due to motor accidents we have chosen this space. India accounts for 11 per cent of global death in road accidents. Drivers are held responsible for 78% of accidents. Road safety problems in developing countries is a major concern and human behavior is ascribed as one of the main causes and accelerators of road safety problems. Driver distraction has been identified as the main reason for accidents. Distractions can be caused due to reasons such as mobile usage, drinking, operating instruments, facial makeup, social interaction. For the scope of this project, we will focus on building a highly efficient ML model to classify different driver distractions at runtime using computer vision. We would also analyze the overall speed and scalability of the model in order to be able to set it up on an edge device. We use CNN, VGG-16, RestNet50 and ensemble of CNN to predict the classes.

preprint2022arXiv

Human Hands as Probes for Interactive Object Understanding

Interactive object understanding, or what we can do to objects and how is a long-standing goal of computer vision. In this paper, we tackle this problem through observation of human hands in in-the-wild egocentric videos. We demonstrate that observation of what human hands interact with and how can provide both the relevant data and the necessary supervision. Attending to hands, readily localizes and stabilizes active objects for learning and reveals places where interactions with objects occur. Analyzing the hands shows what we can do to objects and how. We apply these basic principles on the EPIC-KITCHENS dataset, and successfully learn state-sensitive features, and object affordances (regions of interaction and afforded grasps), purely by observing hands in egocentric videos.

preprint2022arXiv

Learning Value Functions from Undirected State-only Experience

This paper tackles the problem of learning value functions from undirected state-only experience (state transitions without action labels i.e. (s,s',r) tuples). We first theoretically characterize the applicability of Q-learning in this setting. We show that tabular Q-learning in discrete Markov decision processes (MDPs) learns the same value function under any arbitrary refinement of the action space. This theoretical result motivates the design of Latent Action Q-learning or LAQ, an offline RL method that can learn effective value functions from state-only experience. Latent Action Q-learning (LAQ) learns value functions using Q-learning on discrete latent actions obtained through a latent-variable future prediction model. We show that LAQ can recover value functions that have high correlation with value functions learned using ground truth actions. Value functions learned using LAQ lead to sample efficient acquisition of goal-directed behavior, can be used with domain-specific low-level controllers, and facilitate transfer across embodiments. Our experiments in 5 environments ranging from 2D grid world to 3D visual navigation in realistic environments demonstrate the benefits of LAQ over simpler alternatives, imitation learning oracles, and competing methods.

preprint2022arXiv

On-Device CPU Scheduling for Sense-React Systems

Sense-react systems (e.g. robotics and AR/VR) have to take highly responsive real-time actions, driven by complex decisions involving a pipeline of sensing, perception, planning, and reaction tasks. These tasks must be scheduled on resource-constrained devices such that the performance goals and the requirements of the application are met. This is a difficult scheduling problem that requires handling multiple scheduling dimensions, and variations in resource usage and availability. In practice, system designers manually tune parameters for their specific hardware and application, which results in poor generalization and increases the development burden. In this work, we highlight the emerging need for scheduling CPU resources at runtime in sense-react systems. We study three canonical applications (face tracking, robot navigation, and VR) to first understand the key scheduling requirements for such systems. Armed with this understanding, we develop a scheduling framework, Catan, that dynamically schedules compute resources across different components of an app so as to meet the specified application requirements. Through experiments with a prototype implemented on a widely-used robotics framework (ROS) and an open-source AR/VR platform, we show the impact of system scheduling on meeting the performance goals for the three applications, how Catan is able to achieve better application performance than hand-tuned configurations, and how it dynamically adapts to runtime variations.

preprint2022arXiv

Particle on a torus knot: Symplectic analysis

We quantize a particle confined to move on a torus knot satisfying constraint condition ($p θ+ q ϕ) \approx 0$, within the context of a geometrically motivated approach - the Faddeev-Jackiw formalism. We also deduce the constraint spectrum and discern the basic brackets of the theory. We further reformulate the original gauge non-invariant theory into a physically equivalent gauge theory, which is free from any additional Wess-Zumino variables, by employing symplectic gauge invariant formalism. In addition, we analyze the reformulated gauge invariant theory within the framework of BRST formalism to establish the off-shell nilpotent and absolutely anti-commuting (anti-)BRST symmetries. Finally, we construct the conserved (anti-)BRST charges which satisfy the physicality criteria and turn out to be the generators of corresponding symmetries.

preprint2022arXiv

TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors

We introduce TIDEE, an embodied agent that tidies up a disordered scene based on learned commonsense object placement and room arrangement priors. TIDEE explores a home environment, detects objects that are out of their natural place, infers plausible object contexts for them, localizes such contexts in the current scene, and repositions the objects. Commonsense priors are encoded in three modules: i) visuo-semantic detectors that detect out-of-place objects, ii) an associative neural graph memory of objects and spatial relations that proposes plausible semantic receptacles and surfaces for object repositions, and iii) a visual search network that guides the agent's exploration for efficiently localizing the receptacle-of-interest in the current scene to reposition the object. We test TIDEE on tidying up disorganized scenes in the AI2THOR simulation environment. TIDEE carries out the task directly from pixel and raw depth input without ever having observed the same room beforehand, relying only on priors learned from a separate set of training houses. Human evaluations on the resulting room reorganizations show TIDEE outperforms ablative versions of the model that do not use one or more of the commonsense priors. On a related room rearrangement benchmark that allows the agent to view the goal state prior to rearrangement, a simplified version of our model significantly outperforms a top-performing method by a large margin. Code and data are available at the project website: https://tidee-agent.github.io/.

preprint2020arXiv

Aligning Videos in Space and Time

In this paper, we focus on the task of extracting visual correspondences across videos. Given a query video clip from an action class, we aim to align it with training videos in space and time. Obtaining training data for such a fine-grained alignment task is challenging and often ambiguous. Hence, we propose a novel alignment procedure that learns such correspondence in space and time via cross video cycle-consistency. During training, given a pair of videos, we compute cycles that connect patches in a given frame in the first video by matching through frames in the second video. Cycles that connect overlapping patches together are encouraged to score higher than cycles that connect non-overlapping patches. Our experiments on the Penn Action and Pouring datasets demonstrate that the proposed method can successfully learn to correspond semantically similar patches across videos, and learns representations that are sensitive to object and action states.

preprint2020arXiv

Efficient Bimanual Manipulation Using Learned Task Schemas

We address the problem of effectively composing skills to solve sparse-reward tasks in the real world. Given a set of parameterized skills (such as exerting a force or doing a top grasp at a location), our goal is to learn policies that invoke these skills to efficiently solve such tasks. Our insight is that for many tasks, the learning process can be decomposed into learning a state-independent task schema (a sequence of skills to execute) and a policy to choose the parameterizations of the skills in a state-dependent manner. For such tasks, we show that explicitly modeling the schema's state-independence can yield significant improvements in sample efficiency for model-free reinforcement learning algorithms. Furthermore, these schemas can be transferred to solve related tasks, by simply re-learning the parameterizations with which the skills are invoked. We find that doing so enables learning to solve sparse-reward tasks on real-world robotic systems very efficiently. We validate our approach experimentally over a suite of robotic bimanual manipulation tasks, both in simulation and on real hardware. See videos at http://tinyurl.com/chitnis-schema.

preprint2020arXiv

Hashtags are (not) judgemental: The untold story of Lok Sabha elections 2019

Hashtags in online social media have become a way for users to build communities around topics, promote opinions, and categorize messages. In the political context, hashtags on Twitter are used by users to campaign for their parties, spread news, or to get followers and get a general idea by following a discussion built around a hashtag. In the past, researchers have studied certain types and specific properties of hashtags by utilizing a lot of data collected around hashtags. In this paper, we perform a large-scale empirical analysis of elections using only the hashtags shared on Twitter during the 2019 Lok Sabha elections in India. We study the trends and events unfolded on the ground, the latent topics to uncover representative hashtags and semantic similarity to relate hashtags with the election outcomes. We collect over 24 million hashtags to perform extensive experiments. First, we find the trending hashtags to cross-reference them with the tweets in our dataset to list down notable events. Second, we use Latent Dirichlet Allocation to find topic patterns in the dataset. In the end, we use skip-gram word embedding model to find semantically similar hashtags. We propose popularity and an influence metric to predict election outcomes using just the hashtags. Empirical results show that influence is a good measure to predict the election outcome.

preprint2020arXiv

Intrinsic Motivation for Encouraging Synergistic Behavior

We study the role of intrinsic motivation as an exploration bias for reinforcement learning in sparse-reward synergistic tasks, which are tasks where multiple agents must work together to achieve a goal they could not individually. Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own. Thus, we propose to incentivize agents to take (joint) actions whose effects cannot be predicted via a composition of the predicted effect for each individual agent. We study two instantiations of this idea, one based on the true states encountered, and another based on a dynamics model trained concurrently with the policy. While the former is simpler, the latter has the benefit of being analytically differentiable with respect to the action taken. We validate our approach in robotic bimanual manipulation and multi-agent locomotion tasks with sparse rewards; we find that our approach yields more efficient learning than both 1) training with only the sparse reward and 2) using the typical surprise-based formulation of intrinsic motivation, which does not bias toward synergistic behavior. Videos are available on the project webpage: https://sites.google.com/view/iclr2020-synergistic.

preprint2020arXiv

Learning to Explore using Active Neural SLAM

This work presents a modular and hierarchical approach to learn policies for exploring 3D environments, called `Active Neural SLAM'. Our approach leverages the strengths of both classical and learning-based methods, by using analytical path planners with learned SLAM module, and global and local policies. The use of learning provides flexibility with respect to input modalities (in the SLAM module), leverages structural regularities of the world (in global policies), and provides robustness to errors in state estimation (in local policies). Such use of learning within each module retains its benefits, while at the same time, hierarchical decomposition and modular training allow us to sidestep the high sample complexities associated with training end-to-end policies. Our experiments in visually and physically realistic simulated 3D environments demonstrate the effectiveness of our approach over past learning and geometry-based approaches. The proposed model can also be easily transferred to the PointGoal task and was the winning entry of the CVPR 2019 Habitat PointGoal Navigation Challenge.

preprint2020arXiv

Learning to Move with Affordance Maps

The ability to autonomously explore and navigate a physical space is a fundamental requirement for virtually any mobile autonomous agent, from household robotic vacuums to autonomous vehicles. Traditional SLAM-based approaches for exploration and navigation largely focus on leveraging scene geometry, but fail to model dynamic objects (such as other agents) or semantic constraints (such as wet floors or doorways). Learning-based RL agents are an attractive alternative because they can incorporate both semantic and geometric information, but are notoriously sample inefficient, difficult to generalize to novel settings, and are difficult to interpret. In this paper, we combine the best of both worlds with a modular approach that learns a spatial representation of a scene that is trained to be effective when coupled with traditional geometric planners. Specifically, we design an agent that learns to predict a spatial affordance map that elucidates what parts of a scene are navigable through active self-supervised experience gathering. In contrast to most simulation environments that assume a static world, we evaluate our approach in the VizDoom simulator, using large-scale randomly-generated maps containing a variety of dynamic actors and hazards. We show that learned affordance maps can be used to augment traditional approaches for both exploration and navigation, providing significant improvements in performance.

preprint2020arXiv

Making Smart Homes Smarter: Optimizing Energy Consumption with Human in the Loop

Rapid advancements in the Internet of Things (IoT) have facilitated more efficient deployment of smart environment solutions for specific user requirement. With the increase in the number of IoT devices, it has become difficult for the user to control or operate every individual smart device into achieving some desired goal like optimized power consumption, scheduled appliance running time, etc. Furthermore, existing solutions to automatically adapt the IoT devices are not capable enough to incorporate the user behavior. This paper presents a novel approach to accurately configure IoT devices while achieving the twin objectives of energy optimization along with conforming to user preferences. Our work comprises of unsupervised clustering of devices' data to find the states of operation for each device, followed by probabilistically analyzing user behavior to determine their preferred states. Eventually, we deploy an online reinforcement learning (RL) agent to find the best device settings automatically. Results for three different smart homes' data-sets show the effectiveness of our methodology. To the best of our knowledge, this is the first time that a practical approach has been adopted to achieve the above mentioned objectives without any human interaction within the system.

preprint2020arXiv

Neural Topological SLAM for Visual Navigation

This paper studies the problem of image-goal navigation which involves navigating to the location indicated by a goal image in a novel previously unseen environment. To tackle this problem, we design topological representations for space that effectively leverage semantics and afford approximate geometric reasoning. At the heart of our representations are nodes with associated semantic features, that are interconnected using coarse geometric information. We describe supervised learning-based algorithms that can build, maintain and use such representations under noisy actuation. Experimental study in visually and physically realistic simulation suggests that our method builds effective representations that capture structural regularities and efficiently solve long-horizon navigation problems. We observe a relative improvement of more than 50% over existing methods that study this task.

preprint2020arXiv

Semantic Curiosity for Active Visual Learning

In this paper, we study the task of embodied interactive learning for object detection. Given a set of environments (and some labeling budget), our goal is to learn an object detector by having an agent select what data to obtain labels for. How should an exploration policy decide which trajectory should be labeled? One possibility is to use a trained object detector's failure cases as an external reward. However, this will require labeling millions of frames required for training RL policies, which is infeasible. Instead, we explore a self-supervised approach for training our exploration policy by introducing a notion of semantic curiosity. Our semantic curiosity policy is based on a simple observation -- the detection outputs should be consistent. Therefore, our semantic curiosity rewards trajectories with inconsistent labeling behavior and encourages the exploration policy to explore such areas. The exploration policy trained via semantic curiosity generalizes to novel scenes and helps train an object detector that outperforms baselines trained with other possible alternatives such as random exploration, prediction-error curiosity, and coverage-maximizing exploration.

preprint2020arXiv

Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

When we humans look at a video of human-object interaction, we can not only infer what is happening but we can even extract actionable information and imitate those interactions. On the other hand, current recognition or geometric approaches lack the physicality of action representation. In this paper, we take a step towards a more physical understanding of actions. We address the problem of inferring contact points and the physical forces from videos of humans interacting with objects. One of the main challenges in tackling this problem is obtaining ground-truth labels for forces. We sidestep this problem by instead using a physics simulator for supervision. Specifically, we use a simulator to predict effects and enforce that estimated forces must lead to the same effect as depicted in the video. Our quantitative and qualitative results show that (a) we can predict meaningful forces from videos whose effects lead to accurate imitation of the motions observed, (b) by jointly optimizing for contact point and force prediction, we can improve the performance on both tasks in comparison to independent training, and (c) we can learn a representation from this model that generalizes to novel objects using few shot examples.

preprint2019arXiv

Faddeev-Jackiw Quantization of Christ-Lee Model

We analyze the constraints of Christ-Lee model by the means of modified Faddeev-Jackiw formalism in Cartesian as well as polar coordinates. Further, we accomplish quantization à la Faddeev-Jackiw by choosing appropriate gauge conditions in both the coordinate systems. Finally, we establish gauge symmetries of Christ-Lee model with the help of zero modes of the symplectic matrix.

preprint2016arXiv

Anti Self-Dual Yang-Mills, Modified Faddeev-Jackiw Formalism and Hidden BRS Invariance

We analyze the constraints for a system of anti self-dual Yang-Mills (ASDYM) equations by means of the modified Faddeev-Jackiw method in K and J gauges à la Yang. We also establish the Hamiltonian flow for ASDYM system through the hidden BRS invariance in both the gauges. Finally, we remark on the bi-Hamiltonian nature of ASDYM and the compatibility of the symplectic structures therein.

preprint2016arXiv

High scale mixing relations as a natural explanation for large neutrino mixing

The origin of small mixing among the quarks and a large mixing among the neutrinos has been an open question in particle physics. In order to answer this question, we postulate general relations among the quarks and the leptonic mixing angles at a high scale, which could be the scale of Grand Unified Theories. The central idea of these relations is that the quark and the leptonic mixing angles can be unified at some high scale either due to some quark-lepton symmetry or some other underlying mechanism and as a consequence, the mixing angles of the leptonic sector are proportional to that of the quark sector. We investigate the phenomenology of the possible relations where the leptonic mixing angles are proportional to the quark mixing angles at the unification scale by taking into account the latest experimental constraints from the neutrino sector. These relations are able to explain the pattern of leptonic mixing at the low scale and thereby hint that these relations could be possible signatures of a quark-lepton symmetry or some other underlying quark-lepton mixing unification mechanism at some high scale linked to Grand Unified Theories.

preprint2016arXiv

On augmented superfield approach to vector Schwinger model

We exploit the techniques of Bonora-Tonin superfield formalism to derive the off-shell nilpotent and absolutely anticommuting (anti-)BRST as well as (anti-)co-BRST symmetry transformations for the (1+1)-dimensional (2D) bosonized vector Schwinger model. In the derivation of above symmetries, we invoke the (dual)-horizontality conditions as well as gauge and (anti-)co-BRST invariant restrictions on the superfields that are defined onto the (2, 2)-dimensional supermanifold. We provide geometrical interpretation of the above nilpotent symmetries (and their corresponding charges). We also express the nilpotency and absolute anticommutativity of the (anti-)BRST and (anti-)co-BRST charges within the framework of augmented superfield formalism.

preprint2015arXiv

Cross Modal Distillation for Supervision Transfer

In this work we propose a technique that transfers supervision between images from different modalities. We use learned representations from a large labeled modality as a supervisory signal for training representations for a new unlabeled paired modality. Our method enables learning of rich representations for unlabeled modalities and can be used as a pre-training procedure for new modalities with limited labeled data. We show experimental results where we transfer supervision from labeled RGB images to unlabeled depth and optical flow images and demonstrate large improvements for both these cross modal supervision transfers. Code, data and pre-trained models are available at https://github.com/s-gupta/fast-rcnn/tree/distillation

preprint2015arXiv

Exploring Nearest Neighbor Approaches for Image Captioning

We explore a variety of nearest neighbor baseline approaches for image captioning. These approaches find a set of nearest neighbor images in the training set from which a caption may be borrowed for the query image. We select a caption for the query image by finding the caption that best represents the "consensus" of the set of candidate captions gathered from the nearest neighbor images. When measured by automatic evaluation metrics on the MS COCO caption evaluation server, these approaches perform as well as many recent approaches that generate novel captions. However, human studies show that a method that generates novel captions is still preferred over the nearest neighbor approach.

preprint2015arXiv

Exploring Person Context and Local Scene Context for Object Detection

In this paper we explore two ways of using context for object detection. The first model focusses on people and the objects they commonly interact with, such as fashion and sports accessories. The second model considers more general object detection and uses the spatial relationships between objects and between objects and scenes. Our models are able to capture precise spatial relationships between the context and the object of interest, and make effective use of the appearance of the contextual region. On the newly released COCO dataset, our models provide relative improvements of up to 5% over CNN-based state-of-the-art detectors, with the gains concentrated on hard cases such as small objects (10% relative improvement).

preprint2015arXiv

From Captions to Visual Concepts and Back

This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives. The word detector outputs serve as conditional inputs to a maximum-entropy language model. The language model learns from a set of over 400,000 image descriptions to capture the statistics of word usage. We capture global semantics by re-ranking caption candidates using sentence-level features and a deep multimodal similarity model. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. When human judges compare the system captions to ones written by other people on our held-out test set, the system captions have equal or better quality 34% of the time.

preprint2015arXiv

High Scale Mixing Unification for Dirac Neutrinos

Starting with high scale mixing unification hypothesis, we investigate the renormalization group evolution of mixing parameters and masses for Dirac type neutrinos. Following this hypothesis, the PMNS mixing angles and phase are taken to be identical to the CKM ones at a unifying high scale. Then, they are evolved to a low scale using renormalization-group equations. The notable feature of this hypothesis is that renormalization group evolution with quasi-degenerate mass pattern can explain largeness of leptonic mixing angles even for Dirac neutrinos. The renormalization group evolution "naturally" results in a non-zero and small value of leptonic mixing angle $θ_{13}$. One of the important predictions of this work is that the mixing angle $θ_{23}$ is non-maximal and lies only in the second octant. We also derive constraints on the allowed parameter range for the SUSY breaking and unification scales for which this hypothesis works. The results are novel and can be tested by present and future experiments.

preprint2015arXiv

Inferring 3D Object Pose in RGB-D Images

The goal of this work is to replace objects in an RGB-D scene with corresponding 3D models from a library. We approach this problem by first detecting and segmenting object instances in the scene using the approach from Gupta et al. [13]. We use a convolutional neural network (CNN) to predict the pose of the object. This CNN is trained using pixel normals in images containing rendered synthetic objects. When tested on real data, it outperforms alternative algorithms trained on real data. We then use this coarse pose estimate along with the inferred pixel support to align a small number of prototypical models to the data, and place the model that fits the best into the scene. We observe a 48% relative improvement in performance at the task of 3D detection over the current state-of-the-art [33], while being an order of magnitude faster at the same time.

preprint2015arXiv

Language Models for Image Captioning: The Quirks and What Works

Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a recurrent neural network (RNN) that then generates the caption sequence. In this paper, we compare the merits of these different language modeling approaches for the first time by using the same state-of-the-art CNN as input. We examine issues in the different approaches, including linguistic irregularities, caption repetition, and data set overlap. By combining key aspects of the ME and RNN methods, we achieve a new record performance over previously published results on the benchmark COCO dataset. However, the gains we see in BLEU do not translate to human judgments.

preprint2015arXiv

Microsoft COCO Captions: Data Collection and Evaluation Server

In this paper we describe the Microsoft COCO Caption dataset and evaluation server. When completed, the dataset will contain over one and a half million captions describing over 330,000 images. For the training and validation images, five independent human generated captions will be provided. To ensure consistency in evaluation of automatic caption generation algorithms, an evaluation server is used. The evaluation server receives candidate captions and scores them using several popular metrics, including BLEU, METEOR, ROUGE and CIDEr. Instructions for using the evaluation server are provided.

preprint2015arXiv

Nilpotent Symmetries in Jackiw-Pi Model: Augmented Superfield Approach

We derive the complete set of off-shell nilpotent (s^2_{(a)b} = 0) and absolutely anticommuting (s_b s_{ab} + s_{ab} s_b = 0) Becchi-Rouet-Stora-Tyutin (BRST) (s_b) as well as anti-BRST symmetry transformations (s_{ab}) corresponding to the combined Yang-Mills and non-Yang-Mills symmetries of the (2 + 1)-dimensional Jackiw-Pi model within the framework of augmented superfield formalism. The absolute anticommutativity of the (anti-)BRST symmetries is ensured by the existence of two sets of Curci-Ferrari (CF) type of conditions which emerge naturally in this formalism. The presence of CF conditions enables us to derive the coupled but equivalent Lagrangian densities. We also capture the (anti-)BRST invariance of the coupled Lagrangian densities in the superfield formalism. The derivation of the (anti-)BRST transformations of the auxiliary field ρis one of the key findings which can neither be generated by the nilpotent (anti-)BRST charges nor by the requirements of the nilpotency and/or absolute anticommutativity of the (anti-)BRST transformations. Finally, we provide a bird's-eye view on the role of auxiliary field for various massive models and point out few striking similarities and some glaring differences among them.

preprint2015arXiv

Visual Semantic Role Labeling

In this paper we introduce the problem of Visual Semantic Role Labeling: given an image we want to detect people doing actions and localize the objects of interaction. Classical approaches to action recognition either study the task of action classification at the image or video clip level or at best produce a bounding box around the person doing the action. We believe such an output is inadequate and a complete understanding can only come when we are able to associate objects in the scene to the different semantic roles of the action. To enable progress towards this goal, we annotate a dataset of 16K people instances in 10K images with actions they are doing and associate objects in the scene with different semantic roles for each action. Finally, we provide a set of baseline algorithms for this task and analyze error modes providing directions for future work.

preprint2014arXiv

Flow-pattern switching in a Motored Spark Ignition Engine

Cyclic-to-cycle variability, CCV, of intake-jet flow in an optical engine was measured using particle image velocimetry (PIV), revealing the possibility of two different flow patterns. A phase-dependent proper orthogonal decomposition (POD) analysis showed that one or the other flow pattern would appear in the average flow, sampled from test to test or sub-sampled within a single test; each data set contained individual cycles showing one flow pattern or the other. Three-dimensional velocity data from a large-eddy simulation (LES) of the engine showed that the PIV plane was cutting through a region of high shear between the intake jet and another large flow structure. Rotating the measurement plane 10° revealed one or the other flow structure observed in the PIV measurements. Thus, it was hypothesized that cycle-to-cycle variations in the swirl ratio result in the two different flow patterns in the PIV plane. Having an unambiguous metric to reveal large-scale flow CCV, causes for this variability were examined within the possible sources present in the available testing. In particular, variations in intake-port and cylinder pressure, lateral valve oscillations, and engine RPM were examined as potential causes for the cycle-to-cycle flow ariations using the phase-dependent POD coefficients. No direct correlation was seen between the intake port pressure, or the pressure drop across the intake valve, and the in-cylinder flow pattern. A correlation was observed between dominant flow pattern and cycle-to-cycle variations in intake valve horizontal position. RPM values and in-cylinder flow patterns did not correlate directly. However, a shift in flow pattern was observed between early and late cycles in a 2900-cycle test after an approximately 5 rpm engine speed perturbation.

preprint2014arXiv

Jackiw-Pi Model: A Superfield Approach

We derive the off-shell nilpotent and absolutely anticommuting Becchi-Rouet-Stora-Tyutin (BRST) as well as anti-BRST transformations s_{(a)b} corresponding to the Yang-Mills gauge transformations of 3D Jackiw-Pi model by exploiting the "augmented" superfield formalism. We also show that the Curci-Ferrari restriction, which is a hallmark of any non-Abelian 1-form gauge theories, emerges naturally within this formalism and plays an instrumental role in providing the proof of absolute anticommutativity of s_{(a)b}.

preprint2014arXiv

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features. We propose a new geocentric embedding for depth images that encodes height above ground and angle with gravity for each pixel in addition to the horizontal disparity. We demonstrate that this geocentric embedding works better than using raw depth images for learning feature representations with convolutional neural networks. Our final object detection system achieves an average precision of 37.3%, which is a 56% relative improvement over existing methods. We then focus on the task of instance segmentation where we label pixels belonging to object instances found by our detector. For this task, we propose a decision forest approach that classifies pixels in the detection window as foreground or background using a family of unary and binary tests that query shape and geocentric pose features. Finally, we use the output from our object detectors in an existing superpixel classification framework for semantic scene segmentation and achieve a 24% relative improvement over current state-of-the-art for the object categories that we study. We believe advances such as those represented in this paper will facilitate the use of perception in fields like robotics.

preprint2014arXiv

Modeling of scalar dissipation rates in flamelet models for low temperature combustion engine simulations

The flamelet approach offers a viable framework for combustion modeling of homogeneous charge compression ignition (HCCI) engines under stratified mixture conditions. Scalar dissipation rate acts as a key parameter in flamelet-based combustion models which connects the physical mixing space to the reactive space. The aim of this paper is to gain fundamental insights into turbulent mixing in low temperature combustion (LTC) engines and investigate the modeling of scalar dissipation rate. Three direct numerical simulation (DNS) test cases of two-dimensional turbulent auto-ignition of a hydrogen-air mixture with different correlations of temperature and mixture fraction are considered, which are representative of different ignition regimes. The existing models of mean and conditional scalar dissipation rates, and probability density functions (PDFs) of mixture fraction and total enthalpy are a priori validated against the DNS data.

preprint2014arXiv

MUSIC: A Hybrid Computing Environment for Burrows-Wheeler Alignment for Massive Amount of Short Read Sequence Data

High-throughput DNA sequencers are becoming indispensable in our understanding of diseases at molecular level, in marker-assisted selection in agriculture and in microbial genetics research. These sequencing instruments produce enormous amount of data (often terabytes of raw data in a month) that requires efficient analysis, management and interpretation. The commonly used sequencing instrument today produces billions of short reads (upto 150 bases) from each run. The first step in the data analysis step is alignment of these short reads to the reference genome of choice. There are different open source algorithms available for sequence alignment to the reference genome. These tools normally have a high computational overhead, both in terms of number of processors and memory. Here, we propose a hybrid-computing environment called MUSIC (Mapping USIng hybrid Computing) for one of the most popular open source sequence alignment algorithm, BWA, using accelerators that show significant improvement in speed over the serial code.

preprint2014arXiv

Novel Symmetries in Vector Schwinger Model

We derive nilpotent and absolutely anticommuting (anti-)co-BRST symmetry transformations for the bosonized version of (1+1)-dimensional (2D) vector Schwinger model. These symmetry transformations turn out to be the analogue of co-exterior derivative of differential geometry as the total gauge-fixing term remains invariant under it. The exterior derivative is realized in terms of the (anti-)BRST symmetry transformations of the theory whereas the bosonic symmetries find their analogue in the Laplacian operator. The algebra obeyed by these symmetry transformations turns out to be exactly same as the algebra obeyed by the de Rham cohomological operators of differential geometry.

preprint2014arXiv

Predictions from High Scale Mixing Unification Hypothesis

We investigate the renormalization group evolution of masses and mixing angles of Majorana neutrinos under the `High Scale Mixing Unification' hypothesis. Assuming the unification of quark-lepton mixing angles at a high scale, we show that all the experimentally observed neutrino oscillation parameters can be obtained, within 3-$σ$ range, through the running of corresponding renormalization group equations provided neutrinos have same CP parity and are quasi-degenerate. One of the novel results of our analysis is that $θ_{23}$ turns out to be non-maximal and lies in the second octant. Furthermore, we derive new constraints on the allowed parameter space for the unification scale, SUSY breaking scale and $\tan β$, for which the `High Scale Mixing Unification' hypothesis works.

preprint2013arXiv

Augmented Superfield Approach to Non-Yang-Mills Symmetries of Jackiw-Pi Model: Novel Observations

We derive the off-shell nilpotent and absolutely anticommuting Becchi-Rouet-Stora-Tyutin (BRST) as well as anti-BRST symmetry transformations corresponding to the non-Yang-Mills symmetry transformations of (2 + 1)- dimensional Jackiw-Pi (JP) model within the framework of "augmented" superfield formalism. The Curci-Ferrari restriction, which is a hallmark of non-Abelian 1-form gauge theories, does not appear in this case. One of the novel features of our present investigation is the derivation of proper (anti-)BRST symmetry transformations corresponding to the auxiliary field ρthat can not be derived by any conventional means.

preprint2013arXiv

SInC: An accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data

We report SInC (SNV, Indel and CNV) simulator and read generator, an open-source tool capable of simulating biological variants taking into account a platform-specific error model. SInC is capable of simulating and generating single- and paired-end reads with user-defined insert size with high efficiency compared to the other existing tools. SInC, due to its multi-threaded capability during read generation, has a low time footprint. SInC is currently optimised to work in limited infrastructure setup and can efficiently exploit the commonly used quad-core desktop architecture to simulate short sequence reads with deep coverage for large genomes. Sinc can be downloaded from https://sourceforge.net/projects/sincsimulator/.

preprint2011arXiv

Canonical brackets from continuous symmetries: Abelian 2-form gauge theory

We derive the canonical (anti-)commutation relations amongst the creation and annihilation operators of the various basic fields, present in the four (3 + 1)-dimensional (4D) free Abelian 2-from gauge theory, with the help of continuous symmetry transformations within the framework of Becchi-Rouet-Stora-Tyutin (BRST) formalism. We show that all the six continuous symmetries of the theory lead to the exactly the same non-vanishing (anti-)commutator amongst the creation and annihilation operators of the normal mode expansion of the basic fields of the theory.

preprint2010arXiv

A note on the (anti-)BRST invariant Lagrangian densities for the free abelian 2-form gauge theory

We show that the previously known off-shell nilpotent (s_{(a)b}^2 = 0) and absolutely anticommuting (s_b s_{ab} + s_{ab} s_b = 0) Becchi-Rouet-Stora-Tyutin (BRST) transformations (s_b) and anti-BRST transformations (s_{ab}) are the symmetry transformations of the appropriate Lagrangian densities of a four (3 + 1)-dimensional (4D) free Abelian 2-form gauge theory which do not explicitly incorporate a very specific constrained field condition through a Lagrange multiplier 4D vector field. The above condition, which is the analogue of the Curci-Ferrari restriction of the non-Abelian 1-form gauge theory, emerges from the Euler-Lagrange equations of motion of our present theory and ensures the absolute anticommutativity of the transformations s_{(a)b}. Thus, the coupled Lagrangian densities, proposed in our present investigation, are aesthetically more appealing and more economical.

preprint2010arXiv

Pseudo-Hermitian Interactions in Dirac Theory: Examples

We consider a couple of examples to study the pseudo-Hermitian interaction in relativistic quantum mechanics. Rasbha interaction, commonly used to study the spin Hall effect, is considered with imaginary coupling. The corresponding Dirac Hamiltonian is shown to be parity pseudo-Hermitian. In the other example we consider parity pseudo-Hermitian scalar interaction with arbitrary parameter in Dirac theory. In both the cases we show that the energy spectrum is real and all the other features of non-relativistic pseudo-Hermitian formulation are present. Using the spectral method the positive definite metric operator ($η$) has been calculated explicitly for both the models to ensure positive definite norms for the state vectors.

preprint2010arXiv

Rigid Rotor as a Toy Model for Hodge Theory

We apply the superfield approach to the toy model of a rigid rotor and show the existence of the nilpotent and absolutely anticommuting Becchi-Rouet-Stora-Tyutin (BRST) and anti-BRST symmetry transformations, under which, the kinetic term and action remain invariant. Furthermore, we also derive the off-shell nilpotent and absolutely anticommuting (anti-) co-BRST symmetry transformations, under which, the gauge-fixing term and Lagrangian remain invariant. The anticommutator of the above nilpotent symmetry transformations leads to the derivation of a bosonic symmetry transformation, under which, the ghost terms and action remain invariant. Together, the above transformations (and their corresponding generators) respect an algebra that turns out to be a physical realization of the algebra obeyed by the de Rham cohomological operators of differential geometry. Thus, our present model is a toy model for the Hodge theory.

Saurabh Gupta

What is connected

Connect this record

See the researcher in context

Building this map preview

43 published item(s)

Detection of Distracted Driver using Convolution Neural Network

Human Hands as Probes for Interactive Object Understanding

Learning Value Functions from Undirected State-only Experience

On-Device CPU Scheduling for Sense-React Systems

Particle on a torus knot: Symplectic analysis

TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors

Aligning Videos in Space and Time

Efficient Bimanual Manipulation Using Learned Task Schemas

Hashtags are (not) judgemental: The untold story of Lok Sabha elections 2019

Intrinsic Motivation for Encouraging Synergistic Behavior

Learning to Explore using Active Neural SLAM

Learning to Move with Affordance Maps

Making Smart Homes Smarter: Optimizing Energy Consumption with Human in the Loop

Neural Topological SLAM for Visual Navigation

Semantic Curiosity for Active Visual Learning

Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

Faddeev-Jackiw Quantization of Christ-Lee Model

Anti Self-Dual Yang-Mills, Modified Faddeev-Jackiw Formalism and Hidden BRS Invariance

High scale mixing relations as a natural explanation for large neutrino mixing

On augmented superfield approach to vector Schwinger model

Cross Modal Distillation for Supervision Transfer

Exploring Nearest Neighbor Approaches for Image Captioning

Exploring Person Context and Local Scene Context for Object Detection

From Captions to Visual Concepts and Back

High Scale Mixing Unification for Dirac Neutrinos

Inferring 3D Object Pose in RGB-D Images

Language Models for Image Captioning: The Quirks and What Works

Microsoft COCO Captions: Data Collection and Evaluation Server

Nilpotent Symmetries in Jackiw-Pi Model: Augmented Superfield Approach

Visual Semantic Role Labeling

Flow-pattern switching in a Motored Spark Ignition Engine

Jackiw-Pi Model: A Superfield Approach

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

Modeling of scalar dissipation rates in flamelet models for low temperature combustion engine simulations

MUSIC: A Hybrid Computing Environment for Burrows-Wheeler Alignment for Massive Amount of Short Read Sequence Data

Novel Symmetries in Vector Schwinger Model

Predictions from High Scale Mixing Unification Hypothesis

Augmented Superfield Approach to Non-Yang-Mills Symmetries of Jackiw-Pi Model: Novel Observations

SInC: An accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data

Canonical brackets from continuous symmetries: Abelian 2-form gauge theory

A note on the (anti-)BRST invariant Lagrangian densities for the free abelian 2-form gauge theory

Pseudo-Hermitian Interactions in Dirac Theory: Examples

Rigid Rotor as a Toy Model for Hodge Theory