Source author record

Rohit Gupta

Rohit Gupta appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision math.OC Computation and Language Cryptography and Security cs.CY eess.IV hep-ph Machine Learning math-ph math.DG math.MP math.SG physics.optics Systems and Control

Catalog footprint

What is connected

10works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

BBQ-V: Benchmarking Visual Stereotype Bias in Large Multimodal Models

Stereotype biases in Large Multimodal Models (LMMs) perpetuate harmful societal prejudices, undermining the fairness and equity of AI applications. As LMMs grow increasingly influential, addressing and mitigating inherent biases related to stereotypes, harmful generations, and ambiguous assumptions in real-world scenarios has become essential. However, existing datasets evaluating stereotype biases in LMMs often lack diversity, rely on synthetic images, and often have single-actor images, leaving a gap in bias evaluation for real-world visual contexts. To address the gap in bias evaluation using real images, we introduce the BBQ-Vision (BBQ-V), the most comprehensive framework for assessing stereotype biases across nine diverse categories and 50 sub-categories with real and multi-actor images. BBQ-V benchmark contains 14,144 image-question pairs and rigorously evaluates LMMs through carefully curated, visually grounded scenarios, challenging them to reason accurately about visual stereotypes. It offers a robust evaluation framework featuring real-world visual samples, image variations, and open-ended question formats. BBQ-V enables a precise and nuanced assessment of a model's reasoning capabilities across varying levels of difficulty. Through rigorous testing of 19 state-of-the-art open-source (general-purpose and reasoning) and closed-source LMMs, we highlight that these top-performing models are often biased on several social stereotypes, and demonstrate that the thinking models induce more bias in the reasoning chains. This benchmark represents a significant step toward fostering fairness in AI systems and reducing harmful biases, laying the groundwork for more equitable and socially responsible LMMs. Our dataset and evaluation code are publicly available.

preprint2022arXiv

Model comparison of the transverse momentum spectra of charged hadrons produced in $PbPb$ collision at $\sqrt{s_{NN}} = 5.02$ TeV

Transverse Momentum, $p_T$, spectra is of prime importance in order to extract crucial information about the evolution dynamics of the system of particles produced in the collider experiments. In this work, the transverse momentum spectra of charged hadrons produced in $PbPb$ collision at $5.02$ TeV has been analyzed using different distribution functions in order to gain strong insight into the information that can be extracted from the spectra. We have also discussed the applicability of Pearson distribution on the spectra of charged hadron at $5.02$ TeV.

preprint2022arXiv

TCLR: Temporal Contrastive Learning for Video Representation

Contrastive learning has nearly closed the gap between supervised and self-supervised learning of image representations, and has also been explored for videos. However, prior work on contrastive learning for video data has not explored the effect of explicitly encouraging the features to be distinct across the temporal dimension. We develop a new temporal contrastive learning framework consisting of two novel losses to improve upon existing contrastive self-supervised video representation learning methods. The local-local temporal contrastive loss adds the task of discriminating between non-overlapping clips from the same video, whereas the global-local temporal contrastive aims to discriminate between timesteps of the feature map of an input clip in order to increase the temporal diversity of the learned features. Our proposed temporal contrastive learning framework achieves significant improvement over the state-of-the-art results in various downstream video understanding tasks such as action recognition, limited-label action classification, and nearest-neighbor video retrieval on multiple video datasets and backbones. We also demonstrate significant improvement in fine-grained action classification for visually similar classes. With the commonly used 3D ResNet-18 architecture with UCF101 pretraining, we achieve 82.4\% (+5.1\% increase over the previous best) top-1 accuracy on UCF101 and 52.9\% (+5.4\% increase) on HMDB51 action classification, and 56.2\% (+11.7\% increase) Top-1 Recall on UCF101 nearest neighbor video retrieval. Code released at github.com/DAVEISHAN/TCLR.

preprint2020arXiv

Block the blocker: Studying the effects of Anti Ad-blocking

Advertisements generate huge chunks of revenues for websites and online businesses. Ad-blocker and tracker blocking programs have gained momentum in the last few years with massive debates raging on privacy concerns and improving user experience online. Acceptable Ads programme and Anti Ad-blockers are primary elements emerging in recent years that combat ad-blockers. In this paper, we discuss at length data collection of top websites in the world, Germany, DACH region and news category. We generate feature based A/B testing metrics and employ classifier evaluations on them along with then analysing the result. Our paper also discusses how Anti Ad-blockers impact the economic, legal and ethical usage in Germany along with the recent changes in GDPR while taking a look at Acceptable ads programme and Whitelisting.

preprint2020arXiv

Cassandra: Detecting Trojaned Networks from Adversarial Perturbations

Deep neural networks are being widely deployed for many critical tasks due to their high classification accuracy. In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models. These malicious behaviors can be triggered at the adversary's will and hence, cause a serious threat to the widespread deployment of deep models. We propose a method to verify if a pre-trained model is Trojaned or benign. Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients. Inserting backdoors into a network alters its decision boundaries which are effectively encoded in their adversarial perturbations. We train a two stream network for Trojan detection from its global ($L_\infty$ and $L_2$ bounded) perturbations and the localized region of high energy within each perturbation. The former encodes decision boundaries of the network and latter encodes the unknown trigger shape. We also propose an anomaly detection method to identify the target class in a Trojaned network. Our methods are invariant to the trigger type, trigger size, training data and network architecture. We evaluate our methods on MNIST, NIST-Round0 and NIST-Round1 datasets, with up to 1,000 pre-trained models making this the largest study to date on Trojaned network detection, and achieve over 92\% detection accuracy to set the new state-of-the-art.

preprint2020arXiv

Observation of geometric phase for unpolarized and partially polarized light fields

Geometric phase, owing to its topological nature and properties of fault tolerance, plays an important role in devising real world applications in both classical and quantum domain. For classical systems, geometric phase has been observed and studied so far for fully polarized light only. Using an interferometric experiment we demonstrate, for the first time, the existence of Pancharatnam-Berry phase for states covering all empty space inside the Poincaré sphere namely the unpolarized and partially polarized light fields. The observed geometric phase is found identical to its fully polarized counterpart in excellent agreement with the theoretical predictions.

preprint2020arXiv

RescueNet: Joint Building Segmentation and Damage Assessment from Satellite Imagery

Accurate and fine-grained information about the extent of damage to buildings is essential for directing Humanitarian Aid and Disaster Response (HADR) operations in the immediate aftermath of any natural calamity. In recent years, satellite and UAV (drone) imagery has been used for this purpose, sometimes aided by computer vision algorithms. Existing Computer Vision approaches for building damage assessment typically rely on a two stage approach, consisting of building detection using an object detection model, followed by damage assessment through classification of the detected building tiles. These multi-stage methods are not end-to-end trainable, and suffer from poor overall results. We propose RescueNet, a unified model that can simultaneously segment buildings and assess the damage levels to individual buildings and can be trained end-toend. In order to to model the composite nature of this problem, we propose a novel localization aware loss function, which consists of a Binary Cross Entropy loss for building segmentation, and a foreground only selective Categorical Cross-Entropy loss for damage classification, and show significant improvement over the widely used Cross-Entropy loss. RescueNet is tested on the large scale and diverse xBD dataset and achieves significantly better building segmentation and damage classification performance than previous methods and achieves generalization across varied geographical regions and disaster types.

preprint2016arXiv

MPC on manifolds with an application to the control of spacecraft attitude on SO(3)

We develop a model predictive control (MPC) design for systems with discrete-time dynamics evolving on smooth manifolds. We show that the properties of conventional MPC for dynamics evolving on $\mathbb R^n$ are preserved and we develop a design procedure for achieving similar properties. We also demonstrate that for discrete-time dynamics on manifolds with Euler characteristic not equal to 1, there do not exist globally stabilizing, continuous control laws. The MPC law is able to achieve global asymptotic stability on these manifolds, because the MPC law may be discontinuous. We apply the method to spacecraft attitude control, where the spacecraft attitude evolves on the Lie group SO(3) and for which a continuous globally stabilizing control law does not exist. In this case, the MPC law is discontinuous and achieves global stability.

preprint2016arXiv

Reordering rules for English-Hindi SMT

Reordering is a preprocessing stage for Statistical Machine Translation (SMT) system where the words of the source sentence are reordered as per the syntax of the target language. We are proposing a rich set of rules for better reordering. The idea is to facilitate the training process by better alignments and parallel phrase extraction for a phrase-based SMT system. Reordering also helps the decoding process and hence improving the machine translation quality. We have observed significant improvements in the translation quality by using our approach over the baseline SMT. We have used BLEU, NIST, multi-reference word error rate, multi-reference position independent error rate for judging the improvements. We have exploited open source SMT toolkit MOSES to develop the system.

preprint2014arXiv

A geometric approach to the optimal control of nonholnomic mechanical systems

In this paper, we describe a constrained Lagrangian and Hamiltonian formalism for the optimal control of nonholonomic mechanical systems. In particular, we aim to minimize a cost functional, given initial and final conditions where the controlled dynamics is given by nonholonomic mechanical system. In our paper, the controlled equations are derived using a basis of vector fields adapted to the nonholonomic distribution and the Riemannian metric determined by the kinetic energy. Given a cost function, the optimal control problem is understood as a constrained problem or equivalently, under some mild regularity conditions, as a Hamiltonian problem on the cotangent bundle of the nonholonomic distribution. A suitable Lagrangian submanifold is also shown to lead to the correct dynamics. We demonstrate our techniques in several examples including a continuously variable transmission problem and motion planning for obstacle avoidance problems.

Rohit Gupta

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

BBQ-V: Benchmarking Visual Stereotype Bias in Large Multimodal Models

Model comparison of the transverse momentum spectra of charged hadrons produced in $PbPb$ collision at $\sqrt{s_{NN}} = 5.02$ TeV

TCLR: Temporal Contrastive Learning for Video Representation

Block the blocker: Studying the effects of Anti Ad-blocking

Cassandra: Detecting Trojaned Networks from Adversarial Perturbations

Observation of geometric phase for unpolarized and partially polarized light fields

RescueNet: Joint Building Segmentation and Damage Assessment from Satellite Imagery

MPC on manifolds with an application to the control of spacecraft attitude on SO(3)

Reordering rules for English-Hindi SMT

A geometric approach to the optimal control of nonholnomic mechanical systems