Source author record

Gaurav Sharma

Gaurav Sharma appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Computation and Language Machine Learning physics.atom-ph cond-mat.mes-hall cond-mat.mtrl-sci cond-mat.str-el Distributed, Parallel, and Cluster Computing Graphics physics.ins-det

Catalog footprint

What is connected

19works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Image-Text Knowledge Modeling for Unsupervised Multi-Scenario Person Re-Identification

We propose unsupervised multi-scenario (UMS) person re-identification (ReID) as a new task that expands ReID across diverse scenarios (cross-resolution, clothing change, etc.) within a single coherent framework. To tackle UMS-ReID, we introduce image-text knowledge modeling (ITKM) -- a three-stage framework that effectively exploits the representational power of vision-language models. We start with a pre-trained CLIP model with an image encoder and a text encoder. In Stage I, we introduce a scenario embedding in the image encoder and fine-tune the encoder to adaptively leverage knowledge from multiple scenarios. In Stage II, we optimize a set of learned text embeddings to associate with pseudo-labels from Stage I and introduce a multi-scenario separation loss to increase the divergence between inter-scenario text representations. In Stage III, we first introduce cluster-level and instance-level heterogeneous matching modules to obtain reliable heterogeneous positive pairs (e.g., a visible image and an infrared image of the same person) within each scenario. Next, we propose a dynamic text representation update strategy to maintain consistency between text and image supervision signals. Experimental results across multiple scenarios demonstrate the superiority and generalizability of ITKM; it not only outperforms existing scenario-specific methods but also enhances overall performance by integrating knowledge from multiple scenarios.

preprint2022arXiv

Discriminative Semantic Transitive Consistency for Cross-Modal Learning

Cross-modal retrieval is generally performed by projecting and aligning the data from two different modalities onto a shared representation space. This shared space often also acts as a bridge for translating the modalities. We address the problem of learning such representation space by proposing and exploiting the property of Discriminative Semantic Transitive Consistency -- ensuring that the data points are correctly classified even after being transferred to the other modality. Along with semantic transitive consistency, we also enforce the traditional distance minimizing constraint which makes the projections of the corresponding data points from both the modalities to come closer in the representation space. We analyze and compare the contribution of both the loss terms and their interaction, for the task. In addition, we incorporate semantic cycle-consistency for each of the modality. We empirically demonstrate better performance owing to the different components with clear ablation studies. We also provide qualitative results to support the proposals.

preprint2022arXiv

Distantly-Supervised Long-Tailed Relation Extraction Using Constraint Graphs

Label noise and long-tailed distributions are two major challenges in distantly supervised relation extraction. Recent studies have shown great progress on denoising, but paid little attention to the problem of long-tailed relations. In this paper, we introduce a constraint graph to model the dependencies between relation labels. On top of that, we further propose a novel constraint graph-based relation extraction framework(CGRE) to handle the two challenges simultaneously. CGRE employs graph convolution networks to propagate information from data-rich relation nodes to data-poor relation nodes, and thus boosts the representation learning of long-tailed relations. To further improve the noise immunity, a constraint-aware attention module is designed in CGRE to integrate the constraint information. Extensive experimental results indicate that CGRE achieves significant improvements over the previous methods for both denoising and long-tailed relation extraction. The pre-processed datasets and source code are publicly available at https://github.com/tmliang/CGRE.

preprint2022arXiv

K-12BERT: BERT for K-12 education

Online education platforms are powered by various NLP pipelines, which utilize models like BERT to aid in content curation. Since the inception of the pre-trained language models like BERT, there have also been many efforts toward adapting these pre-trained models to specific domains. However, there has not been a model specifically adapted for the education domain (particularly K-12) across subjects to the best of our knowledge. In this work, we propose to train a language model on a corpus of data curated by us across multiple subjects from various sources for K-12 education. We also evaluate our model, K12-BERT, on downstream tasks like hierarchical taxonomy tagging.

preprint2020arXiv

A Novel Deep Learning Pipeline for Retinal Vessel Detection in Fluorescein Angiography

While recent advances in deep learning have significantly advanced the state of the art for vessel detection in color fundus (CF) images, the success for detecting vessels in fluorescein angiography (FA) has been stymied due to the lack of labeled ground truth datasets. We propose a novel pipeline to detect retinal vessels in FA images using deep neural networks that reduces the effort required for generating labeled ground truth data by combining two key components: cross-modality transfer and human-in-the-loop learning. The cross-modality transfer exploits concurrently captured CF and fundus FA images. Binary vessels maps are first detected from CF images with a pre-trained neural network and then are geometrically registered with and transferred to FA images via robust parametric chamfer alignment to a preliminary FA vessel detection obtained with an unsupervised technique. Using the transferred vessels as initial ground truth labels for deep learning, the human-in-the-loop approach progressively improves the quality of the ground truth labeling by iterating between deep-learning and labeling. The approach significantly reduces manual labeling effort while increasing engagement. We highlight several important considerations for the proposed methodology and validate the performance on three datasets. Experimental results demonstrate that the proposed pipeline significantly reduces the annotation effort and the resulting deep learning methods outperform prior existing FA vessel detection methods by a significant margin. A new public dataset, RECOVERY-FA19, is introduced that includes high-resolution ultra-widefield images and accurately labeled ground truth binary vessel maps.

preprint2020arXiv

Imaging of Strain Driven Magnetic Domains and Strong Spin-Phonon Coupling in Epitaxial Thin Films of SrRuO3

Epitaxial thin films of SrRuO3 with large strain disorder were grown using pulsed laser deposition method which showed two distinct transition temperatures in Magnetic measurements. For the first time, we present visual evolution of magnetic domains across the two transitions using Magnetic force microscopy on these films. The study clearly showed that the magnetic anisotropy corresponding to the two transitions is different. It is observed that the perpendicular magnetic anisotropy is dominating in films which results in domain spin orientation preferably in out of plane direction. The Raman studies showed that the lattice is highly influenced by the magnetic order. The analysis of the phonon spectra around magnetic transition reveals the existence of strong spin-phonon coupling and the calculations resulted in spin-phonon coupling strength (λ) values of λ ~ 5 cm-1 and λ ~ 8.5 cm-1, for SrRuO3 films grown on LSAT and SrTiO3 single crystal substrates, respectively.

preprint2020arXiv

Object Detection with a Unified Label Space from Multiple Datasets

Given multiple datasets with different label spaces, the goal of this work is to train a single object detector predicting over the union of all the label spaces. The practical benefits of such an object detector are obvious and significant application-relevant categories can be picked and merged form arbitrary existing datasets. However, naive merging of datasets is not possible in this case, due to inconsistent object annotations. Consider an object category like faces that is annotated in one dataset, but is not annotated in another dataset, although the object itself appears in the latter images. Some categories, like face here, would thus be considered foreground in one dataset, but background in another. To address this challenge, we design a framework which works with such partial annotations, and we exploit a pseudo labeling approach that we adapt for our specific case. We propose loss functions that carefully integrate partial but correct annotations with complementary but noisy pseudo labels. Evaluation in the proposed novel setting requires full annotation on the test set. We collect the required annotations and define a new challenging experimental setup for this task based one existing public datasets. We show improved performances compared to competitive baselines and appropriate adaptations of existing work.

preprint2019arXiv

Strain Healing of Spin-Orbit Coupling: A Cause for Enhanced Magnetic Moment in Epitaxial SrRuO3 Thin Films

Enhanced magnetic moment and coercivity in SrRuO3 thin films are significant issues for advanced technological usages and hence are researched extensively in recent times. Most of the previous reports on thin films with enhanced magnetic moment attributed the high spin state for the enhancement. Our magnetization results show high magnetic moment of 3.3 Bohr-magnetron/Ru ion in the epitaxial thin films grown on LSAT substrate against 1.2 Bohr-magnetron/Ru ion observed in bulk compound. Contrary to the expectation the Ru ions are found to be in low spin state and the orbital moment is shown to be contributing significantly in the enhancement of magnetic moment. We employed x-ray absorption spectroscopy and resonant valance band spectroscopy to probe the spin state and orbital contributions in these films. The existence of strong spin-orbit coupling responsible for the de-quenching of the 4d orbitals is confirmed by the observation of the non-statistical large branching ratio at the Ru M2,3 absorption edges. The relaxation of orbital quenching by strain engineering provides a new tool for enhancing magnetic moment. Strain disorder is shown to be an efficient mean to control the spin-orbit coupling.

preprint2016arXiv

CP-mtML: Coupled Projection multi-task Metric Learning for Large Scale Face Retrieval

We propose a novel Coupled Projection multi-task Metric Learning (CP-mtML) method for large scale face retrieval. In contrast to previous works which were limited to low dimensional features and small datasets, the proposed method scales to large datasets with high dimensional face descriptors. It utilises pairwise (dis-)similarity constraints as supervision and hence does not require exhaustive class annotation for every training image. While, traditionally, multi-task learning methods have been validated on same dataset but different tasks, we work on the more challenging setting with heterogeneous datasets and different tasks. We show empirical validation on multiple face image datasets of different facial traits, e.g. identity, age and expression. We use classic Local Binary Pattern (LBP) descriptors along with the recent Deep Convolutional Neural Network (CNN) features. The experiments clearly demonstrate the scalability and improved performance of the proposed method on the tasks of identity and age based face image retrieval compared to competitive existing methods, on the standard datasets and with the presence of a million distractor face images.

preprint2016arXiv

Deep fusion of visual signatures for client-server facial analysis

Facial analysis is a key technology for enabling human-machine interaction. In this context, we present a client-server framework, where a client transmits the signature of a face to be analyzed to the server, and, in return, the server sends back various information describing the face e.g. is the person male or female, is she/he bald, does he have a mustache, etc. We assume that a client can compute one (or a combination) of visual features; from very simple and efficient features, like Local Binary Patterns, to more complex and computationally heavy, like Fisher Vectors and CNN based, depending on the computing resources available. The challenge addressed in this paper is to design a common universal representation such that a single merged signature is transmitted to the server, whatever be the type and number of features computed by the client, ensuring nonetheless an optimal performance. Our solution is based on learning of a common optimal subspace for aligning the different face features and merging them into a universal signature. We have validated the proposed method on the challenging CelebA dataset, on which our method outperforms existing state-of-the-art methods when rich representation is available at test time, while giving competitive performance when only simple signatures (like LBP) are available at test time due to resource constraints on the client.

preprint2016arXiv

Expanded Parts Model for Semantic Description of Humans in Still Images

We introduce an Expanded Parts Model (EPM) for recognizing human attributes (e.g. young, short hair, wearing suit) and actions (e.g. running, jumping) in still images. An EPM is a collection of part templates which are learnt discriminatively to explain specific scale-space regions in the images (in human centric coordinates). This is in contrast to current models which consist of a relatively few (i.e. a mixture of) 'average' templates. EPM uses only a subset of the parts to score an image and scores the image sparsely in space, i.e. it ignores redundant and random background in an image. To learn our model, we propose an algorithm which automatically mines parts and learns corresponding discriminative templates together with their respective locations from a large number of candidate parts. We validate our method on three recent challenging datasets of human attributes and actions. We obtain convincing qualitative and state-of-the-art quantitative results on the three datasets.

preprint2016arXiv

Latent Embeddings for Zero-shot Classification

We present a novel latent embedding model for learning a compatibility function between image and class embeddings, in the context of zero-shot classification. The proposed method augments the state-of-the-art bilinear compatibility model by incorporating latent variables. Instead of learning a single bilinear map, it learns a collection of maps with the selection, of which map to use, being a latent variable for the current image-class pair. We train the model with a ranking based objective function which penalizes incorrect rankings of the true class for a given image. We empirically demonstrate that our model improves the state-of-the-art for various class embeddings consistently on three challenging publicly available datasets for the zero-shot setting. Moreover, our method leads to visually highly interpretable results with clear clusters of different fine-grained object properties that correspond to different latent variable maps.

preprint2016arXiv

LOMo: Latent Ordinal Model for Facial Analysis in Videos

We study the problem of facial analysis in videos. We propose a novel weakly supervised learning method that models the video event (expression, pain etc.) as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for smile, brow lower and cheek raise for pain). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF- it extends such frameworks to model the ordinal or temporal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations. In combination with complimentary features, we report state-of-the-art results on these datasets.

preprint2016arXiv

X-ray spectroscopy technique for the pile-up region

We report a pile-up rejection technique based on X-ray absorption concept of Beer-Lambert law for measuring true events in the pile-up region. We have detected a 10^4 times weaker peak in the pile-up region. This technique also enables one to resolve the weak peaks adjacent to an intense peak provided the later lies in the lower energy side, and the peaks are at least theoretically resolvable by the detector used. We have resolved such peaks by reducing the intensity ratios in our experiment. The technique allows us to obtain the actual intensities of the observed peaks to have been measured without any attenuator. The possible applications of this technique can be to study the physics of two electron one-photon transition as well as the properties of projectile-like or target-like ions

preprint2015arXiv

Local Higher-Order Statistics (LHS) describing images with statistics of local non-binarized pixel patterns

We propose a new image representation for texture categorization and facial analysis, relying on the use of higher-order local differential statistics as features. It has been recently shown that small local pixel pattern distributions can be highly discriminative while being extremely efficient to compute, which is in contrast to the models based on the global structure of images. Motivated by such works, we propose to use higher-order statistics of local non-binarized pixel patterns for the image description. The proposed model does not require either (i) user specified quantization of the space (of pixel patterns) or (ii) any heuristics for discarding low occupancy volumes of the space. We propose to use a data driven soft quantization of the space, with parametric mixture models, combined with higher-order statistics, based on Fisher scores. We demonstrate that this leads to a more expressive representation which, when combined with discriminatively learned classifiers and metrics, achieves state-of-the-art performance on challenging texture and facial analysis datasets, in low complexity setup. Further, it is complementary to higher complexity features and when combined with them improves performance.

preprint2015arXiv

Scalable Nonlinear Embeddings for Semantic Category-based Image Retrieval

We propose a novel algorithm for the task of supervised discriminative distance learning by nonlinearly embedding vectors into a low dimensional Euclidean space. We work in the challenging setting where supervision is with constraints on similar and dissimilar pairs while training. The proposed method is derived by an approximate kernelization of a linear Mahalanobis-like distance metric learning algorithm and can also be seen as a kernel neural network. The number of model parameters and test time evaluation complexity of the proposed method are O(dD) where D is the dimensionality of the input features and d is the dimension of the projection space - this is in contrast to the usual kernelization methods as, unlike them, the complexity does not scale linearly with the number of training examples. We propose a stochastic gradient based learning algorithm which makes the method scalable (w.r.t. the number of training examples), while being nonlinear. We train the method with up to half a million training pairs of 4096 dimensional CNN features. We give empirical comparisons with relevant baselines on seven challenging datasets for the task of low dimensional semantic category based image retrieval.

preprint2015arXiv

Surface wake field model of beam-foil circular Rydberg states

Production of projectile Rydberg states in fast ion-solid collisions in H-like ions exhibits a pronounce target thickness dependence in spite of these states forming at the last layers. This occurs due to important role of the surface wake field which varies with the target foil thickness. Further, according to the proposed model Rydberg states with low angular momentum are transformed into a circular Rydberg states while passing through the field. The transfer occurs by a single multiphoton process with high probability depending upon the projectile ion velocity with respect to the Fermi velocity of the target electrons.

preprint2012arXiv

Reliable Resource Selection in Grid Environment

The primary concern in area of computational grid is security and resources. Most of the existing grids address this problem by authenticating the users, hosts and their interactions in an appropriate manner. A secured system is compulsory for the efficient utilization of grid services. The high degree of strangeness has been identified as the problem factors in the secured selection of grid. Without the assurance of a higher degree of trust relationship, competent resource selection and utilization cannot be achieved. In this paper we proposed an approach which is providing reliability and reputation aware security for resource selection in grid environment. In this approach, the self-protection capability and reputation weightage is utilized to obtain the Reliability Factor (RF) value. Therefore jobs are allocated to the resources that posses higher RF values. Extensive experimental evaluation shows that as higher trust and reliable nodes are selected the chances of failure decreased drastically.

preprint2001arXiv

Digital Color Imaging

This paper surveys current technology and research in the area of digital color imaging. In order to establish the background and lay down terminology, fundamental concepts of color perception and measurement are first presented us-ing vector-space notation and terminology. Present-day color recording and reproduction systems are reviewed along with the common mathematical models used for representing these devices. Algorithms for processing color images for display and communication are surveyed, and a forecast of research trends is attempted. An extensive bibliography is provided.

Gaurav Sharma

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Image-Text Knowledge Modeling for Unsupervised Multi-Scenario Person Re-Identification

Discriminative Semantic Transitive Consistency for Cross-Modal Learning

Distantly-Supervised Long-Tailed Relation Extraction Using Constraint Graphs

K-12BERT: BERT for K-12 education

A Novel Deep Learning Pipeline for Retinal Vessel Detection in Fluorescein Angiography

Imaging of Strain Driven Magnetic Domains and Strong Spin-Phonon Coupling in Epitaxial Thin Films of SrRuO3

Object Detection with a Unified Label Space from Multiple Datasets

Strain Healing of Spin-Orbit Coupling: A Cause for Enhanced Magnetic Moment in Epitaxial SrRuO3 Thin Films

CP-mtML: Coupled Projection multi-task Metric Learning for Large Scale Face Retrieval

Deep fusion of visual signatures for client-server facial analysis

Expanded Parts Model for Semantic Description of Humans in Still Images

Latent Embeddings for Zero-shot Classification

LOMo: Latent Ordinal Model for Facial Analysis in Videos

X-ray spectroscopy technique for the pile-up region

Local Higher-Order Statistics (LHS) describing images with statistics of local non-binarized pixel patterns

Scalable Nonlinear Embeddings for Semantic Category-based Image Retrieval

Surface wake field model of beam-foil circular Rydberg states

Reliable Resource Selection in Grid Environment

Digital Color Imaging