Source author record

Salman Khan

Salman Khan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision quant-ph Machine Learning Artificial Intelligence eess.IV Computation and Language Cryptography and Security Networking and Internet Architecture physics.gen-ph q-fin.GN Robotics

Catalog footprint

What is connected

59works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

When a model produces a correct solution under reinforcement learning with verifiable rewards (RLVR), every token receives the same reward signal regardless of whether it was a decisive reasoning step or a grammatical filler. A natural fix is to condition the model on the correct answer as a teacher, identifying tokens it would have generated differently had it known the answer. Prior work shows this either corrupts training by leaking the answer into the gradient, or produces a weak signal that cannot distinguish decisive steps from filler, since both look equally surprising relative to the model's baseline. We propose Contrastive Evidence Policy Optimization (CEPO), which asks a sharper question at every token: not just "does the correct answer favor this token?" but "does the correct answer favor it while the wrong answer disfavors it?" A token satisfying both is a genuine reasoning step; one satisfying neither is filler. The wrong-answer teacher is constructed from rejected rollouts already in the training batch, incurring no additional sampling cost. We prove CEPO inherits all structural safety guarantees of the prior state of the art while strictly sharpening credit at decisive tokens, with the improvement vanishing exactly at filler positions. Empirically, CEPO achieves 43.43% and 60.56% average accuracy across five multimodal mathematical reasoning benchmarks at 2B and 4B scale, respectively, versus 41.17% and 57.43% for GRPO under identical training budgets. Distribution-matching self-distillation methods (OPSD, SDPO) fall below the untrained baseline, empirically confirming the information leakage our theory predicts. Our code is available at https://github.com/ahmedheakl/CEPO.

preprint2026arXiv

DocAtlas: Multilingual Document Understanding Across 80+ Languages

Multilingual document understanding remains limited for low-resource languages due to scarce training data and model-based annotation pipelines that perpetuate existing biases. We introduce DocAtlas, a framework that constructs high-fidelity OCR datasets and benchmarks covering 82 languages and 9 evaluation tasks. Our dual pipelines, differential rendering of native DOCX documents and synthetic LaTeX-based generation for right-to-left scripts produce precise structural annotations in a unified DocTag format encoding layout, text, and component types, without learned models for core annotation. Evaluating 16 state-of-the-art models reveals persistent gaps in low-resource scripts. We show that Direct Preference Optimization (DPO) using rendering-derived ground truth as positive signal achieves stable multilingual adaptation, improving both in-domain (+1.9%) and out-of-domain (+1.8%) accuracy without measurable base-language degradation, where supervised fine-tuning degrades out-of-domain performance by up to 21%. Our best variant, DocAtlas-DeepSeek, improves +1.7% over the strongest baseline.

preprint2023arXiv

Vision in adverse weather: Augmentation using CycleGANs with various object detectors for robust perception in autonomous racing

In an autonomous driving system, perception - identification of features and objects from the environment - is crucial. In autonomous racing, high speeds and small margins demand rapid and accurate detection systems. During the race, the weather can change abruptly, causing significant degradation in perception, resulting in ineffective manoeuvres. In order to improve detection in adverse weather, deep-learning-based models typically require extensive datasets captured in such conditions - the collection of which is a tedious, laborious, and costly process. However, recent developments in CycleGAN architectures allow the synthesis of highly realistic scenes in multiple weather conditions. To this end, we introduce an approach of using synthesised adverse condition datasets in autonomous racing (generated using CycleGAN) to improve the performance of four out of five state-of-the-art detectors by an average of 42.7 and 4.4 mAP percentage points in the presence of night-time conditions and droplets, respectively. Furthermore, we present a comparative analysis of five object detectors - identifying the optimal pairing of detector and training data for use during autonomous racing in challenging conditions.

preprint2022arXiv

3D Vision with Transformers: A Survey

The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field. The transformer has been used as a replacement for the widely used convolution operators, due to its ability to learn long-range dependencies. This replacement was proven to be successful in numerous tasks, in which several state-of-the-art methods rely on transformers for better learning. In computer vision, the 3D field has also witnessed an increase in employing the transformer for 3D convolution neural networks and multi-layer perceptron networks. Although a number of surveys have focused on transformers in vision in general, 3D vision requires special attention due to the difference in data representation and processing when compared to 2D vision. In this work, we present a systematic and thorough review of more than 100 transformers methods for different 3D vision tasks, including classification, segmentation, detection, completion, pose estimation, and others. We discuss transformer design in 3D vision, which allows it to process data with various 3D representations. For each application, we highlight key properties and contributions of proposed transformer-based methods. To assess the competitiveness of these methods, we compare their performance to common non-transformer methods on 12 3D benchmarks. We conclude the survey by discussing different open directions and challenges for transformers in 3D vision. In addition to the presented papers, we aim to frequently update the latest relevant papers along with their corresponding implementations at: https://github.com/lahoud/3d-vision-transformers.

preprint2022arXiv

A Novel Incremental Learning Driven Instance Segmentation Framework to Recognize Highly Cluttered Instances of the Contraband Items

Screening cluttered and occluded contraband items from baggage X-ray scans is a cumbersome task even for the expert security staff. This paper presents a novel strategy that extends a conventional encoder-decoder architecture to perform instance-aware segmentation and extract merged instances of contraband items without using any additional sub-network or an object detector. The encoder-decoder network first performs conventional semantic segmentation and retrieves cluttered baggage items. The model then incrementally evolves during training to recognize individual instances using significantly reduced training batches. To avoid catastrophic forgetting, a novel objective function minimizes the network loss in each iteration by retaining the previously acquired knowledge while learning new class representations and resolving their complex structural inter-dependencies through Bayesian inference. A thorough evaluation of our framework on two publicly available X-ray datasets shows that it outperforms state-of-the-art methods, especially within the challenging cluttered scenarios, while achieving an optimal trade-off between detection accuracy and efficiency.

preprint2022arXiv

AVisT: A Benchmark for Visual Object Tracking in Adverse Visibility

One of the key factors behind the recent success in visual tracking is the availability of dedicated benchmarks. While being greatly benefiting to the tracking research, existing benchmarks do not pose the same difficulty as before with recent trackers achieving higher performance mainly due to (i) the introduction of more sophisticated transformers-based methods and (ii) the lack of diverse scenarios with adverse visibility such as, severe weather conditions, camouflage and imaging effects. We introduce AVisT, a dedicated benchmark for visual tracking in diverse scenarios with adverse visibility. AVisT comprises 120 challenging sequences with 80k annotated frames, spanning 18 diverse scenarios broadly grouped into five attributes with 42 object categories. The key contribution of AVisT is diverse and challenging scenarios covering severe weather conditions such as, dense fog, heavy rain and sandstorm; obstruction effects including, fire, sun glare and splashing water; adverse imaging effects such as, low-light; target effects including, small targets and distractor objects along with camouflage. We further benchmark 17 popular and recent trackers on AVisT with detailed analysis of their tracking performance across attributes, demonstrating a big room for improvement in performance. We believe that AVisT can greatly benefit the tracking community by complementing the existing benchmarks, in developing new creative tracking solutions in order to continue pushing the boundaries of the state-of-the-art. Our dataset along with the complete tracking performance evaluation is available at: https://github.com/visionml/pytracking

preprint2022arXiv

Burst Image Restoration and Enhancement

Modern handheld devices can acquire burst image sequence in a quick succession. However, the individual acquired frames suffer from multiple degradations and are misaligned due to camera shake and object motions. The goal of Burst Image Restoration is to effectively combine complimentary cues across multiple burst frames to generate high-quality outputs. Towards this goal, we develop a novel approach by solely focusing on the effective information exchange between burst frames, such that the degradations get filtered out while the actual scene details are preserved and enhanced. Our central idea is to create a set of pseudo-burst features that combine complementary information from all the input burst frames to seamlessly exchange information. However, the pseudo-burst cannot be successfully created unless the individual burst frames are properly aligned to discount inter-frame movements. Therefore, our approach initially extracts pre-processed features from each burst frame and matches them using an edge-boosting burst alignment module. The pseudo-burst features are then created and enriched using multi-scale contextual information. Our final step is to adaptively aggregate information from the pseudo-burst features to progressively increase resolution in multiple stages while merging the pseudo-burst features. In comparison to existing works that usually follow a late fusion scheme with single-stage upsampling, our approach performs favorably, delivering state-of-the-art performance on burst superresolution, burst low-light image enhancement, and burst denoising tasks. The source code and pre-trained models are available at \url{https://github.com/akshaydudhane16/BIPNet}.

preprint2022arXiv

Class-agnostic Object Detection with Multi-modal Transformer

What constitutes an object? This has been a long-standing question in computer vision. Towards this goal, numerous learning-free and learning-based approaches have been developed to score objectness. However, they generally do not scale well across new domains and novel objects. In this paper, we advocate that existing methods lack a top-down supervision signal governed by human-understandable semantics. For the first time in literature, we demonstrate that Multi-modal Vision Transformers (MViT) trained with aligned image-text pairs can effectively bridge this gap. Our extensive experiments across various domains and novel objects show the state-of-the-art performance of MViTs to localize generic objects in images. Based on the observation that existing MViTs do not include multi-scale feature processing and usually require longer training schedules, we develop an efficient MViT architecture using multi-scale deformable attention and late vision-language fusion. We show the significance of MViT proposals in a diverse range of applications including open-world object detection, salient and camouflage object detection, supervised and self-supervised detection tasks. Further, MViTs can adaptively generate proposals given a specific language query and thus offer enhanced interactability. Code: \url{https://git.io/J1HPY}.

preprint2022arXiv

Energy-based Latent Aligner for Incremental Learning

Deep learning models tend to forget their earlier knowledge while incrementally learning new tasks. This behavior emerges because the parameter updates optimized for the new tasks may not align well with the updates suitable for older tasks. The resulting latent representation mismatch causes forgetting. In this work, we propose ELI: Energy-based Latent Aligner for Incremental Learning, which first learns an energy manifold for the latent representations such that previous task latents will have low energy and the current task latents have high energy values. This learned manifold is used to counter the representational shift that happens during incremental learning. The implicit regularization that is offered by our proposed methodology can be used as a plug-and-play module in existing incremental learning methodologies. We validate this through extensive evaluation on CIFAR-100, ImageNet subset, ImageNet 1k and Pascal VOC datasets. We observe consistent improvement when ELI is added to three prominent methodologies in class-incremental learning, across multiple incremental settings. Further, when added to the state-of-the-art incremental object detector, ELI provides over 5% improvement in detection accuracy, corroborating its effectiveness and complementary advantage to existing art.

preprint2022arXiv

Guidance Through Surrogate: Towards a Generic Diagnostic Attack

Adversarial training is an effective approach to make deep neural networks robust against adversarial attacks. Recently, different adversarial training defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well studied adversarial attacks such as PGD. High adversarial robustness can also arise if an attack fails to find adversarial gradient directions, a phenomenon known as `gradient masking'. In this work, we analyse the effect of label smoothing on adversarial training as one of the potential causes of gradient masking. We then develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA). Our attack approach is based on a `match and deceive' loss that finds optimal adversarial directions through guidance from a surrogate model. Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size. Furthermore, our proposed G-PGA is generic, thus it can be combined with an ensemble attack strategy as we demonstrate for the case of Auto-Attack, leading to efficiency and convergence speed improvements. More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.

preprint2022arXiv

Learning Enriched Features for Fast Image Restoration and Enhancement

Given a degraded input image, image restoration aims to recover the missing high-quality image content. Numerous applications demand effective image restoration, e.g., computational photography, surveillance, autonomous vehicles, and remote sensing. Significant advances in image restoration have been made in recent years, dominated by convolutional neural networks (CNNs). The widely-used CNN-based methods typically operate either on full-resolution or on progressively low-resolution representations. In the former case, spatial details are preserved but the contextual information cannot be precisely encoded. In the latter case, generated outputs are semantically reliable but spatially less accurate. This paper presents a new architecture with a holistic goal of maintaining spatially-precise high-resolution representations through the entire network, and receiving complementary contextual information from the low-resolution representations. The core of our approach is a multi-scale residual block containing the following key elements: (a) parallel multi-resolution convolution streams for extracting multi-scale features, (b) information exchange across the multi-resolution streams, (c) non-local attention mechanism for capturing contextual information, and (d) attention based multi-scale feature aggregation. Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Extensive experiments on six real image benchmark datasets demonstrate that our method, named as MIRNet-v2 , achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement. The source code and pre-trained models are available at https://github.com/swz30/MIRNetv2

preprint2022arXiv

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29.00dB on DIV2K validation set. IMDN is set as the baseline for efficiency measurement. The challenge had 3 tracks including the main track (runtime), sub-track one (model complexity), and sub-track two (overall performance). In the main track, the practical runtime performance of the submissions was evaluated. The rank of the teams were determined directly by the absolute value of the average runtime on the validation set and test set. In sub-track one, the number of parameters and FLOPs were considered. And the individual rankings of the two metrics were summed up to determine a final ranking in this track. In sub-track two, all of the five metrics mentioned in the description of the challenge including runtime, parameter count, FLOPs, activations, and memory consumption were considered. Similar to sub-track one, the rankings of five metrics were summed up to determine a final ranking. The challenge had 303 registered participants, and 43 teams made valid submissions. They gauge the state-of-the-art in efficient single image super-resolution.

preprint2022arXiv

On Improving Adversarial Transferability of Vision Transformers

Vision transformers (ViTs) process input images as sequences of patches via self-attention; a radically different architecture than convolutional neural networks (CNNs). This makes it interesting to study the adversarial feature space of ViT models and their transferability. In particular, we observe that adversarial patterns found via conventional adversarial attacks show very \emph{low} black-box transferability even for large ViT models. We show that this phenomenon is only due to the sub-optimal attack procedures that do not leverage the true representation potential of ViTs. A deep ViT is composed of multiple blocks, with a consistent architecture comprising of self-attention and feed-forward layers, where each block is capable of independently producing a class token. Formulating an attack using only the last class token (conventional approach) does not directly leverage the discriminative information stored in the earlier tokens, leading to poor adversarial transferability of ViTs. Using the compositional nature of ViT models, we enhance transferability of existing attacks by introducing two novel strategies specific to the architecture of ViT models. (i) Self-Ensemble: We propose a method to find multiple discriminative pathways by dissecting a single ViT model into an ensemble of networks. This allows explicitly utilizing class-specific information at each ViT block. (ii) Token Refinement: We then propose to refine the tokens to further enhance the discriminative capacity at each block of ViT. Our token refinement systematically combines the class tokens with structural information preserved within the patch tokens.

preprint2022arXiv

OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning

Semi-supervised learning (SSL) is one of the dominant approaches to address the annotation bottleneck of supervised learning. Recent SSL methods can effectively leverage a large repository of unlabeled data to improve performance while relying on a small set of labeled data. One common assumption in most SSL methods is that the labeled and unlabeled data are from the same data distribution. However, this is hardly the case in many real-world scenarios, which limits their applicability. In this work, instead, we attempt to solve the challenging open-world SSL problem that does not make such an assumption. In the open-world SSL problem, the objective is to recognize samples of known classes, and simultaneously detect and cluster samples belonging to novel classes present in unlabeled data. This work introduces OpenLDN that utilizes a pairwise similarity loss to discover novel classes. Using a bi-level optimization rule this pairwise similarity loss exploits the information available in the labeled set to implicitly cluster novel class samples, while simultaneously recognizing samples from known classes. After discovering novel classes, OpenLDN transforms the open-world SSL problem into a standard SSL problem to achieve additional performance gains using existing SSL methods. Our extensive experiments demonstrate that OpenLDN outperforms the current state-of-the-art methods on multiple popular classification benchmarks while providing a better accuracy/training time trade-off.

preprint2022arXiv

OW-DETR: Open-world Detection Transformer

Open-world object detection (OWOD) is a challenging computer vision problem, where the task is to detect a known set of object categories while simultaneously identifying unknown objects. Additionally, the model must incrementally learn new classes that become known in the next training episodes. Distinct from standard object detection, the OWOD setting poses significant challenges for generating quality candidate proposals on potentially unknown objects, separating the unknown objects from the background and detecting diverse unknown objects. Here, we introduce a novel end-to-end transformer-based framework, OW-DETR, for open-world object detection. The proposed OW-DETR comprises three dedicated components namely, attention-driven pseudo-labeling, novelty classification and objectness scoring to explicitly address the aforementioned OWOD challenges. Our OW-DETR explicitly encodes multi-scale contextual information, possesses less inductive bias, enables knowledge transfer from known classes to the unknown class and can better discriminate between unknown objects and background. Comprehensive experiments are performed on two benchmarks: MS-COCO and PASCAL VOC. The extensive ablations reveal the merits of our proposed contributions. Further, our model outperforms the recently introduced OWOD approach, ORE, with absolute gains ranging from 1.8% to 3.3% in terms of unknown recall on MS-COCO. In the case of incremental object detection, OW-DETR outperforms the state-of-the-art for all settings on PASCAL VOC. Our code is available at https://github.com/akshitac8/OW-DETR.

preprint2022arXiv

Restormer: Efficient Transformer for High-Resolution Image Restoration

Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks. Recently, another class of neural architectures, Transformers, have shown significant performance gains on natural language and high-level vision tasks. While the Transformer model mitigates the shortcomings of CNNs (i.e., limited receptive field and inadaptability to input content), its computational complexity grows quadratically with the spatial resolution, therefore making it infeasible to apply to most image restoration tasks involving high-resolution images. In this work, we propose an efficient Transformer model by making several key designs in the building blocks (multi-head attention and feed-forward network) such that it can capture long-range pixel interactions, while still remaining applicable to large images. Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks, including image deraining, single-image motion deblurring, defocus deblurring (single-image and dual-pixel data), and image denoising (Gaussian grayscale/color denoising, and real image denoising). The source code and pre-trained models are available at https://github.com/swz30/Restormer.

preprint2022arXiv

ROAD: The ROad event Awareness Dataset for Autonomous Driving

Humans drive in a holistic fashion which entails, in particular, understanding dynamic road events and their evolution. Injecting these capabilities in autonomous vehicles can thus take situational awareness and decision making closer to human-level performance. To this purpose, we introduce the ROad event Awareness Dataset (ROAD) for Autonomous Driving, to our knowledge the first of its kind. ROAD is designed to test an autonomous vehicle's ability to detect road events, defined as triplets composed by an active agent, the action(s) it performs and the corresponding scene locations. ROAD comprises videos originally from the Oxford RobotCar Dataset annotated with bounding boxes showing the location in the image plane of each road event. We benchmark various detection tasks, proposing as a baseline a new incremental algorithm for online road event awareness termed 3D-RetinaNet. We also report the performance on the ROAD tasks of Slowfast and YOLOv5 detectors, as well as that of the winners of the ICCV2021 ROAD challenge, which highlight the challenges faced by situation awareness in autonomous driving. ROAD is designed to allow scholars to investigate exciting tasks such as complex (road) activity detection, future event anticipation and continual learning. The dataset is available at https://github.com/gurkirt/road-dataset; the baseline can be found at https://github.com/gurkirt/3D-RetinaNet.

preprint2022arXiv

Self-Supervised Video Object Segmentation via Cutout Prediction and Tagging

We propose a novel self-supervised Video Object Segmentation (VOS) approach that strives to achieve better object-background discriminability for accurate object segmentation. Distinct from previous self-supervised VOS methods, our approach is based on a discriminative learning loss formulation that takes into account both object and background information to ensure object-background discriminability, rather than using only object appearance. The discriminative learning loss comprises cutout-based reconstruction (cutout region represents part of a frame, whose pixels are replaced with some constant values) and tag prediction loss terms. The cutout-based reconstruction term utilizes a simple cutout scheme to learn the pixel-wise correspondence between the current and previous frames in order to reconstruct the original current frame with added cutout region in it. The introduced cutout patch guides the model to focus as much on the significant features of the object of interest as the less significant ones, thereby implicitly equipping the model to address occlusion-based scenarios. Next, the tag prediction term encourages object-background separability by grouping tags of all pixels in the cutout region that are similar, while separating them from the tags of the rest of the reconstructed frame pixels. Additionally, we introduce a zoom-in scheme that addresses the problem of small object segmentation by capturing fine structural information at multiple scales. Our proposed approach, termed CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS. A detailed ablation showcases the importance of the proposed loss formulation to effectively capture object-background discriminability and the impact of our zoom-in scheme to accurately segment small-sized objects.

preprint2022arXiv

Self-supervised Video Transformer

In this paper, we propose self-supervised training for video transformers using unlabeled video data. From a given video, we create local and global spatiotemporal views with varying spatial sizes and frame rates. Our self-supervised objective seeks to match the features of these different views representing the same video, to be invariant to spatiotemporal variations in actions. To the best of our knowledge, the proposed approach is the first to alleviate the dependency on negative samples or dedicated memory banks in Self-supervised Video Transformer (SVT). Further, owing to the flexibility of Transformer models, SVT supports slow-fast video processing within a single architecture using dynamically adjusted positional encoding and supports long-term relationship modeling along spatiotemporal dimensions. Our approach performs well on four action recognition benchmarks (Kinetics-400, UCF-101, HMDB-51, and SSv2) and converges faster with small batch sizes. Code: https://git.io/J1juJ

preprint2022arXiv

Spatio-temporal Relation Modeling for Few-shot Action Recognition

We propose a novel few-shot action recognition framework, STRM, which enhances class-specific feature discriminability while simultaneously learning higher-order temporal representations. The focus of our approach is a novel spatio-temporal enrichment module that aggregates spatial and temporal contexts with dedicated local patch-level and global frame-level feature enrichment sub-modules. Local patch-level enrichment captures the appearance-based characteristics of actions. On the other hand, global frame-level enrichment explicitly encodes the broad temporal context, thereby capturing the relevant object features over time. The resulting spatio-temporally enriched representations are then utilized to learn the relational matching between query and support action sub-sequences. We further introduce a query-class similarity classifier on the patch-level enriched features to enhance class-specific feature discriminability by reinforcing the feature learning at different stages in the proposed framework. Experiments are performed on four few-shot action recognition benchmarks: Kinetics, SSv2, HMDB51 and UCF101. Our extensive ablation study reveals the benefits of the proposed contributions. Furthermore, our approach sets a new state-of-the-art on all four benchmarks. On the challenging SSv2 benchmark, our approach achieves an absolute gain of $3.5\%$ in classification accuracy, as compared to the best existing method in the literature. Our code and models are available at https://github.com/Anirudh257/strm.

preprint2022arXiv

Transformers in Medical Imaging: A Survey

Following unprecedented success on the natural language tasks, Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results and prompting researchers to reconsider the supremacy of convolutional neural networks (CNNs) as {de facto} operators. Capitalizing on these advances in computer vision, the medical imaging field has also witnessed growing interest for Transformers that can capture global context compared to CNNs with local receptive fields. Inspired from this transition, in this survey, we attempt to provide a comprehensive review of the applications of Transformers in medical imaging covering various aspects, ranging from recently proposed architectural designs to unsolved issues. Specifically, we survey the use of Transformers in medical image segmentation, detection, classification, reconstruction, synthesis, registration, clinical report generation, and other tasks. In particular, for each of these applications, we develop taxonomy, identify application-specific challenges as well as provide insights to solve them, and highlight recent trends. Further, we provide a critical discussion of the field's current state as a whole, including the identification of key challenges, open problems, and outlining promising future directions. We hope this survey will ignite further interest in the community and provide researchers with an up-to-date reference regarding applications of Transformer models in medical imaging. Finally, to cope with the rapid development in this field, we intend to regularly update the relevant latest papers and their open-source implementations at \url{https://github.com/fahadshamshad/awesome-transformers-in-medical-imaging}.

preprint2022arXiv

Transformers in Remote Sensing: A Survey

Deep learning-based algorithms have seen a massive popularity in different areas of remote sensing image analysis over the past decade. Recently, transformers-based architectures, originally introduced in natural language processing, have pervaded computer vision field where the self-attention mechanism has been utilized as a replacement to the popular convolution operator for capturing long-range dependencies. Inspired by recent advances in computer vision, remote sensing community has also witnessed an increased exploration of vision transformers for a diverse set of tasks. Although a number of surveys have focused on transformers in computer vision in general, to the best of our knowledge we are the first to present a systematic review of recent advances based on transformers in remote sensing. Our survey covers more than 60 recent transformers-based methods for different remote sensing problems in sub-areas of remote sensing: very high-resolution (VHR), hyperspectral (HSI) and synthetic aperture radar (SAR) imagery. We conclude the survey by discussing different challenges and open issues of transformers in remote sensing. Additionally, we intend to frequently update and maintain the latest transformers in remote sensing papers with their respective code at: https://github.com/VIROBO-15/Transformer-in-Remote-Sensing

preprint2022arXiv

Transformers in Vision: A Survey

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence as compared to recurrent networks e.g., Long short-term memory (LSTM). Different from convolutional networks, Transformers require minimal inductive biases for their design and are naturally suited as set-functions. Furthermore, the straightforward design of Transformers allows processing multiple modalities (e.g., images, videos, text and speech) using similar processing blocks and demonstrates excellent scalability to very large capacity networks and huge datasets. These strengths have led to exciting progress on a number of vision tasks using Transformer networks. This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline. We start with an introduction to fundamental concepts behind the success of Transformers i.e., self-attention, large-scale pre-training, and bidirectional encoding. We then cover extensive applications of transformers in vision including popular recognition tasks (e.g., image classification, object detection, action recognition, and segmentation), generative modeling, multi-modal tasks (e.g., visual-question answering, visual reasoning, and visual grounding), video processing (e.g., activity recognition, video forecasting), low-level vision (e.g., image super-resolution, image enhancement, and colorization) and 3D analysis (e.g., point cloud classification and segmentation). We compare the respective advantages and limitations of popular techniques both in terms of architectural design and their experimental value. Finally, we provide an analysis on open research directions and possible future works.

preprint2022arXiv

Video Instance Segmentation via Multi-scale Spatio-temporal Split Attention Transformer

State-of-the-art transformer-based video instance segmentation (VIS) approaches typically utilize either single-scale spatio-temporal features or per-frame multi-scale features during the attention computations. We argue that such an attention computation ignores the multi-scale spatio-temporal feature relationships that are crucial to tackle target appearance deformations in videos. To address this issue, we propose a transformer-based VIS framework, named MS-STS VIS, that comprises a novel multi-scale spatio-temporal split (MS-STS) attention module in the encoder. The proposed MS-STS module effectively captures spatio-temporal feature relationships at multiple scales across frames in a video. We further introduce an attention block in the decoder to enhance the temporal consistency of the detected instances in different frames of a video. Moreover, an auxiliary discriminator is introduced during training to ensure better foreground-background separability within the multi-scale spatio-temporal feature space. We conduct extensive experiments on two benchmarks: Youtube-VIS (2019 and 2021). Our MS-STS VIS achieves state-of-the-art performance on both benchmarks. When using the ResNet50 backbone, our MS-STS achieves a mask AP of 50.1 %, outperforming the best reported results in literature by 2.7 % and by 4.8 % at higher overlap threshold of AP_75, while being comparable in model size and speed on Youtube-VIS 2019 val. set. When using the Swin Transformer backbone, MS-STS VIS achieves mask AP of 61.0 % on Youtube-VIS 2019 val. set. Our code and models are available at https://github.com/OmkarThawakar/MSSTS-VIS.

preprint2020arXiv

A Deep Journey into Super-resolution: A survey

Deep convolutional networks based super-resolution is a fast-growing field with numerous practical applications. In this exposition, we extensively compare 30+ state-of-the-art super-resolution Convolutional Neural Networks (CNNs) over three classical and three recently introduced challenging datasets to benchmark single image super-resolution. We introduce a taxonomy for deep-learning based super-resolution networks that groups existing methods into nine categories including linear, residual, multi-branch, recursive, progressive, attention-based and adversarial designs. We also provide comparisons between the models in terms of network complexity, memory footprint, model input and output, learning details, the type of network losses and important architectural differences (e.g., depth, skip-connections, filters). The extensive evaluation performed, shows the consistent and rapid growth in the accuracy in the past few years along with a corresponding boost in model complexity and the availability of large-scale datasets. It is also observed that the pioneering methods identified as the benchmark have been significantly outperformed by the current contenders. Despite the progress in recent years, we identify several shortcomings of existing techniques and provide future research directions towards the solution of these open problems.

preprint2020arXiv

A Self-supervised Approach for Adversarial Robustness

Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems e.g., for classification, segmentation and object detection. The vulnerability of DNNs against such attacks can prove a major roadblock towards their real-world deployment. Transferability of adversarial examples demand generalizable defenses that can provide cross-task protection. Adversarial training that enhances robustness by modifying target model's parameters lacks such generalizability. On the other hand, different input processing based defenses fall short in the face of continuously evolving attacks. In this paper, we take the first step to combine the benefits of both approaches and propose a self-supervised adversarial training mechanism in the input space. By design, our defense is a generalizable approach and provides significant robustness against the \textbf{unseen} adversarial attacks (\eg by reducing the success rate of translation-invariant \textbf{ensemble} attack from 82.6\% to 31.9\% in comparison to previous state-of-the-art). It can be deployed as a plug-and-play solution to protect a variety of vision systems, as we demonstrate for the case of classification, segmentation and detection. Code is available at: {\small\url{https://github.com/Muzammal-Naseer/NRP}}.

preprint2020arXiv

An Adaptive Random Path Selection Approach for Incremental Learning

In a conventional supervised learning setting, a machine learning model has access to examples of all object classes that are desired to be recognized during the inference stage. This results in a fixed model that lacks the flexibility to adapt to new learning tasks. In practical settings, learning tasks often arrive in a sequence and the models must continually learn to increment their previously acquired knowledge. Existing incremental learning approaches fall well below the state-of-the-art cumulative models that use all training classes at once. In this paper, we propose a random path selection algorithm, called Adaptive RPS-Net, that progressively chooses optimal paths for the new tasks while encouraging parameter sharing between tasks. We introduce a new network capacity measure that enables us to automatically switch paths if the already used resources are saturated. Since the proposed path-reuse strategy ensures forward knowledge transfer, our approach is efficient and has considerably less computation overhead. As an added novelty, the proposed model integrates knowledge distillation and retrospection along with the path selection strategy to overcome catastrophic forgetting. In order to maintain an equilibrium between previous and newly acquired knowledge, we propose a simple controller to dynamically balance the model plasticity. Through extensive experiments, we demonstrate that the Adaptive RPS-Net method surpasses the state-of-the-art performance for incremental learning and by utilizing parallel computation this method can run in constant time with nearly the same efficiency as a conventional deep convolutional neural network.

preprint2020arXiv

Any-Shot Object Detection

Previous work on novel object detection considers zero or few-shot settings where none or few examples of each category are available for training. In real world scenarios, it is less practical to expect that 'all' the novel classes are either unseen or {have} few-examples. Here, we propose a more realistic setting termed 'Any-shot detection', where totally unseen and few-shot categories can simultaneously co-occur during inference. Any-shot detection offers unique challenges compared to conventional novel object detection such as, a high imbalance between unseen, few-shot and seen object classes, susceptibility to forget base-training while learning novel classes and distinguishing novel classes from the background. To address these challenges, we propose a unified any-shot detection model, that can concurrently learn to detect both zero-shot and few-shot object classes. Our core idea is to use class semantics as prototypes for object detection, a formulation that naturally minimizes knowledge forgetting and mitigates the class-imbalance in the label space. Besides, we propose a rebalanced loss function that emphasizes difficult few-shot cases but avoids overfitting on the novel classes to allow detection of totally unseen classes. Without bells and whistles, our framework can also be used solely for Zero-shot detection and Few-shot detection tasks. We report extensive experiments on Pascal VOC and MS-COCO datasets where our approach is shown to provide significant improvements.

preprint2020arXiv

Blended Convolution and Synthesis for Efficient Discrimination of 3D Shapes

Existing networks directly learn feature representations on 3D point clouds for shape analysis. We argue that 3D point clouds are highly redundant and hold irregular (permutation-invariant) structure, which makes it difficult to achieve inter-class discrimination efficiently. In this paper, we propose a two-faceted solution to this problem that is seamlessly integrated in a single `Blended Convolution and Synthesis' layer. This fully differentiable layer performs two critical tasks in succession. In the first step, it projects the input 3D point clouds into a latent 3D space to synthesize a highly compact and more inter-class discriminative point cloud representation. Since, 3D point clouds do not follow a Euclidean topology, standard 2/3D Convolutional Neural Networks offer limited representation capability. Therefore, in the second step, it uses a novel 3D convolution operator functioning inside the unit ball ($\mathbb{B}^3$) to extract useful volumetric features. We extensively derive formulae to achieve both translation and rotation of our novel convolution kernels. Finally, using the proposed techniques we present an extremely light-weight, end-to-end architecture that achieves compelling results on 3D shape recognition and retrieval.

preprint2020arXiv

Cascaded Structure Tensor Framework for Robust Identification of Heavily Occluded Baggage Items from X-ray Scans

In the last two decades, baggage scanning has globally become one of the prime aviation security concerns. Manual screening of the baggage items is tedious, error-prone, and compromise privacy. Hence, many researchers have developed X-ray imagery-based autonomous systems to address these shortcomings. This paper presents a cascaded structure tensor framework that can automatically extract and recognize suspicious items in heavily occluded and cluttered baggage. The proposed framework is unique, as it intelligently extracts each object by iteratively picking contour-based transitional information from different orientations and uses only a single feed-forward convolutional neural network for the recognition. The proposed framework has been rigorously evaluated using a total of 1,067,381 X-ray scans from publicly available GDXray and SIXray datasets where it outperformed the state-of-the-art solutions by achieving the mean average precision score of 0.9343 on GDXray and 0.9595 on SIXray for recognizing the highly cluttered and overlapping suspicious items. Furthermore, the proposed framework computationally achieves 4.76\% superior run-time performance as compared to the existing solutions based on publicly available object detectors

preprint2020arXiv

CycleISP: Real Image Restoration via Improved Data Synthesis

The availability of large-scale datasets has helped unleash the true potential of deep convolutional neural networks (CNNs). However, for the single-image denoising problem, capturing a real dataset is an unacceptably expensive and cumbersome procedure. Consequently, image denoising algorithms are mostly developed and evaluated on synthetic data that is usually generated with a widespread assumption of additive white Gaussian noise (AWGN). While the CNNs achieve impressive results on these synthetic datasets, they do not perform well when applied on real camera images, as reported in recent benchmark datasets. This is mainly because the AWGN is not adequate for modeling the real camera noise which is signal-dependent and heavily transformed by the camera imaging pipeline. In this paper, we present a framework that models camera imaging pipeline in forward and reverse directions. It allows us to produce any number of realistic image pairs for denoising both in RAW and sRGB spaces. By training a new image denoising network on realistic synthetic data, we achieve the state-of-the-art performance on real camera benchmark datasets. The parameters in our model are ~5 times lesser than the previous best method for RAW denoising. Furthermore, we demonstrate that the proposed framework generalizes beyond image denoising problem e.g., for color matching in stereoscopic cinema. The source code and pre-trained models are available at https://github.com/swz30/CycleISP.

preprint2020arXiv

Durocmien: A deep framework for duroc skeleton extraction in constraint environment

Farm animal behavior analysis is a crucial tasks for the industrial farming. In an indoor farm setting, extracting Key joints of animal is essential for tracking the animal for longer period of time. In this paper, we proposed a deep network named DUROCMIEN that exploit transfer learning to trained the network for the Duroc, a domestic breed of pig, an end to end fashion. The backbone of the architecture is based on hourglass stacked dense-net. In order to train the network, key frames are selected from the test data using K-mean sampler. In total, 9 Keypoints are annotated that gives a brief detailed behavior analysis in the farm setting. Extensive experiments are conducted and the quantitative results show that the network has the potential of increasing the tracking performance by a substantial margin.

preprint2020arXiv

iTAML: An Incremental Task-Agnostic Meta-learning Approach

Humans can continuously learn new knowledge as their experience grows. In contrast, previous learning in deep neural networks can quickly fade out when they are trained on a new task. In this paper, we hypothesize this problem can be avoided by learning a set of generalized parameters, that are neither specific to old nor new tasks. In this pursuit, we introduce a novel meta-learning approach that seeks to maintain an equilibrium between all the encountered tasks. This is ensured by a new meta-update rule which avoids catastrophic forgetting. In comparison to previous meta-learning techniques, our approach is task-agnostic. When presented with a continuum of data, our model automatically identifies the task and quickly adapts to it with just a single update. We perform extensive experiments on five datasets in a class-incremental setting, leading to significant improvements over the state of the art methods (e.g., a 21.3% boost on CIFAR100 with 10 incremental tasks). Specifically, on large-scale datasets that generally prove difficult cases for incremental learning, our approach delivers absolute gains as high as 19.1% and 7.4% on ImageNet and MS-Celeb datasets, respectively.

preprint2020arXiv

Learning Enriched Features for Real Image Restoration and Enhancement

With the goal of recovering high-quality image content from its degraded version, image restoration enjoys numerous applications, such as in surveillance, computational photography, medical imaging, and remote sensing. Recently, convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task. Existing CNN-based methods typically operate either on full-resolution or on progressively low-resolution representations. In the former case, spatially precise but contextually less robust results are achieved, while in the latter case, semantically reliable but spatially less accurate outputs are generated. In this paper, we present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network and receiving strong contextual information from the low-resolution representations. The core of our approach is a multi-scale residual block containing several key elements: (a) parallel multi-resolution convolution streams for extracting multi-scale features, (b) information exchange across the multi-resolution streams, (c) spatial and channel attention mechanisms for capturing contextual information, and (d) attention based multi-scale feature aggregation. In a nutshell, our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Extensive experiments on five real image benchmark datasets demonstrate that our method, named as MIRNet, achieves state-of-the-art results for a variety of image processing tasks, including image denoising, super-resolution, and image enhancement. The source code and pre-trained models are available at https://github.com/swz30/MIRNet.

preprint2020arXiv

Polarity Loss for Zero-shot Object Detection

Conventional object detection models require large amounts of training data. In comparison, humans can recognize previously unseen objects by merely knowing their semantic description. To mimic similar behaviour, zero-shot object detection aims to recognize and localize 'unseen' object instances by using only their semantic information. The model is first trained to learn the relationships between visual and semantic domains for seen objects, later transferring the acquired knowledge to totally unseen objects. This setting gives rise to the need for correct alignment between visual and semantic concepts, so that the unseen objects can be identified using only their semantic attributes. In this paper, we propose a novel loss function called 'Polarity loss', that promotes correct visual-semantic alignment for an improved zero-shot object detection. On one hand, it refines the noisy semantic embeddings via metric learning on a 'Semantic vocabulary' of related concepts to establish a better synergy between visual and semantic domains. On the other hand, it explicitly maximizes the gap between positive and negative predictions to achieve better discrimination between seen, unseen and background objects. Our approach is inspired by embodiment theories in cognitive science, that claim human semantic understanding to be grounded in past experiences (seen objects), related linguistic concepts (word vocabulary) and visual perception (seen/unseen object images). We conduct extensive evaluations on MS-COCO and Pascal VOC datasets, showing significant improvements over state of the art.

preprint2020arXiv

Self-supervised Knowledge Distillation for Few-shot Learning

Real-world contains an overwhelmingly large number of object classes, learning all of which at once is infeasible. Few shot learning is a promising learning paradigm due to its ability to learn out of order distributions quickly with only a few samples. Recent works [7, 41] show that simply learning a good feature embedding can outperform more sophisticated meta-learning and metric learning algorithms for few-shot learning. In this paper, we propose a simple approach to improve the representation capacity of deep neural networks for few-shot learning tasks. We follow a two-stage learning process: First, we train a neural network to maximize the entropy of the feature embedding, thus creating an optimal output manifold using a self-supervised auxiliary loss. In the second stage, we minimize the entropy on feature embedding by bringing self-supervised twins together, while constraining the manifold with student-teacher distillation. Our experiments show that, even in the first stage, self-supervision can outperform current state-of-the-art methods, with further gains achieved by our second stage distillation process. Our codes are available at: https://github.com/brjathu/SKD.

preprint2020arXiv

Semi-supervised Learning for Few-shot Image-to-Image Translation

In the last few years, unpaired image-to-image translation has witnessed remarkable progress. Although the latest methods are able to generate realistic images, they crucially rely on a large number of labeled images. Recently, some methods have tackled the challenging setting of few-shot image-to-image translation, reducing the labeled data requirements for the target domain during inference. In this work, we go one step further and reduce the amount of required labeled data also from the source domain during training. To do so, we propose applying semi-supervised learning via a noise-tolerant pseudo-labeling procedure. We also apply a cycle consistency constraint to further exploit the information from unlabeled images, either from the same dataset or external. Additionally, we propose several structural modifications to facilitate the image translation task under these circumstances. Our semi-supervised method for few-shot image translation, called SEMIT, achieves excellent results on four different datasets using as little as 10% of the source labels, and matches the performance of the main fully-supervised competitor using only 20% labeled data. Our code and models are made public at: https://github.com/yaxingwang/SEMIT.

preprint2020arXiv

Spectral-GANs for High-Resolution 3D Point-cloud Generation

Point-clouds are a popular choice for vision and graphics tasks due to their accurate shape description and direct acquisition from range-scanners. This demands the ability to synthesize and reconstruct high-quality point-clouds. Current deep generative models for 3D data generally work on simplified representations (e.g., voxelized objects) and cannot deal with the inherent redundancy and irregularity in point-clouds. A few recent efforts on 3D point-cloud generation offer limited resolution and their complexity grows with the increase in output resolution. In this paper, we develop a principled approach to synthesize 3D point-clouds using a spectral-domain Generative Adversarial Network (GAN). Our spectral representation is highly structured and allows us to disentangle various frequency bands such that the learning task is simplified for a GAN model. As compared to spatial-domain generative approaches, our formulation allows us to generate arbitrary number of points high-resolution point-clouds with minimal computational overhead. Furthermore, we propose a fully differentiable block to transform from {the} spectral to the spatial domain and back, thereby allowing us to integrate knowledge from well-established spatial models. We demonstrate that Spectral-GAN performs well for point-cloud generation task. Additionally, it can learn {a} highly discriminative representation in an unsupervised fashion and can be used to accurately reconstruct 3D objects.

preprint2020arXiv

Towards Partial Supervision for Generic Object Counting in Natural Scenes

Generic object counting in natural scenes is a challenging computer vision problem. Existing approaches either rely on instance-level supervision or absolute count information to train a generic object counter. We introduce a partially supervised setting that significantly reduces the supervision level required for generic object counting. We propose two novel frameworks, named lower-count (LC) and reduced lower-count (RLC), to enable object counting under this setting. Our frameworks are built on a novel dual-branch architecture that has an image classification and a density branch. Our LC framework reduces the annotation cost due to multiple instances in an image by using only lower-count supervision for all object categories. Our RLC framework further reduces the annotation cost arising from large numbers of object categories in a dataset by only using lower-count supervision for a subset of categories and class-labels for the remaining ones. The RLC framework extends our dual-branch LC framework with a novel weight modulation layer and a category-independent density map prediction. Experiments are performed on COCO, Visual Genome and PASCAL 2007 datasets. Our frameworks perform on par with state-of-the-art approaches using higher levels of supervision. Additionally, we demonstrate the applicability of our LC supervised density map for image-level supervised instance segmentation.

preprint2020arXiv

Understanding More about Human and Machine Attention in Deep Neural Networks

Human visual system can selectively attend to parts of a scene for quick perception, a biological mechanism known as Human attention. Inspired by this, recent deep learning models encode attention mechanisms to focus on the most task-relevant parts of the input signal for further processing, which is called Machine/Neural/Artificial attention. Understanding the relation between human and machine attention is important for interpreting and designing neural networks. Many works claim that the attention mechanism offers an extra dimension of interpretability by explaining where the neural networks look. However, recent studies demonstrate that artificial attention maps do not always coincide with common intuition. In view of these conflicting evidence, here we make a systematic study on using artificial attention and human attention in neural network design. With three example computer vision tasks, diverse representative backbones, and famous architectures, corresponding real human gaze data, and systematically conducted large-scale quantitative studies, we quantify the consistency between artificial attention and human visual attention and offer novel insights into existing artificial attention mechanisms by giving preliminary answers to several key questions related to human and artificial attention mechanisms. Overall results demonstrate that human attention can benchmark the meaningful `ground-truth' in attention-driven tasks, where the more the artificial attention is close to human attention, the better the performance; for higher-level vision tasks, it is case-by-case. It would be advisable for attention-driven tasks to explicitly force a better alignment between artificial and human attention to boost the performance; such alignment would also improve the network explainability for higher-level computer vision tasks.

preprint2016arXiv

Environment generated quantum correlations in bipartite qubit-qutrit systems

The dynamics of entanglement and quantum discord for qubit-qutrit systems are studied in the presence of phase damping and amplitude damping noises. Both one way and two couplings of the marginal systems with the environments are considered. Entanglement sudden death is unavoidable under any setup, however, the required time span depends on the way of coupling. On the other hand, the dynamics of quantum discord strongly depends both on the nature of environment and on the number of dimensions of the Hilbert space of the coupled marginal system. We show that freezing and invariance of quantum discord, as previously reported in the literature, are limited to some special cases. Most importantly, it is noted that under some particular coupling the existence of environment can guarantee the generation of nonclassical correlations.

preprint2016arXiv

Relativistic quantum speed limit time in dephasing noise

The behavior of quantum speed limit time (QSLT) for a single free spin $-1/2$ particle described by Gaussian wavepackets in the framework of relativity under dephasing noise is investigated. The dephasing noise acts only on the spin degrees of freedom of the spin$-1/2$ particle. In particular, the effects of initial time parameter, rapidity, average momentum and the size of the wavepackets in the presence of the dephasing noise on the dynamics of evolution process are studied. In general, the effects of relativity monotonically decrease the QSLT in time. In the range of large values of average momentum, critical values of both the rapidity and the size of the wavepackets exist at which the QSLT has its minimum value. In the range of small values of the average momentum, the QSLT monotonically decreases with both rapidity and the size of the wavepackets. The decrease of QSLT in a particular range of rapidity and with other relative parameters may be of great interest in employing fast quantum communication and quantum computation.

preprint2016arXiv

Zitterbewegung, internal momentum and spin of the circular travelling wave electromagnetic electron

The study of this paper demonstrates that electron has Dirac delta like internal momentum (u,p_{θ}), going round in a circle of radius equal to half the reduced Compton wavelength of electron with tangential velocity c. The circular momentum p_{θ} and energy u emanate from circular Dirac delta type rotating monochromatic electromagnetic (EM) wave that itself travels in another circle having radius equal to the reduced Compton wavelength of electron. The phenomenon of Zitterbewegung and the spin of electron are the natural consequences of the model. The spin is associated with the internal circulating momentum of electron in terms of four component spinor, which leads to the Dirac equation linking the EM electron model with quantum mechanical theory. Our model accurately explains the experimental results of electron channelling experiment, [P. Catillon et al., Found.Phys. 38, 659 (2008)], in which the momentum resonance is observed at 161.784MeV/c corresponding to Zitterbewegung frequency of 80.874MeV/c electron beam.

preprint2014arXiv

Non-maximal Tripartite Entanglement Degradation of Dirac and Scalar fields in Non-inertial frames

The π-tangle is used to study the behavior of entanglement of a nonmaximal tripartite state of both Dirac and scalar fields in accelerated frame. For Dirac fields, the degree of degradation with acceleration of both one-tangle of accelerated observer and π-tangle, for the same initial entanglement, is different by just interchanging the values of probability amplitudes. A fraction of both one-tangles and the π-tangle always survives for any choice of acceleration and the degree of initial entanglement. For scalar field, the one-tangle of accelerated observer depends on the choice of values of probability amplitudes and it vanishes in the range of infinite acceleration, whereas for π-tangle this is not always true. The dependence of π-tangle on probability amplitudes varies with acceleration. In the lower range of acceleration, its behavior changes by switching between the values of probability amplitudes and for larger values of acceleration this dependence on probability amplitudes vanishes. Interestingly, unlike bipartite entanglement, the degradation of π-tangle against acceleration in the case of scalar fields is slower than for Dirac fields.

preprint2013arXiv

Entanglement of Open Quantum Systems in Noninertial Frames

We study the effects of decoherence on the entanglement generated by Unruh effect in accelerated frames by using various combinations of an amplitude damping channel, a phase damping channel and a depolarizing channel in the form of multilocal and collective environments. Using concurrence as entanglement quantifier, we show that the occurrence of entanglement sudden death (ESD) depends on different combinations of the channels. The ESD can be avoided under a particular configuration of the channels. We show that the channels can be used to distinguish between a moving and a stationary frame.

preprint2013arXiv

Entanglement of Tripartite States with Decoherence in Noninertial frames

The one-tangle and π-tangle are used to quantify the entanglement of a tripartite GHZ state in noninertial frames when the system interacts with a noisy environment in the form of phase damping, phase flip and bit flip channel. It is shown that the two-tangles behave as a closed system. The one-tangle and π-tangle have different behaviors in the three channel. In the case of phase damping channel, depending on the kind of coupling, the sudden death of both one-tangle and π-tangle may or may not happen. Whereas in the case of phase flip channel the sudden death cannot be avoided. The effect of decoherence may be ignored in the limit of infinite acceleration when the system interacts with a bit flip channel. Furthermore, a sudden rebirth of the one-tangle and π-tangle occur in the case of phase flip channel that may be delayed when collective coupling is switched on.

preprint2013arXiv

Generation and sudden death of entanglement in qubit-qutrit systems in depolarizing noise

The dynamics of entanglement in some hybrid qubit-qutrit systems under the influence of global, collective, local and multilocal depolarizing noise is studied. It is shown that the depolarizing noise can be used to induce entanglement. A critical point exists under every coupling of the system with the environment at which all the states are equally entangled. Furthermore, it is seen that no ESD occurs when either only the qubit is coupled to its local environment or the system is coupled to multilocal or global environment. This is an important result for various quantum information processing tasks and hence needs further investigation.

preprint2013arXiv

Noisy Relativistic Quantum Games in Noninertial Frames

The influence of noise and of Unruh effect on quantum Prisoners' dilemma is investigated both for entangled and unentangled initial states. The noise is incorporated through amplitude damping channel. For unentangled initial state, the decoherence compensates for the adverse effect of acceleration of the frame and the effect of acceleration becomes irrelevant provided the game is fully decohered. It is shown that the inertial player always out scores the noninertial player by choosing defection. For maximally entangled initially state, we show that for fully decohered case every strategy profile results in either of the two possible equilibrium outcomes. Two of the four possible strategy profiles become Pareto Optimal and Nash equilibrium and no dilemma is leftover. It is shown that other equilibrium points emerge for different region of values of decoherence parameter that are either Pareto optimal or Pareto inefficient in the quantum strategic spaces. It is shown that the Eisert et al miracle move is a special move that leads always to distinguishable results compare to other moves. We show that the dilemma like situation is resolved in favor of one player or the other.

preprint2012arXiv

Environmental influences on Quantum Monty Hall problem

We reformulate the quantum Monty Hall problem in the presence of decoherence. The decoherence destroys the fairness of the game. A new Nash equilibrium for a particular strategy profile in the presence of decoherence emerges. It is shown that in the presence of decoherence under the action of amplitude damping channel, Bob's winning probability is always higher than three-forth, irrespective of Alice's strategy, if he does not switch to the other door and always wins for a fully decohered case of the channel. Depolarizing channel damps up Bob's winning probability and gets better off if he sticks to his current selection. Phase damping channel leaves the winning probability unaffected. Unlike the classical and the quantum forms of the game, Bob's dominant strategy in the presence of decoherence is not switching.

preprint2011arXiv

Nondistillability of distillable qutrit-qutrit states under depolarizing noise

We study the effects of decoherence on some particular bipartite qutrit states under the influence of global, collective, local and multilocal depolarizing noise. We show that certain free entangled distillable qutrit density matrices become bound entangled or separable and hence convert into nondistillable density matrices in global noise. The collective noise increases the degree of entanglement of the qutrit bipartite states. Furthermore, we show that some particular local operation cannot avoid the Nondistillability of the distillable states.

preprint2011arXiv

Quantum Stackelberg duopoly in noninertial frame

We study the influence of Unruh effect on quantum Stackelberg duopoly. We show that the acceleration of noninertial frame strongly effects the payoffs of the firms. The validation of subgame perfect Nash equilibrium is limited to a particular range of acceleration of the noninertial frame. The benefit of initial state entanglement in the quantum form of the duopoly is adversely affected by the acceleration. The duopoly can become as a follower advantage only in a small region of the acceleration.

preprint2011arXiv

Relativistic Quantum Games in Noninertial Frames

We study the influence of Unruh effect on quantum non-zero sum games. In particular, we investigate the quantum Prisoners' Dilemma both for entangled and unentangled initial states and show that the acceleration of the noninertial frames disturbs the symmetry of the game. It is shown that for maximally entangled initial state, the classical strategy C (cooperation) becomes the dominant strategy. Our investigation shows that any quantum strategy does no better for any player against the classical strategies. The miracle move of Eisert et al (1999 Phys. Rev. Lett. 83 3077) is no more a superior move. We show that the dilemma like situation is resolved in favor of one player or the other.

preprint2010arXiv

A Trustworthy and well-organized data disseminating scheme for ad-hoc wsns

Wireless Sensor Networks (WSNs) generate massive amount of live data and events sensed through dispersedly deployed tiny sensors. This generated data needed to be disseminate to the sink with slight consumption of network resources. One of the ways to efficiently transmit this bulk data is gossiping. An important consideration in gossip-based dissemination protocols is to keep routing table up to date. Considering the inherent resource constrained nature of adhoc wireless sensor networks, we propose a gossip based protocol that consumes little resources. Our proposed scheme aims to keep the routing table size R as low as possible yet it ensures that the diameter is small too. We learned the performance of our proposed protocol through simulations. Results show that our proposed protocol attains major improvement in network reachability and connectivity.

preprint2010arXiv

Open Quantum Systems in Noninertial Frames

We study the effects of decoherence on the entanglement generated by Unruh effect in noninertial frames by using bit flip, phase damping and depolarizing channels. It is shown that decoherence strongly influences the initial state entanglement. The entanglement sudden death can happens irrespective of the acceleration of the noninertial frame under the action of phase flip and phase damping channels. It is investigated that an early sudden death happens for large acceleration under the depolarizing environment. Moreover, the entanglement increases for a highly decohered phase flip channel.

preprint2010arXiv

Quantum Model of Bertrand Duopoly

We present the quantum model of Bertrand duopoly and study the entanglement behavior on the profit functions of the firms. Using the concept of optimal response of each firm to the price of the opponent, we found only one Nash equilibirum point for maximally entangled initial state. The very presence of quantum entanglement in the initial state gives payoffs higher to the firms than the classical payoffs at the Nash equilibrium. As a result the dilemma like situation in the classical game is resolved.

preprint2010arXiv

Quantum Parrondo's games under decoherence

We study the effect of quantum noise on history dependent quantum Parrondo's games by taking into account different noise channels. Our calculations show that entanglement can play a crucial role in quantum Parrondo's games. It is seen that for the maximally entangled initial state in the presence of decoherence, the quantum phases strongly influence the payoffs for various sequences of the game. The effect of amplitude damping channel leads to winning payoffs. Whereas the depolarizing and phase damping channels lead to the losing payoffs. In case of amplitude damping channel, the payoffs are enhanced in the presence of decoherence for the sequence AAB. This is because the quantum phases interfere constructively which leads to the quantum enhancement of the payoffs in comparison to the undecohered case. It is also seen that the quantum phase angles damp the payoffs significantly in the presence of decoherence. Furthermore, it is seen that for multiple games of sequence AAB, under the influence of amplitude damping channel, the game still remains a winning game. However, the quantum enhancement reduces in comparison to the single game of sequence AAB because of the destructive interference of phase dependent terms. In case of depolarizing channel, the game becomes a loosing game. It is seen that for the game sequence B the game is loosing one and the behavior of sequences B and BB is similar for amplitude damping and depolarizing channels. In addition, the repeated games of A are only influenced by the amplitude damping channel and the game remains a losing game. Furthermore, it is also seen that for any sequence when played in series, the phase damping channel does not influence the game.

preprint2010arXiv

Quantum Stackelberg duopoly in the presence of correlated noise

We study the influence of entanglement and correlated noise using correlated amplitude damping, depolarizing and phase damping channels on the quantum Stackelberg duopoly. Our investigations show that under the action of amplitude damping channel a critical point exists for unentangled initial state as well, at which firms get equal payoffs. The game becomes a follower advantage game when the channel is highly decohered. Two critical points corresponding to two values of the entanglement angle are found in the presence of correlated noise. Within the range of these limits of entanglement angle, the game is follower advantage game. In case of depolarizing channel, the payoffs of the two firms are strongly influenced by the memory parameter. The presence of quantum memory ensures the existence of Nash equilibrium for the entire range of decoherence and entanglement parameters for both the channels. A local maximum in the payoffs is observed which vanishes as the channel correlation increases. Moreover, under the influence of depolarizing channel, the game is always a leader advantage game. Furthermore, it is seen that phase damping channel does not effect the outcome of the game.

preprint2009arXiv

Noisy non-transitive quantum games

We study the effect of quantum noise in 3 by 3 entangled quantum games. By considering different noisy quantum channels we analyze that how a two-player, three-strategy Rock-Scissor-Paper game is influenced by the quantum noise. We consider the winning non-transitive strategies R, S and P such as R beats S, S beats P, and P beats R. The game behaves as a noiseless game for maximum value of the quantum noise parameter. It is seen that Alice's payoff is heavily influenced by the depolarizing noise as compared to the amplitude damping noise. Depolarizing channel causes a monotonic decrease in players payoffs as we increase the amount of of quantum noise. In case of amplitude damping channel, the Alice's payoff function reaches its minimum for alpha=0.5 and is symmetrical. This means that larger values of quantum noise influence the game weakly. On the other hand, phase damping channel does not influence the game's payoff. Furthermore, the game's Nash equilibrium and non-transitive character of the game are not affected under the influence of quantum noise.

preprint2009arXiv

Quantum Monty Hall problem under decoherence

We study the effect of decoherence on quantum Monty Hall problem under the influence of amplitude damping, depolarizing and dephasing channels. It is shown that under the effect of decoherence, there is a Nash equilibrium of the game in case of depolarizing channel for Alice's quantum strategy. Where as in case of dephasing noise, the game is not influenced by the quantum channel. For amplitude damping channel, the Bob's payoffs are found symmetrical with maximum at p=0.5 against his classical strategy. However, it is worth-mentioning that in case of depolarizing channel, Bob's classical strategy remains always dominant against any choice of Alice's strategy.

Salman Khan

What is connected

Connect this record

See the researcher in context

Building this map preview

59 published item(s)

CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

DocAtlas: Multilingual Document Understanding Across 80+ Languages

Vision in adverse weather: Augmentation using CycleGANs with various object detectors for robust perception in autonomous racing

3D Vision with Transformers: A Survey

A Novel Incremental Learning Driven Instance Segmentation Framework to Recognize Highly Cluttered Instances of the Contraband Items

AVisT: A Benchmark for Visual Object Tracking in Adverse Visibility

Burst Image Restoration and Enhancement

Class-agnostic Object Detection with Multi-modal Transformer

Energy-based Latent Aligner for Incremental Learning

Guidance Through Surrogate: Towards a Generic Diagnostic Attack

Learning Enriched Features for Fast Image Restoration and Enhancement

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

On Improving Adversarial Transferability of Vision Transformers

OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning

OW-DETR: Open-world Detection Transformer

Restormer: Efficient Transformer for High-Resolution Image Restoration

ROAD: The ROad event Awareness Dataset for Autonomous Driving

Self-Supervised Video Object Segmentation via Cutout Prediction and Tagging

Self-supervised Video Transformer

Spatio-temporal Relation Modeling for Few-shot Action Recognition

Transformers in Medical Imaging: A Survey

Transformers in Remote Sensing: A Survey

Transformers in Vision: A Survey

Video Instance Segmentation via Multi-scale Spatio-temporal Split Attention Transformer

A Deep Journey into Super-resolution: A survey

A Self-supervised Approach for Adversarial Robustness

An Adaptive Random Path Selection Approach for Incremental Learning

Any-Shot Object Detection

Blended Convolution and Synthesis for Efficient Discrimination of 3D Shapes

Cascaded Structure Tensor Framework for Robust Identification of Heavily Occluded Baggage Items from X-ray Scans

CycleISP: Real Image Restoration via Improved Data Synthesis

Durocmien: A deep framework for duroc skeleton extraction in constraint environment

iTAML: An Incremental Task-Agnostic Meta-learning Approach

Learning Enriched Features for Real Image Restoration and Enhancement

Polarity Loss for Zero-shot Object Detection

Self-supervised Knowledge Distillation for Few-shot Learning

Semi-supervised Learning for Few-shot Image-to-Image Translation

Spectral-GANs for High-Resolution 3D Point-cloud Generation

Towards Partial Supervision for Generic Object Counting in Natural Scenes

Understanding More about Human and Machine Attention in Deep Neural Networks

Environment generated quantum correlations in bipartite qubit-qutrit systems

Relativistic quantum speed limit time in dephasing noise

Zitterbewegung, internal momentum and spin of the circular travelling wave electromagnetic electron

Non-maximal Tripartite Entanglement Degradation of Dirac and Scalar fields in Non-inertial frames

Entanglement of Open Quantum Systems in Noninertial Frames

Entanglement of Tripartite States with Decoherence in Noninertial frames

Generation and sudden death of entanglement in qubit-qutrit systems in depolarizing noise

Noisy Relativistic Quantum Games in Noninertial Frames

Environmental influences on Quantum Monty Hall problem

Nondistillability of distillable qutrit-qutrit states under depolarizing noise

Quantum Stackelberg duopoly in noninertial frame

Relativistic Quantum Games in Noninertial Frames

A Trustworthy and well-organized data disseminating scheme for ad-hoc wsns

Open Quantum Systems in Noninertial Frames

Quantum Model of Bertrand Duopoly

Quantum Parrondo's games under decoherence

Quantum Stackelberg duopoly in the presence of correlated noise

Noisy non-transitive quantum games

Quantum Monty Hall problem under decoherence