Source author record

Arnav Bhavsar

Arnav Bhavsar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Machine Learning Neurons and Cognition

Catalog footprint

What is connected

9works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

YOLOBirDrone: Dataset for Bird vs Drone Detection and Classification and a YOLO based enhanced learning architecture

The use of aerial drones for commercial and defense applications has benefited in many ways and is therefore utilized in several different application domains. However, they are also increasingly used for targeted attacks, posing a significant safety challenge and necessitating the development of drone detection systems. Vision-based drone detection systems currently have an accuracy limitation and struggle to distinguish between drones and birds, particularly when the birds are small in size. This research work proposes a novel YOLOBirDrone architecture that improves the detection and classification accuracy of birds and drones. YOLOBirDrone has different components, including an adaptive and extended layer aggregation (AELAN), a multi-scale progressive dual attention module (MPDA), and a reverse MPDA (RMPDA) to preserve shape information and enrich features with local and global spatial and channel information. A large-scale dataset, BirDrone, is also introduced in this article, which includes small and challenging objects for robust aerial object identification. Experimental results demonstrate an improvement in performance metrics through the proposed YOLOBirDrone architecture compared to other state-of-the-art algorithms, with detection accuracy reaching approximately 85% across various scenarios.

preprint2022arXiv

Image Forgery Detection with Interpretability

In this work, we present a learning based method focusing on the convolutional neural network (CNN) architecture to detect these forgeries. We consider the detection of both copy-move forgeries and inpainting based forgeries. For these, we synthesize our own large dataset. In addition to classification, the focus is also on interpretability of the forgery detection. As the CNN classification yields the image-level label, it is important to understand if forged region has indeed contributed to the classification. For this purpose, we demonstrate using the Grad-CAM heatmap, that in various correctly classified examples, that the forged region is indeed the region contributing to the classification. Interestingly, this is also applicable for small forged regions, as is depicted in our results. Such an analysis can also help in establishing the reliability of the classification.

preprint2022arXiv

Stain Normalized Breast Histopathology Image Recognition using Convolutional Neural Networks for Cancer Detection

Computer assisted diagnosis in digital pathology is becoming ubiquitous as it can provide more efficient and objective healthcare diagnostics. Recent advances have shown that the convolutional Neural Network (CNN) architectures, a well-established deep learning paradigm, can be used to design a Computer Aided Diagnostic (CAD) System for breast cancer detection. However, the challenges due to stain variability and the effect of stain normalization with such deep learning frameworks are yet to be well explored. Moreover, performance analysis with arguably more efficient network models, which may be important for high throughput screening, is also not well explored.To address this challenge, we consider some contemporary CNN models for binary classification of breast histopathology images that involves (1) the data preprocessing with stain normalized images using an adaptive colour deconvolution (ACD) based color normalization algorithm to handle the stain variabilities; and (2) applying transfer learning based training of some arguably more efficient CNN models, namely Visual Geometry Group Network (VGG16), MobileNet and EfficientNet. We have validated the trained CNN networks on a publicly available BreaKHis dataset, for 200x and 400x magnified histopathology images. The experimental analysis shows that pretrained networks in most cases yield better quality results on data augmented breast histopathology images with stain normalization, than the case without stain normalization. Further, we evaluated the performance and efficiency of popular lightweight networks using stain normalized images and found that EfficientNet outperforms VGG16 and MobileNet in terms of test accuracy and F1 Score. We observed that efficiency in terms of test time is better in EfficientNet than other networks; VGG Net, MobileNet, without much drop in the classification accuracy.

preprint2021arXiv

MHATC: Autism Spectrum Disorder identification utilizing multi-head attention encoder along with temporal consolidation modules

Resting-state fMRI is commonly used for diagnosing Autism Spectrum Disorder (ASD) by using network-based functional connectivity. It has been shown that ASD is associated with brain regions and their inter-connections. However, discriminating based on connectivity patterns among imaging data of the control population and that of ASD patients' brains is a non-trivial task. In order to tackle said classification task, we propose a novel deep learning architecture (MHATC) consisting of multi-head attention and temporal consolidation modules for classifying an individual as a patient of ASD. The devised architecture results from an in-depth analysis of the limitations of current deep neural network solutions for similar applications. Our approach is not only robust but computationally efficient, which can allow its adoption in a variety of other research and clinical settings.

preprint2020arXiv

Detecting Deepfakes with Metric Learning

With the arrival of several face-swapping applications such as FaceApp, SnapChat, MixBooth, FaceBlender and many more, the authenticity of digital media content is hanging on a very loose thread. On social media platforms, videos are widely circulated often at a high compression factor. In this work, we analyze several deep learning approaches in the context of deepfakes classification in high compression scenario and demonstrate that a proposed approach based on metric learning can be very effective in performing such a classification. Using less number of frames per video to assess its realism, the metric learning approach using a triplet network architecture proves to be fruitful. It learns to enhance the feature space distance between the cluster of real and fake videos embedding vectors. We validated our approaches on two datasets to analyze the behavior in different environments. We achieved a state-of-the-art AUC score of 99.2% on the Celeb-DF dataset and accuracy of 90.71% on a highly compressed Neural Texture dataset. Our approach is especially helpful on social media platforms where data compression is inevitable.

preprint2020arXiv

Semantic Features Aided Multi-Scale Reconstruction of Inter-Modality Magnetic Resonance Images

Long acquisition time (AQT) due to series acquisition of multi-modality MR images (especially T2 weighted images (T2WI) with longer AQT), though beneficial for disease diagnosis, is practically undesirable. We propose a novel deep network based solution to reconstruct T2W images from T1W images (T1WI) using an encoder-decoder architecture. The proposed learning is aided with semantic features by using multi-channel input with intensity values and gradient of image in two orthogonal directions. A reconstruction module (RM) augmenting the network along with a domain adaptation module (DAM) which is an encoder-decoder model built-in with sharp bottleneck module (SBM) is trained via modular training. The proposed network significantly reduces the total AQT with negligible qualitative artifacts and quantitative loss (reconstructs one volume in approximately 1 second). The testing is done on publicly available dataset with real MR images, and the proposed network shows (approximately 1dB) increase in PSNR over SOTA.

preprint2016arXiv

Shape Estimation from Defocus Cue for Microscopy Images via Belief Propagation

In recent years, the usefulness of 3D shape estimation is being realized in microscopic or close-range imaging, as the 3D information can further be used in various applications. Due to limited depth of field at such small distances, the defocus blur induced in images can provide information about the 3D shape of the object. The task of `shape from defocus' (SFD), involves the problem of estimating good quality 3D shape estimates from images with depth-dependent defocus blur. While the research area of SFD is quite well-established, the approaches have largely demonstrated results on objects with bulk/coarse shape variation. However, in many cases, objects studied under microscopes often involve fine/detailed structures, which have not been explicitly considered in most methods. In addition, given that, in recent years, large data volumes are typically associated with microscopy related applications, it is also important for such SFD methods to be efficient. In this work, we provide an indication of the usefulness of the Belief Propagation (BP) approach in addressing these concerns for SFD. BP has been known to be an efficient combinatorial optimization approach, and has been empirically demonstrated to yield good quality solutions in low-level vision problems such as image restoration, stereo disparity estimation etc. For exploiting the efficiency of BP in SFD, we assume local space-invariance of the defocus blur, which enables the application of BP in a straightforward manner. Even with such an assumption, the ability of BP to provide good quality solutions while using non-convex priors, reflects in yielding plausible shape estimates in presence of fine structures on the objects under microscopy imaging.

preprint2012arXiv

Analysis of Magnification in Depth from Defocus

In depth from defocus (DFD), when images are captured with different camera parameters, a relative magnification is induced between them. Image warping is a simpler solution to account for magnification than seemingly more accurate optical approaches. This work is an investigation into the effects of magnification on the accuracy of DFD. We comment on issues regarding scaling effect on relative blur computation. We statistically analyze accountability of scale factor, commenting on the bias and efficiency of the estimator that does not consider scale. We also discuss the effect of interpolation errors on blur estimation in a warping based solution to handle magnification and carry out experimental analysis to comment on the blur estimation accuracy.

preprint2012arXiv

Resolution Enhancement of Range Images via Color-Image Segmentation

We report a method for super-resolution of range images. Our approach leverages the interpretation of LR image as sparse samples on the HR grid. Based on this interpretation, we demonstrate that our recently reported approach, which reconstructs dense range images from sparse range data by exploiting a registered colour image, can be applied for the task of resolution enhancement of range images. Our method only uses a single colour image in addition to the range observation in the super-resolution process. Using the proposed approach, we demonstrate super-resolution results for large factors (e.g. 4) with good localization accuracy.

Arnav Bhavsar

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

YOLOBirDrone: Dataset for Bird vs Drone Detection and Classification and a YOLO based enhanced learning architecture

Image Forgery Detection with Interpretability

Stain Normalized Breast Histopathology Image Recognition using Convolutional Neural Networks for Cancer Detection

MHATC: Autism Spectrum Disorder identification utilizing multi-head attention encoder along with temporal consolidation modules

Detecting Deepfakes with Metric Learning

Semantic Features Aided Multi-Scale Reconstruction of Inter-Modality Magnetic Resonance Images

Shape Estimation from Defocus Cue for Microscopy Images via Belief Propagation

Analysis of Magnification in Depth from Defocus

Resolution Enhancement of Range Images via Color-Image Segmentation