Source author record

Arif Mahmood

Arif Mahmood appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Cryptography and Security eess.IV eess.SP Machine Learning

Catalog footprint

What is connected

13works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Data Augmentation for Graph Data: Recent Advancements

Graph Neural Network (GNNs) based methods have recently become a popular tool to deal with graph data because of their ability to incorporate structural information. The only hurdle in the performance of GNNs is the lack of labeled data. Data Augmentation techniques for images and text data can not be used for graph data because of the complex and non-euclidean structure of graph data. This gap has forced researchers to shift their focus towards the development of data augmentation techniques for graph data. Most of the proposed Graph Data Augmentation (GDA) techniques are task-specific. In this paper, we survey the existing GDA techniques based on different graph tasks. This survey not only provides a reference to the research community of GDA but also provides the necessary information to the researchers of other domains.

preprint2022arXiv

Generative Cooperative Learning for Unsupervised Video Anomaly Detection

Video anomaly detection is well investigated in weakly-supervised and one-class classification (OCC) settings. However, unsupervised video anomaly detection methods are quite sparse, likely because anomalies are less frequent in occurrence and usually not well-defined, which when coupled with the absence of ground truth supervision, could adversely affect the performance of the learning algorithms. This problem is challenging yet rewarding as it can completely eradicate the costs of obtaining laborious annotations and enable such systems to be deployed without human intervention. To this end, we propose a novel unsupervised Generative Cooperative Learning (GCL) approach for video anomaly detection that exploits the low frequency of anomalies towards building a cross-supervision between a generator and a discriminator. In essence, both networks get trained in a cooperative fashion, thereby allowing unsupervised learning. We conduct extensive experiments on two large-scale video anomaly detection datasets, UCF crime, and ShanghaiTech. Consistent improvement over the existing state-of-the-art unsupervised and OCC methods corroborate the effectiveness of our approach.

preprint2022arXiv

Lightweight Encoder-Decoder Architecture for Foot Ulcer Segmentation

Continuous monitoring of foot ulcer healing is needed to ensure the efficacy of a given treatment and to avoid any possibility of deterioration. Foot ulcer segmentation is an essential step in wound diagnosis. We developed a model that is similar in spirit to the well-established encoder-decoder and residual convolution neural networks. Our model includes a residual connection along with a channel and spatial attention integrated within each convolution block. A simple patch-based approach for model training, test time augmentations, and majority voting on the obtained predictions resulted in superior performance. Our model did not leverage any readily available backbone architecture, pre-training on a similar external dataset, or any of the transfer learning techniques. The total number of network parameters being around 5 million made it a significantly lightweight model as compared with the available state-of-the-art models used for the foot ulcer segmentation task. Our experiments presented results at the patch-level and image-level. Applied on publicly available Foot Ulcer Segmentation (FUSeg) Challenge dataset from MICCAI 2021, our model achieved state-of-the-art image-level performance of 88.22% in terms of Dice similarity score and ranked second in the official challenge leaderboard. We also showed an extremely simple solution that could be compared against the more advanced architectures.

preprint2022arXiv

Quantification of Occlusion Handling Capability of a 3D Human Pose Estimation Framework

3D human pose estimation using monocular images is an important yet challenging task. Existing 3D pose detection methods exhibit excellent performance under normal conditions however their performance may degrade due to occlusion. Recently some occlusion aware methods have also been proposed, however, the occlusion handling capability of these networks has not yet been thoroughly investigated. In the current work, we propose an occlusion-guided 3D human pose estimation framework and quantify its occlusion handling capability by using different protocols. The proposed method estimates more accurate 3D human poses using 2D skeletons with missing joints as input. Missing joints are handled by introducing occlusion guidance that provides extra information about the absence or presence of a joint. Temporal information has also been exploited to better estimate the missing joints. A large number of experiments are performed for the quantification of occlusion handling capability of the proposed method on three publicly available datasets in various settings including random missing joints, fixed body parts missing, and complete frames missing, using mean per joint position error criterion. In addition to that, the quality of the predicted 3D poses is also evaluated using action classification performance as a criterion. 3D poses estimated by the proposed method achieved significantly improved action recognition performance in the presence of missing joints. Our experiments demonstrate the effectiveness of the proposed framework for handling the missing joints as well as quantification of the occlusion handling capability of the deep neural networks.

preprint2022arXiv

Reconstruction of Time-varying Graph Signals via Sobolev Smoothness

Graph Signal Processing (GSP) is an emerging research field that extends the concepts of digital signal processing to graphs. GSP has numerous applications in different areas such as sensor networks, machine learning, and image processing. The sampling and reconstruction of static graph signals have played a central role in GSP. However, many real-world graph signals are inherently time-varying and the smoothness of the temporal differences of such graph signals may be used as a prior assumption. In the current work, we assume that the temporal differences of graph signals are smooth, and we introduce a novel algorithm based on the extension of a Sobolev smoothness function for the reconstruction of time-varying graph signals from discrete samples. We explore some theoretical aspects of the convergence rate of our Time-varying Graph signal Reconstruction via Sobolev Smoothness (GraphTRSS) algorithm by studying the condition number of the Hessian associated with our optimization problem. Our algorithm has the advantage of converging faster than other methods that are based on Laplacian operators without requiring expensive eigenvalue decomposition or matrix inversions. The proposed GraphTRSS is evaluated on several datasets including two COVID-19 datasets and it has outperformed many existing state-of-the-art methods for time-varying graph signal reconstruction. GraphTRSS has also shown excellent performance on two environmental datasets for the recovery of particulate matter and sea surface temperature signals.

preprint2021arXiv

Fake Visual Content Detection Using Two-Stream Convolutional Neural Networks

Rapid progress in adversarial learning has enabled the generation of realistic-looking fake visual content. To distinguish between fake and real visual content, several detection techniques have been proposed. The performance of most of these techniques however drops off significantly if the test and the training data are sampled from different distributions. This motivates efforts towards improving the generalization of fake detectors. Since current fake content generation techniques do not accurately model the frequency spectrum of the natural images, we observe that the frequency spectrum of the fake visual data contains discriminative characteristics that can be used to detect fake content. We also observe that the information captured in the frequency spectrum is different from that of the spatial domain. Using these insights, we propose to complement frequency and spatial domain features using a two-stream convolutional neural network architecture called TwoStreamNet. We demonstrate the improved generalization of the proposed two-stream network to several unseen generation architectures, datasets, and techniques. The proposed detector has demonstrated significant performance improvement compared to the current state-of-the-art fake content detectors and fusing the frequency and spatial domain streams has also improved generalization of the detector.

preprint2021arXiv

Leveraging Orientation for Weakly Supervised Object Detection with Application to Firearm Localization

Automatic detection of firearms is important for enhancing the security and safety of people, however, it is a challenging task owing to the wide variations in shape, size, and appearance of firearms. Also, most of the generic object detectors process axis-aligned rectangular areas though, a thin and long rifle may actually cover only a small percentage of that area and the rest may contain irrelevant details suppressing the required object signatures. To handle these challenges, we propose a weakly supervised Orientation Aware Object Detection (OAOD) algorithm which learns to detect oriented object bounding boxes (OBB) while using AxisAligned Bounding Boxes (AABB) for training. The proposed OAOD is different from the existing oriented object detectors which strictly require OBB during training which may not always be present. The goal of training on AABB and detection of OBB is achieved by employing a multistage scheme, with Stage-1 predicting the AABB and Stage-2 predicting OBB. In-between the two stages, the oriented proposal generation module along with the object aligned RoI pooling is designed to extract features based on the predicted orientation and to make these features orientation invariant. A diverse and challenging dataset consisting of eleven thousand images is also proposed for firearm detection which is manually annotated for firearm classification and localization. The proposed ITU Firearm dataset (ITUF) contains a wide range of guns and rifles. The OAOD algorithm is evaluated on the ITUF dataset and compared with current state-of-the-art object detectors, including fully supervised oriented object detectors. OAOD has outperformed both types of object detectors with a significant margin. The experimental results (mAP: 88.3 on AABB & mAP: 77.5 on OBB) demonstrate the effectiveness of the proposed algorithm for firearm detection.

preprint2020arXiv

Localizing Firearm Carriers by Identifying Human-Object Pairs

Visual identification of gunmen in a crowd is a challenging problem, that requires resolving the association of a person with an object (firearm). We present a novel approach to address this problem, by defining human-object interaction (and non-interaction) bounding boxes. In a given image, human and firearms are separately detected. Each detected human is paired with each detected firearm, allowing us to create a paired bounding box that contains both object and the human. A network is trained to classify these paired-bounding-boxes into human carrying the identified firearm or not. Extensive experiments were performed to evaluate effectiveness of the algorithm, including exploiting full pose of the human, hand key-points, and their association with the firearm. The knowledge of spatially localized features is key to success of our method by using multi-size proposals with adaptive average pooling. We have also extended a previously firearm detection dataset, by adding more images and tagging in extended dataset the human-firearm pairs (including bounding boxes for firearms and gunmen). The experimental results ($AP_{hold} = 78.5$) demonstrate effectiveness of the proposed method.

preprint2015arXiv

Histogram of Oriented Principal Components for Cross-View Action Recognition

Existing techniques for 3D action recognition are sensitive to viewpoint variations because they extract features from depth images which are viewpoint dependent. In contrast, we directly process pointclouds for cross-view action recognition from unknown and unseen views. We propose the Histogram of Oriented Principal Components (HOPC) descriptor that is robust to noise, viewpoint, scale and action speed variations. At a 3D point, HOPC is computed by projecting the three scaled eigenvectors of the pointcloud within its local spatio-temporal support volume onto the vertices of a regular dodecahedron. HOPC is also used for the detection of Spatio-Temporal Keypoints (STK) in 3D pointcloud sequences so that view-invariant STK descriptors (or Local HOPC descriptors) at these key locations only are used for action recognition. We also propose a global descriptor computed from the normalized spatio-temporal distribution of STKs in 4-D, which we refer to as STK-D. We have evaluated the performance of our proposed descriptors against nine existing techniques on two cross-view and three single-view human action recognition datasets. The Experimental results show that our techniques provide significant improvement over state-of-the-art methods.

preprint2014arXiv

Action Classification with Locality-constrained Linear Coding

We propose an action classification algorithm which uses Locality-constrained Linear Coding (LLC) to capture discriminative information of human body variations in each spatiotemporal subsequence of a video sequence. Our proposed method divides the input video into equally spaced overlapping spatiotemporal subsequences, each of which is decomposed into blocks and then cells. We use the Histogram of Oriented Gradient (HOG3D) feature to encode the information in each cell. We justify the use of LLC for encoding the block descriptor by demonstrating its superiority over Sparse Coding (SC). Our sequence descriptor is obtained via a logistic regression classifier with L2 regularization. We evaluate and compare our algorithm with ten state-of-the-art algorithms on five benchmark datasets. Experimental results show that, on average, our algorithm gives better accuracy than these ten algorithms.

preprint2014arXiv

HOPC: Histogram of Oriented Principal Components of 3D Pointclouds for Action Recognition

Existing techniques for 3D action recognition are sensitive to viewpoint variations because they extract features from depth images which change significantly with viewpoint. In contrast, we directly process the pointclouds and propose a new technique for action recognition which is more robust to noise, action speed and viewpoint variations. Our technique consists of a novel descriptor and keypoint detection algorithm. The proposed descriptor is extracted at a point by encoding the Histogram of Oriented Principal Components (HOPC) within an adaptive spatio-temporal support volume around that point. Based on this descriptor, we present a novel method to detect Spatio-Temporal Key-Points (STKPs) in 3D pointcloud sequences. Experimental results show that the proposed descriptor and STKP detector outperform state-of-the-art algorithms on three benchmark human activity datasets. We also introduce a new multiview public dataset and show the robustness of our proposed method to viewpoint variations.

preprint2014arXiv

Optimizing Auto-correlation for Fast Target Search in Large Search Space

In remote sensing image-blurring is induced by many sources such as atmospheric scatter, optical aberration, spatial and temporal sensor integration. The natural blurring can be exploited to speed up target search by fast template matching. In this paper, we synthetically induce additional non-uniform blurring to further increase the speed of the matching process. To avoid loss of accuracy, the amount of synthetic blurring is varied spatially over the image according to the underlying content. We extend transitive algorithm for fast template matching by incorporating controlled image blur. To this end we propose an Efficient Group Size (EGS) algorithm which minimizes the number of similarity computations for a particular search image. A larger efficient group size guarantees less computations and more speedup. EGS algorithm is used as a component in our proposed Optimizing auto-correlation (OptA) algorithm. In OptA a search image is iteratively non-uniformly blurred while ensuring no accuracy degradation at any image location. In each iteration efficient group size and overall computations are estimated by using the proposed EGS algorithm. The OptA algorithm stops when the number of computations cannot be further decreased without accuracy degradation. The proposed algorithm is compared with six existing state of the art exhaustive accuracy techniques using correlation coefficient as the similarity measure. Experiments on satellite and aerial image datasets demonstrate the effectiveness of the proposed algorithm.

preprint2014arXiv

Semi-supervised Spectral Clustering for Classification

We propose a Classification Via Clustering (CVC) algorithm which enables existing clustering methods to be efficiently employed in classification problems. In CVC, training and test data are co-clustered and class-cluster distributions are used to find the label of the test data. To determine an efficient number of clusters, a Semi-supervised Hierarchical Clustering (SHC) algorithm is proposed. Clusters are obtained by hierarchically applying two-way NCut by using signs of the Fiedler vector of the normalized graph Laplacian. To this end, a Direct Fiedler Vector Computation algorithm is proposed. The graph cut is based on the data structure and does not consider labels. Labels are used only to define the stopping criterion for graph cut. We propose clustering to be performed on the Grassmannian manifolds facilitating the formation of spectral ensembles. The proposed algorithm outperformed state-of-the-art image-set classification algorithms on five standard datasets.

Arif Mahmood

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Data Augmentation for Graph Data: Recent Advancements

Generative Cooperative Learning for Unsupervised Video Anomaly Detection

Lightweight Encoder-Decoder Architecture for Foot Ulcer Segmentation

Quantification of Occlusion Handling Capability of a 3D Human Pose Estimation Framework

Reconstruction of Time-varying Graph Signals via Sobolev Smoothness

Fake Visual Content Detection Using Two-Stream Convolutional Neural Networks

Leveraging Orientation for Weakly Supervised Object Detection with Application to Firearm Localization

Localizing Firearm Carriers by Identifying Human-Object Pairs

Histogram of Oriented Principal Components for Cross-View Action Recognition

Action Classification with Locality-constrained Linear Coding

HOPC: Histogram of Oriented Principal Components of 3D Pointclouds for Action Recognition

Optimizing Auto-correlation for Fast Target Search in Large Search Space

Semi-supervised Spectral Clustering for Classification