Source author record

Arun Ross

Arun Ross appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning eess.AS Artificial Intelligence Cryptography and Security Sound eess.IV Human-Computer Interaction

Catalog footprint

What is connected

15works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Can GAN-induced Attribute Manipulations Impact Face Recognition?

Impact due to demographic factors such as age, sex, race, etc., has been studied extensively in automated face recognition systems. However, the impact of \textit{digitally modified} demographic and facial attributes on face recognition is relatively under-explored. In this work, we study the effect of attribute manipulations induced via generative adversarial networks (GANs) on face recognition performance. We conduct experiments on the CelebA dataset by intentionally modifying thirteen attributes using AttGAN and STGAN and evaluating their impact on two deep learning-based face verification methods, ArcFace and VGGFace. Our findings indicate that some attribute manipulations involving eyeglasses and digital alteration of sex cues can significantly impair face recognition by up to 73% and need further analysis.

preprint2022arXiv

Complex-valued Iris Recognition Network

In this work, we design a fully complex-valued neural network for the task of iris recognition. Unlike the problem of general object recognition, where real-valued neural networks can be used to extract pertinent features, iris recognition depends on the extraction of both phase and magnitude information from the input iris texture in order to better represent its biometric content. This necessitates the extraction and processing of phase information that cannot be effectively handled by a real-valued neural network. In this regard, we design a fully complex-valued neural network that can better capture the multi-scale, multi-resolution, and multi-orientation phase and amplitude features of the iris texture. We show a strong correspondence of the proposed complex-valued iris recognition network with Gabor wavelets that are used to generate the classical IrisCode; however, the proposed method enables a new capability of automatic complex-valued feature learning that is tailored for iris recognition. We conduct experiments on three benchmark datasets - ND-CrossSensor-2013, CASIA-Iris-Thousand and UBIRIS.v2 - and show the benefit of the proposed network for the task of iris recognition. We exploit visualization schemes to convey how the complex-valued network, when compared to standard real-valued networks, extracts fundamentally different features from the iris texture.

preprint2022arXiv

DeepVOX: Discovering Features from Raw Audio for Speaker Recognition in Non-ideal Audio Signals

Automatic speaker recognition algorithms typically use pre-defined filterbanks, such as Mel-Frequency and Gammatone filterbanks, for characterizing speech audio. However, it has been observed that the features extracted using these filterbanks are not resilient to diverse audio degradations. In this work, we propose a deep learning-based technique to deduce the filterbank design from vast amounts of speech audio. The purpose of such a filterbank is to extract features robust to non-ideal audio conditions, such as degraded, short duration, and multi-lingual speech. To this effect, a 1D convolutional neural network is designed to learn a time-domain filterbank called DeepVOX directly from raw speech audio. Secondly, an adaptive triplet mining technique is developed to efficiently mine the data samples best suited to train the filterbank. Thirdly, a detailed ablation study of the DeepVOX filterbanks reveals the presence of both vocal source and vocal tract characteristics in the extracted features. Experimental results on VOXCeleb2, NIST SRE 2008, 2010 and 2018, and Fisher speech datasets demonstrate the efficacy of the DeepVOX features across a variety of degraded, short duration, and multi-lingual speech. The DeepVOX features also shown to improve the performance of existing speaker recognition algorithms, such as the xVector-PLDA and the iVector-PLDA.

preprint2022arXiv

Facial De-morphing: Extracting Component Faces from a Single Morph

A face morph is created by strategically combining two or more face images corresponding to multiple identities. The intention is for the morphed image to match with multiple identities. Current morph attack detection strategies can detect morphs but cannot recover the images or identities used in creating them. The task of deducing the individual face images from a morphed face image is known as \textit{de-morphing}. Existing work in de-morphing assume the availability of a reference image pertaining to one identity in order to recover the image of the accomplice - i.e., the other identity. In this work, we propose a novel de-morphing method that can recover images of both identities simultaneously from a single morphed face image without needing a reference image or prior information about the morphing process. We propose a generative adversarial network that achieves single image-based de-morphing with a surprisingly high degree of visual realism and biometric similarity with the original face images. We demonstrate the performance of our method on landmark-based morphs and generative model-based morphs with promising results.

preprint2022arXiv

HEFT: Homomorphically Encrypted Fusion of Biometric Templates

This paper proposes a non-interactive end-to-end solution for secure fusion and matching of biometric templates using fully homomorphic encryption (FHE). Given a pair of encrypted feature vectors, we perform the following ciphertext operations, i) feature concatenation, ii) fusion and dimensionality reduction through a learned linear projection, iii) scale normalization to unit $\ell_2$-norm, and iv) match score computation. Our method, dubbed HEFT (Homomorphically Encrypted Fusion of biometric Templates), is custom-designed to overcome the unique constraint imposed by FHE, namely the lack of support for non-arithmetic operations. From an inference perspective, we systematically explore different data packing schemes for computationally efficient linear projection and introduce a polynomial approximation for scale normalization. From a training perspective, we introduce an FHE-aware algorithm for learning the linear projection matrix to mitigate errors induced by approximate normalization. Experimental evaluation for template fusion and matching of face and voice biometrics shows that HEFT (i) improves biometric verification performance by 11.07% and 9.58% AUROC compared to the respective unibiometric representations while compressing the feature vectors by a factor of 16 (512D to 32D), and (ii) fuses a pair of encrypted feature vectors and computes its match score against a gallery of size 1024 in 884 ms. Code and data are available at https://github.com/human-analysis/encrypted-biometric-fusion

preprint2022arXiv

Periocular Biometrics and its Relevance to Partially Masked Faces: A Survey

The performance of face recognition systems can be negatively impacted in the presence of masks and other types of facial coverings that have become prevalent due to the COVID-19 pandemic. In such cases, the periocular region of the human face becomes an important biometric cue. In this article, we present a detailed review of periocular biometrics. We first examine the various face and periocular techniques specially designed to recognize humans wearing a face mask. Then, we review different aspects of periocular biometrics: (a) the anatomical cues present in the periocular region useful for recognition, (b) the various feature extraction and matching techniques developed, (c) recognition across different spectra, (d) fusion with other biometric modalities (face or iris), (e) recognition on mobile devices, (f) its usefulness in other applications, (g) periocular datasets, and (h) competitions organized for evaluating the efficacy of this biometric modality. Finally, we discuss various challenges and future directions in the field of periocular biometrics.

preprint2022arXiv

The State of Aerial Surveillance: A Survey

The rapid emergence of airborne platforms and imaging sensors are enabling new forms of aerial surveillance due to their unprecedented advantages in scale, mobility, deployment and covert observation capabilities. This paper provides a comprehensive overview of human-centric aerial surveillance tasks from a computer vision and pattern recognition perspective. It aims to provide readers with an in-depth systematic review and technical analysis of the current state of aerial surveillance tasks using drones, UAVs and other airborne platforms. The main object of interest is humans, where single or multiple subjects are to be detected, identified, tracked, re-identified and have their behavior analyzed. More specifically, for each of these four tasks, we first discuss unique challenges in performing these tasks in an aerial setting compared to a ground-based setting. We then review and analyze the aerial datasets publicly available for each task, and delve deep into the approaches in the aerial literature and investigate how they presently address the aerial challenges. We conclude the paper with discussion on the missing gaps and open research questions to inform future research avenues.

preprint2022arXiv

Trust in AI and Its Role in the Acceptance of AI Technologies

As AI-enhanced technologies become common in a variety of domains, there is an increasing need to define and examine the trust that users have in such technologies. Given the progress in the development of AI, a correspondingly sophisticated understanding of trust in the technology is required. This paper addresses this need by explaining the role of trust on the intention to use AI technologies. Study 1 examined the role of trust in the use of AI voice assistants based on survey responses from college students. A path analysis confirmed that trust had a significant effect on the intention to use AI, which operated through perceived usefulness and participants' attitude toward voice assistants. In study 2, using data from a representative sample of the U.S. population, different dimensions of trust were examined using exploratory factor analysis, which yielded two dimensions: human-like trust and functionality trust. The results of the path analyses from Study 1 were replicated in Study 2, confirming the indirect effect of trust and the effects of perceived usefulness, ease of use, and attitude on intention to use. Further, both dimensions of trust shared a similar pattern of effects within the model, with functionality-related trust exhibiting a greater total impact on usage intention than human-like trust. Overall, the role of trust in the acceptance of AI technologies was significant across both studies. This research contributes to the advancement and application of the TAM in AI-related applications and offers a multidimensional measure of trust that can be utilized in the future study of trustworthy AI.

preprint2021arXiv

DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech Synthesis

Automatic speaker recognition algorithms typically characterize speech audio using short-term spectral features that encode the physiological and anatomical aspects of speech production. Such algorithms do not fully capitalize on speaker-dependent characteristics present in behavioral speech features. In this work, we propose a prosody encoding network called DeepTalk for extracting vocal style features directly from raw audio data. The DeepTalk method outperforms several state-of-the-art speaker recognition systems across multiple challenging datasets. The speaker recognition performance is further improved by combining DeepTalk with a state-of-the-art physiological speech feature-based speaker recognition system. We also integrate DeepTalk into a current state-of-the-art speech synthesizer to generate synthetic speech. A detailed analysis of the synthetic speech shows that the DeepTalk captures F0 contours essential for vocal style modeling. Furthermore, DeepTalk-based synthetic speech is shown to be almost indistinguishable from real speech in the context of speaker recognition.

preprint2021arXiv

One-shot Representational Learning for Joint Biometric and Device Authentication

In this work, we propose a method to simultaneously perform (i) biometric recognition (i.e., identify the individual), and (ii) device recognition, (i.e., identify the device) from a single biometric image, say, a face image, using a one-shot schema. Such a joint recognition scheme can be useful in devices such as smartphones for enhancing security as well as privacy. We propose to automatically learn a joint representation that encapsulates both biometric-specific and sensor-specific features. We evaluate the proposed approach using iris, face and periocular images acquired using near-infrared iris sensors and smartphone cameras. Experiments conducted using 14,451 images from 15 sensors resulted in a rank-1 identification accuracy of upto 99.81% and a verification accuracy of upto 100% at a false match rate of 1%.

preprint2020arXiv

D-NetPAD: An Explainable and Interpretable Iris Presentation Attack Detector

An iris recognition system is vulnerable to presentation attacks, or PAs, where an adversary presents artifacts such as printed eyes, plastic eyes, or cosmetic contact lenses to circumvent the system. In this work, we propose an effective and robust iris PA detector called D-NetPAD based on the DenseNet convolutional neural network architecture. It demonstrates generalizability across PA artifacts, sensors and datasets. Experiments conducted on a proprietary dataset and a publicly available dataset (LivDet-2017) substantiate the effectiveness of the proposed method for iris PA detection. The proposed method results in a true detection rate of 98.58\% at a false detection rate of 0.2\% on the proprietary dataset and outperfoms state-of-the-art methods on the LivDet-2017 dataset. We visualize intermediate feature distributions and fixation heatmaps using t-SNE plots and Grad-CAM, respectively, in order to explain the performance of D-NetPAD. Further, we conduct a frequency analysis to explain the nature of features being extracted by the network. The source code and trained model are available at https://github.com/iPRoBe-lab/D-NetPAD.

preprint2020arXiv

Face Phylogeny Tree Using Basis Functions

Photometric transformations, such as brightness and contrast adjustment, can be applied to a face image repeatedly creating a set of near-duplicate images. Identifying the original image from a set of such near-duplicates and deducing the relationship between them are important in the context of digital image forensics. This is commonly done by generating an image phylogeny tree \textemdash \hspace{0.08cm} a hierarchical structure depicting the relationship between a set of near-duplicate images. In this work, we utilize three different families of basis functions to model pairwise relationships between near-duplicate images. The basis functions used in this work are orthogonal polynomials, wavelet basis functions and radial basis functions. We perform extensive experiments to assess the performance of the proposed method across three different modalities, namely, face, fingerprint and iris images; across different image phylogeny tree configurations; and across different types of photometric transformations. We also utilize the same basis functions to model geometric transformations and deep-learning based transformations. We also perform extensive analysis of each basis function with respect to its ability to model arbitrary transformations and to distinguish between the original and the transformed images. Finally, we utilize the concept of approximate von Neumann graph entropy to explain the success and failure cases of the proposed IPT generation algorithm. Experiments indicate that the proposed algorithm generalizes well across different scenarios thereby suggesting the merits of using basis functions to model the relationship between photometrically and geometrically modified images.

preprint2020arXiv

Iris Liveness Detection Competition (LivDet-Iris) -- The 2020 Edition

Launched in 2013, LivDet-Iris is an international competition series open to academia and industry with the aim to assess and report advances in iris Presentation Attack Detection (PAD). This paper presents results from the fourth competition of the series: LivDet-Iris 2020. This year's competition introduced several novel elements: (a) incorporated new types of attacks (samples displayed on a screen, cadaver eyes and prosthetic eyes), (b) initiated LivDet-Iris as an on-going effort, with a testing protocol available now to everyone via the Biometrics Evaluation and Testing (BEAT)(https://www.idiap.ch/software/beat/) open-source platform to facilitate reproducibility and benchmarking of new algorithms continuously, and (c) performance comparison of the submitted entries with three baseline methods (offered by the University of Notre Dame and Michigan State University), and three open-source iris PAD methods available in the public domain. The best performing entry to the competition reported a weighted average APCER of 59.10\% and a BPCER of 0.46\% over all five attack types. This paper serves as the latest evaluation of iris PAD on a large spectrum of presentation attack instruments.

preprint2020arXiv

JukeBox: A Multilingual Singer Recognition Dataset

A text-independent speaker recognition system relies on successfully encoding speech factors such as vocal pitch, intensity, and timbre to achieve good performance. A majority of such systems are trained and evaluated using spoken voice or everyday conversational voice data. Spoken voice, however, exhibits a limited range of possible speaker dynamics, thus constraining the utility of the derived speaker recognition models. Singing voice, on the other hand, covers a broader range of vocal and ambient factors and can, therefore, be used to evaluate the robustness of a speaker recognition system. However, a majority of existing speaker recognition datasets only focus on the spoken voice. In comparison, there is a significant shortage of labeled singing voice data suitable for speaker recognition research. To address this issue, we assemble \textit{JukeBox} - a speaker recognition dataset with multilingual singing voice audio annotated with singer identity, gender, and language labels. We use the current state-of-the-art methods to demonstrate the difficulty of performing speaker recognition on singing voice using models trained on spoken voice alone. We also evaluate the effect of gender and language on speaker recognition performance, both in spoken and singing voice data. The complete \textit{JukeBox} dataset can be accessed at http://iprobe.cse.msu.edu/datasets/jukebox.html.

preprint2020arXiv

Smartphone Camera De-identification while Preserving Biometric Utility

The principle of Photo Response Non Uniformity (PRNU) is often exploited to deduce the identity of the smartphone device whose camera or sensor was used to acquire a certain image. In this work, we design an algorithm that perturbs a face image acquired using a smartphone camera such that (a) sensor-specific details pertaining to the smartphone camera are suppressed (sensor anonymization); (b) the sensor pattern of a different device is incorporated (sensor spoofing); and (c) biometric matching using the perturbed image is not affected (biometric utility). We employ a simple approach utilizing Discrete Cosine Transform to achieve the aforementioned objectives. Experiments conducted on the MICHE-I and OULU-NPU datasets, which contain periocular and facial data acquired using 12 smartphone cameras, demonstrate the efficacy of the proposed de-identification algorithm on three different PRNU-based sensor identification schemes. This work has application in sensor forensics and personal privacy.

Arun Ross

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Can GAN-induced Attribute Manipulations Impact Face Recognition?

Complex-valued Iris Recognition Network

DeepVOX: Discovering Features from Raw Audio for Speaker Recognition in Non-ideal Audio Signals

Facial De-morphing: Extracting Component Faces from a Single Morph

HEFT: Homomorphically Encrypted Fusion of Biometric Templates

Periocular Biometrics and its Relevance to Partially Masked Faces: A Survey

The State of Aerial Surveillance: A Survey

Trust in AI and Its Role in the Acceptance of AI Technologies

DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech Synthesis

One-shot Representational Learning for Joint Biometric and Device Authentication

D-NetPAD: An Explainable and Interpretable Iris Presentation Attack Detector

Face Phylogeny Tree Using Basis Functions

Iris Liveness Detection Competition (LivDet-Iris) -- The 2020 Edition

JukeBox: A Multilingual Singer Recognition Dataset

Smartphone Camera De-identification while Preserving Biometric Utility