Researcher profile

Alex C. Kot

Alex C. Kot contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2024arXiv

Benchmarking Joint Face Spoofing and Forgery Detection with Visual and Physiological Cues

Face anti-spoofing (FAS) and face forgery detection play vital roles in securing face biometric systems from presentation attacks (PAs) and vicious digital manipulation (e.g., deepfakes). Despite promising performance upon large-scale data and powerful deep models, the generalization problem of existing approaches is still an open issue. Most of recent approaches focus on 1) unimodal visual appearance or physiological (i.e., remote photoplethysmography (rPPG)) cues; and 2) separated feature representation for FAS or face forgery detection. On one side, unimodal appearance and rPPG features are respectively vulnerable to high-fidelity face 3D mask and video replay attacks, inspiring us to design reliable multi-modal fusion mechanisms for generalized face attack detection. On the other side, there are rich common features across FAS and face forgery detection tasks (e.g., periodic rPPG rhythms and vanilla appearance for bonafides), providing solid evidence to design a joint FAS and face forgery detection system in a multi-task learning fashion. In this paper, we establish the first joint face spoofing and forgery detection benchmark using both visual appearance and physiological rPPG cues. To enhance the rPPG periodicity discrimination, we design a two-branch physiological network using both facial spatio-temporal rPPG signal map and its continuous wavelet transformed counterpart as inputs. To mitigate the modality bias and improve the fusion efficacy, we conduct a weighted batch and layer normalization for both appearance and rPPG features before multi-modal fusion. We find that the generalization capacities of both unimodal (appearance or rPPG) and multi-modal (appearance+rPPG) models can be obviously improved via joint training on these two tasks. We hope this new benchmark will facilitate the future research of both FAS and deepfake detection communities.

preprint2022arXiv

Adversarial Pairwise Reverse Attention for Camera Performance Imbalance in Person Re-identification: New Dataset and Metrics

Existing evaluation metrics for Person Re-Identification (Person ReID) models focus on system-wide performance. However, our studies reveal weaknesses due to the uneven data distributions among cameras and different camera properties that expose the ReID system to exploitation. In this work, we raise the long-ignored ReID problem of camera performance imbalance and collect a real-world privacy-aware dataset from 38 cameras to assist the study of the imbalance issue. We propose new metrics to quantify camera performance imbalance and further propose the Adversarial Pairwise Reverse Attention (APRA) Module to guide the model learning the camera invariant feature with a novel pairwise attention inversion mechanism.

preprint2022arXiv

One-Class Knowledge Distillation for Face Presentation Attack Detection

Face presentation attack detection (PAD) has been extensively studied by research communities to enhance the security of face recognition systems. Although existing methods have achieved good performance on testing data with similar distribution as the training data, their performance degrades severely in application scenarios with data of unseen distributions. In situations where the training and testing data are drawn from different domains, a typical approach is to apply domain adaptation techniques to improve face PAD performance with the help of target domain data. However, it has always been a non-trivial challenge to collect sufficient data samples in the target domain, especially for attack samples. This paper introduces a teacher-student framework to improve the cross-domain performance of face PAD with one-class domain adaptation. In addition to the source domain data, the framework utilizes only a few genuine face samples of the target domain. Under this framework, a teacher network is trained with source domain samples to provide discriminative feature representations for face PAD. Student networks are trained to mimic the teacher network and learn similar representations for genuine face samples of the target domain. In the test phase, the similarity score between the representations of the teacher and student networks is used to distinguish attacks from genuine ones. To evaluate the proposed framework under one-class domain adaptation settings, we devised two new protocols and conducted extensive experiments. The experimental results show that our method outperforms baselines under one-class domain adaptation settings and even state-of-the-art methods with unsupervised domain adaptation.

preprint2022arXiv

Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond

Rain removal aims to remove rain streaks from images/videos and reduce the disruptive effects caused by rain. It not only enhances image/video visibility but also allows many computer vision algorithms to function properly. This paper makes the first attempt to conduct a comprehensive study on the robustness of deep learning-based rain removal methods against adversarial attacks. Our study shows that, when the image/video is highly degraded, rain removal methods are more vulnerable to the adversarial attacks as small distortions/perturbations become less noticeable or detectable. In this paper, we first present a comprehensive empirical evaluation of various methods at different levels of attacks and with various losses/targets to generate the perturbations from the perspective of human perception and machine analysis tasks. A systematic evaluation of key modules in existing methods is performed in terms of their robustness against adversarial attacks. From the insights of our analysis, we construct a more robust deraining method by integrating these effective modules. Finally, we examine various types of adversarial attacks that are specific to deraining problems and their effects on both human and machine vision tasks, including 1) rain region attacks, adding perturbations only in the rain regions to make the perturbations in the attacked rain images less visible; 2) object-sensitive attacks, adding perturbations only in regions near the given objects. Code is available at https://github.com/yuyi-sd/Robust_Rain_Removal.

preprint2021arXiv

Interaction Relational Network for Mutual Action Recognition

Person-person mutual action recognition (also referred to as interaction recognition) is an important research branch of human activity analysis. Current solutions in the field -- mainly dominated by CNNs, GCNs and LSTMs -- often consist of complicated architectures and mechanisms to embed the relationships between the two persons on the architecture itself, to ensure the interaction patterns can be properly learned. Our main contribution with this work is by proposing a simpler yet very powerful architecture, named Interaction Relational Network, which utilizes minimal prior knowledge about the structure of the human body. We drive the network to identify by itself how to relate the body parts from the individuals interacting. In order to better represent the interaction, we define two different relationships, leading to specialized architectures and models for each. These multiple relationship models will then be fused into a single and special architecture, in order to leverage both streams of information for further enhancing the relational reasoning capability. Furthermore we define important structured pair-wise operations to extract meaningful extra information from each pair of joints -- distance and motion. Ultimately, with the coupling of an LSTM, our IRN is capable of paramount sequential relational reasoning. These important extensions we made to our network can also be valuable to other problems that require sophisticated relational reasoning. Our solution is able to achieve state-of-the-art performance on the traditional interaction recognition datasets SBU and UT, and also on the mutual actions from the large-scale dataset NTU RGB+D. Furthermore, it obtains competitive performance in the NTU RGB+D 120 dataset interactions subset.

preprint2020arXiv

Heterogeneous Domain Generalization via Domain Mixup

One of the main drawbacks of deep Convolutional Neural Networks (DCNN) is that they lack generalization capability. In this work, we focus on the problem of heterogeneous domain generalization which aims to improve the generalization capability across different tasks, which is, how to learn a DCNN model with multiple domain data such that the trained feature extractor can be generalized to supporting recognition of novel categories in a novel target domain. To solve this problem, we propose a novel heterogeneous domain generalization method by mixing up samples across multiple source domains with two different sampling strategies. Our experimental results based on the Visual Decathlon benchmark demonstrates the effectiveness of our proposed method. The code is released in \url{https://github.com/wyf0912/MIXALL}

preprint2020arXiv

Light Can Hack Your Face! Black-box Backdoor Attack on Face Recognition Systems

Deep neural networks (DNN) have shown great success in many computer vision applications. However, they are also known to be susceptible to backdoor attacks. When conducting backdoor attacks, most of the existing approaches assume that the targeted DNN is always available, and an attacker can always inject a specific pattern to the training data to further fine-tune the DNN model. However, in practice, such attack may not be feasible as the DNN model is encrypted and only available to the secure enclave. In this paper, we propose a novel black-box backdoor attack technique on face recognition systems, which can be conducted without the knowledge of the targeted DNN model. To be specific, we propose a backdoor attack with a novel color stripe pattern trigger, which can be generated by modulating LED in a specialized waveform. We also use an evolutionary computing strategy to optimize the waveform for backdoor attack. Our backdoor attack can be conducted in a very mild condition: 1) the adversary cannot manipulate the input in an unnatural way (e.g., injecting adversarial noise); 2) the adversary cannot access the training database; 3) the adversary has no knowledge of the training model as well as the training set used by the victim party. We show that the backdoor trigger can be quite effective, where the attack success rate can be up to $88\%$ based on our simulation study and up to $40\%$ based on our physical-domain study by considering the task of face recognition and verification based on at most three-time attempts during authentication. Finally, we evaluate several state-of-the-art potential defenses towards backdoor attacks, and find that our attack can still be effective. We highlight that our study revealed a new physical backdoor attack, which calls for the attention of the security issue of the existing face recognition/verification techniques.

preprint2020arXiv

Multi-Domain Adversarial Feature Generalization for Person Re-Identification

With the assistance of sophisticated training methods applied to single labeled datasets, the performance of fully-supervised person re-identification (Person Re-ID) has been improved significantly in recent years. However, these models trained on a single dataset usually suffer from considerable performance degradation when applied to videos of a different camera network. To make Person Re-ID systems more practical and scalable, several cross-dataset domain adaptation methods have been proposed, which achieve high performance without the labeled data from the target domain. However, these approaches still require the unlabeled data of the target domain during the training process, making them impractical. A practical Person Re-ID system pre-trained on other datasets should start running immediately after deployment on a new site without having to wait until sufficient images or videos are collected and the pre-trained model is tuned. To serve this purpose, in this paper, we reformulate person re-identification as a multi-dataset domain generalization problem. We propose a multi-dataset feature generalization network (MMFA-AAE), which is capable of learning a universal domain-invariant feature representation from multiple labeled datasets and generalizing it to `unseen' camera systems. The network is based on an adversarial auto-encoder to learn a generalized domain-invariant latent feature representation with the Maximum Mean Discrepancy (MMD) measure to align the distributions across multiple domains. Extensive experiments demonstrate the effectiveness of the proposed method. Our MMFA-AAE approach not only outperforms most of the domain generalization Person Re-ID methods, but also surpasses many state-of-the-art supervised methods and unsupervised domain adaptation methods by a large margin.