Source author record

Haichang Gao

Haichang Gao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Artificial Intelligence cs.CY Human-Computer Interaction Machine Learning Computer Vision

Catalog footprint

What is connected

13works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models

Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in solving complex problems by generating structured, step-by-step reasoning content. However, exposing a model's internal reasoning process introduces additional safety risks; for example, recent studies show that LRMs are more vulnerable to jailbreak attacks than standard LLMs. In this paper, we investigate jailbreak attacks on LRMs and reveal that the attack success rate (ASR) is closely correlated with LRMs' attention patterns. Specifically, successful jailbreaks tend to assign lower attention to harmful tokens in the input prompt, while allocating higher attention to those tokens in the reasoning content. Motivated by this finding, we propose a novel jailbreak method for LRMs that leverages reinforcement learning (RL) to enhance attack effectiveness, explicitly incorporating attention signals into the reward function design. In addition, we introduce diverse persuasion strategies to enrich the RL action space, which consistently improves the ASR. Extensive experiments on five open-source and closed-source LRMs across three benchmarks demonstrate that our method achieves substantially higher ASR, outperforming existing approaches in terms of effectiveness, efficiency, and transferability.

preprint2026arXiv

Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing

This paper proposes a guaranteed defense method for large language models (LLMs) to safeguard against jailbreaking attacks. Drawing inspiration from the denoised-smoothing approach in the adversarial defense domain, we propose a novel smoothing-based defense method, termed Disrupt-and-Rectify Smoothing (DR-Smoothing). Specifically, we integrate a two-stage prompt processing scheme-first disrupting the input prompt, then rectifying it-into the conventional smoothing defense framework. This disrupt-and-rectify approach improves upon previous disrupt-only approaches by restoring out-of-distribution disrupted prompts to an in-distribution form, thereby reducing the risk of unpredictable LLM behavior. In addition, this two-stage scheme offers a distinct advantage in striking a balance between harmlessness and helpfulness in jailbreaking defense. Notably, we present a theoretical analysis for generic smoothing framework, offering a tight bound for the defense success probability and the requirements on the disruption strength. Our approach can defend against both token-level and prompt-level jailbreaking attacks, under both established and adaptive attacking scenarios. Extensive experiments demonstrate that our approach surpasses current state-of-the-art defense methods in terms of both harmlessness and helpfulness.

preprint2026arXiv

ICU-Bench:Benchmarking Continual Unlearning in Multimodal Large Language Models

Although Multimodal Large Language Models (MLLMs) have achieved remarkable progress across many domains, their training on large-scale multimodal datasets raises serious privacy concerns, making effective machine unlearning increasingly necessary. However, existing benchmarks mainly focus on static or short-sequence settings, offering limited support for evaluating continual privacy deletion requests in realistic deployments. To bridge this gap, we introduce ICU-Bench, a continual multimodal unlearning benchmark built on privacy-critical document data. ICU-Bench contains 1,000 privacy-sensitive profiles from two document domains, medical reports and labor contracts, with 9,500 images, 16,000 question-answer pairs, and 100 forget tasks. Additionally, new continual unlearning metrics are introduced, facilitating a comprehensive analysis of forgetting effectiveness, historical forgetting preservation, retained utility, and stability throughout the continual unlearning process. Through extensive experiments with representative unlearning methods on ICU-Bench, we show that existing methods generally struggle in continual settings and exhibit clear limitations in balancing forgetting quality, utility preservation, and scalability over long task sequences. These findings highlight the need for multimodal unlearning methods explicitly designed for continual privacy deletion.

preprint2026arXiv

Null Space Constrained Contrastive Visual Forgetting for MLLM Unlearning

The core challenge of machine unlearning is to strike a balance between target knowledge removal and non-target knowledge retention. In the context of Multimodal Large Language Models (MLLMs), this challenge becomes even more pronounced, as knowledge is further divided into visual and textual modalities that are tightly intertwined. In this paper, we introduce an MLLM unlearning approach that aims to forget target visual knowledge while preserving non-target visual knowledge and all textual knowledge. Specifically, we freeze the LLM backbone and achieve unlearning by fine-tuning the visual module. First, we propose a Contrastive Visual Forgetting (CVF) mechanism to separate target visual knowledge from retained visual knowledge, guiding the representations of target visual concepts toward appropriate regions in the feature space. Second, we identify the null space associated with retained knowledge and constrain the unlearning process within this space, thereby significantly mitigating degradation in knowledge retention. Third, beyond static unlearning scenarios, we extend our approach to continual unlearning, where forgetting requests arrive sequentially. Extensive experiments across diverse benchmarks demonstrate that our approach achieves a strong balance between effective forgetting and robust knowledge retention.

preprint2026arXiv

Re-Triggering Safeguards within LLMs for Jailbreak Detection

This paper proposes a jailbreaking prompt detection method for large language models (LLMs) to defend against jailbreak attacks. Although recent LLMs are equipped with built-in safeguards, it remains possible to craft jailbreaking prompts that bypass them. We argue that such jailbreaking prompts are inherently fragile, and thus introduce an embedding disruption method to re-activate the safeguards within LLMs. Unlike previous defense methods that aim to serve as standalone solutions, our approach instead cooperates with the LLM's internal defense mechanisms by re-triggering them. Moreover, through extensive analysis, we gain a comprehensive understanding of the disruption effects and develop an efficient search algorithm to identify appropriate disruptions for effective jailbreak detection. Extensive experiments demonstrate that our approach effectively defends against state-of-the-art jailbreak attacks in white-box and black-box settings, and remains robust even against adaptive attacks.

preprint2024arXiv

Lower Difficulty and Better Robustness: A Bregman Divergence Perspective for Adversarial Training

In this paper, we investigate on improving the adversarial robustness obtained in adversarial training (AT) via reducing the difficulty of optimization. To better study this problem, we build a novel Bregman divergence perspective for AT, in which AT can be viewed as the sliding process of the training data points on the negative entropy curve. Based on this perspective, we analyze the learning objectives of two typical AT methods, i.e., PGD-AT and TRADES, and we find that the optimization process of TRADES is easier than PGD-AT for that TRADES separates PGD-AT. In addition, we discuss the function of entropy in TRADES, and we find that models with high entropy can be better robustness learners. Inspired by the above findings, we propose two methods, i.e., FAIT and MER, which can both not only reduce the difficulty of optimization under the 10-step PGD adversaries, but also provide better robustness. Our work suggests that reducing the difficulty of optimization under the 10-step PGD adversaries is a promising approach for enhancing the adversarial robustness in AT.

preprint2022arXiv

Alleviating Robust Overfitting of Adversarial Training With Consistency Regularization

Adversarial training (AT) has proven to be one of the most effective ways to defend Deep Neural Networks (DNNs) against adversarial attacks. However, the phenomenon of robust overfitting, i.e., the robustness will drop sharply at a certain stage, always exists during AT. It is of great importance to decrease this robust generalization gap in order to obtain a robust model. In this paper, we present an in-depth study towards the robust overfitting from a new angle. We observe that consistency regularization, a popular technique in semi-supervised learning, has a similar goal as AT and can be used to alleviate robust overfitting. We empirically validate this observation, and find a majority of prior solutions have implicit connections to consistency regularization. Motivated by this, we introduce a new AT solution, which integrates the consistency regularization and Mean Teacher (MT) strategy into AT. Specifically, we introduce a teacher model, coming from the average weights of the student models over the training steps. Then we design a consistency loss function to make the prediction distribution of the student models over adversarial examples consistent with that of the teacher model over clean samples. Experiments show that our proposed method can effectively alleviate robust overfitting and improve the robustness of DNN models against common adversarial attacks.

preprint2013arXiv

A New Graphical Password Scheme Resistant to Shoulder-Surfing

Shoulder-surfing is a known risk where an attacker can capture a password by direct observation or by recording the authentication session. Due to the visual interface, this problem has become exacerbated in graphical passwords. There have been some graphical schemes resistant or immune to shoulder-surfing, but they have significant usability drawbacks, usually in the time and effort to log in. In this paper, we propose and evaluate a new shoulder-surfing resistant scheme which has a desirable usability for PDAs. Our inspiration comes from the drawing input method in DAS and the association mnemonics in Story for sequence retrieval. The new scheme requires users to draw a curve across their password images orderly rather than click directly on them. The drawing input trick along with the complementary measures, such as erasing the drawing trace, displaying degraded images, and starting and ending with randomly designated images provide a good resistance to shouldersurfing. A preliminary user study showed that users were able to enter their passwords accurately and to remember them over time.

preprint2013arXiv

Against Spyware Using CAPTCHA in Graphical Password Scheme

Text-based password schemes have inherent security and usability problems, leading to the development of graphical password schemes. However, most of these alternate schemes are vulnerable to spyware attacks. We propose a new scheme, using CAPTCHA (Completely Automated Public Turing tests to tell Computers and Humans Apart) that retaining the advantages of graphical password schemes, while simultaneously raising the cost of adversaries by orders of magnitude. Furthermore, some primary experiments are conducted and the results indicate that the usability should be improved in the future work.

preprint2013arXiv

An audio CAPTCHA to distinguish humans from computers

CAPTCHAs are employed as a security measure to differentiate human users from bots. A new sound-based CAPTCHA is proposed in this paper, which exploits the gaps between human voice and synthetic voice rather than relays on the auditory perception of human. The user is required to read out a given sentence, which is selected randomly from a specified book. The generated audio file will be analyzed automatically to judge whether the user is a human or not. In this paper, the design of the new CAPTCHA, the analysis of the audio files, and the choice of the audio frame window function are described in detail. And also, some experiments are conducted to fix the critical threshold and the coefficients of three indicators to ensure the security. The proposed audio CAPTCHA is proved accessible to users. The user study has shown that the human success rate reaches approximately 97% and the pass rate of attack software using Microsoft SDK 5.1 is only 4%. The experiments also indicated that it could be solved by most human users in less than 14 seconds and the average time is only 7.8 seconds.

preprint2013arXiv

Can background baroque music help to improve the memorability of graphical passwords?

Graphical passwords have been proposed as an alternative to alphanumeric passwords with their advantages in usability and security. However, they still tend to follow predictable patterns that are easier for attackers to exploit, probably due to users' memory limitations. Various literatures show that baroque music has positive effects on human learning and memorizing. To alleviate users' memory burden, we investigate the novel idea of introducing baroque music to graphical password schemes (specifically DAS, PassPoints and Story) and conduct a laboratory study to see whether it is helpful. In a ten minutes short-term recall, we found that participants in all conditions had high recall success rates that were not statistically different from each other. After one week, the music group coped PassPoints passwords significantly better than the group without music. But there was no statistical difference between two groups in recalling DAS passwords or Story passwords. Further more, we found that the music group tended to set significantly more complicated PassPoints passwords but less complicated DAS passwords.

preprint2013arXiv

Draw a line on your PDA to authenticate

The trend toward a highly mobile workforce and the ubiquity of graphical interfaces (such as the stylus and touch-screen) has enabled the emergence of graphical authentications in Personal Digital Assistants (PDAs) [1]. However, most of the current graphical password schemes are vulnerable to shoulder-surfing [2,3], a known risk where an attacker can capture a password by direct observation or by recording the authentication session. Several approaches have been developed to deal with this problem, but they have significant usability drawbacks, usually in the time and effort to log in, making them less suitable for authentication [4, 8]. For example, it is time-consuming for users to log in CHC [4] and there are complex text memory requirements in scheme proposed by Hong [5]. With respect to the scheme proposed by Weinshall [6], not only is it intricate to log in, but also the main claim of resisting shoulder-surfing is proven false [7]. In this paper, we introduce a new graphical password scheme which provides a good resistance to shouldersurfing and preserves a desirable usability.

preprint2013arXiv

The effect of baroque music on the PassPoints graphical password

Graphical passwords have been demonstrated to be the possible alternatives to traditional alphanumeric passwords. However, they still tend to follow predictable patterns that are easier to attack. The crux of the problem is users' memory limitations. Users are the weakest link in password authentication mechanism. It shows that baroque music has positive effects on human memorizing and learning. We introduce baroque music to the PassPoints graphical password scheme and conduct a laboratory study in this paper. Results shown that there is no statistic difference between the music group and the control group without music in short-term recall experiments, both had high recall success rates. But in long-term recall, the music group performed significantly better. We also found that the music group tended to set significantly more complicated passwords, which are usually more resistant to dictionary and other guess attacks. But compared with the control group, the music group took more time to log in both in short-term and long-term tests. Besides, it appears that background music does not work in terms of hotspots.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint