Source author record

Leo Yu Zhang

Leo Yu Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Computer Vision Artificial Intelligence Information Theory math.IT Computation and Language Machine Learning Software Engineering

Catalog footprint

What is connected

20works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Image-to-Video Diffusion: From Foundations to Open Frontiers

Diffusion-based \textit{image-to-video} (I2V) generation has become a central direction in generative models by turning a reference image, with optional conditions, into a temporally coherent video. Compared with broader video generation settings, this task places stricter demands on content consistency, identity preservation, and motion coherence. Although the literature grows rapidly, existing works mostly discuss I2V generation within broader topics and still lack a dedicated taxonomy together with a systematic analysis centered on this field. This work addresses that gap by treating diffusion I2V generation as a standalone subject. It first reviews the task formulation, model architectures, datasets, and evaluation metrics, and then organizes existing methods through a taxonomy based on architecture and training paradigm. It further distills four core designs, namely condition encoding, temporal modeling, noise prior design, and spatial-temporal upsampling, and discusses representative application scenarios together with major open challenges.

preprint2026arXiv

Less Is More -- Until It Breaks: Security Pitfalls of Vision Token Compression in Large Vision-Language Models

Visual token compression is widely adopted to improve the inference efficiency of Large Vision-Language Models (LVLMs), enabling their deployment in latency-sensitive and resource-constrained scenarios. However, existing work has mainly focused on efficiency and performance, while the security implications of visual token compression remain largely unexplored. In this work, we first reveal that visual token compression substantially degrades the robustness of LVLMs: models that are robust under uncompressed inference become highly vulnerable once compression is enabled. These vulnerabilities are state-specific; failure modes emerge only in the compressed setting and completely disappear when compression is disabled, making them particularly hidden and difficult to diagnose. By analyzing the key stages of the compression process, we identify instability in token importance ranking as the primary cause of this robustness degradation. Small and imperceptible perturbations can significantly alter token rankings, leading the compression mechanism to mistakenly discard task-critical information and ultimately causing model failure. Motivated by this observation, we propose a Compression-Aware Attack to systematically study and exploit this vulnerability. CAA directly targets the token selection mechanism and induces failures exclusively under compressed inference. We further extend this approach to more realistic black-box settings and introduce Transfer CAA, where neither the target model nor the compression configuration is accessible. We further evaluate potential defenses and find that they provide only limited protection. Extensive experiments across models, datasets, and compression methods show that visual token compression significantly undermines robustness, revealing a previously overlooked efficiency-security trade-off.

preprint2026arXiv

Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

Coding agents now run autonomously with shell, file, and network privileges. When a user issues a benign request, the agent sometimes does more than asked: it deletes unrelated files, wipes a stale credentials backup, or rewrites configuration the user never mentioned. We call these scope expansions overeager actions, an authorization problem distinct from capability failures, prompt injection, or sandbox escapes. We present OverEager-Gen, a benchmark dedicated to overeager behavior on benign tasks. Building it surfaces a measurement-validity issue: if a benchmark spells out the authorized scope inside the prompt, the agent stops inferring boundaries and starts pattern-matching declaration text. On Claude Code, stripping the consent declaration alone raises the overeager rate from 0.0% to 17.1% on paired scenarios (McNemar exact p = 2.4 x 10^-4). OverEager-Gen therefore certifies each scenario's discriminative power before admission via a behavioral-gradient validator, audits internal tool calls through a dual-channel stack (PATH-injected shim plus per-agent event streams), and ships byte-identical consent_kept and consent_stripped variants. OverEager-Bench contains 500 validated scenarios and ~7,500 runs across four agent products (Claude Code, OpenHands, Codex CLI, Gemini CLI) and six base models; a 50-sample re-annotation gives Cohen's kappa = 0.73 and rule-judge recall = 1.00. Stripping consent multiplies the overeager rate on every shared base model (Delta in [11.9, 17.2] pp). The framework axis dominates effect size: a permissive cluster (Claude Code, Codex CLI, Gemini CLI) runs at 5.4-27.7% while the ask-to-continue framework (OpenHands) sits at 0.2-4.5% (Fisher p <= 10^-5). Within-framework base-model variance reaches 15.9 pp, indicating that model-layer alignment does not fully propagate through permissive permission gating.

preprint2022arXiv

Attention Distraction: Watermark Removal Through Continual Learning with Selective Forgetting

Fine-tuning attacks are effective in removing the embedded watermarks in deep learning models. However, when the source data is unavailable, it is challenging to just erase the watermark without jeopardizing the model performance. In this context, we introduce Attention Distraction (AD), a novel source data-free watermark removal attack, to make the model selectively forget the embedded watermarks by customizing continual learning. In particular, AD first anchors the model's attention on the main task using some unlabeled data. Then, through continual learning, a small number of \textit{lures} (randomly selected natural images) that are assigned a new label distract the model's attention away from the watermarks. Experimental results from different datasets and networks corroborate that AD can thoroughly remove the watermark with a small resource budget without compromising the model's performance on the main task, which outperforms the state-of-the-art works.

preprint2022arXiv

BadHash: Invisible Backdoor Attacks against Deep Hashing with Clean Label

Due to its powerful feature learning capability and high efficiency, deep hashing has achieved great success in large-scale image retrieval. Meanwhile, extensive works have demonstrated that deep neural networks (DNNs) are susceptible to adversarial examples, and exploring adversarial attack against deep hashing has attracted many research efforts. Nevertheless, backdoor attack, another famous threat to DNNs, has not been studied for deep hashing yet. Although various backdoor attacks have been proposed in the field of image classification, existing approaches failed to realize a truly imperceptive backdoor attack that enjoys invisible triggers and clean label setting simultaneously, and they also cannot meet the intrinsic demand of image retrieval backdoor. In this paper, we propose BadHash, the first generative-based imperceptible backdoor attack against deep hashing, which can effectively generate invisible and input-specific poisoned images with clean label. Specifically, we first propose a new conditional generative adversarial network (cGAN) pipeline to effectively generate poisoned samples. For any given benign image, it seeks to generate a natural-looking poisoned counterpart with a unique invisible trigger. In order to improve the attack effectiveness, we introduce a label-based contrastive learning network LabCLN to exploit the semantic characteristics of different labels, which are subsequently used for confusing and misleading the target model to learn the embedded trigger. We finally explore the mechanism of backdoor attacks on image retrieval in the hash space. Extensive experiments on multiple benchmark datasets verify that BadHash can generate imperceptible poisoned samples with strong attack ability and transferability over state-of-the-art deep hashing schemes.

preprint2022arXiv

Evaluating Membership Inference Through Adversarial Robustness

The usage of deep learning is being escalated in many applications. Due to its outstanding performance, it is being used in a variety of security and privacy-sensitive areas in addition to conventional applications. One of the key aspects of deep learning efficacy is to have abundant data. This trait leads to the usage of data which can be highly sensitive and private, which in turn causes wariness with regard to deep learning in the general public. Membership inference attacks are considered lethal as they can be used to figure out whether a piece of data belongs to the training dataset or not. This can be problematic with regards to leakage of training data information and its characteristics. To highlight the significance of these types of attacks, we propose an enhanced methodology for membership inference attacks based on adversarial robustness, by adjusting the directions of adversarial perturbations through label smoothing under a white-box setting. We evaluate our proposed method on three datasets: Fashion-MNIST, CIFAR-10, and CIFAR-100. Our experimental results reveal that the performance of our method surpasses that of the existing adversarial robustness-based method when attacking normally trained models. Additionally, through comparing our technique with the state-of-the-art metric-based membership inference methods, our proposed method also shows better performance when attacking adversarially trained models. The code for reproducing the results of this work is available at \url{https://github.com/plll4zzx/Evaluating-Membership-Inference-Through-Adversarial-Robustness}.

preprint2022arXiv

Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-robust Makeup Transfer

While deep face recognition (FR) systems have shown amazing performance in identification and verification, they also arouse privacy concerns for their excessive surveillance on users, especially for public face images widely spread on social networks. Recently, some studies adopt adversarial examples to protect photos from being identified by unauthorized face recognition systems. However, existing methods of generating adversarial face images suffer from many limitations, such as awkward visual, white-box setting, weak transferability, making them difficult to be applied to protect face privacy in reality. In this paper, we propose adversarial makeup transfer GAN (AMT-GAN), a novel face protection method aiming at constructing adversarial face images that preserve stronger black-box transferability and better visual quality simultaneously. AMT-GAN leverages generative adversarial networks (GAN) to synthesize adversarial face images with makeup transferred from reference images. In particular, we introduce a new regularization module along with a joint training strategy to reconcile the conflicts between the adversarial noises and the cycle consistence loss in makeup transfer, achieving a desirable balance between the attack strength and visual changes. Extensive experiments verify that compared with state of the arts, AMT-GAN can not only preserve a comfortable visual quality, but also achieve a higher attack success rate over commercial FR APIs, including Face++, Aliyun, and Microsoft.

preprint2022arXiv

Self-Supervised Adversarial Example Detection by Disentangled Representation

Deep learning models are known to be vulnerable to adversarial examples that are elaborately designed for malicious purposes and are imperceptible to the human perceptual system. Autoencoder, when trained solely over benign examples, has been widely used for (self-supervised) adversarial detection based on the assumption that adversarial examples yield larger reconstruction errors. However, because lacking adversarial examples in its training and the too strong generalization ability of autoencoder, this assumption does not always hold true in practice. To alleviate this problem, we explore how to detect adversarial examples with disentangled label/semantic features under the autoencoder structure. Specifically, we propose Disentangled Representation-based Reconstruction (DRR). In DRR, we train an autoencoder over both correctly paired label/semantic features and incorrectly paired label/semantic features to reconstruct benign and counterexamples. This mimics the behavior of adversarial examples and can reduce the unnecessary generalization ability of autoencoder. We compare our method with the state-of-the-art self-supervised detection methods under different adversarial attacks and different victim models, and it exhibits better performance in various metrics (area under the ROC curve, true positive rate, and true negative rate) for most attack settings. Though DRR is initially designed for visual tasks only, we demonstrate that it can be easily extended for natural language tasks as well. Notably, different from other autoencoder-based detectors, our method can provide resistance to the adaptive adversary.

preprint2022arXiv

Towards Privacy-Preserving Neural Architecture Search

Machine learning promotes the continuous development of signal processing in various fields, including network traffic monitoring, EEG classification, face identification, and many more. However, massive user data collected for training deep learning models raises privacy concerns and increases the difficulty of manually adjusting the network structure. To address these issues, we propose a privacy-preserving neural architecture search (PP-NAS) framework based on secure multi-party computation to protect users' data and the model's parameters/hyper-parameters. PP-NAS outsources the NAS task to two non-colluding cloud servers for making full advantage of mixed protocols design. Complement to the existing PP machine learning frameworks, we redesign the secure ReLU and Max-pooling garbled circuits for significantly better efficiency ($3 \sim 436$ times speed-up). We develop a new alternative to approximate the Softmax function over secret shares, which bypasses the limitation of approximating exponential operations in Softmax while improving accuracy. Extensive analyses and experiments demonstrate PP-NAS's superiority in security, efficiency, and accuracy.

preprint2021arXiv

From Chaos to Pseudo-Randomness: A Case Study on the 2D Coupled Map Lattice

Applying chaos theory for secure digital communications is promising and it is well acknowledged that in such applications the underlying chaotic systems should be carefully chosen. However, the requirements imposed on the chaotic systems are usually heuristic, without theoretic guarantee for the resultant communication scheme. Among all the primitives for secure communications, it is well-accepted that (pseudo) random numbers are most essential. Taking the well-studied two-dimensional coupled map lattice (2D CML) as an example, this paper performs a theoretical study towards pseudo-random number generation with the 2D CML. In so doing, an analytical expression of the Lyapunov exponent (LE) spectrum of the 2D CML is first derived. Using the LEs, one can configure system parameters to ensure the 2D CML only exhibits complex dynamic behavior, and then collect pseudo-random numbers from the system orbits. Moreover, based on the observation that least significant bit distributes more evenly in the (pseudo) random distribution, an extraction algorithm E is developed with the property that, when applied to the orbits of the 2D CML, it can squeeze uniform bits. In implementation, if fixed-point arithmetic is used in binary format with a precision of $z$ bits after the radix point, E can ensure that the deviation of the squeezed bits is bounded by $2^{-z}$ . Further simulation results demonstrate that the new method not only guide the 2D CML model to exhibit complex dynamic behavior, but also generate uniformly distributed independent bits. In particular, the squeezed pseudo random bits can pass both NIST 800-22 and TestU01 test suites in various settings. This study thereby provides a theoretical basis for effectively applying the 2D CML to secure communications.

preprint2016arXiv

Breaking a chaotic image encryption algorithm based on modulo addition and XOR operation

This paper re-evaluates the security of a chaotic image encryption algorithm called MCKBA/HCKBA and finds that it can be broken efficiently with two known plain-images and the corresponding cipher-images. In addition, it is reported that a previously proposed breaking on MCKBA/HCKBA can be further improved by reducing the number of chosen plain-images from four to two. The two attacks are both based on the properties of solving a composite function involving the carry bit, which is composed of the modulo addition and the bitwise OR operations. Both rigorous theoretical analysis and detailed experimental results are provided.

preprint2016arXiv

Cryptanalyzing a class of image encryption schemes based on Chinese Remainder Theorem

As a fundamental theorem in number theory, the Chinese Reminder Theorem (CRT) is widely used to construct cryptographic primitives. This paper investigates the security of a class of image encryption schemes based on CRT, referred to as CECRT. Making use of some properties of CRT, the equivalent secret key of CECRT can be recovered efficiently. The required number of pairs of chosen plaintext and the corresponding ciphertext is only $(1+\lceil (\log_2L)/l \rceil)$. The attack complexity is only $O(L)$, where $L$ is the plaintext length and $l$ is the number of bits representing a plaintext symbol. In addition, other defects of CECRT such as invalid compression function and low sensitivity to plaintext, are reported. The work in this paper will help clarify positive role of CRT in cryptology.

preprint2015arXiv

Bi-level Protected Compressive Sampling

Some pioneering works have investigated embedding cryptographic properties in compressive sampling (CS) in a way similar to one-time pad symmetric cipher. This paper tackles the problem of constructing a CS-based symmetric cipher under the key reuse circumstance, i.e., the cipher is resistant to common attacks even a fixed measurement matrix is used multiple times. To this end, we suggest a bi-level protected CS (BLP-CS) model which makes use of the advantage of the non-RIP measurement matrix construction. Specifically, two kinds of artificial basis mismatch techniques are investigated to construct key-related sparsifying bases. It is demonstrated that the encoding process of BLP-CS is simply a random linear projection, which is the same as the basic CS model. However, decoding the linear measurements requires knowledge of both the key-dependent sensing matrix and its sparsifying basis. The proposed model is exemplified by sampling images as a joint data acquisition and protection layer for resource-limited wireless sensors. Simulation results and numerical analyses have justified that the new model can be applied in circumstances where the measurement matrix can be re-used.

preprint2015arXiv

Chosen-plaintext attack of an image encryption scheme based on modified permutation-diffusion structure

Since the first appearance in Fridrich's design, the usage of permutation-diffusion structure for designing digital image cryptosystem has been receiving increasing research attention in the field of chaos-based cryptography. Recently, a novel chaotic Image Cipher using one round Modified Permutation-Diffusion pattern (ICMPD) was proposed. Unlike traditional permutation-diffusion structure, the permutation is operated on bit level instead of pixel level and the diffusion is operated on masked pixels, which are obtained by carrying out the classical affine cipher, instead of plain pixels in ICMPD. Following a \textit{divide-and-conquer strategy}, this paper reports that ICMPD can be compromised by a chosen-plaintext attack efficiently and the involved data complexity is linear to the size of the plain-image. Moreover, the relationship between the cryptographic kernel at the diffusion stage of ICMPD and modulo addition then XORing is explored thoroughly.

preprint2015arXiv

On the security of a class of diffusion mechanisms for image encryption

The need for fast and strong image cryptosystems motivates researchers to develop new techniques to apply traditional cryptographic primitives in order to exploit the intrinsic features of digital images. One of the most popular and mature technique is the use of complex ynamic phenomena, including chaotic orbits and quantum walks, to generate the required key stream. In this paper, under the assumption of plaintext attacks we investigate the security of a classic diffusion mechanism (and of its variants) used as the core cryptographic rimitive in some image cryptosystems based on the aforementioned complex dynamic phenomena. We have theoretically found that regardless of the key schedule process, the data complexity for recovering each element of the equivalent secret key from these diffusion mechanisms is only O(1). The proposed analysis is validated by means of numerical examples. Some additional cryptographic applications of our work are also discussed.

preprint2014arXiv

A chaotic image encryption scheme owning temp-value feedback

This paper presents a novel efficient chaotic image encryption scheme, in which the temp-value feedback mechanism is introduced to the permutation and diffusion procedures. Firstly, a simple trick is played to map the plain-image pixels to the initial condition of the Logistic map. Then, a pseudorandom number sequence (PRNS) is obtained from iterating the map. The permutation procedure is carried out by a permutation sequence which is generated by comparing the PRNS and its sorted version. The diffusion procedure is composed of two reversely executed rounds. During each round, the current plain-image pixel and the last cipher-image pixel are used to produce the current cipher-image pixel with the help of the Logistic map and a pseudorandom number generated by the Chen system. To enhance the efficiency, only expanded XOR operation and modulo 256 addition are employed during diffusion. Experimental results show that the new scheme owns a large key space and can resist the differential attack. It is also efficient.

preprint2014arXiv

Cryptanalyzing an image encryption algorithm based on scrambling and Veginere cipher

Recently, an image encryption algorithm based on scrambling and Vegin`ere cipher has been proposed. However, it was soon cryptanalyzed by Zhang et al. using a combination of chosen-plaintext attack and differential attack. This paper briefly reviews the two attack methods proposed by Zhang et al. and outlines the mathematical interpretations of them. Based on their work, we present an improved chosen-plaintext attack to further reduce the number of chosen-plaintexts required, which is proved to be optimal. Moreover, it is found that an elaborately designed known-plaintex attack can efficiently compromise the image cipher under study. This finding is verified by both mathematical analysis and numerical simulations. The cryptanalyzing techniques described in this paper may provide some insights for designing secure and efficient multimedia ciphers.

preprint2014arXiv

Embedding Cryptographic Features in Compressive Sensing

Compressive sensing (CS) has been widely studied and applied in many fields. Recently, the way to perform secure compressive sensing (SCS) has become a topic of growing interest. The existing works on SCS usually take the sensing matrix as a key and the resultant security level is not evaluated in depth. They can only be considered as a preliminary exploration on SCS, but a concrete and operable encipher model is not given yet. In this paper, we are going to investigate SCS in a systematic way. The relationship between CS and symmetric-key cipher indicates some possible encryption models. To this end, we propose the two-level protection models (TLPM) for SCS which are developed from measurements taking and something else, respectively. It is believed that these models will provide a new point of view and stimulate further research in both CS and cryptography. Specifically, an efficient and secure encryption scheme for parallel compressive sensing (PCS) is designed by embedding a two-layer protection in PCS using chaos. The first layer is undertaken by random permutation on a two-dimensional signal, which is proved to be an acceptable permutation with overwhelming probability. The other layer is to sample the permuted signal column by column with the same chaotic measurement matrix, which satisfies the restricted isometry property of PCS with overwhelming probability. Both the random permutation and the measurement matrix are constructed under the control of a chaotic system. Simulation results show that unlike the general joint compression and encryption schemes in which encryption always leads to the same or a lower compression ratio, the proposed approach of embedding encryption in PCS actually improves the compression performance. Besides, the proposed approach possesses high transmission robustness against additive Gaussian white noise and cropping attack.

preprint2014arXiv

Joint Quantization and Diffusion for Compressed Sensing Measurements of Natural Images

Recent research advances have revealed the computational secrecy of the compressed sensing (CS) paradigm. Perfect secrecy can also be achieved by normalizing the CS measurement vector. However, these findings are established on real measurements while digital devices can only store measurements at a finite precision. Based on the distribution of measurements of natural images sensed by structurally random ensemble, a joint quantization and diffusion approach is proposed for these real-valued measurements. In this way, a nonlinear cryptographic diffusion is intrinsically imposed on the CS process and the overall security level is thus enhanced. Security analyses show that the proposed scheme is able to resist known-plaintext attack while the original CS scheme without quantization cannot. Experimental results demonstrate that the reconstruction quality of our scheme is comparable to that of the original one.

preprint2014arXiv

Robust Coding of Encrypted Images via Structural Matrix

The robust coding of natural images and the effective compression of encrypted images have been studied individually in recent years. However, little work has been done in the robust coding of encrypted images. The existing results in these two individual research areas cannot be combined directly for the robust coding of encrypted images. This is because the robust coding of natural images relies on the elimination of spatial correlations using sparse transforms such as discrete wavelet transform (DWT), which is ineffective to encrypted images due to the weak correlation between encrypted pixels. Moreover, the compression of encrypted images always generates code streams with different significance. If one or more such streams are lost, the quality of the reconstructed images may drop substantially or decoding error may exist, which violates the goal of robust coding of encrypted images. In this work, we intend to design a robust coder, based on compressive sensing with structurally random matrix, for encrypted images over packet transmission networks. The proposed coder can be applied in the scenario that Alice needs a semi-trusted channel provider Charlie to encode and transmit the encrypted image to Bob. In particular, Alice first encrypts an image using globally random permutation and then sends the encrypted image to Charlie who samples the encrypted image using a structural matrix. Through an imperfect channel with packet loss, Bob receives the compressive measurements and reconstructs the original image by joint decryption and decoding. Experimental results show that the proposed coder can be considered as an efficient multiple description coder with a number of descriptions against packet loss.

Leo Yu Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

Image-to-Video Diffusion: From Foundations to Open Frontiers

Less Is More -- Until It Breaks: Security Pitfalls of Vision Token Compression in Large Vision-Language Models

Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

Attention Distraction: Watermark Removal Through Continual Learning with Selective Forgetting

BadHash: Invisible Backdoor Attacks against Deep Hashing with Clean Label

Evaluating Membership Inference Through Adversarial Robustness

Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-robust Makeup Transfer

Self-Supervised Adversarial Example Detection by Disentangled Representation

Towards Privacy-Preserving Neural Architecture Search

From Chaos to Pseudo-Randomness: A Case Study on the 2D Coupled Map Lattice

Breaking a chaotic image encryption algorithm based on modulo addition and XOR operation

Cryptanalyzing a class of image encryption schemes based on Chinese Remainder Theorem

Bi-level Protected Compressive Sampling

Chosen-plaintext attack of an image encryption scheme based on modified permutation-diffusion structure

On the security of a class of diffusion mechanisms for image encryption

A chaotic image encryption scheme owning temp-value feedback

Cryptanalyzing an image encryption algorithm based on scrambling and Veginere cipher

Embedding Cryptographic Features in Compressive Sensing

Joint Quantization and Diffusion for Compressed Sensing Measurements of Natural Images

Robust Coding of Encrypted Images via Structural Matrix