Source author record

Haodong Li

Haodong Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Multimedia Artificial Intelligence Cryptography and Security Computation and Language Computer Vision eess.SY Machine Learning Systems and Control

Catalog footprint

What is connected

7works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria

Aligning multimodal generative models with human preferences demands reward signals that respect the compositional, multi-dimensional structure of human judgment. Prevailing RLHF approaches reduce this structure to scalar or pairwise labels, collapsing nuanced preferences into opaque parametric proxies and exposing vulnerabilities to reward hacking. While recent Rubrics-as-Reward (RaR) methods attempt to recover this structure through explicit criteria, generating rubrics that are simultaneously reliable, scalable, and data-efficient remains an open problem. We introduce Auto-Rubric as Reward (ARR), a framework that reframes reward modeling from implicit weight optimization to explicit, criteria-based decomposition. Before any pairwise comparison, ARR externalizes a VLM's internalized preference knowledge as prompt-specific rubrics, translating holistic intent into independently verifiable quality dimensions. This conversion of implicit preference structure into inspectable, interpretable constraints substantially suppresses evaluation biases including positional bias, enabling both zero-shot deployment and few-shot conditioning on minimal supervision. To extend these gains into generative training, we propose Rubric Policy Optimization (RPO), which distills ARR's structured multi-dimensional evaluation into a robust binary reward, replacing opaque scalar regression with rubric-conditioned preference decisions that stabilize policy gradients. On text-to-image generation and image editing benchmarks, ARR-RPO outperforms pairwise reward models and VLM judges, demonstrating that explicitly externalizing implicit preference knowledge into structured rubrics achieves more reliable, data-efficient multimodal alignment, revealing that the bottleneck is the absence of a factorized interface, not a deficit of knowledge.

preprint2026arXiv

Repurposing and Evaluating the (In)Feasibility of Dataset Poisoning enabled Watermarking for Contrastive Learning

Contrastive learning (CL) reduces annotation cost via auto-derived supervisory signals. Since large-scale in-house CL datasets are infeasible, reliance on third-party or internet data is common. Recent studies show CL models are vulnerable to data-poisoning backdoor attacks, but their generalization and robustness are underexplored. We systematically evaluate existing data-poisoning backdoor attacks on CL, revealing limitations: poor dataset adaptability, low success rates, limited portability, and restrictive assumptions (e.g., downstream task knowledge). Interestingly, trigger samples exhibit distinguishable statistical divergence from clean samples, which inspires repurposing it as a watermark for dataset IP protection. Direct repurposing is challenging due to low success rates; we overcome this by statistical verification using a unified density metric. We further propose a multi-level watermarking scheme adapting to feature-level, soft-label, or hard-label outputs in CL. Experiments show some backdoor attacks can be repurposed as effective watermarks with trade-offs among fidelity, verifiability, and robustness. This work demonstrates weak backdoor effects become reliable signals for dataset IP protection in challenging CL settings.

preprint2024arXiv

Digger: Detecting Copyright Content Mis-usage in Large Language Model Training

Pre-training, which utilizes extensive and varied datasets, is a critical factor in the success of Large Language Models (LLMs) across numerous applications. However, the detailed makeup of these datasets is often not disclosed, leading to concerns about data security and potential misuse. This is particularly relevant when copyrighted material, still under legal protection, is used inappropriately, either intentionally or unintentionally, infringing on the rights of the authors. In this paper, we introduce a detailed framework designed to detect and assess the presence of content from potentially copyrighted books within the training datasets of LLMs. This framework also provides a confidence estimation for the likelihood of each content sample's inclusion. To validate our approach, we conduct a series of simulated experiments, the results of which affirm the framework's effectiveness in identifying and addressing instances of content misuse in LLM training processes. Furthermore, we investigate the presence of recognizable quotes from famous literary works within these datasets. The outcomes of our study have significant implications for ensuring the ethical use of copyrighted materials in the development of LLMs, highlighting the need for more transparent and responsible data management practices in this field.

preprint2022arXiv

Robust Coordinated Longitudinal Control of MAV Based on Energy State

Fixed-wing Miniature Air Vehicle (MAV) is not only coupled with longitudinal motion, but also more susceptible to wind disturbance due to its lighter weight, which brings more challenges to its altitude and airspeed controller design. Therefore, in this paper, an improved longitudinal control strategy based on energy state, is proposed to address the above-mentioned issues. The control strategy utilizes the Linear Extended State Observer (LESO) to observe the energy states and the disturbance of the MAV, and then designs a Multiple-Input Multiple-Output (MIMO) controller based on a more coordinated Total Energy Control (TEC) strategy to control the airspeed and altitude of the MAV. The performance of this control strategy has been successfully verified in a Model-in-the-Loop (MIL) simulation with Simulink, and a comparative test with the classical TEC algorithm is carried out.

preprint2020arXiv

Identification of Deep Network Generated Images Using Disparities in Color Components

With the powerful deep network architectures, such as generative adversarial networks, one can easily generate photorealistic images. Although the generated images are not dedicated for fooling human or deceiving biometric authentication systems, research communities and public media have shown great concerns on the security issues caused by these images. This paper addresses the problem of identifying deep network generated (DNG) images. Taking the differences between camera imaging and DNG image generation into considerations, we analyze the disparities between DNG images and real images in different color components. We observe that the DNG images are more distinguishable from real ones in the chrominance components, especially in the residual domain. Based on these observations, we propose a feature set to capture color image statistics for identifying DNG images. Additionally, we evaluate several detection situations, including the training-testing data are matched or mismatched in image sources or generative models and detection with only real images. Extensive experimental results show that the proposed method can accurately identify DNG images and outperforms existing methods when the training and testing data are mismatched. Moreover, when the GAN model is unknown, our methods also achieves good performance with one-class classification by using only real images for training.

preprint2017arXiv

Image Processing Operations Identification via Convolutional Neural Network

In recent years, image forensics has attracted more and more attention, and many forensic methods have been proposed for identifying image processing operations. Up to now, most existing methods are based on hand crafted features, and just one specific operation is considered in their methods. In many forensic scenarios, however, multiple classification for various image processing operations is more practical. Besides, it is difficult to obtain effective features by hand for some image processing operations. In this paper, therefore, we propose a new convolutional neural network (CNN) based method to adaptively learn discriminative features for identifying typical image processing operations. We carefully design the high pass filter bank to get the image residuals of the input image, the channel expansion layer to mix up the resulting residuals, the pooling layers, and the activation functions employed in our method. The extensive results show that the proposed method can outperform the currently best method based on hand crafted features and three related methods based on CNN for image steganalysis and/or forensics, achieving the state-of-the-art results. Furthermore, we provide more supplementary results to show the rationality and robustness of the proposed model.

preprint2015arXiv

Identification of Image Operations Based on Steganalytic Features

Image forensics have attracted wide attention during the past decade. Though many forensic methods have been proposed to identify image forgeries, most of them are targeted ones, since their proposed features are highly dependent on the image operation under investigation. The performance of the well-designed features for detecting the targeted operation usually degrades significantly for other operations. On the other hand, a wise attacker can perform anti-forensics to fool the existing forensic methods, making countering anti-forensics become an urgent need. In this paper, we try to find a universal feature to detect various image processing and anti-forensic operations. Based on our extensive experiments and analysis, we find that any image processing/anti-forensic operations would inevitably modify many image pixels. This would change some inherent statistics within original images, which is similar to the case of steganography. Therefore, we model image processing/anti-forensic operations as steganography problems, and propose a detection strategy by applying steganalytic features. With some advanced steganalytic features, we are able to detect various image operations and further identify their types. In our experiments, we have tested several steganalytic features on 11 different kinds of typical image processing operations and 4 kinds of anti-forensic operations. The experimental results have shown that the proposed strategy significantly outperforms the existing forensic methods in both effectiveness and universality.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint