SRAF: Stealthy and Robust Adversarial Fingerprint for Copyright Verification of Large Language Models
The protection of Intellectual Property (IP) for Large Language Models (LLMs) has become a critical concern as model theft and unauthorized commercialization escalate. While adversarial fingerprinting offers a promising black-box solution for ownership verification, existing methods suffer from significant limitations: they are fragile against model modifications, sensitive to system prompt variations, and easily detectable due to high-perplexity input patterns. In this paper, we propose SRAF, which employs a multi-task adversarial optimization strategy that jointly optimizes fingerprints across homologous model variants and diverse chat templates, allowing the fingerprint to anchor onto invariant decision boundary features. Furthermore, we introduce a Perplexity Hiding technique that embeds adversarial perturbations within Markdown tables, effectively aligning the prompt's statistics with natural language to evade perplexity-based detection. Experiments on Llama-2 variants demonstrate SRAF's superior robustness and stealthiness compared to state-of-the-art baselines, offering a practical black-box solution for ownership verification.