Source author record

Yixu Wang

Yixu Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Cryptography and Security gr-qc hep-th Machine Learning quant-ph Computation and Language cond-mat.str-el math-ph math.MP

Catalog footprint

What is connected

8works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

The rapid evolution of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has driven major gains in reasoning, perception, and generation across language and vision, yet whether these advances translate into comparable improvements in safety remains unclear, partly due to fragmented evaluations that focus on isolated modalities or threat models. In this report, we present an integrated safety evaluation of six frontier models--GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5--assessing each across language, vision-language, and image generation using a unified protocol that combines benchmark, adversarial, multilingual, and compliance evaluations. By aggregating results into safety leaderboards and model profiles, we reveal a highly uneven safety landscape: while GPT-5.2 demonstrates consistently strong and balanced performance, other models exhibit clear trade-offs across benchmark safety, adversarial robustness, multilingual generalization, and regulatory compliance. Despite strong results under standard benchmarks, all models remain highly vulnerable under adversarial testing, with worst-case safety rates dropping below 6%. Text-to-image models show slightly stronger alignment in regulated visual risk categories, yet remain fragile when faced with adversarial or semantically ambiguous prompts. Overall, these findings highlight that safety in frontier models is inherently multidimensional--shaped by modality, language, and evaluation design--underscoring the need for standardized, holistic safety assessments to better reflect real-world risk and guide responsible deployment.

preprint2026arXiv

DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models

While vision and multimodal foundation models underpin critical tasks from perception to complex reasoning, they remain highly vulnerable to adversarial attacks. However, traditional adversarial attacks are typically limited to single, predefined objectives, tightly coupling each attack to a specific model or task, which restricts their scalability and flexibility in real-world scenarios. In this work, we present DarkLLM, a novel attack framework that trains an LLM to translate natural-language attack instructions into latent attack vectors, which are then decoded into visual adversarial perturbations. By leveraging natural-language instruction tuning, DarkLLM not only unifies targeted, untargeted, segmentation, and multi-model attacks within a single framework, but also achieves flexible and controllable adversarial generation, enabling each instruction to produce a perturbation that induces desired behaviors across heterogeneous models. Through extensive experiments across 4 tasks, 13 datasets, and 15 models, we demonstrate that DarkLLM with only 1B parameters can follow attacker instructions and generate highly effective attacks against CLIP, SAM, and frontier LLMs, revealing a systemic vulnerability in modern foundation models.

preprint2026arXiv

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

The rapid integration of Multimodal Large Language Models (MLLMs) into critical applications is increasingly hindered by persistent safety vulnerabilities. However, existing red-teaming benchmarks are often fragmented, limited to single-turn text interactions, and lack the scalability required for systematic evaluation. To address this, we introduce OpenRT, a unified, modular, and high-throughput red-teaming framework designed for comprehensive MLLM safety evaluation. At its core, OpenRT architects a paradigm shift in automated red-teaming by introducing an adversarial kernel that enables modular separation across five critical dimensions: model integration, dataset management, attack strategies, judging methods, and evaluation metrics. By standardizing attack interfaces, it decouples adversarial logic from a high-throughput asynchronous runtime, enabling systematic scaling across diverse models. Our framework integrates 37 diverse attack methodologies, spanning white-box gradients, multi-modal perturbations, and sophisticated multi-agent evolutionary strategies. Through an extensive empirical study on 20 advanced models (including GPT-5.2, Claude 4.5, and Gemini 3 Pro), we expose critical safety gaps: even frontier models fail to generalize across attack paradigms, with leading models exhibiting average Attack Success Rates as high as 49.14%. Notably, our findings reveal that reasoning models do not inherently possess superior robustness against complex, multi-turn jailbreaks. By open-sourcing OpenRT, we provide a sustainable, extensible, and continuously maintained infrastructure that accelerates the development and standardization of AI safety.

preprint2026arXiv

TAME: Test-Time Adversarial Prompt Tuning via Mixture-of-Experts for Vision-Language Models

Large-scale pre-trained Vision-Language models (VLMs), such as CLIP, exhibit strong zero-shot generalization, yet remain highly vulnerable to imperceptible adversarial perturbations, raising serious safety concerns for open-world deployment. To enhance robustness without requiring downstream task-specific retraining, we propose TAME, a novel test-time defense. Building upon our prior Test-Time Adversarial Prompt Tuning (TAPT), TAME introduces an architectural reformulation by replacing TAPT's single adaptive prompt with an input-conditioned Mixture-of-Experts (MoE) framework, enabling more expressive and adaptive defense. Specifically, TAME maintains a bank of learnable expert prompts and employs an input-dependent routing mechanism to aggregate a customized prompt mixture for each unlabeled test sample at inference time. This test-time defense mechanism is driven by three unsupervised objectives: (1) multi-view prediction entropy minimization, (2) layer-wise alignment of visual token statistics to precomputed clean and adversarial reference distributions, and (3) MoE regularization for balanced expert utilization and prompt diversity. We evaluated TAME on 11 benchmark datasets, including ImageNet and 10 additional zero-shot datasets. The results show that TAME improves the zero-shot adversarial robustness of the original CLIP by at least 49.1% under AutoAttack while largely preserving generalization on clean samples. TAME also consistently outperforms existing adversarial prompt tuning methods across multiple prompt designs, yielding an average robustness gain of at least 30.2%.

preprint2021arXiv

Hyper-Invariant MERA: Approximate Holographic Error Correction Codes with Power-Law Correlations

We consider a class of holographic tensor networks that are efficiently contractible variational ansatze, manifestly (approximate) quantum error correction codes, and can support power-law correlation functions. In the case when the network consists of a single type of tensor that also acts as an erasure correction code, we show that it cannot be both locally contractible and sustain power-law correlation functions. Motivated by this no-go theorem, and the desirability of local contractibility for an efficient variational ansatz, we provide guidelines for constructing networks consisting of multiple types of tensors that can support power-law correlation. We also provide an explicit construction of one such network, which approximates the holographic HaPPY pentagon code in the limit where variational parameters are taken to be small.

preprint2020arXiv

Approximate recovery and relative entropy I. general von Neumann subalgebras

We prove the existence of a universal recovery channel that approximately recovers states on a v. Neumann subalgebra when the change in relative entropy, with respect to a fixed reference state, is small. Our result is a generalization of previous results that applied to type-I v. Neumann algebras by Junge at al. [arXiv:1509.07127]. We broadly follow their proof strategy but consider here arbitrary v. Neumann algebras, where qualitatively new issues arise. Our results hinge on the construction of certain analytic vectors and computations/estimations of their Araki-Masuda $L_p$ norms. We comment on applications to the quantum null energy condition.

preprint2016arXiv

Black hole solutions in functional extensions of Born-Infeld gravity

We consider electrovacuum black hole spacetimes in classical extensions of Eddington-inspired Born-Infeld gravity. By rewriting Born-Infeld action as the square root of the determinant of a matrix $\hatΩ$, we consider the family of models $f (|\hatΩ|)$, and study black hole solutions for a power-law family of models labelled by a simple parameter. We show how the innermost structure of the corresponding black holes is modified as compared to their General Relativity counterparts, discussing in which cases a wormhole structure replaces the point-like singularity. We go forward to argue that in such cases a geodesically complete and thus non-singular spacetime is present, despite the existence of curvature divergences at the wormhole throat.

preprint2016arXiv

Lee-Wick Black Holes

We derive and study an approximate static vacuum solution generated by a point-like source in a higher derivative gravitational theory with a pair of complex conjugate ghosts. The gravitational theory is local and characterized by a high derivative operator compatible with Lee-Wick unitarity. In particular, the tree-level two-point function only shows a pair of complex conjugate poles besides the massless spin two graviton. We show that singularity-free black holes exist when the mass of the source $M$ exceeds a critical value $M_{\rm crit}$. For $M > M_{\rm crit}$ the spacetime structure is characterized by an outer event horizon and an inner Cauchy horizon, while for $M = M_{\rm crit}$ we have an extremal black hole with vanishing Hawking temperature. The evaporation process leads to a remnant that approaches the zero-temperature extremal black hole state in an infinite amount of time.

Yixu Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

TAME: Test-Time Adversarial Prompt Tuning via Mixture-of-Experts for Vision-Language Models

Hyper-Invariant MERA: Approximate Holographic Error Correction Codes with Power-Law Correlations

Approximate recovery and relative entropy I. general von Neumann subalgebras

Black hole solutions in functional extensions of Born-Infeld gravity

Lee-Wick Black Holes