Researcher profile

Yuan Xin

Yuan Xin contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

SafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts

As Large Language Models (LLMs) are increasingly integrated into academic peer review, their vulnerability to adversarial prompts -- adversarial instructions embedded in submissions to manipulate outcomes -- emerges as a critical threat to scholarly integrity. To counter this, we propose a novel adversarial framework where a Generator model, trained to create sophisticated attack prompts, is jointly optimized with a Defender model tasked with their detection. This system is trained using a loss function inspired by Information Retrieval Generative Adversarial Networks, which fosters a dynamic co-evolution between the two models, forcing the Defender to develop robust capabilities against continuously improving attack strategies. The resulting framework demonstrates significantly enhanced resilience to novel and evolving threats compared to static defenses, thereby establishing a critical foundation for securing the integrity of peer review.

preprint2025arXiv

Jailbreaking Attacks vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race?

As large language models (LLMs) are increasingly deployed, ensuring their safe use is paramount. Jailbreaking, adversarial prompts that bypass model alignment to trigger harmful outputs, present significant risks, with existing studies reporting high success rates in evading common LLMs. However, previous evaluations have focused solely on the models, neglecting the full deployment pipeline, which typically incorporates additional safety mechanisms like content moderation filters. To address this gap, we present the first systematic evaluation of jailbreak attacks targeting LLM safety alignment, assessing their success across the full inference pipeline, including both input and output filtering stages. Our findings yield two key insights: first, nearly all evaluated jailbreak techniques can be detected by at least one safety filter, suggesting that prior assessments may have overestimated the practical success of these attacks; second, while safety filters are effective in detection, there remains room to better balance recall and precision to further optimize protection and user experience. We highlight critical gaps and call for further refinement of detection accuracy and usability in LLM safety systems.

preprint2022arXiv

Bootstrapping $N_f=4$ conformal QED$_3$

We present the results of a conformal bootstrap study of the presumed unitary IR fixed point of quantum electrodynamics in three dimensions (QED$_3$) coupled to $N_f=4$ two-component Dirac fermions. Specifically, we study the four-point correlators of the $SU(4)$ adjoint fermion bilinear $r$ and the monopole of lowest topological charge $\mathcal{M}_{1/2}$. Most notably, the scaling dimensions of the fermion bilinear $r$ and the monopole $\mathcal{M}_{1/2}$ are found to be constrained into a closed island with a combination of spectrum assumptions inspired by the $1/N_f$ perturbative results as well as a novel interval positivity constraint on the next-lowest-charge monopole $\mathcal{M}_1$. Bounds in this island on the $SU(4)$ and topological $U(1)_t$ conserved current central charges $c_J$, $c_J^t$, as well as on the stress tensor central charge $c_T$, are comfortably consistent with the perturbative results. Together with the scaling dimensions, this suggests that a part of estimates from the $1/N_f$ expansion -- even at $N_f=4$ -- provide a self-consistent solution to the bootstrap crossing relations, despite some of our assumptions not being strictly justified.

preprint2020arXiv

Introduction to Lightcone Conformal Truncation: QFT Dynamics from CFT Data

We both review and augment the lightcone conformal truncation (LCT) method. LCT is a Hamiltonian truncation method for calculating dynamical quantities in QFT in infinite volume. This document is a self-contained, pedagogical introduction and "how-to" manual for LCT. We focus on 2D QFTs which have UV descriptions as free CFTs containing scalars, fermions, and gauge fields, providing a rich starting arena for LCT applications. Along our way, we develop several new techniques and innovations that greatly enhance the efficiency and applicability of LCT. These include the development of CFT radial quantization methods for computing Hamiltonian matrix elements and a new SUSY-inspired way of avoiding state-dependent counterterms and maintaining chiral symmetry. We walk readers through the construction of their own basic LCT code, sufficient for small truncation cutoffs. We also provide a more sophisticated and comprehensive set of Mathematica packages and demonstrations that can be used to study a variety of 2D models. We guide the reader through these packages with several examples and illustrate how to obtain QFT observables, such as spectral densities and the Zamolodchikov $C$-function. Specific models considered are finite $N_c$ QCD, scalar $ϕ^4$ theory, and Yukawa theory.

preprint2020arXiv

Supersymmetric SYK model and random matrix theory

In this paper, we investigate the effect of supersymmetry on the symmetry classification of random matrix theory ensembles. We mainly consider the random matrix behaviors in the $\mathcal{N}=1$ supersymmetric generalization of the Sachdev-Ye-Kitaev (SYK) model, a toy model for the two-dimensional quantum black hole with supersymmetric constraint. Some analytical arguments and numerical results are given to show that the statistics of the supersymmetric SYK model could be interpreted as random matrix theory ensembles, with a different eight-fold classification from the original SYK model and some new features. The time-dependent evolution of the spectral form factor is also investigated, where predictions from random matrix theory are governing the late time behavior of the chaotic Hamiltonian with supersymmetry.