Source author record

An-an Liu

An-an Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Cryptography and Security physics.chem-ph physics.optics

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

What Concepts Lie Within? Detecting and Suppressing Risky Content in Diffusion Transformers

The rise of text-to-image (T2I) models has increasingly raised concerns regarding the generation of risky content, such as sexual, violent, and copyright-protected images, highlighting the need for effective safeguards within the models themselves. Although existing methods have been proposed to eliminate risky concepts from T2I models, they are primarily developed for earlier U-Net architectures, leaving the state-of-the-art Diffusion-Transformer-based T2I models inadequately protected. This gap stems from a fundamental architectural shift: Diffusion Transformers (DiTs) entangle semantic injection and visual synthesis via joint attention, which makes it difficult to isolate and erase risky content within the generation. To bridge this gap, we investigate how semantic concepts are represented in DiTs and discover that attention heads exhibit concept-specific sensitivity. This property enables both the detection and suppression of risky content. Building on this discovery, we propose AHV-D\&S, a training-free inference-time safeguard for image generation in DiTs. Specifically, AHV-D\&S quantifies each textual token's sensitivity across all attention heads as an Attention Head Vector (AHV), which serves as a discriminative signature for detecting risky generation tendencies. In the inference stage, we propose a momentum-based strategy to dynamically track token-wise AHVs across denoising steps, and a sensitivity-guided adaptive suppression strategy that suppresses the attention weights of identified risky tokens based on head-specific risk scores. Extensive experiments demonstrate that AHV-D\&S effectively suppresses sexual, copyrighted-style, and various harmful content while preserving visual quality, and further exhibits strong robustness against adversarial prompts and transferability across different DiT-based T2I models.

preprint2022arXiv

Intrinsic Bias Identification on Medical Image Datasets

Machine learning based medical image analysis highly depends on datasets. Biases in the dataset can be learned by the model and degrade the generalizability of the applications. There are studies on debiased models. However, scientists and practitioners are difficult to identify implicit biases in the datasets, which causes lack of reliable unbias test datasets to valid models. To tackle this issue, we first define the data intrinsic bias attribute, and then propose a novel bias identification framework for medical image datasets. The framework contains two major components, KlotskiNet and Bias Discriminant Direction Analysis(bdda), where KlostkiNet is to build the mapping which makes backgrounds to distinguish positive and negative samples and bdda provides a theoretical solution on determining bias attributes. Experimental results on three datasets show the effectiveness of the bias attributes discovered by the framework.

preprint2022arXiv

Temporal Action Localization with Multi-temporal Scales

Temporal action localization plays an important role in video analysis, which aims to localize and classify actions in untrimmed videos. The previous methods often predict actions on a feature space of a single-temporal scale. However, the temporal features of a low-level scale lack enough semantics for action classification while a high-level scale cannot provide rich details of the action boundaries. To address this issue, we propose to predict actions on a feature space of multi-temporal scales. Specifically, we use refined feature pyramids of different scales to pass semantics from high-level scales to low-level scales. Besides, to establish the long temporal scale of the entire video, we use a spatial-temporal transformer encoder to capture the long-range dependencies of video frames. Then the refined features with long-range dependencies are fed into a classifier for the coarse action prediction. Finally, to further improve the prediction accuracy, we propose to use a frame-level self attention module to refine the classification and boundaries of each action instance. Extensive experiments show that the proposed method can outperform state-of-the-art approaches on the THUMOS14 dataset and achieves comparable performance on the ActivityNet1.3 dataset. Compared with A2Net (TIP20, Avg\{0.3:0.7\}), Sub-Action (CSVT2022, Avg\{0.1:0.5\}), and AFSD (CVPR21, Avg\{0.3:0.7\}) on the THUMOS14 dataset, the proposed method can achieve improvements of 12.6\%, 17.4\% and 2.2\%, respectively

preprint2007arXiv

Treatment of Linear and Nonlinear Dielectric Property of Molecular Monolayer and Submonolayer with Microscopic Dipole Lattice Model: I. Second Harmonic Generation and Sum-Frequency Generation

In the currently accepted models of the nonlinear optics, the nonlinear radiation was treated as the result of an infinitesimally thin polarization sheet layer, and a three layer model was generally employed. The direct consequence of this approach is that an apriori dielectric constant, which still does not have a clear definition, has to be assigned to this polarization layer. Because the Second Harmonic Generation (SHG) and the Sum-Frequency Generation vibrational Spectroscopy (SFG-VS) have been proven as the sensitive probes for interfaces with the submonolayer coverage, the treatment based on the more realistic discrete induced dipole model needs to be developed. Here we show that following the molecular optics theory approach the SHG, as well as the SFG-VS, radiation from the monolayer or submonolayer at an interface can be rigorously treated as the radiation from an induced dipole lattice at the interface. In this approach, the introduction of the polarization sheet is no longer necessary. Therefore, the ambiguity of the unaccounted dielectric constant of the polarization layer is no longer an issue. Moreover, the anisotropic two dimensional microscopic local field factors can be explicitly expressed with the linear polarizability tensors of the interfacial molecules. Based on the planewise dipole sum rule in the molecular monolayer, crucial experimental tests of this microscopic treatment with SHG and SFG-VS are discussed. Many puzzles in the literature of surface SHG and SFG spectroscopy studies can also be understood or resolved in this framework. This new treatment may provide a solid basis for the quantitative analysis in the surface SHG and SFG studies.