Researcher profile

Xiaoyu Wu

Xiaoyu Wu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Taming Outlier Tokens in Diffusion Transformers

We study outlier tokens in Diffusion Transformers (DiTs) for image generation. Prior work has shown that Vision Transformers (ViTs) can produce a small number of high-norm tokens that attract disproportionate attention while carrying limited local information, but their role in generative models remains underexplored. We show that this phenomenon appears in both the encoder and denoiser of modern Representation Autoencoder (RAE)-DiT pipelines: pretrained ViT encoders can produce outlier representations, and DiTs themselves can develop internal outlier tokens, especially in intermediate layers. Moreover, simply masking high-norm tokens does not improve performance, indicating that the problem is not only caused by a few extreme values, but is more closely related to corrupted local patch semantics. To address this issue, we introduce Dual-Stage Registers (DSR), a register-based intervention for both components: trained registers when available, recursive test-time registers otherwise, and diffusion registers for the denoiser. Across ImageNet and large-scale text-to-image generation, these interventions consistently reduce outlier artifacts and improve generation quality. Our results highlight outlier-token control as an important ingredient in building stronger DiTs.

preprint2022arXiv

Approximation Algorithms for Interdiction Problem with Packing Constraints

We study a bilevel optimization problem which is a zero-sum Stackelberg game. In this problem, there are two players, a leader and a follower, who pick items from a common set. Both the leader and the follower have their own (multi-dimensional) budgets, respectively. Each item is associated with a profit, which is the same to the leader and the follower, and will consume the leader's (follower's) budget if it is selected by the leader (follower). The leader and the follower will select items in a sequential way: First, the leader selects items within the leader's budget. Then the follower selects items from the remaining items within the follower's budget. The goal of the leader is to minimize the maximum profit that the follower can obtain. Let $s_A$ and $s_B$ be the dimension of the leader's and follower's budget, respectively. A special case of our problem is the bilevel knapsack problem studied by Caprara et al. [SIAM Journal on Optimization, 2014], where $s_A=s_B=1$. We consider the general problem and obtain an $(s_B+ε)$-approximation algorithm when $s_A$ and $s_B$ are both constant. In particular, if $s_B=1$, our algorithm implies a PTAS for the bilevel knapsack problem, which is the first O(1)-approximation algorithm. We also complement our result by showing that there does not exist any $(4/3-ε)$-approximation algorithm even if $s_A=1$ and $s_B=2$. We also consider a variant of our problem with resource augmentation when $s_A$ and $s_B$ are both part of the input. We obtain an O(1)-approximation algorithm with O(1)-resource augmentation, that is, we give an algorithm that returns a solution which exceeds the given leader's budget by O(1) times, and the objective value achieved by the solution is O(1) times the optimal objective value that respects the leader's budget.

preprint2022arXiv

Locality-aware Attention Network with Discriminative Dynamics Learning for Weakly Supervised Anomaly Detection

Video anomaly detection is recently formulated as a multiple instance learning task under weak supervision, in which each video is treated as a bag of snippets to be determined whether contains anomalies. Previous efforts mainly focus on the discrimination of the snippet itself without modeling the temporal dynamics, which refers to the variation of adjacent snippets. Therefore, we propose a Discriminative Dynamics Learning (DDL) method with two objective functions, i.e., dynamics ranking loss and dynamics alignment loss. The former aims to enlarge the score dynamics gap between positive and negative bags while the latter performs temporal alignment of the feature dynamics and score dynamics within the bag. Moreover, a Locality-aware Attention Network (LA-Net) is constructed to capture global correlations and re-calibrate the location preference across snippets, followed by a multilayer perceptron with causal convolution to obtain anomaly scores. Experimental results show that our method achieves significant improvements on two challenging benchmarks, i.e., UCF-Crime and XD-Violence.

preprint2020arXiv

Unexpected Giant Microwave Conductivity in a Nominally Silent BiFeO3 Domain Wall

Nanoelectronic devices based on ferroelectric domain walls (DWs), such as memories, transistors, and rectifiers, have been demonstrated in recent years. Practical high-speed electronics, on the other hand, usually demand operation frequencies in the giga-Hertz (GHz) regime, where the effect of dipolar oscillation is important. In this work, an unexpected giant GHz conductivity on the order of 103 S/m is observed in certain BiFeO3 DWs, which is about 100,000 times greater than the carrier-induced dc conductivity of the same walls. Surprisingly, the nominal configuration of the DWs precludes the ac conduction under an excitation electric field perpendicular to the surface. Theoretical analysis shows that the inclined DWs are stressed asymmetrically near the film surface, whereas the vertical walls in a control sample are not. The resultant imbalanced polarization profile can then couple to the out-of-plane microwave fields and induce power dissipation, which is confirmed by the phase-field modeling. Since the contributions from mobile-carrier conduction and bound-charge oscillation to the ac conductivity are equivalent in a microwave circuit, the research on local structural dynamics may open a new avenue to implement DW nano-devices for RF applications.