Paper detail

Bi-Orthogonal Factor Decomposition for Vision Transformers

Self-attention is the central computational primitive of Vision Transformers, yet we lack a principled understanding of what information attention mechanisms exchange between tokens. Attention maps describe where weight mass concentrates; they do not reveal whether queries and keys trade position, content, or both. We introduce Bi-orthogonal Factor Decomposition (BFD), a two-stage analytical framework: first, an ANOVA-based decomposition statistically disentangles token activations into orthogonal positional and content factors; second, SVD of the query-key interaction matrix QK^T exposes bi-orthogonal modes that reveal how these factors mediate communication. After validating proper isolation of position and content, we apply BFD to state-of-the-art vision models and uncover three phenomena.(i) Attention operates primarily through content. Content-content interactions dominate attention energy, followed by content-position coupling. DINOv2 allocates more energy to content-position than supervised models and distributes computation across a richer mode spectrum. (ii) Attention mechanisms exhibit specialization: heads differentiate into content-content, content-position, and position-position operators, while singular modes within heads show analogous specialization. (iii) DINOv2's superior holistic shape processing emerges from intermediate layers that simultaneously preserve positional structure while contextually enriching semantic content. Overall, BFD exposes how tokens interact through attention and which informational factors - positional or semantic - mediate their communication, yielding practical insights into vision transformer mechanisms.

preprint2026arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.