Graph explorer

Attention-based PCA

We study attention mechanisms through the lens of a canonical unsupervised problem: principal component analysis (PCA). We show that, when trained on Gaussian data, both softmax and linear attention layers learn parameters that align with the principal eigenvectors of the covariance matrix, thereby establishing a direct and explicit connection with PCA. Our analysis covers both finite and infinite prompt regimes. In the infinite-prompt limit, we prove convergence to globally optimal solutions aligned with the leading spectral direction, while in the finiteprompt setting we show that the same behavior emerges up to sampling effects. We further extend the analysis to an in-context setting with spiked Wishart covariances, where attention successfully recovers the underlying signal direction. These results demonstrate that attention inherently performs PCA-like computations under unsupervised objectives, providing a theoretical foundation for its representation-learning capabilities.

5 nodes10 linksoverview previewAttention-based PCA
5 nodes10 links
Attention-based PCA5 visible / 5 total nodes / 10 links
Related contextCo-authorshipAuthorshipWorks onWorks onWorks onWorks onAuthorshipTopic signalTopic signalWAttention-based PCApreprint / 2026ARodrigo Maulen-SotoResearcherAClaire BoyerResearcherTMachine Learning49008 worksTmath.OC9232 works
PaperSignal 104 links

Attention-based PCA

preprint / 2026

Open