Researcher profile

Peter Cholak

Peter Cholak contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

Overcoming a Theoretical Limitation of Self-Attention

Although transformers are remarkably effective for many tasks, there are some surprisingly easy-looking regular languages that they struggle with. Hahn shows that for languages where acceptance depends on a single input symbol, a transformer's classification decisions become less and less confident (that is, with cross-entropy approaching 1 bit per string) as input strings get longer and longer. We examine this limitation using two languages: PARITY, the language of bit strings with an odd number of 1s, and FIRST, the language of bit strings starting with a 1. We demonstrate three ways of overcoming the limitation suggested by Hahn's lemma. First, we settle an open question by constructing a transformer that recognizes PARITY with perfect accuracy, and similarly for FIRST. Second, we use layer normalization to bring the cross-entropy of both models arbitrarily close to zero. Third, when transformers need to focus on a single position, as for FIRST, we find that they can fail to generalize to longer strings; we offer a simple remedy to this problem that also improves length generalization in machine translation.

preprint2020arXiv

Realizing Computably Enumerable Degrees in Separating Classes

We investigate what collections of c.e.\ Turing degrees can be realised as the collection of elements of a separating $Π^0_1$ class of c.e.\ degree. We show that for every c.e.\ degree $\mathbf{c}$, the collection $\{\mathbf{c}, \mathbf{0}'\}$ can be thus realized. We also rule out several attempts at constructing separating classes realizing a unique c.e.\ degree. For example, we show that there is no \emph{super-maximal} pair: disjoint c.e.\ sets $A$ and $B$ whose separating class is infinite, but every separator of c.e.\ degree is a finite variant of either $A$ or $\overline{B}$.