Source author record

Henryk Michalewski

Henryk Michalewski appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Formal Languages and Automata Theory Computation and Language Logic in Computer Science math.LO Computer Vision math.FA math.GN Neural and Evolutionary Computing

Catalog footprint

What is connected

14works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Hierarchical Transformers Are More Efficient Language Models

Transformer models yield impressive results on many NLP and sequence modeling tasks. Remarkably, Transformers can handle long sequences which allows them to produce long coherent outputs: full paragraphs produced by GPT-3 or well-structured images produced by DALL-E. These large language models are impressive but also very inefficient and costly, which limits their applications and accessibility. We postulate that having an explicit hierarchical architecture is the key to Transformers that efficiently handle long sequences. To verify this claim, we first study different ways to downsample and upsample activations in Transformers so as to make them hierarchical. We use the best performing upsampling and downsampling layers to create Hourglass - a hierarchical Transformer language model. Hourglass improves upon the Transformer baseline given the same amount of computation and can yield the same results as Transformers more efficiently. In particular, Hourglass sets new state-of-the-art for Transformer models on the ImageNet32 generation task and improves language modeling efficiency on the widely studied enwik8 benchmark.

preprint2022arXiv

Language Model Cascades

Prompted models have demonstrated impressive few-shot learning abilities. Repeated interactions at test-time with a single model, or the composition of multiple models together, further expands capabilities. These compositions are probabilistic models, and may be expressed in the language of graphical models with random variables whose values are complex data types such as strings. Cases with control flow and dynamic structure require techniques from probabilistic programming, which allow implementing disparate model structures and inference strategies in a unified language. We formalize several existing techniques from this perspective, including scratchpads / chain of thought, verifiers, STaR, selection-inference, and tool use. We refer to the resulting programs as language model cascades.

preprint2022arXiv

Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

How can we measure the reasoning capabilities of intelligence systems? Visual question answering provides a convenient framework for testing the model's abilities by interrogating the model through questions about the scene. However, despite scores of various visual QA datasets and architectures, which sometimes yield even a super-human performance, the question of whether those architectures can actually reason remains open to debate. To answer this, we extend the visual question answering framework and propose the following behavioral test in the form of a two-player game. We consider black-box neural models of CLEVR. These models are trained on a diagnostic dataset benchmarking reasoning. Next, we train an adversarial player that re-configures the scene to fool the CLEVR model. We show that CLEVR models, which otherwise could perform at a human level, can easily be fooled by our agent. Our results put in doubt whether data-driven approaches can do reasoning without exploiting the numerous biases that are often present in those datasets. Finally, we also propose a controlled experiment measuring the efficiency of such models to learn and perform reasoning.

preprint2022arXiv

Solving Quantitative Reasoning Problems with Language Models

Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and find that the model can correctly answer nearly a third of them.

preprint2021arXiv

Q-Value Weighted Regression: Reinforcement Learning with Limited Data

Sample efficiency and performance in the offline setting have emerged as significant challenges of deep reinforcement learning. We introduce Q-Value Weighted Regression (QWR), a simple RL algorithm that excels in these aspects. QWR is an extension of Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, also in the offline setting, but has low sample efficiency and struggles with high-dimensional observation spaces. We perform an analysis of AWR that explains its shortcomings and use these insights to motivate QWR. We show experimentally that QWR matches the state-of-the-art algorithms both on tasks with continuous and discrete actions. In particular, QWR yields results on par with SAC on the MuJoCo suite and - with the same set of hyperparameters - yields results on par with a highly tuned Rainbow implementation on a set of Atari games. We also verify that QWR performs well in the offline RL setting.

preprint2021arXiv

Show Your Work: Scratchpads for Intermediate Computation with Language Models

Large pre-trained language models perform remarkably well on tasks that can be done "in one pass", such as generating realistic text or synthesizing computer programs. However, they struggle with tasks that require unbounded multi-step computation, such as adding integers or executing programs. Surprisingly, we find that these same models are able to perform complex multi-step computations -- even in the few-shot regime -- when asked to perform the operation "step by step", showing the results of intermediate computations. In particular, we train transformers to perform multi-step computations by asking them to emit intermediate computation steps into a "scratchpad". On a series of increasingly complex tasks ranging from long addition to the execution of arbitrary programs, we show that scratchpads dramatically improve the ability of language models to perform multi-step computations.

preprint2020arXiv

Neural heuristics for SAT solving

We use neural graph networks with a message-passing architecture and an attention mechanism to enhance the branching heuristic in two SAT-solving algorithms. We report improvements of learned neural heuristics compared with two standard human-designed heuristics.

preprint2016arXiv

Learning from the memory of Atari 2600

We train a number of neural networks to play games Bowling, Breakout and Seaquest using information stored in the memory of a video game console Atari 2600. We consider four models of neural networks which differ in size and architecture: two networks which use only information contained in the RAM and two mixed networks which use both information in the RAM and information from the screen. As the benchmark we used the convolutional model proposed in NIPS and received comparable results in all considered games. Quite surprisingly, in the case of Seaquest we were able to train RAM-only agents which behave better than the benchmark screen-only agent. Mixing screen and RAM did not lead to an improved performance comparing to screen-only and RAM-only agents.

preprint2016arXiv

On the Regular Emptiness Problem of Subzero Automata

Subzero automata is a class of tree automata whose acceptance condition can express probabilistic constraints. Our main result is that the problem of determining if a subzero automaton accepts some regular tree is decidable.

preprint2016arXiv

Unambiguous Buchi is weak

A non-deterministic automaton running on infinite trees is unambiguous if it has at most one accepting run on every tree. The class of languages recognisable by unambiguous tree automata is still not well-understood. In particular, decidability of the problem whether a given language is recognisable by some unambiguous automaton is open. Moreover, there are no known upper bounds on the descriptive complexity of unambiguous languages among all regular tree languages. In this paper we show the following complexity collapse: if a non-deterministic parity tree automaton $A$ is unambiguous and its priorities are between $i$ and $2n$ then the language recognised by $A$ is in the class $Comp(i+1,2n)$. A particular case of this theorem is for $i=n=1$: if $A$ is an unambiguous Buchi tree automaton then $L(A)$ is recognisable by a weak alternating automaton (or equivalently definable in weak MSO). The main motivation for this result is a theorem by Finkel and Simonnet stating that every unambiguous Buchi automaton recognises a Borel language. The assumptions of the presented theorem are syntactic (we require one automaton to be both unambiguous and of particular parity index). However, to the authors' best knowledge this is the first theorem showing a collapse of the parity index that exploits the fact that a given automaton is unambiguous.

preprint2015arXiv

How unprovable is Rabin's decidability theorem?

We study the strength of set-theoretic axioms needed to prove Rabin's theorem on the decidability of the MSO theory of the infinite binary tree. We first show that the complementation theorem for tree automata, which forms the technical core of typical proofs of Rabin's theorem, is equivalent over the moderately strong second-order arithmetic theory $\mathsf{ACA}_0$ to a determinacy principle implied by the positional determinacy of all parity games and implying the determinacy of all Gale-Stewart games given by boolean combinations of ${\bf Σ^0_2}$ sets. It follows that complementation for tree automata is provable from $Π^1_3$- but not $Δ^1_3$-comprehension. We then use results due to MedSalem-Tanaka, Möllerfeld and Heinatsch-Möllerfeld to prove that over $Π^1_2$-comprehension, the complementation theorem for tree automata, decidability of the MSO theory of the infinite binary tree, positional determinacy of parity games and determinacy of $\mathrm{Bool}({\bf Σ^0_2})$ Gale-Stewart games are all equivalent. Moreover, these statements are equivalent to the $Π^1_3$-reflection principle for $Π^1_2$-comprehension. It follows in particular that Rabin's decidability theorem is not provable in $Δ^1_3$-comprehension.

preprint2015arXiv

On the Problem of Computing the Probability of Regular Sets of Trees

We consider the problem of computing the probability of regular languages of infinite trees with respect to the natural coin-flipping measure. We propose an algorithm which computes the probability of languages recognizable by \emph{game automata}. In particular this algorithm is applicable to all deterministic automata. We then use the algorithm to prove through examples three properties of measure: (1) there exist regular sets having irrational probability, (2) there exist comeager regular sets having probability $0$ and (3) the probability of \emph{game languages} $W_{i,k}$, from automata theory, is $0$ if $k$ is odd and is $1$ otherwise.

preprint2014arXiv

Deciding the Borel complexity of regular tree languages

We show that it is decidable whether a given a regular tree language belongs to the class ${\bf Δ^0_2}$ of the Borel hierarchy, or equivalently whether the Wadge degree of a regular tree language is countable.

preprint2005arXiv

Small Valdivia compact spaces

We prove a preservation theorem for the class of Valdivia compact spaces, which involves inverse sequences of ``simple'' retractions. Consequently, a compact space of weight $\loe\aleph_1$ is Valdivia compact iff it is the limit of an inverse sequence of metric compacta whose bonding maps are retractions. As a corollary, we show that the class of Valdivia compacta of weight at most $\aleph_1$ is preserved both under retractions and under open 0-dimensional images. Finally, we characterize the class of all Valdivia compacta in the language of category theory, which implies that this class is preserved under all continuous weight preserving functors.

Henryk Michalewski

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Hierarchical Transformers Are More Efficient Language Models

Language Model Cascades

Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

Solving Quantitative Reasoning Problems with Language Models

Q-Value Weighted Regression: Reinforcement Learning with Limited Data

Show Your Work: Scratchpads for Intermediate Computation with Language Models

Neural heuristics for SAT solving

Learning from the memory of Atari 2600

On the Regular Emptiness Problem of Subzero Automata

Unambiguous Buchi is weak

How unprovable is Rabin's decidability theorem?

On the Problem of Computing the Probability of Regular Sets of Trees

Deciding the Borel complexity of regular tree languages

Small Valdivia compact spaces