Source author record

Marcin Andrychowicz

Marcin Andrychowicz appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Cryptography and Security Neural and Evolutionary Computing

Catalog footprint

What is connected

8works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Implicitly Regularized RL with Implicit Q-Values

The $Q$-function is a central quantity in many Reinforcement Learning (RL) algorithms for which RL agents behave following a (soft)-greedy policy w.r.t. to $Q$. It is a powerful tool that allows action selection without a model of the environment and even without explicitly modeling the policy. Yet, this scheme can only be used in discrete action tasks, with small numbers of actions, as the softmax cannot be computed exactly otherwise. Especially the usage of function approximation, to deal with continuous action spaces in modern actor-critic architectures, intrinsically prevents the exact computation of a softmax. We propose to alleviate this issue by parametrizing the $Q$-function implicitly, as the sum of a log-policy and of a value function. We use the resulting parametrization to derive a practical off-policy deep RL algorithm, suitable for large action spaces, and that enforces the softmax relation between the policy and the $Q$-value. We provide a theoretical analysis of our algorithm: from an Approximate Dynamic Programming perspective, we show its equivalence to a regularized version of value iteration, accounting for both entropy and Kullback-Leibler regularization, and that enjoys beneficial error propagation results. We then evaluate our algorithm on classic control tasks, where its results compete with state-of-the-art methods.

preprint2020arXiv

What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

In recent years, on-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations. This makes it hard to attribute progress in RL and slows down overall progress [Engstrom'20]. As a step towards filling that gap, we implement >50 such ``choices'' in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study. We train over 250'000 agents in five continuous control environments of different complexity and provide insights and practical recommendations for on-policy training of RL agents.

preprint2016arXiv

Learning Efficient Algorithms with Hierarchical Attentive Memory

In this paper, we propose and investigate a novel memory architecture for neural networks called Hierarchical Attentive Memory (HAM). It is based on a binary tree with leaves corresponding to memory cells. This allows HAM to perform memory access in O(log n) complexity, which is a significant improvement over the standard attention mechanism that requires O(n) operations, where n is the size of the memory. We show that an LSTM network augmented with HAM can learn algorithms for problems like merging, sorting or binary searching from pure input-output examples. In particular, it learns to sort n numbers in time O(n log n) and generalizes well to input sequences much longer than the ones seen during the training. We also show that HAM can be trained to act like classic data structures: a stack, a FIFO queue and a priority queue.

preprint2016arXiv

Learning to learn by gradient descent by gradient descent

The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure. We demonstrate this on a number of tasks, including simple convex problems, training neural networks, and styling images with neural art.

preprint2016arXiv

Neural Random-Access Machines

In this paper, we propose and investigate a new neural network architecture called Neural Random Access Machine. It can manipulate and dereference pointers to an external variable-size random-access memory. The model is trained from pure input-output examples using backpropagation. We evaluate the new model on a number of simple algorithmic tasks whose solutions require pointer manipulation and dereferencing. Our results show that the proposed model can learn to solve algorithmic tasks of such type and is capable of operating on simple data structures like linked-lists and binary trees. For easier tasks, the learned solutions generalize to sequences of arbitrary length. Moreover, memory access during inference can be done in a constant time under some assumptions.

preprint2014arXiv

Modeling Bitcoin Contracts by Timed Automata

Bitcoin is a peer-to-peer cryptographic currency system. Since its introduction in 2008, Bitcoin has gained noticeable popularity, mostly due to its following properties: (1) the transaction fees are very low, and (2) it is not controlled by any central authority, which in particular means that nobody can "print" the money to generate inflation. Moreover, the transaction syntax allows to create the so-called contracts, where a number of mutually-distrusting parties engage in a protocol to jointly perform some financial task, and the fairness of this process is guaranteed by the properties of Bitcoin. Although the Bitcoin contracts have several potential applications in the digital economy, so far they have not been widely used in real life. This is partly due to the fact that they are cumbersome to create and analyze, and hence risky to use. In this paper we propose to remedy this problem by using the methods originally developed for the computer-aided analysis for hardware and software systems, in particular those based on the timed automata. More concretely, we propose a framework for modeling the Bitcoin contracts using the timed automata in the UPPAAL model checker. Our method is general and can be used to model several contracts. As a proof-of-concept we use this framework to model some of the Bitcoin contracts from our recent previous work. We then automatically verify their security in UPPAAL, finding (and correcting) some subtle errors that were difficult to spot by the manual analysis. We hope that our work can draw the attention of the researchers working on formal modeling to the problem of the Bitcoin contract verification, and spark off more research on this topic.

preprint2013arXiv

How to deal with malleability of BitCoin transactions

BitCoin transactions are malleable in a sense that given a transaction an adversary can easily construct an equivalent transaction which has a different hash. This can pose a serious problem in some BitCoin distributed contracts in which changing a transaction's hash may result in the protocol disruption and a financial loss. The problem mostly concerns protocols, which use a "refund" transaction to withdraw a deposit in a case of the protocol interruption. In this short note, we show a general technique for creating malleability-resilient "refund" transactions, which does not require any modification of the BitCoin protocol. Applying our technique to our previous paper "Fair Two-Party Computations via the BitCoin Deposits" (Cryptology ePrint Archive, 2013) allows to achieve fairness in any Two-Party Computation using the BitCoin protocol in its current version.

preprint2012arXiv

Efficient Refreshing Protocol for Leakage-Resilient Storage Based on the Inner-Product Extractor

A recent trend in cryptography is to protect data and computation against various side-channel attacks. Dziembowski and Faust (TCC 2012) have proposed a general way to protect arbitrary circuits against any continual leakage assuming that: (i) the memory is divided into the parts, which leaks independently (ii) the leakage in each observation is bounded (iii) the circuit has an access to a leak-free component, which samples random orthogonal vectors. The pivotal element of their construction is a protocol for refreshing the so-called Leakage-Resilient Storage (LRS). In this note, we present a more efficient and simpler protocol for refreshing LRS under the same assumptions. Our solution needs O(n) operations to fully refresh the secret (in comparison to Ω(n^2) for a protocol of Dziembowski and Faust), where n is a security parameter that describes the maximal amount of leakage in each invocation of the refreshing procedure

Marcin Andrychowicz

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Implicitly Regularized RL with Implicit Q-Values

What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

Learning Efficient Algorithms with Hierarchical Attentive Memory

Learning to learn by gradient descent by gradient descent

Neural Random-Access Machines

Modeling Bitcoin Contracts by Timed Automata

How to deal with malleability of BitCoin transactions

Efficient Refreshing Protocol for Leakage-Resilient Storage Based on the Inner-Product Extractor