Researcher profile

Leonardo Querzoni

Leonardo Querzoni contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer

A recent trend in binary code analysis promotes the use of neural solutions based on instruction embedding models. An instruction embedding model is a neural network that transforms sequences of assembly instructions into embedding vectors. If the embedding network is trained such that the translation from code to vectors partially preserves the semantic, the network effectively represents an assembly code model. In this paper we present BinBert, a novel assembly code model. BinBert is built on a transformer pre-trained on a huge dataset of both assembly instruction sequences and symbolic execution information. BinBert can be applied to assembly instructions sequences and it is fine-tunable, i.e. it can be re-trained as part of a neural architecture on task-specific data. Through fine-tuning, BinBert learns how to apply the general knowledge acquired with pre-training to the specific task. We evaluated BinBert on a multi-task benchmark that we specifically designed to test the understanding of assembly code. The benchmark is composed of several tasks, some taken from the literature, and a few novel tasks that we designed, with a mix of intrinsic and downstream tasks. Our results show that BinBert outperforms state-of-the-art models for binary instruction embedding, raising the bar for binary code understanding.

preprint2021arXiv

In Nomine Function: Naming Functions in Stripped Binaries with Neural Networks

In this paper we investigate the problem of automatically naming pieces of assembly code. Where by naming we mean assigning to an assembly function a string of words that would likely be assigned by a human reverse engineer. We formally and precisely define the framework in which our investigation takes place. That is we define the problem, we provide reasonable justifications for the choices that we made for the design of training and the tests. We performed an analysis on a large real-world corpora constituted by nearly 9 millions of functions taken from more than 22k softwares. In such framework we test baselines coming from the field of Natural Language Processing (e.g., Seq2Seq networks and Transformer). Interestingly, our evaluation shows promising results beating the state-of-the-art and reaching good performance. We investigate the applicability of tine-tuning (i.e., taking a model already trained on a large generic corpora and retraining it for a specific task). Such technique is popular and well-known in the NLP field. Our results confirm that fine-tuning is effective even when neural networks are applied to binaries. We show that a model, pre-trained on the aforementioned corpora, when fine-tuned has higher performances on specific domains (such as predicting names in system utilites, malware, etc).

preprint2020arXiv

Byzantine Generalized Lattice Agreement

The paper investigates the Lattice Agreement (LA) problem in asynchronous systems. In LA each process proposes an element $e$ from a predetermined lattice, and has to decide on an element $e'$ of the lattice such that $e \leq e'$. Moreover, decisions of different processes have to be comparable (no two processes can decide two elements $e$ and $e'$ such that $(e \not\leq e') \land (e' \not\leq e)$). It has been shown that Generalized LA (i.e., a version of LA proposing and deciding on sequences of values) can be used to build a Replicated State Machine (RSM) with commutative update operations. The key advantage of LA and Generalized LA is that they can be solved in asynchronous systems prone to crash-failures (this is not the case with standard Consensus). In this paper we assume Byzantine failures. We propose the Wait Till Safe (WTS) algorithm for LA, and we show that its resilience to $f \leq (n-1)/3$ Byzantines is optimal. We then generalize WTS obtaining a Generalized LA algorithm, namely GWTS. We use GWTS to build a RSM with commutative updates. Our RSM works in asynchronous systems and tolerates $f \leq (n-1)/3$ malicious entities. All our algorithms use the minimal assumption of authenticated channels. When the more powerful public signatures are available, we discuss how to improve the message complexity of our results (from quadratic to linear, when $f={\cal O}(1)$). At the best of our knowledge this is the first paper proposing a solution for Byzantine LA that works on any possible lattice, and it is the first work proposing a Byzantine tolerant RSM built on it.

preprint2020arXiv

Synchronous Byzantine Lattice Agreement in ${\cal O}(\log (f))$ Rounds

In the Lattice Agreement (LA) problem, originally proposed by Attiya et al. \cite{Attiya:1995}, a set of processes has to decide on a chain of a lattice. More precisely, each correct process proposes an element $e$ of a certain join-semi lattice $L$ and it has to decide on a value that contains $e$. Moreover, any pair $p_i,p_j$ of correct processes has to decide two values $dec_i$ and $dec_j$ that are comparable (e.g., $dec_i \leq dec_j$ or $dec_j < dec_i$). LA has been studied for its practical applications, as example it can be used to implement a snapshot objects \cite{Attiya:1995} or a replicated state machine with commutative operations \cite{Faleiro:2012}. Interestingly, the study of the Byzantine Lattice Agreement started only recently, and it has been mainly devoted to asynchronous systems. The synchronous case has been object of a recent pre-print \cite{Zheng:aa} where Zheng et al. propose an algorithm terminating in ${\cal O}(\sqrt f)$ rounds and tolerating $f < \lceil n/3 \rceil$ Byzantine processes. In this paper we present new contributions for the synchronous case. We investigate the problem in the usual message passing model for a system of $n$ processes with distinct unique IDs. We first prove that, when only authenticated channels are available, the problem cannot be solved if $f=n/3$ or more processes are Byzantine. We then propose a novel algorithm that works in a synchronous system model with signatures (i.e., the {\em authenticated message} model), tolerates up to $f$ byzantine failures (where $f<n/3$) and that terminates in ${\cal O}(\log f)$ rounds. We discuss how to remove authenticated messages at the price of algorithm resiliency ($f < n/4$). Finally, we present a transformer that converts any synchronous LA algorithm to an algorithm for synchronous Generalised Lattice Agreement.