Researcher profile

Giuseppe Ateniese

Giuseppe Ateniese contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

Safe Language Generation in the Limit

Recent results in learning a language in the limit have shown that, although language identification is impossible, language generation is tractable. As this foundational area expands, we need to consider the implications of language generation in real-world settings. This work offers the first theoretical treatment of safe language generation. Building on the computational paradigm of learning in the limit, we formalize the tasks of safe language identification and generation. We prove that under this model, safe language identification is impossible, and that safe language generation is at least as hard as (vanilla) language identification, which is also impossible. Last, we discuss several intractable and tractable cases.

preprint2022arXiv

Eluding Secure Aggregation in Federated Learning via Model Inconsistency

Secure aggregation is a cryptographic protocol that securely computes the aggregation of its inputs. It is pivotal in keeping model updates private in federated learning. Indeed, the use of secure aggregation prevents the server from learning the value and the source of the individual model updates provided by the users, hampering inference and data attribution attacks. In this work, we show that a malicious server can easily elude secure aggregation as if the latter were not in place. We devise two different attacks capable of inferring information on individual private training datasets, independently of the number of users participating in the secure aggregation. This makes them concrete threats in large-scale, real-world federated learning applications. The attacks are generic and equally effective regardless of the secure aggregation protocol used. They exploit a vulnerability of the federated learning protocol caused by incorrect usage of secure aggregation and lack of parameter validation. Our work demonstrates that current implementations of federated learning with secure aggregation offer only a "false sense of security".

preprint2021arXiv

Reducing Bias in Modeling Real-world Password Strength via Deep Learning and Dynamic Dictionaries

Password security hinges on an in-depth understanding of the techniques adopted by attackers. Unfortunately, real-world adversaries resort to pragmatic guessing strategies such as dictionary attacks that are inherently difficult to model in password security studies. In order to be representative of the actual threat, dictionary attacks must be thoughtfully configured and tuned. However, this process requires a domain-knowledge and expertise that cannot be easily replicated. The consequence of inaccurately calibrating dictionary attacks is the unreliability of password security analyses, impaired by a severe measurement bias. In the present work, we introduce a new generation of dictionary attacks that is consistently more resilient to inadequate configurations. Requiring no supervision or domain-knowledge, this technique automatically approximates the advanced guessing strategies adopted by real-world attackers. To achieve this: (1) We use deep neural networks to model the proficiency of adversaries in building attack configurations. (2) Then, we introduce dynamic guessing strategies within dictionary attacks. These mimic experts' ability to adapt their guessing strategies on the fly by incorporating knowledge on their targets. Our techniques enable more robust and sound password strength estimates within dictionary attacks, eventually reducing overestimation in modeling real-world threats in password security. Code available: https://github.com/TheAdamProject/adams

preprint2020arXiv

Audita: A Blockchain-based Auditing Framework for Off-chain Storage

The cloud changed the way we manage and store data. Today, cloud storage services offer clients an infrastructure that allows them a convenient source to store, replicate, and secure data online. However, with these new capabilities also come limitations, such as lack of transparency, limited decentralization, and challenges with privacy and security. And, as the need for more agile, private and secure data solutions continues to grow exponentially, rethinking the current structure of cloud storage is mission-critical for enterprises. By leveraging and building upon blockchain's unique attributes, including immutability, security to the data element level, distributed (no single point of failure), we have developed a solution prototype that allows data to be reliably stored while simultaneously being secured, with tamper-evident auditability, via blockchain. The result, Audita, is a flexible solution that assures data protection and solves challenges such as scalability and privacy. Audita works via an augmented blockchain network of participants that include storage-nodes and block-creators. In addition, it provides an automatic and fair challenge system to assure that data is distributed and reliably and provably stored. While the prototype is built on Quorum, the solution framework can be used with any blockchain platform. The benefit is a system that is built to grow along with the data needs of enterprises, while continuing to build the network via incentives and solving for issues such as auditing and outsourcing.

preprint2020arXiv

Improving Password Guessing via Representation Learning

Learning useful representations from unstructured data is one of the core challenges, as well as a driving force, of modern data-driven approaches. Deep learning has demonstrated the broad advantages of learning and harnessing such representations. In this paper, we introduce a deep generative model representation learning approach for password guessing. We show that an abstract password representation naturally offers compelling and versatile properties that can be used to open new directions in the extensively studied, and yet presently active, password guessing field. These properties can establish novel password generation techniques that are neither feasible nor practical with the existing probabilistic and non-probabilistic approaches. Based on these properties, we introduce:(1) A general framework for conditional password guessing that can generate passwords with arbitrary biases; and (2) an Expectation Maximization-inspired framework that can dynamically adapt the estimated password distribution to match the distribution of the attacked password set.

preprint2015arXiv

From Pretty Good To Great: Enhancing PGP using Bitcoin and the Blockchain

PGP is built upon a Distributed Web of Trust in which the trustworthiness of a user is established by others who can vouch through a digital signature for that particular identity. Preventing its wholesale adoption are a number of inherent weaknesses to include (but not limited to) the following: 1) Trust Relationships are built on a subjective honor system, 2) Only first degree relationships can be fully trusted, 3) Levels of trust are difficult to quantify with actual values, and 4) Issues with the Web of Trust itself (Certification and Endorsement). Although the security that PGP provides is proven to be reliable, it has largely failed to garner large scale adoption. In this paper, we propose several novel contributions to address the aforementioned issues with PGP and associated Web of Trust. To address the subjectivity of the Web of Trust, we provide a new certificate format based on Bitcoin which allows a user to verify a PGP certificate using Bitcoin identity-verification transactions - forming first degree trust relationships that are tied to actual values (i.e., number of Bitcoins transferred during transaction). Secondly, we present the design of a novel Distributed PGP key server that leverages the Bitcoin transaction blockchain to store and retrieve Bitcoin-Based PGP certificates. Lastly, we provide a web prototype application that demonstrates several of these capabilities in an actual environment.

preprint2015arXiv

No Place to Hide that Bytes won't Reveal: Sniffing Location-Based Encrypted Traffic to Track a User's Position

News reports of the last few years indicated that several intelligence agencies are able to monitor large networks or entire portions of the Internet backbone. Such a powerful adversary has only recently been considered by the academic literature. In this paper, we propose a new adversary model for Location Based Services (LBSs). The model takes into account an unauthorized third party, different from the LBS provider itself, that wants to infer the location and monitor the movements of a LBS user. We show that such an adversary can extrapolate the position of a target user by just analyzing the size and the timing of the encrypted traffic exchanged between that user and the LBS provider. We performed a thorough analysis of a widely deployed location based app that comes pre-installed with many Android devices: GoogleNow. The results are encouraging and highlight the importance of devising more effective countermeasures against powerful adversaries to preserve the privacy of LBS users.

preprint2014arXiv

No NAT'd User left Behind: Fingerprinting Users behind NAT from NetFlow Records alone

It is generally recognized that the traffic generated by an individual connected to a network acts as his biometric signature. Several tools exploit this fact to fingerprint and monitor users. Often, though, these tools assume to access the entire traffic, including IP addresses and payloads. This is not feasible on the grounds that both performance and privacy would be negatively affected. In reality, most ISPs convert user traffic into NetFlow records for a concise representation that does not include, for instance, any payloads. More importantly, large and distributed networks are usually NAT'd, thus a few IP addresses may be associated to thousands of users. We devised a new fingerprinting framework that overcomes these hurdles. Our system is able to analyze a huge amount of network traffic represented as NetFlows, with the intent to track people. It does so by accurately inferring when users are connected to the network and which IP addresses they are using, even though thousands of users are hidden behind NAT. Our prototype implementation was deployed and tested within an existing large metropolitan WiFi network serving about 200,000 users, with an average load of more than 1,000 users simultaneously connected behind 2 NAT'd IP addresses only. Our solution turned out to be very effective, with an accuracy greater than 90%. We also devised new tools and refined existing ones that may be applied to other contexts related to NetFlow analysis.

preprint2014arXiv

To Share or Not to Share in Client-Side Encrypted Clouds

With the advent of cloud computing, a number of cloud providers have arisen to provide Storage-as-a-Service (SaaS) offerings to both regular consumers and business organizations. SaaS (different than Software-as-a-Service in this context) refers to an architectural model in which a cloud provider provides digital storage on their own infrastructure. Three models exist amongst SaaS providers for protecting the confidentiality data stored in the cloud: 1) no encryption (data is stored in plain text), 2) server-side encryption (data is encrypted once uploaded), and 3) client-side encryption (data is encrypted prior to upload). This paper seeks to identify weaknesses in the third model, as it claims to offer 100% user data confidentiality throughout all data transactions (e.g., upload, download, sharing) through a combination of Network Traffic Analysis, Source Code Decompilation, and Source Code Disassembly. The weaknesses we uncovered primarily center around the fact that the cloud providers we evaluated were each operating in a Certificate Authority capacity to facilitate data sharing. In this capacity, they assume the role of both certificate issuer and certificate authorizer as denoted in a Public-Key Infrastructure (PKI) scheme - which gives them the ability to view user data contradicting their claims of 100% data confidentiality. We have collated our analysis and findings in this paper and explore some potential solutions to address these weaknesses in these sharing methods. The solutions proposed are a combination of best practices associated with the use of PKI and other cryptographic primitives generally accepted for protecting the confidentiality of shared information.

preprint2013arXiv

Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers

Machine Learning (ML) algorithms are used to train computers to perform a variety of complex tasks and improve with experience. Computers learn how to recognize patterns, make unintended decisions, or react to a dynamic environment. Certain trained machines may be more effective than others because they are based on more suitable ML algorithms or because they were trained through superior training sets. Although ML algorithms are known and publicly released, training sets may not be reasonably ascertainable and, indeed, may be guarded as trade secrets. While much research has been performed about the privacy of the elements of training sets, in this paper we focus our attention on ML classifiers and on the statistical information that can be unconsciously or maliciously revealed from them. We show that it is possible to infer unexpected but useful information from ML classifiers. In particular, we build a novel meta-classifier and train it to hack other classifiers, obtaining meaningful information about their training sets. This kind of information leakage can be exploited, for example, by a vendor to build more effective classifiers or to simply acquire trade secrets from a competitor's apparatus, potentially violating its intellectual property rights.