Source author record

Josep Domingo-Ferrer

Josep Domingo-Ferrer appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security cs.CY Machine Learning Artificial Intelligence Computer Science and Game Theory Databases Digital Libraries Social and Information Networks

Catalog footprint

What is connected

21works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Critical Review on the Use (and Misuse) of Differential Privacy in Machine Learning

We review the use of differential privacy (DP) for privacy protection in machine learning (ML). We show that, driven by the aim of preserving the accuracy of the learned models, DP-based ML implementations are so loose that they do not offer the ex ante privacy guarantees of DP. Instead, what they deliver is basically noise addition similar to the traditional (and often criticized) statistical disclosure control approach. Due to the lack of formal privacy guarantees, the actual level of privacy offered must be experimentally assessed ex post, which is done very seldom. In this respect, we present empirical results showing that standard anti-overfitting techniques in ML can achieve a better utility/privacy/efficiency trade-off than DP.

preprint2022arXiv

Bistochastic privacy

We introduce a new privacy model relying on bistochastic matrices, that is, matrices whose components are nonnegative and sum to 1 both row-wise and column-wise. This class of matrices is used to both define privacy guarantees and a tool to apply protection on a data set. The bistochasticity assumption happens to connect several fields of the privacy literature, including the two most popular models, k-anonymity and differential privacy. Moreover, it establishes a bridge with information theory, which simplifies the thorny issue of evaluating the utility of a protected data set. Bistochastic privacy also clarifies the trade-off between protection and utility by using bits, which can be viewed as a natural currency to comprehend and operationalize this trade-off, in the same way than bits are used in information theory to capture uncertainty. A discussion on the suitable parameterization of bistochastic matrices to achieve the privacy guarantees of this new model is also provided.

preprint2022arXiv

Defending against the Label-flipping Attack in Federated Learning

Federated learning (FL) provides autonomy and privacy by design to participating peers, who cooperatively build a machine learning (ML) model while keeping their private data in their devices. However, that same autonomy opens the door for malicious peers to poison the model by conducting either untargeted or targeted poisoning attacks. The label-flipping (LF) attack is a targeted poisoning attack where the attackers poison their training data by flipping the labels of some examples from one class (i.e., the source class) to another (i.e., the target class). Unfortunately, this attack is easy to perform and hard to detect and it negatively impacts on the performance of the global model. Existing defenses against LF are limited by assumptions on the distribution of the peers' data and/or do not perform well with high-dimensional models. In this paper, we deeply investigate the LF attack behavior and find that the contradicting objectives of attackers and honest peers on the source class examples are reflected in the parameter gradients corresponding to the neurons of the source and target classes in the output layer, making those gradients good discriminative features for the attack detection. Accordingly, we propose a novel defense that first dynamically extracts those gradients from the peers' local updates, and then clusters the extracted gradients, analyzes the resulting clusters and filters out potential bad updates before model aggregation. Extensive empirical analysis on three data sets shows the proposed defense's effectiveness against the LF attack regardless of the data distribution or model dimensionality. Also, the proposed defense outperforms several state-of-the-art defenses by offering lower test error, higher overall accuracy, higher source class accuracy, lower attack success rate, and higher stability of the source class accuracy.

preprint2021arXiv

Circuit-Free General-Purpose Multi-Party Computation via Co-Utile Unlinkable Outsourcing

Multiparty computation (MPC) consists in several parties engaging in joint computation in such a way that each party's input and output remain private to that party. Whereas MPC protocols for specific computations have existed since the 1980s, only recently general-purpose compilers have been developed to allow MPC on arbitrary functions. Yet, using today's MPC compilers requires substantial programming effort and skill on the user's side, among other things because nearly all compilers translate the code of the computation into a Boolean or arithmetic circuit. In particular, the circuit representation requires unrolling loops and recursive calls, which forces programmers to (often manually) define loop bounds and hardly use recursion. We present an approach allowing MPC on an arbitrary computation expressed as ordinary code with all functionalities that does not need to be translated into a circuit. Our notion of input and output privacy is predicated on unlinkability. Our method leverages co-utile computation outsourcing using anonymous channels via decentralized reputation, makes a minimalistic use of cryptography and does not require participants to be honest-but-curious: it works as long as participants are rational (self-interested), which may include rationally malicious peers (who become attackers if this is advantageous to them). We present example applications, including e-voting. Our empirical work shows that reputation captures well the behavior of peers and ensures that parties with high reputation obtain correct results.

preprint2020arXiv

Give more data, awareness and control to individual citizens, and they will help COVID-19 containment

The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively, voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates - if and when they want, for specific aims - with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society.

preprint2016arXiv

Supplementary Materials for "How to Avoid Reidentification with Proper Anonymization"- Comment on "Unique in the shopping mall: on the reidentifiability of credit card metadata"

The study by De Montjoye et al. ("Science", 30 January 2015, p. 536) claimed that most individuals can be reidentified from a deidentified credit card transaction database and that anonymization mechanisms are not effective against reidentification. Such claims deserve detailed quantitative scrutiny, as they might seriously undermine the willingness of data owners and subjects to share data for research. In a recent Technical Comment published in "Science" (18 March 2016, p. 1274), we demonstrate that the reidentification risk reported by De Montjoye et al. was significantly overestimated (due to a misunderstanding of the reidentification attack) and that the alleged ineffectiveness of anonymization is due to the choice of poor and undocumented methods and to a general disregard of 40 years of anonymization literature. The technical comment also shows how to properly anonymize data, in order to reduce unequivocal reidentifications to zero while retaining even more analytical utility than with the poor anonymization mechanisms employed by De Montjoye et al. In conclusion, data owners, subjects and users can be reassured that sound privacy models and anonymization methods exist to produce safe and useful anonymized data. Supplementary materials detailing the data sets, algorithms and extended results of our study are available here. Moreover, unlike the De Montjoye et al.'s data set, which was never made available, our data, anonymized results, and anonymization algorithms can be freely downloaded from http://crises-deim.urv.cat/opendata/SPD_Science.zip

preprint2015arXiv

Co-Utility: Self-Enforcing Protocols without Coordination Mechanisms

Performing some task among a set of agents requires the use of some protocol that regulates the interactions between them. If those agents are rational, they may try to subvert the protocol for their own benefit, in an attempt to reach an outcome that provides greater utility. We revisit the traditional notion of self-enforcing protocols implemented using existing game-theoretic solution concepts, we describe its shortcomings in real-world applications, and we propose a new notion of self-enforcing protocols, namely co-utile protocols. The latter represent a solution concept that can be implemented without a coordination mechanism in situations when traditional self-enforcing protocols need a coordination mechanism. Co-utile protocols are preferable in decentralized systems of rational agents because of their efficiency and fairness. We illustrate the application of co-utile protocols to information technology, specifically to preserving the privacy of query profiles of database/search engine users.

preprint2015arXiv

Flexible and Robust Privacy-Preserving Implicit Authentication

Implicit authentication consists of a server authenticating a user based on the user's usage profile, instead of/in addition to relying on something the user explicitly knows (passwords, private keys, etc.). While implicit authentication makes identity theft by third parties more difficult, it requires the server to learn and store the user's usage profile. Recently, the first privacy-preserving implicit authentication system was presented, in which the server does not learn the user's profile. It uses an ad hoc two-party computation protocol to compare the user's fresh sampled features against an encrypted stored user's profile. The protocol requires storing the usage profile and comparing against it using two different cryptosystems, one of them order-preserving; furthermore, features must be numerical. We present here a simpler protocol based on set intersection that has the advantages of: i) requiring only one cryptosystem; ii) not leaking the relative order of fresh feature samples; iii) being able to deal with any type of features (numerical or non-numerical). Keywords: Privacy-preserving implicit authentication, privacy-preserving set intersection, implicit authentication, active authentication, transparent authentication, risk mitigation, data brokers.

preprint2015arXiv

Flexible Attribute-Based Encryption Applicable to Secure E-Healthcare Records

In e-healthcare record systems (EHRS), attribute-based encryption (ABE) appears as a natural way to achieve fine-grained access control on health records. Some proposals exploit key-policy ABE (KP-ABE) to protect privacy in such a way that all users are associated with specific access policies and only the ciphertexts matching the users' access policies can be decrypted. An issue with KP-ABE is that it requires an a priori formulation of access policies during key generation, which is not always practicable in EHRS because the policies to access health records are sometimes determined after key generation. In this paper, we revisit KPABE and propose a dynamic ABE paradigm, referred to as access policy redefinable ABE (APR-ABE). To address the above issue, APR-ABE allows users to redefine their access policies and delegate keys for the redefined ones; hence a priori precise policies are no longer mandatory. We construct an APR-ABE scheme with short ciphertexts and prove its full security in the standard model under several static assumptions.

preprint2015arXiv

New Directions in Anonymization: Permutation Paradigm, Verifiability by Subjects and Intruders, Transparency to Users

There are currently two approaches to anonymization: "utility first" (use an anonymization method with suitable utility features, then empirically evaluate the disclosure risk and, if necessary, reduce the risk by possibly sacrificing some utility) or "privacy first" (enforce a target privacy level via a privacy model, e.g., k-anonymity or epsilon-differential privacy, without regard to utility). To get formal privacy guarantees, the second approach must be followed, but then data releases with no utility guarantees are obtained. Also, in general it is unclear how verifiable is anonymization by the data subject (how safely released is the record she has contributed?), what type of intruder is being considered (what does he know and want?) and how transparent is anonymization towards the data user (what is the user told about methods and parameters used?). We show that, using a generally applicable reverse mapping transformation, any anonymization for microdata can be viewed as a permutation plus (perhaps) a small amount of noise; permutation is thus shown to be the essential principle underlying any anonymization of microdata, which allows giving simple utility and privacy metrics. From this permutation paradigm, a new privacy model naturally follows, which we call (d,v)-permuted privacy. The privacy ensured by this method can be verified by each subject contributing an original record (subject-verifiability) and also at the data set level by the data protector. We then proceed to define a maximum-knowledge intruder model, which we argue should be the one considered in anonymization. Finally, we make the case for anonymization transparent to the data user, that is, compliant with Kerckhoff's assumption (only the randomness used, if any, must stay secret).

preprint2015arXiv

On the Security of MTA-OTIBASs (Multiple-TA One-Time Identity-Based Aggregate Signatures)

In [3] the authors proposed a new aggregate signature scheme referred to as multiple-TA (trusted authority) one-time identity-based aggregate signature (MTA-OTIBAS). Further, they gave a concrete MTA-OTIBAS scheme. We recall here the definition of MTA-OTIBAS and the concrete proposed scheme. Then we prove that our MTA-OTIBAS concrete scheme is existentially unforgeable against adaptively chosen-message attacks in the random oracle model under the co-CDH problem assumption.

preprint2015arXiv

On the Security of Privacy-Preserving Vehicular Communication Authentication with Hierarchical Aggregation and Fast Response

In [3], the authors proposed a highly efficient secure and privacy-preserving scheme for secure vehicular communications. The proposed scheme consists of four protocols: system setup, protocol for STP and STK distribution, protocol for common string synchronization, and protocol for vehicular communications. Here we define the security models for the protocol for STP and STK distribution, and the protocol for vehicular communications,respectively. We then prove that these two protocols are secure in our models.

preprint2015arXiv

Privacy and Data Protection by Design - from policy to engineering

Privacy and data protection constitute core values of individuals and of democratic societies. There have been decades of debate on how those values -and legal obligations- can be embedded into systems, preferably from the very beginning of the design process. One important element in this endeavour are technical mechanisms, known as privacy-enhancing technologies (PETs). Their effectiveness has been demonstrated by researchers and in pilot implementations. However, apart from a few exceptions, e.g., encryption became widely used, PETs have not become a standard and widely used component in system design. Furthermore, for unfolding their full benefit for privacy and data protection, PETs need to be rooted in a data governance strategy to be applied in practice. This report contributes to bridging the gap between the legal framework and the available technological implementation measures by providing an inventory of existing approaches, privacy design strategies, and technical building blocks of various degrees of maturity from research and development. Starting from the privacy principles of the legislation, important elements are presented as a first step towards a design process for privacy-friendly systems and services. The report sketches a method to map legal obligations to design strategies, which allow the system designer to select appropriate techniques for implementing the identified privacy requirements. Furthermore, the report reflects limitations of the approach. It concludes with recommendations on how to overcome and mitigate these limits.

preprint2015arXiv

Privacy by design in big data: An overview of privacy enhancing technologies in the era of big data analytics

The extensive collection and processing of personal information in big data analytics has given rise to serious privacy concerns, related to wide scale electronic surveillance, profiling, and disclosure of private data. To reap the benefits of analytics without invading the individuals' private sphere, it is essential to draw the limits of big data processing and integrate data protection safeguards in the analytics value chain. ENISA, with the current report, supports this approach and the position that the challenges of technology (for big data) should be addressed by the opportunities of technology (for privacy). We first explain the need to shift from "big data versus privacy" to "big data with privacy". In this respect, the concept of privacy by design is key to identify the privacy requirements early in the big data analytics value chain and in subsequently implementing the necessary technical and organizational measures. After an analysis of the proposed privacy by design strategies in the different phases of the big data value chain, we review privacy enhancing technologies of special interest for the current and future big data landscape. In particular, we discuss anonymization, the "traditional" analytics technique, the emerging area of encrypted search and privacy preserving computations, granular access control mechanisms, policy enforcement and accountability, as well as data provenance issues. Moreover, new transparency and access tools in big data are explored, together with techniques for user empowerment and control. Achieving "big data with privacy" is no easy task and a lot of research and implementation is still needed. Yet, it remains a possible task, as long as all the involved stakeholders take the necessary steps to integrate privacy and data protection safeguards in the heart of big data, by design and by default.

preprint2015arXiv

t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation

Microaggregation is a technique for disclosure limitation aimed at protecting the privacy of data subjects in microdata releases. It has been used as an alternative to generalization and suppression to generate $k$-anonymous data sets, where the identity of each subject is hidden within a group of $k$ subjects. Unlike generalization, microaggregation perturbs the data and this additional masking freedom allows improving data utility in several ways, such as increasing data granularity, reducing the impact of outliers and avoiding discretization of numerical data. $k$-Anonymity, on the other side, does not protect against attribute disclosure, which occurs if the variability of the confidential values in a group of $k$ subjects is too small. To address this issue, several refinements of $k$-anonymity have been proposed, among which $t$-closeness stands out as providing one of the strictest privacy guarantees. Existing algorithms to generate $t$-close data sets are based on generalization and suppression (they are extensions of $k$-anonymization algorithms based on the same principles). This paper proposes and shows how to use microaggregation to generate $k$-anonymous $t$-close data sets. The advantages of microaggregation are analyzed, and then several microaggregation algorithms for $k$-anonymous $t$-closeness are presented and empirically evaluated.

preprint2015arXiv

Utility-Preserving Differentially Private Data Releases Via Individual Ranking Microaggregation

Being able to release and exploit open data gathered in information systems is crucial for researchers, enterprises and the overall society. Yet, these data must be anonymized before release to protect the privacy of the subjects to whom the records relate. Differential privacy is a privacy model for anonymization that offers more robust privacy guarantees than previous models, such as $k$-anonymity and its extensions. However, it is often disregarded that the utility of differentially private outputs is quite limited, either because of the amount of noise that needs to be added to obtain them or because utility is only preserved for a restricted type and/or a limited number of queries. On the contrary, $k$-anonymity-like data releases make no assumptions on the uses of the protected data and, thus, do not restrict the number and type of doable analyses. Recently, some authors have proposed mechanisms to offer general-purpose differentially private data releases. This paper extends such works with a specific focus on the preservation of the utility of the protected data. Our proposal builds on microaggregation-based anonymization, which is more flexible and utility-preserving than alternative anonymization methods used in the literature, in order to reduce the amount of noise needed to satisfy differential privacy. In this way, we improve the utility of differentially private data releases. Moreover, the noise reduction we achieve does not depend on the size of the data set, but just on the number of attributes to be protected, which is a more desirable behavior for large data sets. The utility benefits brought by our proposal are empirically evaluated and compared with related works for several data sets and metrics.

preprint2014arXiv

Group Discounts Compatible with Buyer Privacy

We show how group discounts can be offered without forcing buyers to surrender their anonymity, as long as buyers can use their own computing devices (e.g. smartphone, tablet or computer) to perform a purchase. Specifically, we present a protocol for privacy-preserving group discounts. The protocol allows a group of buyers to prove how many they are without disclosing their identities. Coupled with an anonymous payment system, this makes group discounts compatible with buyer privacy (that is, buyer anonymity).

preprint2014arXiv

Privacy-preserving Loyalty Programs

Loyalty programs are promoted by vendors to incentivize loyalty in buyers. Although such programs have become widespread, they have been criticized by business experts and consumer associations: loyalty results in profiling and hence in loss of privacy of consumers. We propose a protocol for privacy-preserving loyalty programs that allows vendors and consumers to enjoy the benefits of loyalty (returning customers and discounts, respectively), while allowing consumers to stay anonymous and empowering them to decide how much of their profile they reveal to the vendor. The vendor must offer additional reward if he wants to learn more details on the consumer's profile. Our protocol is based on partially blind signatures and generalization techniques, and provides anonymity to consumers and their purchases, while still allowing negotiated consumer profiling.

preprint2013arXiv

Privacy-Preserving Trust Management Mechanisms from Private Matching Schemes

Cryptographic primitives are essential for constructing privacy-preserving communication mechanisms. There are situations in which two parties that do not know each other need to exchange sensitive information on the Internet. Trust management mechanisms make use of digital credentials and certificates in order to establish trust among these strangers. We address the problem of choosing which credentials are exchanged. During this process, each party should learn no information about the preferences of the other party other than strictly required for trust establishment. We present a method to reach an agreement on the credentials to be exchanged that preserves the privacy of the parties. Our method is based on secure two-party computation protocols for set intersection. Namely, it is constructed from private matching schemes.

preprint2012arXiv

FuturICT - The Road towards Ethical ICT

The pervasive use of information and communication technology (ICT) in modern societies enables countless opportunities for individuals, institutions, businesses and scientists, but also raises difficult ethical and social problems. In particular, ICT helped to make societies more complex and thus harder to understand, which impedes social and political interventions to avoid harm and to increase the common good. To overcome this obstacle, the large-scale EU flagship proposal FuturICT intends to create a platform for accessing global human knowledge as a public good and instruments to increase our understanding of the information society by making use of ICT-based research. In this contribution, we outline the ethical justification for such an endeavor. We argue that the ethical issues raised by FuturICT research projects overlap substantially with many of the known ethical problems emerging from ICT use in general. By referring to the notion of Value Sensitive Design, we show for the example of privacy how this core value of responsible ICT can be protected in pursuing research in the framework of FuturICT. In addition, we discuss further ethical issues and outline the institutional design of FuturICT allowing to address them.

preprint2012arXiv

Marginality: a numerical mapping for enhanced treatment of nominal and hierarchical attributes

The purpose of statistical disclosure control (SDC) of microdata, a.k.a. data anonymization or privacy-preserving data mining, is to publish data sets containing the answers of individual respondents in such a way that the respondents corresponding to the released records cannot be re-identified and the released data are analytically useful. SDC methods are either based on masking the original data, generating synthetic versions of them or creating hybrid versions by combining original and synthetic data. The choice of SDC methods for categorical data, especially nominal data, is much smaller than the choice of methods for numerical data. We mitigate this problem by introducing a numerical mapping for hierarchical nominal data which allows computing means, variances and covariances on them.

Josep Domingo-Ferrer

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

A Critical Review on the Use (and Misuse) of Differential Privacy in Machine Learning

Bistochastic privacy

Defending against the Label-flipping Attack in Federated Learning

Circuit-Free General-Purpose Multi-Party Computation via Co-Utile Unlinkable Outsourcing

Give more data, awareness and control to individual citizens, and they will help COVID-19 containment

Supplementary Materials for "How to Avoid Reidentification with Proper Anonymization"- Comment on "Unique in the shopping mall: on the reidentifiability of credit card metadata"

Co-Utility: Self-Enforcing Protocols without Coordination Mechanisms

Flexible and Robust Privacy-Preserving Implicit Authentication

Flexible Attribute-Based Encryption Applicable to Secure E-Healthcare Records

New Directions in Anonymization: Permutation Paradigm, Verifiability by Subjects and Intruders, Transparency to Users

On the Security of MTA-OTIBASs (Multiple-TA One-Time Identity-Based Aggregate Signatures)

On the Security of Privacy-Preserving Vehicular Communication Authentication with Hierarchical Aggregation and Fast Response

Privacy and Data Protection by Design - from policy to engineering

Privacy by design in big data: An overview of privacy enhancing technologies in the era of big data analytics

t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation

Utility-Preserving Differentially Private Data Releases Via Individual Ranking Microaggregation

Group Discounts Compatible with Buyer Privacy

Privacy-preserving Loyalty Programs

Privacy-Preserving Trust Management Mechanisms from Private Matching Schemes

FuturICT - The Road towards Ethical ICT

Marginality: a numerical mapping for enhanced treatment of nominal and hierarchical attributes