Source author record

N. Asokan

N. Asokan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Machine Learning Software Engineering Computation and Language Distributed, Parallel, and Cluster Computing Information Retrieval Operating Systems Social and Information Networks

Catalog footprint

What is connected

22works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

A User-centered Security Evaluation of Copilot

Code generation tools driven by artificial intelligence have recently become more popular due to advancements in deep learning and natural language processing that have increased their capabilities. The proliferation of these tools may be a double-edged sword because while they can increase developer productivity by making it easier to write code, research has shown that they can also generate insecure code. In this paper, we perform a user-centered evaluation GitHub's Copilot to better understand its strengths and weaknesses with respect to code security. We conduct a user study where participants solve programming problems (with and without Copilot assistance) that have potentially vulnerable solutions. The main goal of the user study is to determine how the use of Copilot affects participants' security performance. In our set of participants (n=25), we find that access to Copilot accompanies a more secure solution when tackling harder problems. For the easier problem, we observe no effect of Copilot access on the security of solutions. We also observe no disproportionate impact of Copilot use on particular kinds of vulnerabilities. Our results indicate that there are potential security benefits to using Copilot, but more research is warranted on the effects of the use of code generation tools on technically complex problems with security requirements.

preprint2024arXiv

Is GitHub's Copilot as Bad as Humans at Introducing Vulnerabilities in Code?

Several advances in deep learning have been successfully applied to the software development process. Of recent interest is the use of neural language models to build tools, such as Copilot, that assist in writing code. In this paper we perform a comparative empirical analysis of Copilot-generated code from a security perspective. The aim of this study is to determine if Copilot is as bad as human developers. We investigate whether Copilot is just as likely to introduce the same software vulnerabilities as human developers. Using a dataset of C/C++ vulnerabilities, we prompt Copilot to generate suggestions in scenarios that led to the introduction of vulnerabilities by human developers. The suggestions are inspected and categorized in a 2-stage process based on whether the original vulnerability or fix is reintroduced. We find that Copilot replicates the original vulnerable code about 33% of the time while replicating the fixed code at a 25% rate. However this behaviour is not consistent: Copilot is more likely to introduce some types of vulnerabilities than others and is also more likely to generate vulnerable code in response to prompts that correspond to older vulnerabilities. Overall, given that in a significant number of cases it did not replicate the vulnerabilities previously introduced by human developers, we conclude that Copilot, despite performing differently across various vulnerability types, is not as bad as human developers at introducing vulnerabilities in code.

preprint2022arXiv

On the Effectiveness of Dataset Watermarking in Adversarial Settings

In a data-driven world, datasets constitute a significant economic value. Dataset owners who spend time and money to collect and curate the data are incentivized to ensure that their datasets are not used in ways that they did not authorize. When such misuse occurs, dataset owners need technical mechanisms for demonstrating their ownership of the dataset in question. Dataset watermarking provides one approach for ownership demonstration which can, in turn, deter unauthorized use. In this paper, we investigate a recently proposed data provenance method, radioactive data, to assess if it can be used to demonstrate ownership of (image) datasets used to train machine learning (ML) models. The original paper reported that radioactive data is effective in white-box settings. We show that while this is true for large datasets with many classes, it is not as effective for datasets where the number of classes is low $(\leq 30)$ or the number of samples per class is low $(\leq 500)$. We also show that, counter-intuitively, the black-box verification technique is effective for all datasets used in this paper, even when white-box verification is not. Given this observation, we show that the confidence in white-box verification can be improved by using watermarked samples directly during the verification process. We also highlight the need to assess the robustness of radioactive data if it were to be used for ownership demonstration since it is an adversarial setting unlike provenance identification. Compared to dataset watermarking, ML model watermarking has been explored more extensively in recent literature. However, most of the model watermarking techniques can be defeated via model extraction. We show that radioactive data can effectively survive model extraction attacks, which raises the possibility that it can be used for ML model ownership verification robust against model extraction.

preprint2022arXiv

SHAPr: An Efficient and Versatile Membership Privacy Risk Metric for Machine Learning

Data used to train machine learning (ML) models can be sensitive. Membership inference attacks (MIAs), attempting to determine whether a particular data record was used to train an ML model, risk violating membership privacy. ML model builders need a principled definition of a metric to quantify the membership privacy risk of (a) individual training data records, (b) computed independently of specific MIAs, (c) which assesses susceptibility to different MIAs, (d) can be used for different applications, and (e) efficiently. None of the prior membership privacy risk metrics simultaneously meet all these requirements. We present SHAPr, a membership privacy metric based on Shapley values which is a leave-one-out (LOO) technique, originally intended to measure the contribution of a training data record on model utility. We conjecture that contribution to model utility can act as a proxy for memorization, and hence represent membership privacy risk. Using ten benchmark datasets, we show that SHAPr is indeed effective in estimating susceptibility of training data records to MIAs. We also show that, unlike prior work, SHAPr is significantly better in estimating susceptibility to newer, and more effective MIA. We apply SHAPr to evaluate the efficacy of several defenses against MIAs: using regularization and removing high risk training data records. Moreover, SHAPr is versatile: it can be used for estimating vulnerability of different subgroups to MIAs, and inherits applications of Shapley values (e.g., data valuation). We show that SHAPr has an acceptable computational cost (compared to naive LOO), varying from a few minutes for the smallest dataset to ~92 minutes for the largest dataset.

preprint2020arXiv

Effective writing style imitation via combinatorial paraphrasing

Stylometry can be used to profile or deanonymize authors against their will based on writing style. Style transfer provides a defence. Current techniques typically use either encoder-decoder architectures or rule-based algorithms. Crucially, style transfer must reliably retain original semantic content to be actually deployable. We conduct a multifaceted evaluation of three state-of-the-art encoder-decoder style transfer techniques, and show that all fail at semantic retainment. In particular, they do not produce appropriate paraphrases, but only retain original content in the trivial case of exactly reproducing the text. To mitigate this problem we propose ParChoice: a technique based on the combinatorial application of multiple paraphrasing algorithms. ParChoice strongly outperforms the encoder-decoder baselines in semantic retainment. Additionally, compared to baselines that achieve non-negligible semantic retainment, ParChoice has superior style transfer performance. We also apply ParChoice to multi-author style imitation (not considered by prior work), where we achieve up to 75% imitation success among five authors. Furthermore, when compared to two state-of-the-art rule-based style transfer techniques, ParChoice has markedly better semantic retainment. Combining ParChoice with the best performing rule-based baseline (Mutant-X) also reaches the highest style transfer success on the Brennan-Greenstadt and Extended-Brennan-Greenstadt corpora, with much less impact on original meaning than when using the rule-based baseline techniques alone. Finally, we highlight a critical problem that afflicts all current style transfer techniques: the adversary can use the same technique for thwarting style transfer via adversarial training. We show that adding randomness to style transfer helps to mitigate the effectiveness of adversarial training.

preprint2020arXiv

Extraction of Complex DNN Models: Real Threat or Boogeyman?

Recently, machine learning (ML) has introduced advanced solutions to many domains. Since ML models provide business advantage to model owners, protecting intellectual property of ML models has emerged as an important consideration. Confidentiality of ML models can be protected by exposing them to clients only via prediction APIs. However, model extraction attacks can steal the functionality of ML models using the information leaked to clients through the results returned via the API. In this work, we question whether model extraction is a serious threat to complex, real-life ML models. We evaluate the current state-of-the-art model extraction attack (Knockoff nets) against complex models. We reproduce and confirm the results in the original paper. But we also show that the performance of this attack can be limited by several factors, including ML model architecture and the granularity of API response. Furthermore, we introduce a defense based on distinguishing queries used for Knockoff nets from benign queries. Despite the limitations of the Knockoff nets, we show that a more realistic adversary can effectively steal complex ML models and evade known defenses.

preprint2018arXiv

S-FaaS: Trustworthy and Accountable Function-as-a-Service using Intel SGX

Function-as-a-Service (FaaS) is a recent and already very popular paradigm in cloud computing. The function provider need only specify the function to be run, usually in a high-level language like JavaScript, and the service provider orchestrates all the necessary infrastructure and software stacks. The function provider is only billed for the actual computational resources used by the function invocation. Compared to previous cloud paradigms, FaaS requires significantly more fine-grained resource measurement mechanisms, e.g. to measure compute time and memory usage of a single function invocation with sub-second accuracy. Thanks to the short duration and stateless nature of functions, and the availability of multiple open-source frameworks, FaaS enables non-traditional service providers e.g. individuals or data centers with spare capacity. However, this exacerbates the challenge of ensuring that resource consumption is measured accurately and reported reliably. It also raises the issues of ensuring computation is done correctly and minimizing the amount of information leaked to service providers. To address these challenges, we introduce S-FaaS, the first architecture and implementation of FaaS to provide strong security and accountability guarantees backed by Intel SGX. To match the dynamic event-driven nature of FaaS, our design introduces a new key distribution enclave and a novel transitive attestation protocol. A core contribution of S-FaaS is our set of resource measurement mechanisms that securely measure compute time inside an enclave, and actual memory allocations. We have integrated S-FaaS into the popular OpenWhisk FaaS framework. We evaluate the security of our architecture, the accuracy of our resource measurement mechanisms, and the performance of our implementation, showing that our resource measurement mechanisms add less than 6.3% latency on standardized benchmarks.

preprint2016arXiv

C-FLAT: Control-FLow ATtestation for Embedded Systems Software

Remote attestation is a crucial security service particularly relevant to increasingly popular IoT (and other embedded) devices. It allows a trusted party (verifier) to learn the state of a remote, and potentially malware-infected, device (prover). Most existing approaches are static in nature and only check whether benign software is initially loaded on the prover. However, they are vulnerable to run-time attacks that hijack the application's control or data flow, e.g., via return-oriented programming or data-oriented exploits. As a concrete step towards more comprehensive run-time remote attestation, we present the design and implementation of Control- FLow ATtestation (C-FLAT) that enables remote attestation of an application's control-flow path, without requiring the source code. We describe a full prototype implementation of C-FLAT on Raspberry Pi using its ARM TrustZone hardware security extensions. We evaluate C-FLAT's performance using a real-world embedded (cyber-physical) application, and demonstrate its efficacy against control-flow hijacking attacks.

preprint2016arXiv

Know Your Phish: Novel Techniques for Detecting Phishing Sites and their Targets

Phishing is a major problem on the Web. Despite the significant attention it has received over the years, there has been no definitive solution. While the state-of-the-art solutions have reasonably good performance, they require a large amount of training data and are not adept at detecting phishing attacks against new targets. In this paper, we begin with two core observations: (a) although phishers try to make a phishing webpage look similar to its target, they do not have unlimited freedom in structuring the phishing webpage; and (b) a webpage can be characterized by a small set of key terms; how these key terms are used in different parts of a webpage is different in the case of legitimate and phishing webpages. Based on these observations, we develop a phishing detection system with several notable properties: it is language-independent, can be implemented entirely on client-side, has excellent classification performance and is fast. In addition, we developed a target identification component that can identify the target website that a phishing webpage is attempting to mimic. The target detection component is faster than previously reported systems and can help minimize false positives in our phishing detection system.

preprint2016arXiv

OmniShare: Securely Accessing Encrypted Cloud Storage from Multiple Authorized Devices

Cloud storage services like Dropbox and Google Drive are widely used by individuals and businesses. Two attractive features of these services are 1) the automatic synchronization of files between multiple client devices and 2) the possibility to share files with other users. However, privacy of cloud data is a growing concern for both individuals and businesses. Encrypting data on the client-side before uploading it is an effective privacy safeguard, but it requires all client devices to have the decryption key. Current solutions derive these keys solely from user-chosen passwords, which have low entropy and are easily guessed. We present OmniShare, the first scheme to allow client-side encryption with high-entropy keys whilst providing an intuitive key distribution mechanism to enable access from multiple client devices. Instead of passwords, we use low bandwidth uni-directional out-of-band (OOB) channels, such as QR codes, to authenticate new devices. To complement these OOB channels, the cloud storage itself is used as a communication channel between devices in our protocols. We rely on a directory-based key hierarchy with individual file keys to limit the consequences of key compromise and allow efficient sharing of files without requiring re-encryption. OmniShare is open source software and currently available for Android and Windows with other platforms in development. We describe the design and implementation of OmniShare, and explain how we evaluated its security using formal methods, its performance via real-world benchmarks, and its usability through a cognitive walkthrough.

preprint2016arXiv

Pitfalls in Designing Zero-Effort Deauthentication: Opportunistic Human Observation Attacks

Deauthentication is an important component of any authentication system. The widespread use of computing devices in daily life has underscored the need for zero-effort deauthentication schemes. However, the quest for eliminating user effort may lead to hidden security flaws in the authentication schemes. As a case in point, we investigate a prominent zero-effort deauthentication scheme, called ZEBRA, which provides an interesting and a useful solution to a difficult problem as demonstrated in the original paper. We identify a subtle incorrect assumption in its adversary model that leads to a fundamental design flaw. We exploit this to break the scheme with a class of attacks that are much easier for a human to perform in a realistic adversary model, compared to the naïve attacks studied in the ZEBRA paper. For example, one of our main attacks, where the human attacker has to opportunistically mimic only the victim's keyboard typing activity at a nearby terminal, is significantly more successful compared to the naïve attack that requires mimicking keyboard and mouse activities as well as keyboard-mouse movements. Further, by understanding the design flaws in ZEBRA as cases of tainted input, we show that we can draw on well-understood design principles to improve ZEBRA's security.

preprint2016arXiv

Towards Fairness of Cryptocurrency Payments

Motivated by the great success and adoption of Bitcoin, a number of cryptocurrencies such as Litecoin, Dogecoin, and Ethereum are becoming increasingly popular. Although existing blockchain-based cryptocurrency schemes can ensure reasonable security for transactions, they do not consider any notion of fairness. Fair exchange allows two players to exchange digital "items", such as digital signatures, over insecure networks fairly, so that either each player gets the other's item, or neither player does. Given that blockchain participants typically do not trust each other, enabling fairness in existing cryptocurrencies is an essential but insufficiently explored problem. In this paper, we explore the solution space for enabling the fair exchange of a cryptocurrency payment for a receipt. We identify the timeliness of an exchange as an important property especially when one of the parties involved in the exchange is resource-constrained. We introduce the notion of strong timeliness for a fair exchange protocol and propose two fair payment-for-receipt protocol instantiations that leverage functionality of the blockchain to achieve strong timeliness. We implement both and compare their security and efficiency.

preprint2015arXiv

Characterizing SEAndroid Policies in the Wild

Starting from the 5.0 Lollipop release all Android processes must be run inside confined SEAndroid access control domains. As a result, Android device manufacturers were compelled to develop SEAndroid expertise in order to create policies for their device-specific components. In this paper we analyse SEAndroid policies from a number of 5.0 Lollipop devices on the market, and identify patterns of common problems we found. We also suggest some practical tools that can improve policy design and analysis. We implemented the first of such tools, SEAL.

preprint2015arXiv

How Far Removed Are You? Scalable Privacy-Preserving Estimation of Social Path Length with Social PaL

Social relationships are a natural basis on which humans make trust decisions. Online Social Networks (OSNs) are increasingly often used to let users base trust decisions on the existence and the strength of social relationships. While most OSNs allow users to discover the length of the social path to other users, they do so in a centralized way, thus requiring them to rely on the service provider and reveal their interest in each other. This paper presents Social PaL, a system supporting the privacy-preserving discovery of arbitrary-length social paths between any two social network users. We overcome the bootstrapping problem encountered in all related prior work, demonstrating that Social PaL allows its users to find all paths of length two and to discover a significant fraction of longer paths, even when only a small fraction of OSN users is in the Social PaL system - e.g., discovering 70% of all paths with only 40% of the users. We implement Social PaL using a scalable server-side architecture and a modular Android client library, allowing developers to seamlessly integrate it into their apps.

preprint2015arXiv

LookAhead: Augmenting Crowdsourced Website Reputation Systems With Predictive Modeling

Unsafe websites consist of malicious as well as inappropriate sites, such as those hosting questionable or offensive content. Website reputation systems are intended to help ordinary users steer away from these unsafe sites. However, the process of assigning safety ratings for websites typically involves humans. Consequently it is time consuming, costly and not scalable. This has resulted in two major problems: (i) a significant proportion of the web space remains unrated and (ii) there is an unacceptable time lag before new websites are rated. In this paper, we show that by leveraging structural and content-based properties of websites, it is possible to reliably and efficiently predict their safety ratings, thereby mitigating both problems. We demonstrate the effectiveness of our approach using four datasets of up to 90,000 websites. We use ratings from Web of Trust (WOT), a popular crowdsourced web reputation system, as ground truth. We propose a novel ensemble classification technique that makes opportunistic use of available structural and content properties of webpages to predict their eventual ratings in two dimensions used by WOT: trustworthiness and child safety. Ours is the first classification system to predict such subjective ratings and the same approach works equally well in identifying malicious websites. Across all datasets, our classification performs well with average F$_1$-score in the 74--90\% range.

preprint2015arXiv

On Making Emerging Trusted Execution Environments Accessible to Developers

New types of Trusted Execution Environment (TEE) architectures like TrustLite and Intel Software Guard Extensions (SGX) are emerging. They bring new features that can lead to innovative security and privacy solutions. But each new TEE environment comes with its own set of interfaces and programming paradigms, thus raising the barrier for entry for developers who want to make use of these TEEs. In this paper, we motivate the need for realizing standard TEE interfaces on such emerging TEE architectures and show that this exercise is not straightforward. We report on our on-going work in mapping GlobalPlatform standard interfaces to TrustLite and SGX.

preprint2015arXiv

Open-TEE - An Open Virtual Trusted Execution Environment

Hardware-based Trusted Execution Environments (TEEs) are widely deployed in mobile devices. Yet their use has been limited primarily to applications developed by the device vendors. Recent standardization of TEE interfaces by GlobalPlatform (GP) promises to partially address this problem by enabling GP-compliant trusted applications to run on TEEs from different vendors. Nevertheless ordinary developers wishing to develop trusted applications face significant challenges. Access to hardware TEE interfaces are difficult to obtain without support from vendors. Tools and software needed to develop and debug trusted applications may be expensive or non-existent. In this paper, we describe Open-TEE, a virtual, hardware-independent TEE implemented in software. Open-TEE conforms to GP specifications. It allows developers to develop and debug trusted applications with the same tools they use for developing software in general. Once a trusted application is fully debugged, it can be compiled for any actual hardware TEE. Through performance measurements and a user study we demonstrate that Open-TEE is efficient and easy to use. We have made Open- TEE freely available as open source.

preprint2014arXiv

Citizen Electronic Identities using TPM 2.0

Electronic Identification (eID) is becoming commonplace in several European countries. eID is typically used to authenticate to government e-services, but is also used for other services, such as public transit, e-banking, and physical security access control. Typical eID tokens take the form of physical smart cards, but successes in merging eID into phone operator SIM cards show that eID tokens integrated into a personal device can offer better usability compared to standalone tokens. At the same time, trusted hardware that enables secure storage and isolated processing of sensitive data have become commonplace both on PC platforms as well as mobile devices. Some time ago, the Trusted Computing Group (TCG) released the version 2.0 of the Trusted Platform Module (TPM) specification. We propose an eID architecture based on the new, rich authorization model introduced in the TCGs TPM 2.0. The goal of the design is to improve the overall security and usability compared to traditional smart card-based solutions. We also provide, to the best our knowledge, the first accessible description of the TPM 2.0 authorization model.

preprint2014arXiv

ConXsense - Automated Context Classification for Context-Aware Access Control

We present ConXsense, the first framework for context-aware access control on mobile devices based on context classification. Previous context-aware access control systems often require users to laboriously specify detailed policies or they rely on pre-defined policies not adequately reflecting the true preferences of users. We present the design and implementation of a context-aware framework that uses a probabilistic approach to overcome these deficiencies. The framework utilizes context sensing and machine learning to automatically classify contexts according to their security and privacy-related properties. We apply the framework to two important smartphone-related use cases: protection against device misuse using a dynamic device lock and protection against sensory malware. We ground our analysis on a sociological survey examining the perceptions and concerns of users related to contextual smartphone security and analyze the effectiveness of our approach with real-world context data. We also demonstrate the integration of our framework with the FlaskDroid architecture for fine-grained access control enforcement on the Android platform.

preprint2014arXiv

Security of OS-level virtualization technologies: Technical report

The need for flexible, low-overhead virtualization is evident on many fronts ranging from high-density cloud servers to mobile devices. During the past decade OS-level virtualization has emerged as a new, efficient approach for virtualization, with implementations in multiple different Unix-based systems. Despite its popularity, there has been no systematic study of OS-level virtualization from the point of view of security. In this report, we conduct a comparative study of several OS-level virtualization systems, discuss their security and identify some gaps in current solutions.

preprint2014arXiv

The Company You Keep: Mobile Malware Infection Rates and Inexpensive Risk Indicators

There is little information from independent sources in the public domain about mobile malware infection rates. The only previous independent estimate (0.0009%) [12], was based on indirect measurements obtained from domain name resolution traces. In this paper, we present the first independent study of malware infection rates and associated risk factors using data collected directly from over 55,000 Android devices. We find that the malware infection rates in Android devices estimated using two malware datasets (0.28% and 0.26%), though small, are significantly higher than the previous independent estimate. Using our datasets, we investigate how indicators extracted inexpensively from the devices correlate with malware infection. Based on the hypothesis that some application stores have a greater density of malicious applications and that advertising within applications and cross-promotional deals may act as infection vectors, we investigate whether the set of applications used on a device can serve as an indicator for infection of that device. Our analysis indicates that this alone is not an accurate indicator for pinpointing infection. However, it is a very inexpensive but surprisingly useful way for significantly narrowing down the pool of devices on which expensive monitoring and analysis mechanisms must be deployed. Using our two malware datasets we show that this indicator performs 4.8 and 4.6 times (respectively) better at identifying infected devices than the baseline of random checks. Such indicators can be used, for example, in the search for new or previously undetected malware. It is therefore a technique that can complement standard malware scanning by anti-malware tools. Our analysis also demonstrates a marginally significant difference in battery use between infected and clean devices.

preprint2013arXiv

PeerShare: A System Secure Distribution of Sensitive Data Among Social Contacts

We present the design and implementation of the PeerShare, a system that can be used by applications to securely distribute sensitive data to social contacts of a user. PeerShare incorporates a generic framework that allows different applications to distribute data with different security requirements. By using interfaces available from existing popular social networks. PeerShare is designed to be easy to use for both end users as well as developers of applications. PeerShare can be used to distribute shared keys, public keys and any other data that need to be distributed with authenticity and confidentiality guarantees to an authorized set of recipients, specified in terms of social relationships. We have used \peershare already in three different applications and plan to make it available for developers.

N. Asokan

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

A User-centered Security Evaluation of Copilot

Is GitHub's Copilot as Bad as Humans at Introducing Vulnerabilities in Code?

On the Effectiveness of Dataset Watermarking in Adversarial Settings

SHAPr: An Efficient and Versatile Membership Privacy Risk Metric for Machine Learning

Effective writing style imitation via combinatorial paraphrasing

Extraction of Complex DNN Models: Real Threat or Boogeyman?

S-FaaS: Trustworthy and Accountable Function-as-a-Service using Intel SGX

C-FLAT: Control-FLow ATtestation for Embedded Systems Software

Know Your Phish: Novel Techniques for Detecting Phishing Sites and their Targets

OmniShare: Securely Accessing Encrypted Cloud Storage from Multiple Authorized Devices

Pitfalls in Designing Zero-Effort Deauthentication: Opportunistic Human Observation Attacks

Towards Fairness of Cryptocurrency Payments

Characterizing SEAndroid Policies in the Wild

How Far Removed Are You? Scalable Privacy-Preserving Estimation of Social Path Length with Social PaL

LookAhead: Augmenting Crowdsourced Website Reputation Systems With Predictive Modeling

On Making Emerging Trusted Execution Environments Accessible to Developers

Open-TEE - An Open Virtual Trusted Execution Environment

Citizen Electronic Identities using TPM 2.0

ConXsense - Automated Context Classification for Context-Aware Access Control

Security of OS-level virtualization technologies: Technical report

The Company You Keep: Mobile Malware Infection Rates and Inexpensive Risk Indicators

PeerShare: A System Secure Distribution of Sensitive Data Among Social Contacts