Source author record

Vern Paxson

Vern Paxson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Networking and Internet Architecture Cryptography and Security Computation and Language

Catalog footprint

What is connected

6works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2017arXiv

Identifying Products in Online Cybercrime Marketplaces: A Dataset for Fine-grained Domain Adaptation

One weakness of machine-learned NLP models is that they typically perform poorly on out-of-domain data. In this work, we study the task of identifying products being bought and sold in online cybercrime forums, which exhibits particularly challenging cross-domain effects. We formulate a task that represents a hybrid of slot-filling information extraction and named entity recognition and annotate data from four different forums. Each of these forums constitutes its own "fine-grained domain" in that the forums cover different market sectors with different properties, even though all forums are in the broad domain of cybercrime. We characterize these domain differences in the context of a learning-based system: supervised models see decreased accuracy when applied to new forums, and standard techniques for semi-supervised learning and domain adaptation have limited effectiveness on this data, which suggests the need to improve these techniques. We release a dataset of 1,938 annotated posts from across the four forums.

preprint2016arXiv

A Multi-perspective Analysis of Carrier-Grade NAT Deployment

As ISPs face IPv4 address scarcity they increasingly turn to network address translation (NAT) to accommodate the address needs of their customers. Recently, ISPs have moved beyond employing NATs only directly at individual customers and instead begun deploying Carrier-Grade NATs (CGNs) to apply address translation to many independent and disparate endpoints spanning physical locations, a phenomenon that so far has received little in the way of empirical assessment. In this work we present a broad and systematic study of the deployment and behavior of these middleboxes. We develop a methodology to detect the existence of hosts behind CGNs by extracting non-routable IP addresses from peer lists we obtain by crawling the BitTorrent DHT. We complement this approach with improvements to our Netalyzr troubleshooting service, enabling us to determine a range of indicators of CGN presence as well as detailed insights into key properties of CGNs. Combining the two data sources we illustrate the scope of CGN deployment on today's Internet, and report on characteristics of commonly deployed CGNs and their effect on end users.

preprint2016arXiv

Haystack: A Multi-Purpose Mobile Vantage Point in User Space

Despite our growing reliance on mobile phones for a wide range of daily tasks, their operation remains largely opaque. A number of previous studies have addressed elements of this problem in a partial fashion, trading off analytic comprehensiveness and deployment scale. We overcome the barriers to large-scale deployment (e.g., requiring rooted devices) and comprehensiveness of previous efforts by taking a novel approach that leverages the VPN API on mobile devices to design Haystack, an in-situ mobile measurement platform that operates exclusively on the device, providing full access to the device's network traffic and local context without requiring root access. We present the design of Haystack and its implementation in an Android app that we deploy via standard distribution channels. Using data collected from 450 users of the app, we exemplify the advantages of Haystack over the state of the art and demonstrate its seamless experience even under demanding conditions. We also demonstrate its utility to users and researchers in characterizing mobile traffic and privacy risks.

preprint2015arXiv

A Primer on IPv4 Scarcity

With the ongoing exhaustion of free address pools at the registries serving the global demand for IPv4 address space, scarcity has become reality. Networks in need of address space can no longer get more address allocations from their respective registries. In this work we frame the fundamentals of the IPv4 address exhaustion phenomena and connected issues. We elaborate on how the current ecosystem of IPv4 address space has evolved since the standardization of IPv4, leading to the rather complex and opaque scenario we face today. We outline the evolution in address space management as well as address space use patterns, identifying key factors of the scarcity issues. We characterize the possible solution space to overcome these issues and open the perspective of address blocks as virtual resources, which involves issues such as differentiation between address blocks, the need for resource certification, and issues arising when transferring address space between networks.

preprint2015arXiv

Exploring Privacy Preservation in Outsourced K-Nearest Neighbors with Multiple Data Owners

The k-nearest neighbors (k-NN) algorithm is a popular and effective classification algorithm. Due to its large storage and computational requirements, it is suitable for cloud outsourcing. However, k-NN is often run on sensitive data such as medical records, user images, or personal information. It is important to protect the privacy of data in an outsourced k-NN system. Prior works have all assumed the data owners (who submit data to the outsourced k-NN system) are a single trusted party. However, we observe that in many practical scenarios, there may be multiple mutually distrusting data owners. In this work, we present the first framing and exploration of privacy preservation in an outsourced k-NN system with multiple data owners. We consider the various threat models introduced by this modification. We discover that under a particularly practical threat model that covers numerous scenarios, there exists a set of adaptive attacks that breach the data privacy of any exact k-NN system. The vulnerability is a result of the mathematical properties of k-NN and its output. Thus, we propose a privacy-preserving alternative system supporting kernel density estimation using a Gaussian kernel, a classification algorithm from the same family as k-NN. In many applications, this similar algorithm serves as a good substitute for k-NN. We additionally investigate solutions for other threat models, often through extensions on prior single data owner systems.

preprint2014arXiv

On Modeling the Costs of Censorship

We argue that the evaluation of censorship evasion tools should depend upon economic models of censorship. We illustrate our position with a simple model of the costs of censorship. We show how this model makes suggestions for how to evade censorship. In particular, from it, we develop evaluation criteria. We examine how our criteria compare to the traditional methods of evaluation employed in prior works.