Source author record

Nick Feamster

Nick Feamster appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Networking and Internet Architecture Cryptography and Security cs.CY Human-Computer Interaction Machine Learning

Catalog footprint

What is connected

18works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Towards Reproducible Network Traffic Analysis

Analysis techniques are critical for gaining insight into network traffic given both the higher proportion of encrypted traffic and increasing data rates. Unfortunately, the domain of network traffic analysis suffers from a lack of standardization, leading to incomparable results and barriers to reproducibility. Unlike other disciplines, no standard dataset format exists, forcing researchers and practitioners to create bespoke analysis pipelines for each individual task. Without standardization researchers cannot compare "apples-to-apples", preventing us from knowing with certainty if a new technique represents a methodological advancement or if it simply benefits from a different interpretation of a given dataset. In this work, we examine irreproducibility that arises from the lack of standardization in network traffic analysis. First, we study the literature, highlighting evidence of irreproducible research based on different interpretations of popular public datasets. Next, we investigate the underlying issues that have lead to the status quo and prevent reproducible research. Third, we outline the standardization requirements that any solution aiming to fix reproducibility issues must address. We then introduce pcapML, an open source system which increases reproducibility of network traffic analysis research by enabling metadata information to be directly encoded into raw traffic captures in a generic manner. Finally, we use the standardization pcapML provides to create the pcapML benchmarks, an open source leaderboard website and repository built to track the progress of network traffic analysis methods.

preprint2022arXiv

You, Me, and IoT: How Internet-Connected Consumer Devices Affect Interpersonal Relationships

Internet-connected consumer devices have rapidly increased in popularity; however, relatively little is known about how these technologies are affecting interpersonal relationships in multi-occupant households. In this study, we conduct 13 semi-structured interviews and survey 508 individuals from a variety of backgrounds to discover and categorize how consumer IoT devices are affecting interpersonal relationships in the United States. We highlight several themes, providing exploratory data about the pervasiveness of interpersonal costs and benefits of consumer IoT devices. These results inform follow-up studies and design priorities for future IoT technologies to amplify positive and reduce negative interpersonal effects.

preprint2021arXiv

Traffic Refinery: Cost-Aware Data Representation for Machine Learning on Network Traffic

Network management often relies on machine learning to make predictions about performance and security from network traffic. Often, the representation of the traffic is as important as the choice of the model. The features that the model relies on, and the representation of those features, ultimately determine model accuracy, as well as where and whether the model can be deployed in practice. Thus, the design and evaluation of these models ultimately requires understanding not only model accuracy but also the systems costs associated with deploying the model in an operational network. Towards this goal, this paper develops a new framework and system that enables a joint evaluation of both the conventional notions of machine learning performance (e.g., model accuracy) and the systems-level costs of different representations of network traffic. We highlight these two dimensions for two practical network management tasks, video streaming quality inference and malware detection, to demonstrate the importance of exploring different representations to find the appropriate operating point. We demonstrate the benefit of exploring a range of representations of network traffic and present Traffic Refinery, a proof-of-concept implementation that both monitors network traffic at 10 Gbps and transforms traffic in real time to produce a variety of feature representations for machine learning. Traffic Refinery both highlights this design space and makes it possible to explore different representations for learning, balancing systems costs related to feature extraction and model training against model accuracy.

preprint2021arXiv

Understanding How and Why University Students Use Virtual Private Networks

We study how and why university students chose and use VPNs, and whether they are aware of the security and privacy risks that VPNs pose. To answer these questions, we conducted 32 in-person interviews and a survey with 349 respondents, all university students in the United States. We find students are mostly concerned with access to content and privacy concerns were often secondary. They made tradeoffs to achieve a particular goal, such as using a free commercial VPN that may collect their online activities to access an online service in a geographic area. Many users expected that their VPNs were collecting data about them, although they did not understand how VPNs work. We conclude with a discussion of ways to help users make choices about VPNs.

preprint2020arXiv

Characterizing Service Provider Response to the COVID-19 Pandemic in the United States

The COVID-19 pandemic has resulted in dramatic changes to the daily habits of billions of people. Users increasingly have to rely on home broadband Internet access for work, education, and other activities. These changes have resulted in corresponding changes to Internet traffic patterns. This paper aims to characterize the effects of these changes with respect to Internet service providers in the United States. We study three questions: (1)How did traffic demands change in the United States as a result of the COVID-19 pandemic?; (2)What effects have these changes had on Internet performance?; (3)How did service providers respond to these changes? We study these questions using data from a diverse collection of sources. Our analysis of interconnection data for two large ISPs in the United States shows a 30-60% increase in peak traffic rates in the first quarter of 2020. In particular, we observe traffic downstream peak volumes for a major ISP increase of 13-20% while upstream peaks increased by more than 30%. Further, we observe significant variation in performance across ISPs in conjunction with the traffic volume shifts, with evident latency increases after stay-at-home orders were issued, followed by a stabilization of traffic after April. Finally, we observe that in response to changes in usage, ISPs have aggressively augmented capacity at interconnects, at more than twice the rate of normal capacity augmentation. Similarly, video conferencing applications have increased their network footprint, more than doubling their advertised IP address space.

preprint2020arXiv

Classifying Network Vendors at Internet Scale

In this paper, we develop a method to create a large, labeled dataset of visible network device vendors across the Internet by mapping network-visible IP addresses to device vendors. We use Internet-wide scanning, banner grabs of network-visible devices across the IPv4 address space, and clustering techniques to assign labels to more than 160,000 devices. We subsequently probe these devices and use features extracted from the responses to train a classifier that can accurately classify device vendors. Finally, we demonstrate how this method can be used to understand broader trends across the Internet by predicting device vendors in traceroutes from CAIDA's Archipelago measurement system and subsequently examining vendor distributions across these traceroutes.

preprint2020arXiv

Comparing the Effects of DNS, DoT, and DoH on Web Performance

Nearly every service on the Internet relies on the Domain Name System (DNS), which translates a human-readable name to an IP address before two endpoints can communicate. Today, DNS traffic is unencrypted, leaving users vulnerable to eavesdropping and tampering. Past work has demonstrated that DNS queries can reveal a user's browsing history and even what smart devices they are using at home. In response to these privacy concerns, two new protocols have been proposed: DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT). Instead of sending DNS queries and responses in the clear, DoH and DoT establish encrypted connections between users and resolvers. By doing so, these protocols provide privacy and security guarantees that traditional DNS (Do53) lacks. In this paper, we measure the effect of Do53, DoT, and DoH on query response times and page load times from five global vantage points. We find that although DoH and DoT response times are generally higher than Do53, both protocols can perform better than Do53 in terms of page load times. However, as throughput decreases and substantial packet loss and latency are introduced, web pages load fastest with Do53. Additionally, web pages successfully load more often with Do53 and DoT than DoH. Based on these results, we provide several recommendations to improve DNS performance, such as opportunistic partial responses and wire format caching.

preprint2019arXiv

Evaluating the Contextual Integrity of Privacy Regulation: Parents' IoT Toy Privacy Norms Versus COPPA

Increased concern about data privacy has prompted new and updated data protection regulations worldwide. However, there has been no rigorous way to test whether the practices mandated by these regulations actually align with the privacy norms of affected populations. Here, we demonstrate that surveys based on the theory of contextual integrity provide a quantifiable and scalable method for measuring the conformity of specific regulatory provisions to privacy norms. We apply this method to the U.S. Children's Online Privacy Protection Act (COPPA), surveying 195 parents and providing the first data that COPPA's mandates generally align with parents' privacy expectations for Internet-connected "smart" children's toys. Nevertheless, variations in the acceptability of data collection across specific smart toys, information types, parent ages, and other conditions emphasize the importance of detailed contextual factors to privacy norms, which may not be adequately captured by COPPA.

preprint2019arXiv

IoT Inspector: Crowdsourcing Labeled Network Traffic from Smart Home Devices at Scale

The proliferation of smart home devices has created new opportunities for empirical research in ubiquitous computing, ranging from security and privacy to personal health. Yet, data from smart home deployments are hard to come by, and existing empirical studies of smart home devices typically involve only a small number of devices in lab settings. To contribute to data-driven smart home research, we crowdsource the largest known dataset of labeled network traffic from smart home devices from within real-world home networks. To do so, we developed and released IoT Inspector, an open-source tool that allows users to observe the traffic from smart home devices on their own home networks. Since April 2019, 4,322 users have installed IoT Inspector, allowing us to collect labeled network traffic from 44,956 smart home devices across 13 categories and 53 vendors. We demonstrate how this data enables new research into smart homes through two case studies focused on security and privacy. First, we find that many device vendors use outdated TLS versions and advertise weak ciphers. Second, we discover about 350 distinct third-party advertiser and tracking domains on smart TVs. We also highlight other research areas, such as network management and healthcare, that can take advantage of IoT Inspector's dataset. To facilitate future reproducible research in smart homes, we will release the IoT Inspector data to the public.

preprint2019arXiv

Keeping the Smart Home Private with Smart(er) IoT Traffic Shaping

The proliferation of smart home Internet of Things (IoT) devices presents unprecedented challenges for preserving privacy within the home. In this paper, we demonstrate that a passive network observer (e.g., an Internet service provider) can infer private in-home activities by analyzing Internet traffic from commercially available smart home devices even when the devices use end-to-end transport-layer encryption. We evaluate common approaches for defending against these types of traffic analysis attacks, including firewalls, virtual private networks, and independent link padding, and find that none sufficiently conceal user activities with reasonable data overhead. We develop a new defense, "stochastic traffic padding" (STP), that makes it difficult for a passive network adversary to reliably distinguish genuine user activities from generated traffic patterns designed to look like user interactions. Our analysis provides a theoretical bound on an adversary's ability to accurately detect genuine user activities as a function of the amount of additional cover traffic generated by the defense technique.

preprint2018arXiv

Oblivious DNS: Practical Privacy for DNS Queries

Virtually every Internet communication typically involves a Domain Name System (DNS) lookup for the destination server that the client wants to communicate with. Operators of DNS recursive resolvers---the machines that receive a client's query for a domain name and resolve it to a corresponding IP address---can learn significant information about client activity. Past work, for example, indicates that DNS queries reveal information ranging from web browsing activity to the types of devices that a user has in their home. Recognizing the privacy vulnerabilities associated with DNS queries, various third parties have created alternate DNS services that obscure a user's DNS queries from his or her Internet service provider. Yet, these systems merely transfer trust to a different third party. We argue that no single party ought to be able to associate DNS queries with a client IP address that issues those queries. To this end, we present Oblivious DNS (ODNS), which introduces an additional layer of obfuscation between clients and their queries. To do so, ODNS uses its own authoritative namespace; the authoritative servers for the ODNS namespace act as recursive resolvers for the DNS queries that they receive, but they never see the IP addresses for the clients that initiated these queries. We present an initial deployment of ODNS; our experiments show that ODNS introduces minimal performance overhead, both for individual queries and for web page loads. We design ODNS to be compatible with existing DNS protocols and infrastructure, and we are actively working on an open standard with the IETF.

preprint2016arXiv

Characterizing and Avoiding Routing Detours Through Surveillance States

An increasing number of countries are passing laws that facilitate the mass surveillance of Internet traffic. In response, governments and citizens are increasingly paying attention to the countries that their Internet traffic traverses. In some cases, countries are taking extreme steps, such as building new Internet Exchange Points (IXPs), which allow networks to interconnect directly, and encouraging local interconnection to keep local traffic local. We find that although many of these efforts are extensive, they are often futile, due to the inherent lack of hosting and route diversity for many popular sites. By measuring the country-level paths to popular domains, we characterize transnational routing detours. We find that traffic is traversing known surveillance states, even when the traffic originates and ends in a country that does not conduct mass surveillance. Then, we investigate how clients can use overlay network relays and the open DNS resolver infrastructure to prevent their traffic from traversing certain jurisdictions. We find that 84\% of paths originating in Brazil traverse the United States, but when relays are used for country avoidance, only 37\% of Brazilian paths traverse the United States. Using the open DNS resolver infrastructure allows Kenyan clients to avoid the United States on 17\% more paths. Unfortunately, we find that some of the more prominent surveillance states (e.g., the U.S.) are also some of the least avoidable countries.

preprint2016arXiv

Identifying and characterizing Sybils in the Tor network

Being a volunteer-run, distributed anonymity network, Tor is vulnerable to Sybil attacks. Little is known about real-world Sybils in the Tor network, and we lack practical tools and methods to expose Sybil attacks. In this work, we develop sybilhunter, the first system for detecting Sybil relays based on their appearance, such as configuration; and behavior, such as uptime sequences. We used sybilhunter's diverse analysis techniques to analyze nine years of archived Tor network data, providing us with new insights into the operation of real-world attackers. Our findings include diverse Sybils, ranging from botnets, to academic research, and relays that hijack Bitcoin transactions. Our work shows that existing Sybil defenses do not apply to Tor, it delivers insights into real-world attacks, and provides practical tools to uncover and characterize Sybils, making the network safer for its users.

preprint2016arXiv

Revealing Utilization at Internet Interconnection Points

Recent Internet interconnection disputes have sparked an in- creased interest in developing methods for gathering and collecting data about utilization at interconnection points. One mechanism, developed by DeepField Networks, allows Internet service providers (ISPs) to gather and aggregate utilization information using network flow statistics, standardized in the Internet Engineering Task Force as IPFIX. This report (1) provides an overview of the method that DeepField Networks is using to measure the utilization of various interconnection links between content providers and ISPs or links over which traffic between content and ISPs flow; and (2) surveys the findings from five months of Internet utilization data provided by seven participating ISPs---Bright House Networks, Comcast, Cox, Mediacom, Midco, Suddenlink, and Time Warner Cable---whose access networks represent about 50% of all U.S. broadband subscribers. The dataset includes about 97% of the paid peering, settlement-free peering, and ISP-paid transit links of each of the participating ISPs. Initial analysis of the data---which comprises more than 1,000 link groups, representing the diverse and substitutable available routes---suggests that many interconnects have significant spare capacity, that this spare capacity exists both across ISPs in each region and in aggregate for any individual ISP, and that the aggregate utilization across interconnects interconnects is roughly 50% during peak periods.

preprint2016arXiv

Staggercast: Demand-Side Management for ISPs

The continuing expansion of Internet media consumption has increased traffic volumes, and hence congestion, on access links. In response, both mobile and wireline ISPs must either increase capacity or perform traffic engineering over existing resources. Unfortunately, provisioning timescales are long, the process is costly, and single-homing means operators cannot balance across the last mile. Inspired by energy and transport networks, we propose demand-side management of users to reduce the impact caused by consumption patterns out-pacing that of edge network provision. By directly affecting user behaviour through a range of incentives, our techniques enable resource management over shorter timescales than is possible in conventional networks. Using survey data from 100 participants we explore the feasibility of introducing the principles of demand-side management in today's networks.

preprint2016arXiv

The Effect of DNS on Tor's Anonymity

Previous attacks that link the sender and receiver of traffic in the Tor network ("correlation attacks") have generally relied on analyzing traffic from TCP connections. The TCP connections of a typical client application, however, are often accompanied by DNS requests and responses. This additional traffic presents more opportunities for correlation attacks. This paper quantifies how DNS traffic can make Tor users more vulnerable to correlation attacks. We investigate how incorporating DNS traffic can make existing correlation attacks more powerful and how DNS lookups can leak information to third parties about anonymous communication. We (i) develop a method to identify the DNS resolvers of Tor exit relays; (ii) develop a new set of correlation attacks (DefecTor attacks) that incorporate DNS traffic to improve precision; (iii) analyze the Internet-scale effects of these new attacks on Tor users; and (iv) develop improved methods to evaluate correlation attacks. First, we find that there exist adversaries who can mount DefecTor attacks: for example, Google's DNS resolver observes almost 40% of all DNS requests exiting the Tor network. We also find that DNS requests often traverse ASes that the corresponding TCP connections do not transit, enabling additional ASes to gain information about Tor users' traffic. We then show that an adversary who can mount a DefecTor attack can often determine the website that a Tor user is visiting with perfect precision, particularly for less popular websites where the set of DNS names associated with that website may be unique to the site. We also use the Tor Path Simulator (TorPS) in combination with traceroute data from vantage points co-located with Tor exit relays to estimate the power of AS-level adversaries who might mount DefecTor attacks in practice.

preprint2015arXiv

Encore: Lightweight Measurement of Web Censorship with Cross-Origin Requests

Despite the pervasiveness of Internet censorship, we have scant data on its extent, mechanisms, and evolution. Measuring censorship is challenging: it requires continual measurement of reachability to many target sites from diverse vantage points. Amassing suitable vantage points for longitudinal measurement is difficult; existing systems have achieved only small, short-lived deployments. We observe, however, that most Internet users access content via Web browsers, and the very nature of Web site design allows browsers to make requests to domains with different origins than the main Web page. We present Encore, a system that harnesses cross-origin requests to measure Web filtering from a diverse set of vantage points without requiring users to install custom software, enabling longitudinal measurements from many vantage points. We explain how Encore induces Web clients to perform cross-origin requests that measure Web filtering, design a distributed platform for scheduling and collecting these measurements, show the feasibility of a global-scale deployment with a pilot study and an analysis of potentially censored Web content, identify several cases of filtering in six months of measurements, and discuss ethical concerns that would arise with widespread deployment.

preprint2011arXiv

Modeling Tiered Pricing in the Internet Transit Market

ISPs are increasingly selling "tiered" contracts, which offer Internet connectivity to wholesale customers in bundles, at rates based on the cost of the links that the traffic in the bundle is traversing. Although providers have already begun to implement and deploy tiered pricing contracts, little is known about how such pricing affects ISPs and their customers. While contracts that sell connectivity on finer granularities improve market efficiency, they are also more costly for ISPs to implement and more difficult for customers to understand. In this work we present two contributions: (1) we develop a novel way of mapping traffic and topology data to a demand and cost model; and (2) we fit this model on three large real-world networks: an European transit ISP, a content distribution network, and an academic research network, and run counterfactuals to evaluate the effects of different pricing strategies on both the ISP profit and the consumer surplus. We highlight three core findings. First, ISPs gain most of the profits with only three or four pricing tiers and likely have little incentive to increase granularity of pricing even further. Second, we show that consumer surplus follows closely, if not precisely, the increases in ISP profit with more pricing tiers. Finally, the common ISP practice of structuring tiered contracts according to the cost of carrying the traffic flows (e.g., offering a discount for traffic that is local) can be suboptimal and that dividing contracts based on both traffic demand and the cost of carrying it into only three or four tiers yields near-optimal profit for the ISP.

Nick Feamster

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Towards Reproducible Network Traffic Analysis

You, Me, and IoT: How Internet-Connected Consumer Devices Affect Interpersonal Relationships

Traffic Refinery: Cost-Aware Data Representation for Machine Learning on Network Traffic

Understanding How and Why University Students Use Virtual Private Networks

Characterizing Service Provider Response to the COVID-19 Pandemic in the United States

Classifying Network Vendors at Internet Scale

Comparing the Effects of DNS, DoT, and DoH on Web Performance

Evaluating the Contextual Integrity of Privacy Regulation: Parents' IoT Toy Privacy Norms Versus COPPA

IoT Inspector: Crowdsourcing Labeled Network Traffic from Smart Home Devices at Scale

Keeping the Smart Home Private with Smart(er) IoT Traffic Shaping

Oblivious DNS: Practical Privacy for DNS Queries

Characterizing and Avoiding Routing Detours Through Surveillance States

Identifying and characterizing Sybils in the Tor network

Revealing Utilization at Internet Interconnection Points

Staggercast: Demand-Side Management for ISPs

The Effect of DNS on Tor's Anonymity

Encore: Lightweight Measurement of Web Censorship with Cross-Origin Requests

Modeling Tiered Pricing in the Internet Transit Market