Researcher profile

Stephanie Forrest

Stephanie Forrest contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

Cutting Through the Noise to Infer Autonomous System Topology

The Border Gateway Protocol (BGP) is a distributed protocol that manages interdomain routing without requiring a centralized record of which autonomous systems (ASes) connect to which others. Many methods have been devised to infer the AS topology from publicly available BGP data, but none provide a general way to handle the fact that the data are notoriously incomplete and subject to error. This paper describes a method for reliably inferring AS-level connectivity in the presence of measurement error using Bayesian statistical inference acting on BGP routing tables from multiple vantage points. We employ a novel approach for counting AS adjacency observations in the AS-PATH attribute data from public route collectors, along with a Bayesian algorithm to generate a statistical estimate of the AS-level network. Our approach also gives us a way to evaluate the accuracy of existing reconstruction methods and to identify advantageous locations for new route collectors or vantage points.

preprint2022arXiv

Understanding the Power of Evolutionary Computation for GPU Code Optimization

Achieving high performance for GPU codes requires developers to have significant knowledge in parallel programming and GPU architectures, and in-depth understanding of the application. This combination makes it challenging to find performance optimizations for GPU-based applications, especially in scientific computing. This paper shows that significant speedups can be achieved on two quite different scientific workloads using the tool, GEVO, to improve performance over human-optimized GPU code. GEVO uses evolutionary computation to find code edits that improve the runtime of a multiple sequence alignment kernel and a SARS-CoV-2 simulation by 28.9% and 29% respectively. Further, when GEVO begins with an early, unoptimized version of the sequence alignment program, it finds an impressive 30 times speedup -- a performance improvement similar to that of the hand-tuned version. This work presents an in-depth analysis of the discovered optimizations, revealing that the primary sources of improvement vary across applications; that most of the optimizations generalize across GPU architectures; and that several of the most important optimizations involve significant code interdependencies. The results showcase the potential of automated program optimization tools to help reduce the optimization burden for scientific computing developers and enhance performance portability for domain-specific accelerators.

preprint2021arXiv

MIMOSA: Reducing Malware Analysis Overhead with Coverings

There is a growing body of malware samples that evade automated analysis and detection tools. Malware may measure fingerprints ("artifacts") of the underlying analysis tool or environment and change their behavior when artifacts are detected. While analysis tools can mitigate artifacts to reduce exposure, such concealment is expensive. However, not every sample checks for every type of artifact-analysis efficiency can be improved by mitigating only those artifacts most likely to be used by a sample. Using that insight, we propose MIMOSA, a system that identifies a small set of "covering" tool configurations that collectively defeat most malware samples with increased efficiency. MIMOSA identifies a set of tool configurations that maximize analysis throughput and detection accuracy while minimizing manual effort, enabling scalable automation to analyze stealthy malware. We evaluate our approach against a benchmark of 1535 labeled stealthy malware samples. Our approach increases analysis throughput over state of the art on over 95% of these samples. We also investigate cost-benefit tradeoffs between the fraction of successfully-analyzed samples and computing resources required. MIMOSA provides a practical, tunable method for efficiently deploying analysis resources.

preprint2020arXiv

GEVO: GPU Code Optimization using Evolutionary Computation

GPUs are a key enabler of the revolution in machine learning and high performance computing, functioning as de facto co-processors to accelerate large-scale computation. As the programming stack and tool support have matured, GPUs have also become accessible to programmers, who may lack detailed knowledge of the underlying architecture and fail to fully leverage the GPU's computation power. GEVO (Gpu optimization using EVOlutionary computation) is a tool for automatically discovering optimization opportunities and tuning the performance of GPU kernels in the LLVM representation. GEVO uses population-based search to find edits to GPU code compiled to LLVM-IR and improves performance on desired criteria while retaining required functionality. We demonstrate that GEVO improves the execution time of the GPU programs in the Rodinia benchmark suite and the machine learning models, SVM and ResNet18, on NVIDIA Tesla P100. For the Rodinia benchmarks, GEVO improves GPU kernel runtime performance by an average of 49.48% and by as much as 412% over the fully compiler-optimized baseline. If kernel output accuracy is relaxed to tolerate up to 1% error, GEVO can find kernel variants that outperform the baseline version by an average of 51.08%. For the machine learning workloads, GEVO achieves kernel performance improvement for SVM on the MNIST handwriting recognition (3.24X) and the a9a income prediction (2.93X) datasets with no loss of model accuracy. GEVO achieves 1.79X kernel performance improvement on image classification using ResNet18/CIFAR-10, with less than 1% model accuracy reduction.

preprint2012arXiv

Beyond the Blacklist: Modeling Malware Spread and the Effect of Interventions

Malware spread among websites and between websites and clients is an increasing problem. Search engines play an important role in directing users to websites and are a natural control point for intervening, using mechanisms such as blacklisting. The paper presents a simple Markov model of malware spread through large populations of websites and studies the effect of two interventions that might be deployed by a search provider: blacklisting infected web pages by removing them from search results entirely and a generalization of blacklisting, called depreferencing, in which a website's ranking is decreased by a fixed percentage each time period the site remains infected. We analyze and study the trade-offs between infection exposure and traffic loss due to false positives (the cost to a website that is incorrectly blacklisted) for different interventions. As expected, we find that interventions are most effective when websites are slow to remove infections. Surprisingly, we also find that low infection or recovery rates can increase traffic loss due to false positives. Our analysis also shows that heavy-tailed distributions of website popularity, as documented in many studies, leads to high sample variance of all measured outcomes. These result implies that it will be difficult to determine empirically whether certain website interventions are effective, and it suggests that theoretical models such as the one described in this paper have an important role to play in improving web security.

preprint2012arXiv

Internet Topology over Time

There are few studies that look closely at how the topology of the Internet evolves over time; most focus on snapshots taken at a particular point in time. In this paper, we investigate the evolution of the topology of the Autonomous Systems graph of the Internet, examining how eight commonly-used topological measures change from January 2002 to January 2010. We find that the distributions of most of the measures remain unchanged, except for average path length and clustering coefficient. The average path length has slowly and steadily increased since 2005 and the average clustering coefficient has steadily declined. We hypothesize that these changes are due to changes in peering policies as the Internet evolves. We also investigate a surprising feature, namely that the maximum degree has changed little, an aspect that cannot be captured without modeling link deletion. Our results suggest that evaluating models of the Internet graph by comparing steady-state generated topologies to snapshots of the real data is reasonable for many measures. However, accurately matching time-variant properties is more difficult, as we demonstrate by evaluating ten well-known models against the 2010 data.

preprint2012arXiv

Modeling Internet-Scale Policies for Cleaning up Malware

An emerging consensus among policy makers is that interventions undertaken by Internet Service Providers are the best way to counter the rising incidence of malware. However, assessing the suitability of countermeasures at this scale is hard. In this paper, we use an agent-based model, called ASIM, to investigate the impact of policy interventions at the Autonomous System level of the Internet. For instance, we find that coordinated intervention by the 0.2%-biggest ASes is more effective than uncoordinated efforts adopted by 30% of all ASes. Furthermore, countermeasures that block malicious transit traffic appear more effective than ones that block outgoing traffic. The model allows us to quantify and compare positive externalities created by different countermeasures. Our results give an initial indication of the types and levels of intervention that are most cost-effective at large scale.