Researcher profile

David Van Horn

David Van Horn contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
23works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

23 published item(s)

preprint2022arXiv

A Formal Model of Checked C

We present a formal model of Checked C, a dialect of C that aims to enforce spatial memory safety. Our model pays particular attention to the semantics of dynamically sized, potentially null-terminated arrays. We formalize this model in Coq, and prove that any spatial memory safety errors can be blamed on portions of the program labeled unchecked; this is a Checked C feature that supports incremental porting and backward compatibility. While our model's operational semantics uses annotated ("fat") pointers to enforce spatial safety, we show that such annotations can be safely erased: Using PLT Redex we formalize an executable version of our model and a compilation procedure from it to an untyped C-like language, and use randomized testing to validate that generated code faithfully simulates the original. Finally, we develop a custom random generator for well-typed and almost-well-typed terms in our Redex model, and use it to search for inconsistencies between our model and the Clang Checked C implementation. We find these steps to be a useful way to co-develop a language (Checked C is still in development) and a core model of it.

preprint2014arXiv

Abstracting Abstract Control (Extended)

The strength of a dynamic language is also its weakness: run-time flexibility comes at the cost of compile-time predictability. Many of the hallmarks of dynamic languages such as closures, continuations, various forms of reflection, and a lack of static types make many programmers rejoice, while compiler writers, tool developers, and verification engineers lament. The dynamism of these features simply confounds statically reasoning about programs that use them. Consequently, static analyses for dynamic languages are few, far between, and seldom sound. The "abstracting abstract machines" (AAM) approach to constructing static analyses has recently been proposed as a method to ameliorate the difficulty of designing analyses for such language features. The approach, so called because it derives a function for the sound and computable approximation of program behavior starting from the abstract machine semantics of a language, provides a viable approach to dynamic language analysis since all that is required is a machine description of the interpreter. The original AAM recipe produces finite state abstractions, which cannot faithfully represent an interpreter's control stack. Recent advances have shown that higher-order programs can be approximated with pushdown systems. However, these automata theoretic models either break down on features that inspect or modify the control stack. In this paper, we tackle the problem of bringing pushdown flow analysis to the domain of dynamic language features. We revise the abstracting abstract machines technique to target the stronger computational model of pushdown systems. In place of automata theory, we use only abstract machines and memoization. As case studies, we show the technique applies to a language with closures, garbage collection, stack-inspection, and first-class composable continuations.

preprint2014arXiv

Pushdown flow analysis with abstract garbage collection

In the static analysis of functional programs, pushdown flow analysis and abstract garbage collection push the boundaries of what we can learn about programs statically. This work illuminates and poses solutions to theoretical and practical challenges that stand in the way of combining the power of these techniques. Pushdown flow analysis grants unbounded yet computable polyvariance to the analysis of return-flow in higher-order programs. Abstract garbage collection grants unbounded polyvariance to abstract addresses which become unreachable between invocations of the abstract contexts in which they were created. Pushdown analysis solves the problem of precisely analyzing recursion in higher-order languages; abstract garbage collection is essential in solving the "stickiness" problem. Alone, our benchmarks demonstrate that each method can reduce analysis times and boost precision by orders of magnitude. We combine these methods. The challenge in marrying these techniques is not subtle: computing the reachable control states of a pushdown system relies on limiting access during transition to the top of the stack; abstract garbage collection, on the other hand, needs full access to the entire stack to compute a root set, just as concrete collection does. Conditional pushdown systems were developed for just such a conundrum, but existing methods are ill-suited for the dynamic nature of garbage collection. We show fully precise and approximate solutions to the feasible paths problem for pushdown garbage-collecting control-flow analysis. Experiments reveal synergistic interplay between garbage collection and pushdown techniques, and the fusion demonstrates "better-than-both-worlds" precision.

preprint2014arXiv

Soft Contract Verification

Behavioral software contracts are a widely used mechanism for governing the flow of values between components. However, run-time monitoring and enforcement of contracts imposes significant overhead and delays discovery of faulty components to run-time. To overcome these issues, we present soft contract verification, which aims to statically prove either complete or partial contract correctness of components, written in an untyped, higher-order language with first-class contracts. Our approach uses higher-order symbolic execution, leveraging contracts as a source of symbolic values including unknown behavioral values, and employs an updatable heap of contract invariants to reason about flow-sensitive facts. We prove the symbolic execution soundly approximates the dynamic semantics and that verified programs can't be blamed. The approach is able to analyze first-class contracts, recursive data structures, unknown functions, and control-flow-sensitive refinements of values, which are all idiomatic in dynamic languages. It makes effective use of an off-the-shelf solver to decide problems without heavy encodings. The approach is competitive with a wide range of existing tools---including type systems, flow analyzers, and model checkers---on their own benchmarks.

preprint2013arXiv

AnaDroid: Malware Analysis of Android with User-supplied Predicates

Today's mobile platforms provide only coarse-grained permissions to users with regard to how third- party applications use sensitive private data. Unfortunately, it is easy to disguise malware within the boundaries of legitimately-granted permissions. For instance, granting access to "contacts" and "internet" may be necessary for a text-messaging application to function, even though the user does not want contacts transmitted over the internet. To understand fine-grained application use of permissions, we need to statically analyze their behavior. Even then, malware detection faces three hurdles: (1) analyses may be prohibitively expensive, (2) automated analyses can only find behaviors that they are designed to find, and (3) the maliciousness of any given behavior is application-dependent and subject to human judgment. To remedy these issues, we propose semantic-based program analysis, with a human in the loop as an alternative approach to malware detection. In particular, our analysis allows analyst-crafted semantic predicates to search and filter analysis results. Human-oriented semantic-based program analysis can systematically, quickly and concisely characterize the behaviors of mobile applications. We describe a tool that provides analysts with a library of the semantic predicates and the ability to dynamically trade speed and precision. It also provides analysts the ability to statically inspect details of every suspicious state of (abstract) execution in order to make a ruling as to whether or not the behavior is truly malicious with respect to the intent of the application. In addition, permission and profiling reports are generated to aid analysts in identifying common malicious behaviors.

preprint2013arXiv

Flow analysis, linearity, and PTIME

Flow analysis is a ubiquitous and much-studied component of compiler technology---and its variations abound. Amongst the most well known is Shivers' 0CFA; however, the best known algorithm for 0CFA requires time cubic in the size of the analyzed program and is unlikely to be improved. Consequently, several analyses have been designed to approximate 0CFA by trading precision for faster computation. Henglein's simple closure analysis, for example, forfeits the notion of directionality in flows and enjoys an "almost linear" time algorithm. But in making trade-offs between precision and complexity, what has been given up and what has been gained? Where do these analyses differ and where do they coincide? We identify a core language---the linear $λ$-calculus---where 0CFA, simple closure analysis, and many other known approximations or restrictions to 0CFA are rendered identical. Moreover, for this core language, analysis corresponds with (instrumented) evaluation. Because analysis faithfully captures evaluation, and because the linear $λ$-calculus is complete for PTIME, we derive PTIME-completeness results for all of these analyses.

preprint2013arXiv

Optimizing Abstract Abstract Machines

The technique of abstracting abstract machines (AAM) provides a systematic approach for deriving computable approximations of evaluators that are easily proved sound. This article contributes a complementary step-by-step process for subsequently going from a naive analyzer derived under the AAM approach, to an efficient and correct implementation. The end result of the process is a two to three order-of-magnitude improvement over the systematically derived analyzer, making it competitive with hand-optimized implementations that compute fundamentally less precise results.

preprint2013arXiv

Pushdown Exception-Flow Analysis of Object-Oriented Programs

Statically reasoning in the presence of and about exceptions is challenging: exceptions worsen the well-known mutual recursion between data-flow and control-flow analysis. The recent development of pushdown control-flow analysis for the λ-calculus hints at a way to improve analysis of exceptions: a pushdown stack can precisely match catches to throws in the same way it matches returns to calls. This work generalizes pushdown control-flow analysis to object-oriented programs and to exceptions. Pushdown analysis of exceptions improves precision over the next best analysis, Bravenboer and Smaragdakis's Doop, by orders of magnitude. By then generalizing abstract garbage collection to object-oriented programs, we reduce analysis time by half over pure pushdown analysis. We evaluate our implementation for Dalvik bytecode on standard benchmarks as well as several Android applications.

preprint2013arXiv

Resolving and Exploiting the $k$-CFA Paradox

Low-level program analysis is a fundamental problem, taking the shape of "flow analysis" in functional languages and "points-to" analysis in imperative and object-oriented languages. Despite the similarities, the vocabulary and results in the two communities remain largely distinct, with limited cross-understanding. One of the few links is Shivers's $k$-CFA work, which has advanced the concept of "context-sensitive analysis" and is widely known in both communities. Recent results indicate that the relationship between the functional and object-oriented incarnations of $k$-CFA is not as well understood as thought. Van Horn and Mairson proved $k$-CFA for $k \geq 1$ to be EXPTIME-complete; hence, no polynomial-time algorithm can exist. Yet, there are several polynomial-time formulations of context-sensitive points-to analyses in object-oriented languages. Thus, it seems that functional $k$-CFA may actually be a profoundly different analysis from object-oriented $k$-CFA. We resolve this paradox by showing that the exact same specification of $k$-CFA is polynomial-time for object-oriented languages yet exponential- time for functional ones: objects and closures are subtly different, in a way that interacts crucially with context-sensitivity and complexity. This illumination leads to an immediate payoff: by projecting the object-oriented treatment of objects onto closures, we derive a polynomial-time hierarchy of context-sensitive CFAs for functional programs.

preprint2013arXiv

Sound and Precise Malware Analysis for Android via Pushdown Reachability and Entry-Point Saturation

We present Anadroid, a static malware analysis framework for Android apps. Anadroid exploits two techniques to soundly raise precision: (1) it uses a pushdown system to precisely model dynamically dispatched interprocedural and exception-driven control-flow; (2) it uses Entry-Point Saturation (EPS) to soundly approximate all possible interleavings of asynchronous entry points in Android applications. (It also integrates static taint-flow analysis and least permissions analysis to expand the class of malicious behaviors which it can catch.) Anadroid provides rich user interface support for human analysts which must ultimately rule on the &#34;maliciousness&#34; of a behavior. To demonstrate the effectiveness of Anadroid&#39;s malware analysis, we had teams of analysts analyze a challenge suite of 52 Android applications released as part of the Auto- mated Program Analysis for Cybersecurity (APAC) DARPA program. The first team analyzed the apps using a ver- sion of Anadroid that uses traditional (finite-state-machine-based) control-flow-analysis found in existing malware analysis tools; the second team analyzed the apps using a version of Anadroid that uses our enhanced pushdown-based control-flow-analysis. We measured machine analysis time, human analyst time, and their accuracy in flagging malicious applications. With pushdown analysis, we found statistically significant (p < 0.05) decreases in time: from 85 minutes per app to 35 minutes per app in human plus machine analysis time; and statistically significant (p < 0.05) increases in accuracy with the pushdown-driven analyzer: from 71% correct identification to 95% correct identification.

preprint2013arXiv

The Complexity of Flow Analysis in Higher-Order Languages

This dissertation proves lower bounds on the inherent difficulty of deciding flow analysis problems in higher-order programming languages. We give exact characterizations of the computational complexity of 0CFA, the $k$CFA hierarchy, and related analyses. In each case, we precisely capture both the expressiveness and feasibility of the analysis, identifying the elements responsible for the trade-off. 0CFA is complete for polynomial time. This result relies on the insight that when a program is linear (each bound variable occurs exactly once), the analysis makes no approximation; abstract and concrete interpretation coincide, and therefore pro- gram analysis becomes evaluation under another guise. Moreover, this is true not only for 0CFA, but for a number of further approximations to 0CFA. In each case, we derive polynomial time completeness results. For any $k > 0$, $k$CFA is complete for exponential time. Even when $k = 1$, the distinction in binding contexts results in a limited form of closures, which do not occur in 0CFA. This theorem validates empirical observations that $k$CFA is intractably slow for any $k > 0$. There is, in the worst case---and plausibly, in practice---no way to tame the cost of the analysis. Exponential time is required. The empirically observed intractability of this analysis can be understood as being inherent in the approximation problem being solved, rather than reflecting unfortunate gaps in our programming abilities.

preprint2012arXiv

Higher-Order Symbolic Execution via Contracts

We present a new approach to automated reasoning about higher-order programs by extending symbolic execution to use behavioral contracts as symbolic values, enabling symbolic approximation of higher-order behavior. Our approach is based on the idea of an abstract reduction semantics that gives an operational semantics to programs with both concrete and symbolic components. Symbolic components are approximated by their contract and our semantics gives an operational interpretation of contracts-as-values. The result is a executable semantics that soundly predicts program behavior, including contract failures, for all possible instantiations of symbolic components. We show that our approach scales to an expressive language of contracts including arbitrary programs embedded as predicates, dependent function contracts, and recursive contracts. Supporting this feature-rich language of specifications leads to powerful symbolic reasoning using existing program assertions. We then apply our approach to produce a verifier for contract correctness of components, including a sound and computable approximation to our semantics that facilitates fully automated contract verification. Our implementation is capable of verifying contracts expressed in existing programs, and of justifying valuable contract-elimination optimizations.

preprint2012arXiv

Introspective Pushdown Analysis of Higher-Order Programs

In the static analysis of functional programs, pushdown flow analysis and abstract garbage collection skirt just inside the boundaries of soundness and decidability. Alone, each method reduces analysis times and boosts precision by orders of magnitude. This work illuminates and conquers the theoretical challenges that stand in the way of combining the power of these techniques. The challenge in marrying these techniques is not subtle: computing the reachable control states of a pushdown system relies on limiting access during transition to the top of the stack; abstract garbage collection, on the other hand, needs full access to the entire stack to compute a root set, just as concrete collection does. \emph{Introspective} pushdown systems resolve this conflict. Introspective pushdown systems provide enough access to the stack to allow abstract garbage collection, but they remain restricted enough to compute control-state reachability, thereby enabling the sound and precise product of pushdown analysis and abstract garbage collection. Experiments reveal synergistic interplay between the techniques, and the fusion demonstrates &#34;better-than-both-worlds&#34; precision.

preprint2011arXiv

A family of abstract interpretations for static analysis of concurrent higher-order programs

We develop a framework for computing two foundational analyses for concurrent higher-order programs: (control-)flow analysis (CFA) and may-happen-in-parallel analysis (MHP). We pay special attention to the unique challenges posed by the unrestricted mixture of first-class continuations and dynamically spawned threads. To set the stage, we formulate a concrete model of concurrent higher-order programs: the P(CEK*)S machine. We find that the systematic abstract interpretation of this machine is capable of computing both flow and MHP analyses. Yet, a closer examination finds that the precision for MHP is poor. As a remedy, we adapt a shape analytic technique-singleton abstraction-to dynamically spawned threads (as opposed to objects in the heap). We then show that if MHP analysis is not of interest, we can substantially accelerate the computation of flow analysis alone by collapsing thread interleavings with a second layer of abstraction.

preprint2011arXiv

Abstracting Abstract Machines: A Systematic Approach to Higher-Order Program Analysis

Predictive models are fundamental to engineering reliable software systems. However, designing conservative, computable approximations for the behavior of programs (static analyses) remains a difficult and error-prone process for modern high-level programming languages. What analysis designers need is a principled method for navigating the gap between semantics and analytic models: analysis designers need a method that tames the interaction of complex languages features such as higher-order functions, recursion, exceptions, continuations, objects and dynamic allocation. We contribute a systematic approach to program analysis that yields novel and transparently sound static analyses. Our approach relies on existing derivational techniques to transform high-level language semantics into low-level deterministic state-transition systems (with potentially infinite state spaces). We then perform a series of simple machine refactorings to obtain a sound, computable approximation, which takes the form of a non-deterministic state-transition systems with finite state spaces. The approach scales up uniformly to enable program analysis of realistic language features, including higher-order functions, tail calls, conditionals, side effects, exceptions, first-class continuations, and even garbage collection.

preprint2011arXiv

Pushdown Abstractions of JavaScript

We design a family of program analyses for JavaScript that make no approximation in matching calls with returns, exceptions with handlers, and breaks with labels. We do so by starting from an established reduction semantics for JavaScript and systematically deriving its intensional abstract interpretation. Our first step is to transform the semantics into an equivalent low-level abstract machine: the JavaScript Abstract Machine (JAM). We then give an infinite-state yet decidable pushdown machine whose stack precisely models the structure of the concrete program stack. The precise model of stack structure in turn confers precise control-flow analysis even in the presence of control effects, such as exceptions and finally blocks. We give pushdown generalizations of traditional forms of analysis such as k-CFA, and prove the pushdown framework for abstract interpretation is sound and computable.

preprint2011arXiv

Semantic Solutions to Program Analysis Problems

Problems in program analysis can be solved by developing novel program semantics and deriving abstractions conventionally. For over thirty years, higher-order program analysis has been sold as a hard problem. Its solutions have required ingenuity and complex models of approximation. We claim that this difficulty is due to premature focus on abstraction and propose a new approach that emphasizes semantics. Its simplicity enables new analyses that are beyond the current state of the art.

preprint2011arXiv

Systematic Abstraction of Abstract Machines

We describe a derivational approach to abstract interpretation that yields novel and transparently sound static analyses when applied to well-established abstract machines for higher-order and imperative programming languages. To demonstrate the technique and support our claim, we transform the CEK machine of Felleisen and Friedman, a lazy variant of Krivine&#39;s machine, and the stack-inspecting CM machine of Clements and Felleisen into abstract interpretations of themselves. The resulting analyses bound temporal ordering of program events; predict return-flow and stack-inspection behavior; and approximate the flow and evaluation of by-need parameters. For all of these machines, we find that a series of well-known concrete machine refactorings, plus a technique of store-allocated continuations, leads to machines that abstract into static analyses simply by bounding their stores. We demonstrate that the technique scales up uniformly to allow static analysis of realistic language features, including tail calls, conditionals, side effects, exceptions, first-class continuations, and even garbage collection. In order to close the gap between formalism and implementation, we provide translations of the mathematics as running Haskell code for the initial development of our method.

preprint2010arXiv

Abstracting Abstract Machines

We describe a derivational approach to abstract interpretation that yields novel and transparently sound static analyses when applied to well-established abstract machines. To demonstrate the technique and support our claim, we transform the CEK machine of Felleisen and Friedman, a lazy variant of Krivine&#39;s machine, and the stack-inspecting CM machine of Clements and Felleisen into abstract interpretations of themselves. The resulting analyses bound temporal ordering of program events; predict return-flow and stack-inspection behavior; and approximate the flow and evaluation of by-need parameters. For all of these machines, we find that a series of well-known concrete machine refactorings, plus a technique we call store-allocated continuations, leads to machines that abstract into static analyses simply by bounding their stores. We demonstrate that the technique scales up uniformly to allow static analysis of realistic language features, including tail calls, conditionals, side effects, exceptions, first-class continuations, and even garbage collection.

preprint2010arXiv

Evaluating Call-By-Need on the Control Stack

Ariola and Felleisen&#39;s call-by-need λ-calculus replaces a variable occurrence with its value at the last possible moment. To support this gradual notion of substitution, function applications-once established-are never discharged. In this paper we show how to translate this notion of reduction into an abstract machine that resolves variable references via the control stack. In particular, the machine uses the static address of a variable occurrence to extract its current value from the dynamic control stack.

preprint2010arXiv

Pushdown Control-Flow Analysis of Higher-Order Programs

Context-free approaches to static analysis gain precision over classical approaches by perfectly matching returns to call sites---a property that eliminates spurious interprocedural paths. Vardoulakis and Shivers&#39;s recent formulation of CFA2 showed that it is possible (if expensive) to apply context-free methods to higher-order languages and gain the same boost in precision achieved over first-order programs. To this young body of work on context-free analysis of higher-order programs, we contribute a pushdown control-flow analysis framework, which we derive as an abstract interpretation of a CESK machine with an unbounded stack. One instantiation of this framework marks the first polyvariant pushdown analysis of higher-order programs; another marks the first polynomial-time analysis. In the end, we arrive at a framework for control-flow analysis that can efficiently compute pushdown generalizations of classical control-flow analyses.

preprint2010arXiv

Stack-Summarizing Control-Flow Analysis of Higher-Order Programs

Two sinks drain precision from higher-order flow analyses: (1) merging of argument values upon procedure call and (2) merging of return values upon procedure return. To combat the loss of precision, these two sinks have been addressed independently. In the case of procedure calls, abstract garbage collection reduces argument merging; while in the case of procedure returns, context-free approaches eliminate return value merging. It is natural to expect a combined analysis could enjoy the mutually beneficial interaction between the two approaches. The central contribution of this work is a direct product of abstract garbage collection with context-free analysis. The central challenge to overcome is the conflict between the core constraint of a pushdown system and the needs of garbage collection: a pushdown system can only see the top of the stack, yet garbage collection needs to see the entire stack during a collection. To make the direct product computable, we develop &#34;stack summaries,&#34; a method for tracking stack properties at each control state in a pushdown analysis of higher-order programs.