Source author record

Isil Dillig

Isil Dillig appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Programming Languages Computation and Language Software Engineering Databases Human-Computer Interaction Machine Learning Operating Systems

Catalog footprint

What is connected

12works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Automated Transpilation of Imperative to Functional Code using Neural-Guided Program Synthesis (Extended Version)

While many mainstream languages such as Java, Python, and C# increasingly incorporate functional APIs to simplify programming and improve parallelization/performance, there are no effective techniques that can be used to automatically translate existing imperative code to functional variants using these APIs. Motivated by this problem, this paper presents a transpilation approach based on inductive program synthesis for modernizing existing code. Our method is based on the observation that the overwhelming majority of source/target programs in this setting satisfy an assumption that we call trace-compatibility: not only do the programs share syntactically identical low-level expressions, but these expressions also take the same values in corresponding execution traces. Our method leverages this observation to design a new neural-guided synthesis algorithm that (1) uses a novel neural architecture called cognate grammar network (CGN) and (2) leverages a form of concolic execution to prune partial programs based on intermediate values that arise during a computation. We have implemented our approach in a tool called NGST2 and use it to translate imperative Java and Python code to functional variants that use the Stream and functools APIs respectively. Our experiments show that NGST2 significantly outperforms several baselines and that our proposed neural architecture and pruning techniques are vital for achieving good results.

preprint2022arXiv

Finding and Analyzing Crash-Consistency Bugs in Persistent-Memory File Systems

We present a study of crash-consistency bugs in persistent-memory (PM) file systems and analyze their implications for file-system design and testing crash consistency. We develop FlyTrap, a framework to test PM file systems for crash-consistency bugs. FlyTrap discovered 18 new bugs across four PM file systems; the bugs have been confirmed by developers and many have been already fixed. The discovered bugs have serious consequences such as breaking the atomicity of rename or making the file system unmountable. We present a detailed study of the bugs we found and discuss some important lessons from these observations. For instance, one of our findings is that many of the bugs are due to logic errors, rather than errors in using flushes or fences; this has important applications for future work on testing PM file systems. Another key finding is that many bugs arise from attempts to improve efficiency by performing metadata updates in-place and that recovery code that deals with rebuilding in-DRAM state is a significant source of bugs. These observations have important implications for designing and testing PM file systems. Our code is available at https://github.com/utsaslab/flytrap .

preprint2022arXiv

Synthesizing Fine-Grained Synchronization Protocols for Implicit Monitors (Extended Version)

A monitor is a widely-used concurrent programming abstraction that encapsulates all shared state between threads. Monitors can be classified as being either implicit or explicit depending on the primitives they provide. Implicit monitors are much easier to program but typically not as efficient. To address this gap, there has been recent research on automatically synthesizing explicit-signal monitors from an implicit specification, but prior work does not exploit all paralellization opportunities due to the use of a single lock for the entire monitor. This paper presents a new technique for synthesizing fine-grained explicit-synchronization protocols from implicit monitors. Our method is based on two key innovations: First, we present a new static analysis for inferring safe interleavings that allow violating mutual exclusion of monitor operations without changing its semantics. Second, we use the results of this static analysis to generate a MaxSAT instance whose models correspond to correct-by-construction synchronization protocols. We have implemented our approach in a tool called Cortado and evaluate it on monitors that contain parallelization opportunities. Our evaluation shows that Cortado can synthesize synchronization policies that are competitive with, or even better than, expert-written ones on these benchmarks.

preprint2022arXiv

Type-Directed Synthesis of Visualizations from Natural Language Queries

We propose a new technique based on program synthesis for automatically generating visualizations from natural language queries. Our method parses the natural language query into a refinement type specification using the intents-and-slots paradigm and leverages type-directed synthesis to generate a set of visualization programs that are most likely to meet the user's intent. Our refinement type system captures useful hints present in the natural language query and allows the synthesis algorithm to reject visualizations that violate well-established design guidelines for the input data set. We have implemented our ideas in a tool called Graphy and evaluated it on NLVCorpus, which consists of 3 popular datasets and over 700 real-world natural language queries. Our experiments show that Graphy significantly outperforms state-of-the-art natural-language-based visualization tools, including transformer and rule-based ones.

preprint2021arXiv

Falx: Synthesis-Powered Visualization Authoring

Modern visualization tools aim to allow data analysts to easily create exploratory visualizations. When the input data layout conforms to the visualization design, users can easily specify visualizations by mapping data columns to visual channels of the design. However, when there is a mismatch between data layout and the design, users need to spend significant effort on data transformation. We propose Falx, a synthesis-powered visualization tool that allows users to specify visualizations in a similarly simple way but without needing to worry about data layout. In Falx, users specify visualizations using examples of how concrete values in the input are mapped to visual channels, and Falx automatically infers the visualization specification and transforms the data to match the design. In a study with 33 data analysts on four visualization tasks involving data transformation, we found that users can effectively adopt Falx to create visualizations they otherwise cannot implement.

preprint2020arXiv

Benchmarking Multimodal Regex Synthesis with Complex Structures

Existing datasets for regular expression (regex) generation from natural language are limited in complexity; compared to regex tasks that users post on StackOverflow, the regexes in these datasets are simple, and the language used to describe them is not diverse. We introduce StructuredRegex, a new regex synthesis dataset differing from prior ones in three aspects. First, to obtain structurally complex and realistic regexes, we generate the regexes using a probabilistic grammar with pre-defined macros observed from real-world StackOverflow posts. Second, to obtain linguistically diverse natural language descriptions, we show crowdworkers abstract depictions of the underlying regex and ask them to describe the pattern they see, rather than having them paraphrase synthetic language. Third, we augment each regex example with a collection of strings that are and are not matched by the ground truth regex, similar to how real users give examples. Our quantitative and qualitative analysis demonstrates the advantages of StructuredRegex over prior datasets. Further experimental results using various multimodal synthesis techniques highlight the challenge presented by our dataset, including non-local constraints and multi-modal inputs.

preprint2020arXiv

Data Migration using Datalog Program Synthesis

This paper presents a new technique for migrating data between different schemas. Our method expresses the schema mapping as a Datalog program and automatically synthesizes a Datalog program from simple input-output examples to perform data migration. This approach can transform data between different types of schemas (e.g., relational-to-graph, document-to-relational) and performs synthesis efficiently by leveraging the semantics of Datalog. We implement the proposed technique as a tool called Dynamite and show its effectiveness by evaluating Dynamite on 28 realistic data migration scenarios.

preprint2020arXiv

LambdaNet: Probabilistic Type Inference using Graph Neural Networks

As gradual typing becomes increasingly popular in languages like Python and TypeScript, there is a growing need to infer type annotations automatically. While type annotations help with tasks like code completion and static error catching, these annotations cannot be fully determined by compilers and are tedious to annotate by hand. This paper proposes a probabilistic type inference scheme for TypeScript based on a graph neural network. Our approach first uses lightweight source code analysis to generate a program abstraction called a type dependency graph, which links type variables with logical constraints as well as name and usage information. Given this program abstraction, we then use a graph neural network to propagate information between related type variables and eventually make type predictions. Our neural architecture can predict both standard types, like number or string, as well as user-defined types that have not been encountered during training. Our experimental results show that our approach outperforms prior work in this space by $14\%$ (absolute) on library types, while having the ability to make type predictions that are out of scope for existing techniques.

preprint2020arXiv

Multi-modal Synthesis of Regular Expressions

In this paper, we propose a multi-modal synthesis technique for automatically constructing regular expressions (regexes) from a combination of examples and natural language. Using multiple modalities is useful in this context because natural language alone is often highly ambiguous, whereas examples in isolation are often not sufficient for conveying user intent. Our proposed technique first parses the English description into a so-called hierarchical sketch that guides our programming-by-example (PBE) engine. Since the hierarchical sketch captures crucial hints, the PBE engine can leverage this information to both prioritize the search as well as make useful deductions for pruning the search space. We have implemented the proposed technique in a tool called Regel and evaluate it on over three hundred regexes. Our evaluation shows that Regel achieves 80% accuracy whereas the NLP-only and PBE-only baselines achieve 43% and 26% respectively. We also compare our proposed PBE engine against an adaptation of AlphaRegex, a state-of-the-art regex synthesis tool, and show that our proposed PBE engine is an order of magnitude faster, even if we adapt the search algorithm of AlphaRegex to leverage the sketch. Finally, we conduct a user study involving 20 participants and show that users are twice as likely to successfully come up with the desired regex using Regel compared to without it.

preprint2020arXiv

Sketch-Driven Regular Expression Generation from Natural Language and Examples

Recent systems for converting natural language descriptions into regular expressions (regexes) have achieved some success, but typically deal with short, formulaic text and can only produce simple regexes. Realworld regexes are complex, hard to describe with brief sentences, and sometimes require examples to fully convey the user's intent. We present a framework for regex synthesis in this setting where both natural language (NL) and examples are available. First, a semantic parser (either grammar-based or neural) maps the natural language description into an intermediate sketch, which is an incomplete regex containing holes to denote missing components. Then a program synthesizer searches over the regex space defined by the sketch and finds a regex that is consistent with the given string examples. Our semantic parser can be trained purely from weak supervision based on correctness of the synthesized regex, or it can leverage heuristically-derived sketches. We evaluate on two prior datasets (Kushman and Barzilay, 2013; Locascio et al., 2016) and a real-world dataset from Stack Overflow. Our system achieves state-of-the-art performance on the prior datasets and solves 57% of the real-world dataset, which existing neural systems completely fail on.

preprint2016arXiv

Component-based Synthesis of Table Consolidation and Transformation Tasks from Examples

This paper presents an example-driven synthesis technique for automating a large class of data preparation tasks that arise in data science. Given a set of input tables and an out- put table, our approach synthesizes a table transformation program that performs the desired task. Our approach is not restricted to a fixed set of DSL constructs and can synthesize programs from an arbitrary set of components, including higher-order combinators. At a high-level, our approach performs type-directed enumerative search over partial pro- grams but incorporates two key innovations that allow it to scale: First, our technique can utilize any first-order specification of the components and uses SMT-based deduction to reject partial programs. Second, our algorithm uses partial evaluation to increase the power of deduction and drive enumerative search. We have evaluated our synthesis algorithm on dozens of data preparation tasks obtained from on-line forums, and we show that our approach can automatically solve a large class of problems encountered by R users.

preprint2016arXiv

Type-Directed Code Reuse using Integer Linear Programming

In many common scenarios, programmers need to implement functionality that is already provided by some third party library. This paper presents a tool called Hunter that facilitates code reuse by finding relevant methods in large code bases and automatically synthesizing any necessary wrapper code. The key technical idea underlying our approach is to use types to both improve search results and guide synthesis. Specifically, our method computes similarity metrics between types and uses this information to solve an integer linear programming (ILP) problem in which the objective is to minimize the cost of synthesis. We have implemented Hunter as an Eclipse plug-in and evaluate it by (a) comparing it against S6, a state-of-the-art code reuse tool, and (b) performing a user study. Our evaluation shows that Hunter compares favorably with S6 and significantly increases programmer productivity.

Isil Dillig

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Automated Transpilation of Imperative to Functional Code using Neural-Guided Program Synthesis (Extended Version)

Finding and Analyzing Crash-Consistency Bugs in Persistent-Memory File Systems

Synthesizing Fine-Grained Synchronization Protocols for Implicit Monitors (Extended Version)

Type-Directed Synthesis of Visualizations from Natural Language Queries

Falx: Synthesis-Powered Visualization Authoring

Benchmarking Multimodal Regex Synthesis with Complex Structures

Data Migration using Datalog Program Synthesis

LambdaNet: Probabilistic Type Inference using Graph Neural Networks

Multi-modal Synthesis of Regular Expressions

Sketch-Driven Regular Expression Generation from Natural Language and Examples

Component-based Synthesis of Table Consolidation and Transformation Tasks from Examples

Type-Directed Code Reuse using Integer Linear Programming