Source author record

Armando Solar-Lezama

Armando Solar-Lezama appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Programming Languages Artificial Intelligence Machine Learning Logic in Computer Science Databases Software Engineering Computation and Language Computer Vision cs.CY Graphics Human-Computer Interaction physics.soc-ph Social and Information Networks Systems and Control

Catalog footprint

What is connected

24works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MeMo: Memory as a Model

Large language models (LLMs) achieve strong performance across a wide range of tasks, but remain frozen after pretraining until subsequent updates. Many real-world applications require timely, domain-specific information, motivating the need for efficient mechanisms to incorporate new knowledge. In this paper, we introduce MeMo (Memory as a Model), a modular framework that encodes new knowledge into a dedicated memory model while keeping the LLM parameters unchanged. Compared to existing methods, MeMo offers several advantages: (a) it captures complex cross-document relationships, (b) it is robust to retrieval noise, (c) it avoids catastrophic forgetting in the LLM, (d) it does not require access to the LLM's weights or output logits, enabling plug-and-play integration with both open and proprietary closed-source LLMs, and (e) its retrieval cost is independent of corpus size at inference time. Our experimental results on three benchmarks, BrowseComp-Plus, NarrativeQA, and MuSiQue, show that MeMo achieves strong performance compared to existing methods across diverse settings.

preprint2024arXiv

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

We present CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation), a benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an input-output pair, leading to two natural tasks: input prediction and output prediction. First, we propose a generic recipe for generating our execution benchmark which can be used to create future variation of the benchmark. Second, we evaluate twenty code models on our benchmark and discover that many recent high-scoring models on HumanEval do not show the same improvements on our benchmark. Third, we show that simple CoT and fine-tuning schemes can improve performance on our benchmark but remain far from solving it. The best setup, GPT-4 with chain of thought (CoT), achieves a pass@1 of 75% and 81% on input and output prediction, respectively. In contrast, Code Llama 34B achieves a pass@1 of 50% and 46% on input and output prediction, highlighting the gap between open and closed source models. As no model is close to acing CRUXEval, we provide examples of consistent GPT-4 failures on simple programs as a lens into its code reasoning capabilities and areas for improvement.

preprint2023arXiv

Top-Down Synthesis for Library Learning

This paper introduces corpus-guided top-down synthesis as a mechanism for synthesizing library functions that capture common functionality from a corpus of programs in a domain specific language (DSL). The algorithm builds abstractions directly from initial DSL primitives, using syntactic pattern matching of intermediate abstractions to intelligently prune the search space and guide the algorithm towards abstractions that maximally capture shared structures in the corpus. We present an implementation of the approach in a tool called Stitch and evaluate it against the state-of-the-art deductive library learning algorithm from DreamCoder. Our evaluation shows that Stitch is 3-4 orders of magnitude faster and uses 2 orders of magnitude less memory while maintaining comparable or better library quality (as measured by compressivity). We also demonstrate Stitch's scalability on corpora containing hundreds of complex programs that are intractable with prior deductive approaches and show empirically that it is robust to terminating the search procedure early -- further allowing it to scale to challenging datasets by means of early stopping.

preprint2022arXiv

Automatically Deriving Control-Flow Graph Generators from Operational Semantics

We develop the first theory of control-flow graphs from first principles, and use it to create an algorithm for automatically synthesizing many variants of control-flow graph generators from a language's operational semantics. Our approach first introduces a new algorithm for converting a large class of small-step operational semantics to an abstract machine. It next uses a technique called "abstract rewriting" to automatically abstract the semantics of a language, which is used both to directly generate a CFG from a program ("interpreted mode") and to generate standalone code, similar to a human-written CFG generator, for any program in a language. We show how the choice of two abstraction and projection parameters allow our approach to synthesize several families of CFG-generators useful for different kinds of tools. We prove the correspondence between the generated graphs and the original semantics. We provide and prove an algorithm for automatically proving the termination of interpreted-mode generators. In addition to our theoretical results, we have implemented this algorithm in a tool called Mandate, and show that it produces human-readable code on two medium-size languages with 60-80 rules, featuring nearly all intraprocedural control constructs common in modern languages. We then showed these CFG-generators were sufficient to build two static analyzers atop them. Our work is a promising step towards the grand vision of being able to synthesize all desired tools from the semantics of a programming language.

preprint2022arXiv

JoinABLe: Learning Bottom-up Assembly of Parametric CAD Joints

Physical products are often complex assemblies combining a multitude of 3D parts modeled in computer-aided design (CAD) software. CAD designers build up these assemblies by aligning individual parts to one another using constraints called joints. In this paper we introduce JoinABLe, a learning-based method that assembles parts together to form joints. JoinABLe uses the weak supervision available in standard parametric CAD files without the help of object class labels or human guidance. Our results show that by making network predictions over a graph representation of solid models we can outperform multiple baseline methods with an accuracy (79.53%) that approaches human performance (80%). Finally, to support future research we release the Fusion 360 Gallery assembly dataset, containing assemblies with rich information on joints, contact surfaces, holes, and the underlying assembly graph structure.

preprint2022arXiv

Searching Entangled Program Spaces

Many problem domains, including program synthesis and rewrite-based optimization, require searching astronomically large spaces of programs. Existing approaches often rely on building specialized data structures -- version-space algebras, finite tree automata, or e-graphs -- to compactly represent these programs. To find a compact representation, existing data structures exploit independence of subterms; they blow up when the choices of subterms are entangled. We introduce equality-constrained tree automata (ECTAs), a generalization of the three aforementioned data structures that can efficiently represent large spaces of programs with entangled subterms. We present efficient algorithms for extracting programs from ECTAs, implemented in a performant Haskell library, \texttt{ecta}. Using \texttt{ecta} we construct \textsc{Hectare}, a type-driven program synthesizer for Haskell. \textsc{Hectare} significantly outperforms a state-of-the-art synthesizer Hoogle+ -- providing an average speedup of 8x -- despite its implementation being an order of magnitude smaller.

preprint2020arXiv

Computer-Aided Personalized Education

The shortage of people trained in STEM fields is becoming acute, and universities and colleges are straining to satisfy this demand. In the case of computer science, for instance, the number of US students taking introductory courses has grown three-fold in the past decade. Recently, massive open online courses (MOOCs) have been promoted as a way to ease this strain. This at best provides access to education. The bigger challenge though is coping with heterogeneous backgrounds of different students, retention, providing feedback, and assessment. Personalized education relying on computational tools can address this challenge. While automated tutoring has been studied at different times in different communities, recent advances in computing and education technology offer exciting opportunities to transform the manner in which students learn. In particular, at least three trends are significant. First, progress in logical reasoning, data analytics, and natural language processing has led to tutoring tools for automatic assessment, personalized instruction including targeted feedback, and adaptive content generation for a variety of subjects. Second, research in the science of learning and human-computer interaction is leading to a better understanding of how different students learn, when and what types of interventions are effective for different instructional goals, and how to measure the success of educational tools. Finally, the recent emergence of online education platforms, both in academia and industry, is leading to new opportunities for the development of a shared infrastructure. This CCC workshop brought together researchers developing educational tools based on technologies such as logical reasoning and machine learning with researchers in education, human-computer interaction, and cognitive psychology.

preprint2020arXiv

Deductive Optimization of Relational Data Storage

Optimizing the physical data storage and retrieval of data are two key database management problems. In this paper, we propose a language that can express a wide range of physical database layouts, going well beyond the row- and column-based methods that are widely used in database management systems. We use deductive synthesis to turn a high-level relational representation of a database query into a highly optimized low-level implementation which operates on a specialized layout of the dataset. We build a compiler for this language and conduct experiments using a popular database benchmark, which shows that the performance of these specialized queries is competitive with a state-of-the-art in memory compiled database system.

preprint2020arXiv

DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning

Expert problem-solving is driven by powerful languages for thinking about problems and their solutions. Acquiring expertise means learning these languages -- systems of concepts, alongside the skills to use them. We present DreamCoder, a system that learns to solve problems by writing programs. It builds expertise by creating programming languages for expressing domain concepts, together with neural networks to guide the search for programs within these languages. A ``wake-sleep'' learning algorithm alternately extends the language with new symbolic abstractions and trains the neural network on imagined and replayed problems. DreamCoder solves both classic inductive programming tasks and creative tasks such as drawing pictures and building scenes. It rediscovers the basics of modern functional programming, vector algebra and classical physics, including Newton's and Coulomb's laws. Concepts are built compositionally from those learned earlier, yielding multi-layered symbolic representations that are interpretable and transferrable to new tasks, while still growing scalably and flexibly with experience.

preprint2020arXiv

Liquid Information Flow Control

We present Lifty, a domain-specific language for data-centric applications that manipulate sensitive data. A Lifty programmer annotates the sources of sensitive data with declarative security policies, and the language statically and automatically verifies that the application handles the data according to the policies. Moreover, if verification fails, Lifty suggests a provably correct repair, thereby easing the programmer burden of implementing policy enforcing code throughout the application. The main insight behind Lifty is to encode information flow control using liquid types, an expressive yet decidable type system. Liquid types enable fully automatic checking of complex, data dependent policies, and power our repair mechanism via type-driven error localization and patch synthesis. Our experience using Lifty to implement three case studies from the literature shows that (1) the Lifty policy language is sufficiently expressive to specify many real-world policies, (2) the Lifty type checker is able to verify secure programs and find leaks in insecure programs quickly, and (3) even if the programmer leaves out all policy enforcing code, the Lifty repair engine is able to patch all leaks automatically within a reasonable time.

preprint2020arXiv

Verifiably Safe Exploration for End-to-End Reinforcement Learning

Deploying deep reinforcement learning in safety-critical settings requires developing algorithms that obey hard constraints during exploration. This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs. Our approach draws on recent advances in object detection and automated reasoning for hybrid dynamical systems. The approach is evaluated on a novel benchmark that emphasizes the challenge of safely exploring in the presence of hard constraints. Our benchmark draws from several proposed problem sets for safe learning and includes problems that emphasize challenges such as reward signals that are not aligned with safety constraints. On each of these benchmark problems, our algorithm completely avoids unsafe behavior while remaining competitive at optimizing for as much reward as is safe. We also prove that our method of enforcing the safety constraints preserves all safe policies from the original environment.

preprint2019arXiv

Probabilistic Verification of Fairness Properties via Concentration

As machine learning systems are increasingly used to make real world legal and financial decisions, it is of paramount importance that we develop algorithms to verify that these systems do not discriminate against minorities. We design a scalable algorithm for verifying fairness specifications. Our algorithm obtains strong correctness guarantees based on adaptive concentration inequalities; such inequalities enable our algorithm to adaptively take samples until it has enough data to make a decision. We implement our algorithm in a tool called VeriFair, and show that it scales to large machine learning models, including a deep recurrent neural network that is more than five orders of magnitude larger than the largest previously-verified neural network. While our technique only gives probabilistic guarantees due to the use of random samples, we show that we can choose the probability of error to be extremely small.

preprint2016arXiv

Automatic Generation of Formula Simplifiers based on Conditional Rewrite Rules

This paper addresses the problem of creating simplifiers for logic formulas based on conditional term rewriting. In particular, the paper focuses on a program synthesis application where formula simplifications have been shown to have a significant impact. We show that by combining machine learning techniques with constraint-based synthesis, it is possible to synthesize a formula simplifier fully automatically from a corpus of representative problems, making it possible to create formula simplifiers tailored to specific problem domains. We demonstrate the benefits of our approach for synthesis benchmarks from the SyGuS competition and automated grading.

preprint2016arXiv

Precise, Dynamic Information Flow for Database-Backed Applications

We present an approach for dynamic information flow control across the application and database. Our approach reduces the amount of policy code required, yields formal guarantees across the application and database, works with existing relational database implementations, and scales for realistic applications. In this paper, we present a programming model that factors out information flow policies from application code and database queries, a dynamic semantics for the underlying λ^JDB core language, and proofs of termination-insensitive non-interference and policy compliance for the semantics. We implement these ideas in Jacqueline, a Python web framework, and demonstrate feasibility through three application case studies: a course manager, a health record system, and a conference management system used to run an academic workshop. We show that in comparison to traditional applications with hand-coded policy checks, Jacqueline applications have 1) a smaller trusted computing base, 2) fewer lines of policy code, and 2) reasonable, often negligible, additional overheads.

preprint2016arXiv

Program Synthesis from Polymorphic Refinement Types

We present a method for synthesizing recursive functions that provably satisfy a given specification in the form of a polymorphic refinement type. We observe that such specifications are particularly suitable for program synthesis for two reasons. First, they offer a unique combination of expressive power and decidability, which enables automatic verification---and hence synthesis---of nontrivial programs. Second, a type-based specification for a program can often be effectively decomposed into independent specifications for its components, causing the synthesizer to consider fewer component combinations and leading to a combinatorial reduction in the size of the search space. At the core of our synthesis procedure is a new algorithm for refinement type checking, which supports specification decomposition. We have evaluated our prototype implementation on a large set of synthesis problems and found that it exceeds the state of the art in terms of both scalability and usability. The tool was able to synthesize more complex programs than those reported in prior work (several sorting algorithms and operations on balanced search trees), as well as most of the benchmarks tackled by existing synthesizers, often starting from a more concise and intuitive user input.

preprint2016arXiv

Results and Analysis of SyGuS-Comp'15

Syntax-Guided Synthesis (SyGuS) is the computational problem of finding an implementation f that meets both a semantic constraint given by a logical formula $φ$ in a background theory T, and a syntactic constraint given by a grammar G, which specifies the allowed set of candidate implementations. Such a synthesis problem can be formally defined in SyGuS-IF, a language that is built on top of SMT-LIB. The Syntax-Guided Synthesis Competition (SyGuS-comp) is an effort to facilitate, bring together and accelerate research and development of efficient solvers for SyGuS by providing a platform for evaluating different synthesis techniques on a comprehensive set of benchmarks. In this year's competition we added two specialized tracks: a track for conditional linear arithmetic, where the grammar need not be specified and is implicitly assumed to be that of the LIA logic of SMT-LIB, and a track for invariant synthesis problems, with special constructs conforming to the structure of an invariant synthesis problem. This paper presents and analyzes the results of SyGuS-comp'15.

preprint2016arXiv

sk_p: a neural program corrector for MOOCs

We present a novel technique for automatic program correction in MOOCs, capable of fixing both syntactic and semantic errors without manual, problem specific correction strategies. Given an incorrect student program, it generates candidate programs from a distribution of likely corrections, and checks each candidate for correctness against a test suite. The key observation is that in MOOCs many programs share similar code fragments, and the seq2seq neural network model, used in the natural-language processing task of machine translation, can be modified and trained to recover these fragments. Experiment shows our scheme can correct 29% of all incorrect submissions and out-performs state of the art approach which requires manual, problem specific correction strategies.

preprint2016arXiv

SyGuS-Comp 2016: Results and Analysis

Syntax-Guided Synthesis (SyGuS) is the computational problem of finding an implementation f that meets both a semantic constraint given by a logical formula $φ$ in a background theory T, and a syntactic constraint given by a grammar G, which specifies the allowed set of candidate implementations. Such a synthesis problem can be formally defined in SyGuS-IF, a language that is built on top of SMT-LIB. The Syntax-Guided Synthesis Competition (SyGuS-Comp) is an effort to facilitate, bring together and accelerate research and development of efficient solvers for SyGuS by providing a platform for evaluating different synthesis techniques on a comprehensive set of benchmarks. In this year's competition we added a new track devoted to programming by examples. This track consisted of two categories, one using the theory of bit-vectors and one using the theory of strings. This paper presents and analyses the results of SyGuS-Comp'16.

preprint2015arXiv

JSKETCH: Sketching for Java

Sketch-based synthesis, epitomized by the SKETCH tool, lets developers synthesize software starting from a partial program, also called a sketch or template. This paper presents JSKETCH, a tool that brings sketch-based synthesis to Java. JSKETCH's input is a partial Java program that may include holes, which are unknown constants, expression generators, which range over sets of expressions, and class generators, which are partial classes. JSKETCH then translates the synthesis problem into a SKETCH problem; this translation is complex because SKETCH is not object-oriented. Finally, JSKETCH synthesizes an executable Java program by interpreting the output of SKETCH.

preprint2014arXiv

Proceedings Second Workshop on Synthesis

Software synthesis is rapidly developing into an important research area with vast potential for practical application. The SYNT Workshop on Synthesis aims to bringing together researchers interested in synthesis to present both ongoing and mature work on all aspects of automated synthesis and its applications. The second iteration of SYNT took place in Saint Petersburg, Russia, and was co-located with the 25th International Conference on Computer Aided Verification. The workshop included eleven presentations covering the full scope of the emerging synthesis community, as well as a discussion lead by Swen Jacobs on the organization of two new synthesis competitions focusing on reactive synthesis and syntax-guided functional synthesis respectively.

preprint2013arXiv

(Un)Decidability Results for Word Equations with Length and Regular Expression Constraints

We prove several decidability and undecidability results for the satisfiability and validity problems for languages that can express solutions to word equations with length constraints. The atomic formulas over this language are equality over string terms (word equations), linear inequality over the length function (length constraints), and membership in regular sets. These questions are important in logic, program analysis, and formal verification. Variants of these questions have been studied for many decades by mathematicians. More recently, practical satisfiability procedures (aka SMT solvers) for these formulas have become increasingly important in the context of security analysis for string-manipulating programs such as web applications. We prove three main theorems. First, we give a new proof of undecidability for the validity problem for the set of sentences written as a forall-exists quantifier alternation applied to positive word equations. A corollary of this undecidability result is that this set is undecidable even with sentences with at most two occurrences of a string variable. Second, we consider Boolean combinations of quantifier-free formulas constructed out of word equations and length constraints. We show that if word equations can be converted to a solved form, a form relevant in practice, then the satisfiability problem for Boolean combinations of word equations and length constraints is decidable. Third, we show that the satisfiability problem for quantifier-free formulas over word equations in regular solved form, length constraints, and the membership predicate over regular expressions is also decidable.

preprint2012arXiv

Automated Feedback Generation for Introductory Programming Assignments

We present a new method for automatically providing feedback for introductory programming problems. In order to use this method, we need a reference implementation of the assignment, and an error model consisting of potential corrections to errors that students might make. Using this information, the system automatically derives minimal corrections to student's incorrect solutions, providing them with a quantifiable measure of exactly how incorrect a given solution was, as well as feedback about what they did wrong. We introduce a simple language for describing error models in terms of correction rules, and formally define a rule-directed translation strategy that reduces the problem of finding minimal corrections in an incorrect program to the problem of synthesizing a correct program from a sketch. We have evaluated our system on thousands of real student attempts obtained from 6.00 and 6.00x. Our results show that relatively simple error models can correct on average 65% of all incorrect submissions.

preprint2012arXiv

Inferring SQL Queries Using Program Synthesis

Developing high-performance applications that interact with databases is a difficult task, as developers need to understand both the details of the language in which their applications are written in, and also the intricacies of the relational model. One popular solution to this problem is the use of object-relational mapping (ORM) libraries that provide transparent access to the database using the same language that the application is written in. Unfortunately, using such frameworks can easily lead to applications with poor performance because developers often end up implementing relational operations in application code, and doing so usually does not take advantage of the optimized implementations of relational operations, efficient query plans, or push down of predicates that database systems provide. In this paper we present QBS, an algorithm that automatically identifies fragments of application logic that can be pushed into SQL queries. The QBS algorithm works by automatically synthesizing invariants and postconditions for the original code fragment. The postconditions and invariants are expressed using a theory of ordered relations that allows us to reason precisely about the contents and order of the records produced even by complex code fragments that compute joins and aggregates. The theory is close in expressiveness to SQL, so the synthesized postconditions can be readily translated to SQL queries. Using 40 code fragments extracted from over 120k lines of open-source code written using the Java Hibernate ORM, we demonstrate that our approach can convert a variety of imperative constructs into relational specifications.

preprint2012arXiv

Using Program Synthesis for Social Recommendations

This paper presents a new approach to select events of interest to a user in a social media setting where events are generated by the activities of the user's friends through their mobile devices. We argue that given the unique requirements of the social media setting, the problem is best viewed as an inductive learning problem, where the goal is to first generalize from the users' expressed "likes" and "dislikes" of specific events, then to produce a program that can be manipulated by the system and distributed to the collection devices to collect only data of interest. The key contribution of this paper is a new algorithm that combines existing machine learning techniques with new program synthesis technology to learn users' preferences. We show that when compared with the more standard approaches, our new algorithm provides up to order-of-magnitude reductions in model training time, and significantly higher prediction accuracies for our target application. The approach also improves on standard machine learning techniques in that it produces clear programs that can be manipulated to optimize data collection and filtering.

Armando Solar-Lezama

What is connected

Connect this record

See the researcher in context

Building this map preview

24 published item(s)

MeMo: Memory as a Model

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Top-Down Synthesis for Library Learning

Automatically Deriving Control-Flow Graph Generators from Operational Semantics

JoinABLe: Learning Bottom-up Assembly of Parametric CAD Joints

Searching Entangled Program Spaces

Computer-Aided Personalized Education

Deductive Optimization of Relational Data Storage

DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning

Liquid Information Flow Control

Verifiably Safe Exploration for End-to-End Reinforcement Learning

Probabilistic Verification of Fairness Properties via Concentration

Automatic Generation of Formula Simplifiers based on Conditional Rewrite Rules

Precise, Dynamic Information Flow for Database-Backed Applications

Program Synthesis from Polymorphic Refinement Types

Results and Analysis of SyGuS-Comp'15

sk_p: a neural program corrector for MOOCs

SyGuS-Comp 2016: Results and Analysis

JSKETCH: Sketching for Java

Proceedings Second Workshop on Synthesis

(Un)Decidability Results for Word Equations with Length and Regular Expression Constraints

Automated Feedback Generation for Introductory Programming Assignments

Inferring SQL Queries Using Program Synthesis

Using Program Synthesis for Social Recommendations