Source author record

Ruben Martins

Ruben Martins appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Software Engineering Logic in Computer Science Programming Languages

Catalog footprint

What is connected

7works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

SOAR: A Synthesis Approach for Data Science API Refactoring

With the growth of the open-source data science community, both the number of data science libraries and the number of versions for the same library are increasing rapidly. To match the evolving APIs from those libraries, open-source organizations often have to exert manual effort to refactor the APIs used in the code base. Moreover, due to the abundance of similar open-source libraries, data scientists working on a certain application may have an abundance of libraries to choose, maintain and migrate between. The manual refactoring between APIs is a tedious and error-prone task. Although recent research efforts were made on performing automatic API refactoring between different languages, previous work relies on statistical learning with collected pairwise training data for the API matching and migration. Using large statistical data for refactoring is not ideal because such training data will not be available for a new library or a new version of the same library. We introduce Synthesis for Open-Source API Refactoring (SOAR), a novel technique that requires no training data to achieve API migration and refactoring. SOAR relies only on the documentation that is readily available at the release of the library to learn API representations and mapping between libraries. Using program synthesis, SOAR automatically computes the correct configuration of arguments to the APIs and any glue code required to invoke those APIs. SOAR also uses the interpreter's error messages when running refactored code to generate logical constraints that can be used to prune the search space. Our empirical evaluation shows that SOAR can successfully refactor 80% of our benchmarks corresponding to deep learning models with up to 44 layers with an average run time of 97.23 seconds, and 90% of the data wrangling benchmarks with an average run time of 17.31 seconds.

preprint2016arXiv

Component-based Synthesis of Table Consolidation and Transformation Tasks from Examples

This paper presents an example-driven synthesis technique for automating a large class of data preparation tasks that arise in data science. Given a set of input tables and an out- put table, our approach synthesizes a table transformation program that performs the desired task. Our approach is not restricted to a fixed set of DSL constructs and can synthesize programs from an arbitrary set of components, including higher-order combinators. At a high-level, our approach performs type-directed enumerative search over partial pro- grams but incorporates two key innovations that allow it to scale: First, our technique can utilize any first-order specification of the components and uses SMT-based deduction to reject partial programs. Second, our algorithm uses partial evaluation to increase the power of deduction and drive enumerative search. We have evaluated our synthesis algorithm on dozens of data preparation tasks obtained from on-line forums, and we show that our approach can automatically solve a large class of problems encountered by R users.

preprint2016arXiv

Type-Directed Code Reuse using Integer Linear Programming

In many common scenarios, programmers need to implement functionality that is already provided by some third party library. This paper presents a tool called Hunter that facilitates code reuse by finding relevant methods in large code bases and automatically synthesizing any necessary wrapper code. The key technical idea underlying our approach is to use types to both improve search results and guide synthesis. Specifically, our method computes similarity metrics between types and uses this information to solve an integer linear programming (ILP) problem in which the objective is to minimize the cost of synthesis. We have implemented Hunter as an Eclipse plug-in and evaluate it by (a) comparing it against S6, a state-of-the-art code reuse tool, and (b) performing a user study. Our evaluation shows that Hunter compares favorably with S6 and significantly increases programmer productivity.

preprint2015arXiv

Exploiting Resolution-based Representations for MaxSAT Solving

Most recent MaxSAT algorithms rely on a succession of calls to a SAT solver in order to find an optimal solution. In particular, several algorithms take advantage of the ability of SAT solvers to identify unsatisfiable subformulas. Usually, these MaxSAT algorithms perform better when small unsatisfiable subformulas are found early. However, this is not the case in many problem instances, since the whole formula is given to the SAT solver in each call. In this paper, we propose to partition the MaxSAT formula using a resolution-based graph representation. Partitions are then iteratively joined by using a proximity measure extracted from the graph representation of the formula. The algorithm ends when only one partition remains and the optimal solution is found. Experimental results show that this new approach further enhances a state of the art MaxSAT solver to optimally solve a larger set of industrial problem instances.

preprint2015arXiv

Generalized Totalizer Encoding for Pseudo-Boolean Constraints

Pseudo-Boolean constraints, also known as 0-1 Integer Linear Constraints, are used to model many real-world problems. A common approach to solve these constraints is to encode them into a SAT formula. The runtime of the SAT solver on such formula is sensitive to the manner in which the given pseudo-Boolean constraints are encoded. In this paper, we propose generalized Totalizer encoding (GTE), which is an arc-consistency preserving extension of the Totalizer encoding to pseudo-Boolean constraints. Unlike some other encodings, the number of auxiliary variables required for GTE does not depend on the magnitudes of the coefficients. Instead, it depends on the number of distinct combinations of these coefficients. We show the superiority of GTE with respect to other encodings when large pseudo-Boolean constraints have low number of distinct coefficients. Our experimental results also show that GTE remains competitive even when the pseudo-Boolean constraints do not have this characteristic.

preprint2014arXiv

Incremental Bounded Model Checking for Embedded Software (extended version)

Program analysis is on the brink of mainstream in embedded systems development. Formal verification of behavioural requirements, finding runtime errors and automated test case generation are some of the most common applications of automated verification tools based on Bounded Model Checking. Existing industrial tools for embedded software use an off-the-shelf Bounded Model Checker and apply it iteratively to verify the program with an increasing number of unwindings. This approach unnecessarily wastes time repeating work that has already been done and fails to exploit the power of incremental SAT solving. This paper reports on the extension of the software model checker CBMC to support incremental Bounded Model Checking and its successful integration with the industrial embedded software verification tool BTC EmbeddedTester. We present an extensive evaluation over large industrial embedded programs, which shows that incremental Bounded Model Checking cuts runtimes by one order of magnitude in comparison to the standard non-incremental approach, enabling the application of formal verification to large and complex embedded software.

preprint2014arXiv

Incremental Cardinality Constraints for MaxSAT

Maximum Satisfiability (MaxSAT) is an optimization variant of the Boolean Satisfiability (SAT) problem. In general, MaxSAT algorithms perform a succession of SAT solver calls to reach an optimum solution making extensive use of cardinality constraints. Many of these algorithms are non-incremental in nature, i.e. at each iteration the formula is rebuilt and no knowledge is reused from one iteration to another. In this paper, we exploit the knowledge acquired across iterations using novel schemes to use cardinality constraints in an incremental fashion. We integrate these schemes with several MaxSAT algorithms. Our experimental results show a significant performance boost for these algo- rithms as compared to their non-incremental counterparts. These results suggest that incremental cardinality constraints could be beneficial for other constraint solving domains.

Ruben Martins

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

SOAR: A Synthesis Approach for Data Science API Refactoring

Component-based Synthesis of Table Consolidation and Transformation Tasks from Examples

Type-Directed Code Reuse using Integer Linear Programming

Exploiting Resolution-based Representations for MaxSAT Solving

Generalized Totalizer Encoding for Pseudo-Boolean Constraints

Incremental Bounded Model Checking for Embedded Software (extended version)

Incremental Cardinality Constraints for MaxSAT