Researcher profile

Subhajit Roy

Subhajit Roy contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

Debug-Localize-Repair: A Symbiotic Construction for Heap Manipulations

We present Wolverine2, an integrated Debug-Localize-Repair environment for heap manipulating programs. Wolverine2 provides an interactive debugging environment: while concretely executing a program via on an interactive shell supporting common debugging facilities, Wolverine2 displays the abstract program states (as box-and-arrow diagrams) as a visual aid to the programmer, packages a novel, proof-directed repair algorithm to quickly synthesize the repair patches and a new bug localization algorithm to reduce the search space of repairs. Wolverine2 supports "hot-patching" of the generated patches to provide a seamless debugging environment, and also facilitates new debug-localize-repair possibilities: \textit{specification refinement} and \textit{checkpoint-based hopping}. We evaluate Wolverine2 on 6400 buggy programs (generated using automated fault injection) on a variety of data-structures like singly, doubly, and circular linked lists, AVL trees, Red-Black trees, Splay Trees and Binary Search Trees; Wolverine2 could repair all the buggy instances within realistic programmer wait-time (less than 5 sec in most cases). Wolverine2 could also repair more than 80\% of the 247 (buggy) student submissions where a reasonable attempt was made.

preprint2022arXiv

HOLL: Program Synthesis for Higher OrderLogic Locking

Logic locking "hides" the functionality of a digital circuit to protect it from counterfeiting, piracy, and malicious design modifications. The original design is transformed into a "locked" design such that the circuit reveals its correct functionality only when it is "unlocked" with a secret sequence of bits--the key bit-string. However, strong attacks, especially the SAT attack that uses a SAT solver to recover the key bitstring, have been profoundly effective at breaking the locked circuit and recovering the circuit functionality. We lift logic locking to Higher Order Logic Locking (HOLL) by hiding a higher-order relation, instead of a key of independent values, challenging the attacker to discover this key relation to recreate the circuit functionality. Our technique uses program synthesis to construct the locked design and synthesize a corresponding key relation. HOLL has low overhead and existing attacks for logic locking do not apply as the entity to be recovered is no more a value. To evaluate our proposal, we propose a new attack (SynthAttack) that uses an inductive synthesis algorithm guided by an operational circuit as an input-output oracle to recover the hidden functionality. SynthAttack is inspired by the SAT attack, and similar to the SAT attack, it is verifiably correct, i.e., if the correct functionality is revealed, a verification check guarantees the same. Our empirical analysis shows that SynthAttack can break HOLL for small circuits and small key relations, but it is ineffective for real-life designs.

preprint2022arXiv

Synthesis of Semantic Actions in Attribute Grammars

Attribute grammars allow the association of semantic actions to the production rules in context-free grammars, providing a simple yet effective formalism to define the semantics of a language. However, drafting the semantic actions can be tricky and a large drain on developer time. In this work, we propose a synthesis methodology to automatically infer the semantic actions from a set of examples associating strings to their meanings. We also propose a new coverage metric, derivation coverage. We use it to build a sampler to effectively and automatically draw strings to drive the synthesis engine. We build our ideas into our tool, PANINI, and empirically evaluate it on twelve benchmarks, including a forward differentiation engine, an interpreter over a subset of Java bytecode, and a mini-compiler for C language to two-address code. Our results show that PANINI scales well with the number of actions to be synthesized and the size of the context-free grammar, significantly outperforming simple baselines.

preprint2022arXiv

Synthesizing Abstract Transformers

This paper addresses the problem of creating abstract transformers automatically. The method we present automates the construction of static analyzers in a fashion similar to the way $\textit{yacc}$ automates the construction of parsers. Our method treats the problem as a program-synthesis problem. The user provides specifications of (i) the concrete semantics of a given operation $\textit{op}$, (ii) the abstract domain A to be used by the analyzer, and (iii) the semantics of a domain-specific language $L$ in which the abstract transformer is to be expressed. As output, our method creates an abstract transformer for $\textit{op}$ in abstract domain A, expressed in $L$ (an "$L$-transformer for $\textit{op}$ over A"). Moreover, the abstract transformer obtained is a most-precise $L$-transformer for $\textit{op}$ over A; that is, there is no other $L$-transformer for $\textit{op}$ over A that is strictly more precise. We implemented our method in a tool called AMURTH. We used AMURTH to create sets of replacement abstract transformers for those used in two existing analyzers, and obtained essentially identical performance. However, when we compared the existing transformers with the transformers obtained using AMURTH, we discovered that four of the existing transformers were unsound, which demonstrates the risk of using manually created transformers.

preprint2021arXiv

Learning Differentially Private Mechanisms

Differential privacy is a formal, mathematical definition of data privacy that has gained traction in academia, industry, and government. The task of correctly constructing differentially private algorithms is non-trivial, and mistakes have been made in foundational algorithms. Currently, there is no automated support for converting an existing, non-private program into a differentially private version. In this paper, we propose a technique for automatically learning an accurate and differentially private version of a given non-private program. We show how to solve this difficult program synthesis problem via a combination of techniques: carefully picking representative example inputs, reducing the problem to continuous optimization, and mapping the results back to symbolic expressions. We demonstrate that our approach is able to learn foundational algorithms from the differential privacy literature and significantly outperforms natural program synthesis baselines.

preprint2020arXiv

Distributed Bounded Model Checking

Program verification is a resource-hungry task. This paper looks at the problem of parallelizing SMT-based automated program verification, specifically bounded model-checking, so that it can be distributed and executed on a cluster of machines. We present an algorithm that dynamically unfolds the call graph of the program and frequently splits it to create sub-tasks that can be solved in parallel. The algorithm is adaptive, controlling the splitting rate according to available resources, and also leverages information from the SMT solver to split where most complexity lies in the search. We implemented our algorithm by modifying CORRAL, the verifier used by Microsoft's Static Driver Verifier (SDV), and evaluate it on a series of hard SDV benchmarks.

preprint2020arXiv

Manthan: A Data Driven Approach for Boolean Function Synthesis

Boolean functional synthesis is a fundamental problem in computer science with wide-ranging applications and has witnessed a surge of interest resulting in progressively improved techniques over the past decade. Despite intense algorithmic development, a large number of problems remain beyond the reach of the state of the art techniques. Motivated by the progress in machine learning, we propose Manthan, a novel data-driven approach to Boolean functional synthesis. Manthan views functional synthesis as a classification problem, relying on advances in constrained sampling for data generation, and advances in automated reasoning for a novel proof-guided refinement and provable verification. On an extensive and rigorous evaluation over 609 benchmarks, we demonstrate that Manthan significantly improves upon the current state of the art, solving 356 benchmarks in comparison to 280, which is the most solved by a state of the art technique; thereby, we demonstrate an increase of 76 benchmarks over the current state of the art. Furthermore, Manthan solves 60 benchmarks that none of the current state of the art techniques could solve. The significant performance improvements, along with our detailed analysis, highlights several interesting avenues of future work at the intersection of machine learning, constrained sampling, and automated reasoning.

preprint2020arXiv

Phase Transition Behavior in Knowledge Compilation

The study of phase transition behaviour in SAT has led to deeper understanding and algorithmic improvements of modern SAT solvers. Motivated by these prior studies of phase transitions in SAT, we seek to study the behaviour of size and compile-time behaviour for random k-CNF formulas in the context of knowledge compilation. We perform a rigorous empirical study and analysis of the size and runtime behavior for different knowledge compilation forms (and their corresponding compilation algorithms): d-DNNFs, SDDs and OBDDs across multiple tools and compilation algorithms. We employ instances generated from the random k-CNF model with varying generation parameters to empirically reason about the expected and median behavior of size and compilation-time for these languages. Our work is similar in spirit to the early work in CSP community on phase transition behavior in SAT/CSP. In a similar spirit, we identify the interesting behavior with respect to different parameters: clause density and solution density, a novel control parameter that we identify for the study of phase transition behavior in the context of knowledge compilation. Furthermore, we summarize our empirical study in terms of two concrete conjectures; a rigorous study of these conjectures will possibly require new theoretical tools.