Researcher profile

Sonal Mahajan

Sonal Mahajan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

Providing Real-time Assistance for Repairing Runtime Exceptions using Stack Overflow Posts

Runtime Exceptions (REs) are an important class of bugs that occur frequently during code development. Traditional Automatic Program Repair (APR) tools are of limited use in this "in-development" use case, since they require a test-suite to be available as a patching oracle. Thus, developers typically tend to manually resolve their in-development REs, often by referring to technical forums, such as Stack Overflow (SO). To automate this manual process we extend our previous work, MAESTRO, to provide real-time assistance to developers for repairing Java REs by recommending a relevant patch-suggesting SO post and synthesizing a repair patch from this post to fix the RE in the developer's code. MAESTRO exploits a library of Runtime Exception Patterns (REPs) semi-automatically mined from SO posts, through a relatively inexpensive, one-time, incremental process. An REP is an abstracted sequence of statements that triggers a given RE. REPs are used to index SO posts, retrieve a post most relevant to the RE instance exhibited by a developer's code and then mediate the process of extracting a concrete repair from the SO post, abstracting out post-specific details, and concretizing the repair to the developer's buggy code. We evaluate MAESTRO on a published RE benchmark comprised of 78 instances. MAESTRO is able to generate a correct repair patch at the top position in 27% of the cases, within the top-3 in 40% of the cases and overall return a useful artifact in 81% of the cases. Further, the use of REPs proves instrumental to all aspects of MAESTRO's performance, from ranking and searching of SO posts to synthesizing patches from a given post. In particular, 45% of correct patches generated by MAESTRO could not be produced by a baseline technique not using REPs, even when provided with MAESTRO's SO-post ranking. MAESTRO is also fast, needing around 1 second, on average, to generate its output.

preprint2022arXiv

SapientML: Synthesizing Machine Learning Pipelines by Learning from Human-Written Solutions

Automatic machine learning, or AutoML, holds the promise of truly democratizing the use of machine learning (ML), by substantially automating the work of data scientists. However, the huge combinatorial search space of candidate pipelines means that current AutoML techniques, generate sub-optimal pipelines, or none at all, especially on large, complex datasets. In this work we propose an AutoML technique SapientML, that can learn from a corpus of existing datasets and their human-written pipelines, and efficiently generate a high-quality pipeline for a predictive task on a new dataset. To combat the search space explosion of AutoML, SapientML employs a novel divide-and-conquer strategy realized as a three-stage program synthesis approach, that reasons on successively smaller search spaces. The first stage uses a machine-learned model to predict a set of plausible ML components to constitute a pipeline. In the second stage, this is then refined into a small pool of viable concrete pipelines using syntactic constraints derived from the corpus and the machine-learned model. Dynamically evaluating these few pipelines, in the third stage, provides the best solution. We instantiate SapientML as part of a fully automated tool-chain that creates a cleaned, labeled learning corpus by mining Kaggle, learns from it, and uses the learned models to then synthesize pipelines for new predictive tasks. We have created a training corpus of 1094 pipelines spanning 170 datasets, and evaluated SapientML on a set of 41 benchmark datasets, including 10 new, large, real-world datasets from Kaggle, and against 3 state-of-the-art AutoML tools and 2 baselines. Our evaluation shows that SapientML produces the best or comparable accuracy on 27 of the benchmarks while the second best tool fails to even produce a pipeline on 9 of the instances.

preprint2020arXiv

Recommending Stack Overflow Posts for Fixing Runtime Exceptions using Failure Scenario Matching

Using online Q&A forums, such as Stack Overflow (SO), for guidance to resolve program bugs, among other development issues, is commonplace in modern software development practice. Runtime exceptions (RE) is one such important class of bugs that is actively discussed on SO. In this work we present a technique and prototype tool called MAESTRO that can automatically recommend an SO post that is most relevant to a given Java RE in a developer's code. MAESTRO compares the exception-generating program scenario in the developer's code with that discussed in an SO post and returns the post with the closest match. To extract and compare the exception scenario effectively, MAESTRO first uses the answer code snippets in a post to implicate a subset of lines in the post's question code snippet as responsible for the exception and then compares these lines with the developer's code in terms of their respective Abstract Program Graph (APG) representations. The APG is a simplified and abstracted derivative of an abstract syntax tree, proposed in this work, that allows an effective comparison of the functionality embodied in the high-level program structure, while discarding many of the low-level syntactic or semantic differences. We evaluate MAESTRO on a benchmark of 78 instances of Java REs extracted from the top 500 Java projects on GitHub and show that MAESTRO can return either a highly relevant or somewhat relevant SO post corresponding to the exception instance in 71% of the cases, compared to relevant posts returned in only 8% - 44% instances, by four competitor tools based on state-of-the-art techniques. We also conduct a user experience study of MAESTRO with 10 Java developers, where the participants judge MAESTRO reporting a highly relevant or somewhat relevant post in 80% of the instances. In some cases the post is judged to be even better than the one manually found by the participant.