Researcher profile

Neil A. Ernst

Neil A. Ernst contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
1topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

Error Identification Strategies for Python Jupyter Notebooks

Computational notebooks -- such as Jupyter or Colab -- combine text and data analysis code. They have become ubiquitous in the world of data science and exploratory data analysis. Since these notebooks present a different programming paradigm than conventional IDE-driven programming, it is plausible that debugging in computational notebooks might also be different. More specifically, since creating notebooks blends domain knowledge, statistical analysis, and programming, the ways in which notebook users find and fix errors in these different forms might be different. In this paper, we present an exploratory, observational study on how Python Jupyter notebook users find and understand potential errors in notebooks. Through a conceptual replication of study design investigating the error identification strategies of R notebook users, we presented users with Python Jupyter notebooks pre-populated with common notebook errors -- errors rooted in either the statistical data analysis, the knowledge of domain concepts, or in the programming. We then analyzed the strategies our study participants used to find these errors and determined how successful each strategy was at identifying errors. Our findings indicate that while the notebook programming environment is different from the environments used for traditional programming, debugging strategies remain quite similar. It is our hope that the insights presented in this paper will help both notebook tool designers and educators make changes to improve how data scientists discover errors more easily in the notebooks they write.

preprint2020arXiv

Cross-Dataset Design Discussion Mining

Being able to identify software discussions that are primarily about design, which we call design mining, can improve documentation and maintenance of software systems. Existing design mining approaches have good classification performance using natural language processing (NLP) techniques, but the conclusion stability of these approaches is generally poor. A classifier trained on a given dataset of software projects has so far not worked well on different artifacts or different datasets. In this study, we replicate and synthesize these earlier results in a meta-analysis. We then apply recent work in transfer learning for NLP to the problem of design mining. However, for our datasets, these deep transfer learning classifiers perform no better than less complex classifiers. We conclude by discussing some reasons behind the transfer learning approach to design mining.

preprint2010arXiv

Code forking in open-source software: a requirements perspective

To fork a project is to copy the existing code base and move in a direction different than that of the erstwhile project leadership. Forking provides a rapid way to address new requirements by adapting an existing solution. However, it can also create a plethora of similar tools, and fragment the developer community. Hence, it is not always clear whether forking is the right strategy. In this paper, we describe a mixed-methods exploratory case study that investigated the process of forking a project. The study concerned the forking of an open-source tool for managing software projects, Trac. Trac was forked to address differing requirements in an academic setting. The paper makes two contributions to our understanding of code forking. First, our exploratory study generated several theories about code forking in open source projects, for further research. Second, we investigated one of these theories in depth, via a quantitative study. We conjectured that the features of the OSS forking process would allow new requirements to be addressed. We show that the forking process in this case was successful at fulfilling the new projects requirements.