Source author record

Jason Tsay

Jason Tsay appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Human-Computer Interaction Machine Learning Software Engineering

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Comments on Comments: Where Code Review and Documentation Meet

A central function of code review is to increase understanding; helping reviewers understand a code change aids in knowledge transfer and finding bugs. Comments in code largely serve a similar purpose, helping future readers understand the program. It is thus natural to study what happens when these two forms of understanding collide. We ask: what documentation-related comments do reviewers make and how do they affect understanding of the contribution? We analyze ca.700K review comments on 2,000 (Java and Python) GitHub projects, and propose several filters to identify which comments are likely to be either in response to a change in documentation and/or call for such a change. We identify 65K such cases. We next develop a taxonomy of the reviewer intents behind such "comments on comments". We find that achieving a shared understanding of the code is key: reviewer comments most often focused on clarification, followed by pointing out issues to fix, such as typos and outdated comments. Curiously, clarifying comments were frequently suggested (often verbatim) by the reviewer, indicating a desire to persist their understanding acquired during code review. We conclude with a discussion of implications of our comments-on-comments dataset for research on improving code review, including the potential benefits for automating code review.

preprint2022arXiv

Enabling Reproducibility and Meta-learning Through a Lifelong Database of Experiments (LDE)

Artificial Intelligence (AI) development is inherently iterative and experimental. Over the course of normal development, especially with the advent of automated AI, hundreds or thousands of experiments are generated and are often lost or never examined again. There is a lost opportunity to document these experiments and learn from them at scale, but the complexity of tracking and reproducing these experiments is often prohibitive to data scientists. We present the Lifelong Database of Experiments (LDE) that automatically extracts and stores linked metadata from experiment artifacts and provides features to reproduce these artifacts and perform meta-learning across them. We store context from multiple stages of the AI development lifecycle including datasets, pipelines, how each is configured, and training runs with information about their runtime environment. The standardized nature of the stored metadata allows for querying and aggregation, especially in terms of ranking artifacts by performance metrics. We exhibit the capabilities of the LDE by reproducing an existing meta-learning study and storing the reproduced metadata in our system. Then, we perform two experiments on this metadata: 1) examining the reproducibility and variability of the performance metrics and 2) implementing a number of meta-learning algorithms on top of the data and examining how variability in experimental results impacts recommendation performance. The experimental results suggest significant variation in performance, especially depending on dataset configurations; this variation carries over when meta-learning is built on top of the results, with performance improving when using aggregated results. This suggests that a system that automatically collects and aggregates results such as the LDE not only assists in implementing meta-learning but may also improve its performance.

preprint2014arXiv

Transparency and Coordination in Peer Production

This paper examines coordination in transparent work environments - environments where the content of work artifacts, and the actions taken on these artifacts, are fully visible to organizational members. Our qualitative study of a community of open source software developers revealed a coordination system characterized by interest-based, asynchronous interaction and knowledge transfer. At the core of asynchronous knowledge transfer, lies the concept of quasi-codification, which occurs when rich process knowledge is implicitly encoded in work artifacts. Our findings suggest that members are able to more selectively form dependencies, monitor the trajectory of projects, and make their work understandable to others which facilitates coordination. We discuss two important characteristics that enable coordination activities in a transparent environment: the presence of an imagined audience that dictates the way artifacts are crafted, and experience within the environment, that allows individuals to derive knowledge from these artifacts. By showing how transparency influences coordination, this research challenges previous conceptions of coordination for complex, collaborative work.