Source author record

Will Crichton

Will Crichton appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cs.CY Programming Languages Human-Computer Interaction Machine Learning Multimedia

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Profiling Programming Language Learning

This paper documents a year-long experiment to "profile" the process of learning a programming language: gathering data to understand what makes a language hard to learn, and using that data to improve the learning process. We added interactive quizzes to The Rust Programming Language, the official textbook for learning Rust. Over 13 months, 62,526 readers answered questions 1,140,202 times. First, we analyze the trajectories of readers. We find that many readers drop-out of the book early when faced with difficult language concepts like Rust's ownership types. Second, we use classical test theory and item response theory to analyze the characteristics of quiz questions. We find that better questions are more conceptual in nature, such as asking why a program does not compile vs. whether a program compiles. Third, we performed 12 interventions into the book to help readers with difficult questions. We find that on average, interventions improved quiz scores on the targeted questions by +20%. Fourth, we show that our technique can likely generalize to languages with smaller user bases by simulating our statistical inferences on small N. These results demonstrate that quizzes are a simple and useful technique for understanding language learning at all scales.

preprint2022arXiv

Analyzing Who and What Appears in a Decade of US Cable TV News

Cable TV news reaches millions of U.S. households each day, meaning that decisions about who appears on the news and what stories get covered can profoundly influence public opinion and discourse. We analyze a data set of nearly 24/7 video, audio, and text captions from three U.S. cable TV networks (CNN, FOX, and MSNBC) from January 2010 to July 2019. Using machine learning tools, we detect faces in 244,038 hours of video, label each face's presented gender, identify prominent public figures, and align text captions to audio. We use these labels to perform screen time and word frequency analyses. For example, we find that overall, much more screen time is given to male-presenting individuals than to female-presenting individuals (2.4x in 2010 and 1.9x in 2019). We present an interactive web-based tool, accessible at https://tvnews.stanford.edu, that allows the general public to perform their own analyses on the full cable TV news data set.

preprint2022arXiv

Modular Information Flow through Ownership

Statically analyzing information flow, or how data influences other data within a program, is a challenging task in imperative languages. Analyzing pointers and mutations requires access to a program's complete source. However, programs often use pre-compiled dependencies where only type signatures are available. We demonstrate that ownership types can be used to soundly and precisely analyze information flow through function calls given only their type signature. From this insight, we built Flowistry, a system for analyzing information flow in Rust, an ownership-based language. We prove the system's soundness as a form of noninterference using the Oxide formal model of Rust. Then we empirically evaluate the precision of Flowistry, showing that modular flows are identical to whole-program flows in 94% of cases drawn from large Rust codebases. We illustrate the applicability of Flowistry by using it to implement prototypes of a program slicer and an information flow control system.

preprint2021arXiv

Automating Program Structure Classification

When students write programs, their program structure provides insight into their learning process. However, analyzing program structure by hand is time-consuming, and teachers need better tools for computer-assisted exploration of student solutions. As a first step towards an education-oriented program analysis toolkit, we show how supervised machine learning methods can automatically classify student programs into a predetermined set of high-level structures. We evaluate two models on classifying student solutions to the Rainfall problem: a nearest-neighbors classifier using syntax tree edit distance and a recurrent neural network. We demonstrate that these models can achieve 91% classification accuracy when trained on 108 programs. We further explore the generality, trade-offs, and failure cases of each model.

preprint2021arXiv

The Role of Working Memory in Program Tracing

Program tracing, or mentally simulating a program on concrete inputs, is an important part of general program comprehension. Programs involve many kinds of virtual state that must be held in memory, such as variable/value pairs and a call stack. In this work, we examine the influence of short-term working memory (WM) on a person's ability to remember program state during tracing. We first confirm that previous findings in cognitive psychology transfer to the programming domain: people can keep about 7 variable/value pairs in WM, and people will accidentally swap associations between variables due to WM load. We use a restricted focus viewing interface to further analyze the strategies people use to trace through programs, and the relationship of tracing strategy to WM. Given a straight-line program, we find half of our participants traced a program from the top-down line-by-line (linearly), and the other half start at the bottom and trace upward based on data dependencies (on-demand). Participants with an on-demand strategy made more WM errors while tracing straight-line code than with a linear strategy, but the two strategies contained an equal number of WM errors when tracing code with functions. We conclude with the implications of these findings for the design of programming tools: first, programs should be analyzed to identify and refactor human-memory-intensive sections of code. Second, programming environments should interactively visualize variable metadata to reduce WM load in accordance with a person's tracing strategy. Third, tools for program comprehension should enable externalizing program state while tracing.

Will Crichton

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Profiling Programming Language Learning

Analyzing Who and What Appears in a Decade of US Cable TV News

Modular Information Flow through Ownership

Automating Program Structure Classification

The Role of Working Memory in Program Tracing