Researcher profile

Sami Sarsa

Sami Sarsa contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
3close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

Automatic Generation of Programming Exercises and Code Explanations using Large Language Models

This article explores the natural language generation capabilities of large language models with application to the production of two types of learning resources common in programming courses. Using OpenAI Codex as the large language model, we create programming exercises (including sample solutions and test cases) and code explanations, assessing these qualitatively and quantitatively. Our results suggest that the majority of the automatically generated content is both novel and sensible, and in some cases ready to use as is. When creating exercises we find that it is remarkably easy to influence both the programming concepts and the contextual themes they contain, simply by supplying keywords as input to the model. Our analysis suggests that there is significant value in massive generative machine learning models as a tool for instructors, although there remains a need for some oversight to ensure the quality of the generated content before it is delivered to students. We further discuss the implications of OpenAI Codex and similar tools for introductory programming education and highlight future research streams that have the potential to improve the quality of the educational experience for both teachers and students alike.

preprint2022arXiv

Empirical Evaluation of Deep Learning Models for Knowledge Tracing: Of Hyperparameters and Metrics on Performance and Replicability

We review and evaluate a body of deep learning knowledge tracing (DLKT) models with openly available and widely-used data sets, and with a novel data set of students learning to program. The evaluated knowledge tracing models include Vanilla-DKT, two Long Short-Term Memory Deep Knowledge Tracing (LSTM-DKT) variants, two Dynamic Key-Value Memory Network (DKVMN) variants, and Self-Attentive Knowledge Tracing (SAKT). As baselines, we evaluate simple non-learning models, logistic regression and Bayesian Knowledge Tracing (BKT). To evaluate how different aspects of DLKT models influence model performance, we test input and output layer variations found in the compared models that are independent of the main architectures. We study maximum attempt count options, including filtering out long attempt sequences, that have been implicitly and explicitly used in prior studies. We contrast the observed performance variations against variations from non-model properties such as randomness and hardware. Performance of models is assessed using multiple metrics, whereby we also contrast the impact of the choice of metric on model performance. The key contributions of this work are: Evidence that DLKT models generally outperform more traditional models, but not necessarily by much and not always; Evidence that even simple baselines with little to no predictive value may outperform DLKT models, especially in terms of accuracy -- highlighting importance of selecting proper baselines for comparison; Disambiguation of properties that affect performance in DLKT models including metric choice, input and output layer variations, common hyperparameters, random seeding and hardware; Discussion of issues in replicability when evaluating DLKT models, including discrepancies in prior reported results and methodology. Model implementations, evaluation code, and data are published as a part of this work.