Researcher profile

Marcos D. Caballero

Marcos D. Caballero contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2021arXiv

Predictive and explanatory models might miss informative features in educational data

We encounter variables with little variation often in educational data mining (EDM) due to the demographics of higher education and the questions we ask. Yet, little work has examined how to analyze such data. Therefore, we conducted a simulation study using logistic regression, penalized regression, and random forest. We systematically varied the fraction of positive outcomes, feature imbalances, and odds ratios. We find the algorithms treat features with the same odds ratios differently based on the features' imbalance and the outcome imbalance. While none of the algorithms fully solved how to handle imbalanced data, penalized approaches such as Firth and Log-F reduced the difference between the built-in odds ratio and value determined by the algorithm. Our results suggest that EDM studies might contain false negatives when determining which variables are related to an outcome. We then apply our findings to a graduate admissions data set. We end by proposing recommendations that researchers should consider penalized regression for data sets on the order of hundreds of cases and should include more context about their data in publications such as the outcome and feature imbalances.

preprint2020arXiv

Predicting time to graduation at a large enrollment American university

The time it takes a student to graduate with a university degree is mitigated by a variety of factors such as their background, the academic performance at university, and their integration into the social communities of the university they attend. Different universities have different populations, student services, instruction styles, and degree programs, however, they all collect institutional data. This study presents data for 160,933 students attending a large American research university. The data includes performance, enrollment, demographics, and preparation features. Discrete time hazard models for the time-to-graduation are presented in the context of Tinto's Theory of Drop Out. Additionally, a novel machine learning method: gradient boosted trees, is applied and compared to the typical maximum likelihood method. We demonstrate that enrollment factors (such as changing a major) lead to greater increases in model predictive performance of when a student graduates than performance factors (such as grades) or preparation (such as high school GPA).

preprint2020arXiv

Thematic Analysis of 18 Years of PERC Proceedings using Natural Language Processing

We have used an unsupervised machine learning method called Latent Dirichlet Allocation (LDA) to thematically analyze all papers published in the Physics Education Research Conference Proceedings between 2001 and 2018. By looking at co-occurrences of words across the data corpus, this technique has allowed us to identify ten distinct themes or "topics" that have seen varying levels of prevalence in Physics Education Research (PER) over time and to rate the distribution of these topics within each paper. Our analysis suggests that although all identified topics have seen sustained interest over time, PER has also seen several waves of increased interest in certain topics, beginning with initial interest in qualitative, theory-building studies of student understanding, which has given way to a focus on problem solving in the late 2010s. Since 2010 the field has seen a shift towards more sociocultural views of teaching and learning with a particular focus on communities of practice, student identities, and institutional change. Based on these results, we suggest that unsupervised text analysis techniques like LDA may hold promise for providing quantitative, independent, and replicable analyses of educational research literature.

preprint2019arXiv

Computational Essays: An Avenue for Scientific Creativity in Physics

Computation holds great potential for introducing new opportunities for creativity and exploration into the physics curriculum. At the University of Oslo we have begun development of a new class of assignment called computational essays to help facilitate creative, open-ended computational physics projects. Computational essays are a type of essay or narrative that combine text and code to express an idea or make an argument, usually written in computational notebooks. During a pilot implementation of computational essays in an introductory electricity and magnetism course, students reported that computational essays facilitated creative investigation at a variety of levels within their physics course. They also reported finding this creativity as being both challenging and motivating. Based on these reflections, we argue that computational essays are a useful tool for leveraging the creative affordances of programming in physics education.

preprint2019arXiv

Physics Computational Literacy: An Exploratory Case Study Using Computational Essays

Computation is becoming an increasingly important part of physics education. However, there are currently few theories of learning that can be used to help explain and predict the unique challenges and affordances associated with computation in physics. In this study, we adapt the existing theory of computational literacy, which posits that computational learning can be divided into material, cognitive, and social aspects, to the context of undergraduate physics. Based on an exploratory study of undergraduate physics computational literacy, using a newly-developed teaching tool known as a computational essay, we have identified a variety of student practices, knowledge, and beliefs across these three aspects of computational literacy. We illustrate these categories with data collected from students who engaged in an initial implementation of computational essays in an introductory electricity and magnetism class. We conclude by arguing that this framework can be used to theoretically diagnose student difficulties with computation, distinguish educational approaches that focus on material vs. cognitive aspects of computational literacy, and highlight the benefits and limitations of open-ended projects like computational essays to student learning.

preprint2019arXiv

Using machine learning to understand physics graduate school admissions

Among all of the first-year graduate students enrolled in doctoral-granting physics departments, the percentage of female and racial minority students has remained unchanged for the past 20 years. The current graduate program admissions process can create challenges for achieving diversity goals in physics. In this paper, we will investigate how the various aspects of a prospective student's application to a physics doctoral program affect the likelihood the applicant will be admitted. Admissions data was collected from a large, Midwestern public research university that has a decentralized admissions process and included applicants' undergraduate GPAs and institutions, research interests, and GRE scores. Because the collected data varied in scale, we used supervised machine learning algorithms to create models that predict who was admitted into the PhD program. We find that using only the applicant's undergraduate GPA and physics GRE score, we are able to predict with 75% accuracy who will be admitted to the program.

preprint2018arXiv

Denoting and Comparing Leadership Attributes and Behaviors in Group Work

Projects and Practices in Physics (P$^3$) is an introductory physics class at Michigan State University that replaces lectures with a problem based learning environment. To promote the development of group based practices, students all receive group and individual feedback at the end of each week. The groups are comprised of four students, one of which often takes on the role of being the group's "leader." Developing leadership based skills is a specific learning goal of the P$^3$ learning environment and the goal of this research is to examine what leadership-specific actions/traits students in P$^3$ demonstrate while working in their group. The initial phase of this study examined multiple pieces of literature to identify possible characteristics and behaviors that may present themselves in potential leaders -- creating a codebook. This phase of the study applies the codebook to in-class data to compare two tutor-labeled leaders and their leadership styles.

preprint2018arXiv

Examining the relationship between student performance and video interactions

In this work, we attempted to predict student performance on a suite of laboratory assessments using students' interactions with associated instructional videos. The students' performance is measured by a graded presentation for each of four laboratory presentations in an introductory mechanics course. Each lab assessment was associated with between one and three videos of instructional content. Using video clickstream data, we define summary features (number of pauses, seeks) and contextual information (fraction of time played, in-semester order). These features serve as inputs to a logistic regression (LR) model that aims to predict student performance on the laboratory assessments. Our findings show that LR models are unable to predict student performance. Adding contextual information did not change the model performance. We compare our findings to findings from other studies and explore caveats to the null-result such as representation of the features, the possibility of underfitting, and the complexity of the assessment.

preprint2018arXiv

The difficulties associated with integrating computation into undergraduate physics

From a department being resistant to change to students not buying into the new computational activities, the challenges that are faced with integrating computation into the physics undergraduate curriculum are varied. The Partnership for Integration of Computation into Undergraduate Physics (PICUP) aims to expand the role of computation in the undergraduate physics curriculum. The research presented in this paper is part of a larger project examining the role of the PICUP workshop in facilitating both the integration of computation into classrooms and developing a supportive community to support this integration. An important part of providing the necessary supports for integration is understanding and categorizing the problems members of this community of integrators face when integrating computation in their courses. Through individual and group interviews, we discuss the barriers to integration that new and experienced community members of PICUP have experienced in the past or perceive could exist in the future.