Researcher profile

Johanna Hardin

Johanna Hardin contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2025arXiv

Measuring the Impact of Missingness in Traffic Stop Data

In this article we explore the data available through the Stanford Open Policing Project. The data consist of information on millions of traffic stops across close to 100 different cities and highway patrols. Using a variety of metrics, we identify that the data is not missing completely at random. Furthermore, we develop ways of quantifying and visualizing missingness trends for different variables across the datasets. We follow up by performing a sensitivity analysis to extend work done on the outcome test as well as to extend work done on sharp bounds on the average treatment effect. We demonstrate that bias calculations can fundamentally shift depending on the assumptions made about the observations for which the race variable has not been recorded. We suggest ways that our missingness sensitivity analysis can be extended to myriad different contexts.

preprint2022arXiv

Community, Collaboration, and Climate

The Department of Mathematics & Statistics at Pomona College has long worked to create an inclusive and welcoming space for all individuals to study mathematics. Many years ago, our approach to the lack of diversity we saw in our majors was remediation through programming which sought to ameliorate student deficits. More recently, however, we have taken an anti-deficit approach with focus on changes to the department itself. The programs we have implemented are described below as enhancing community, collaboration, and climate within our department.

preprint2021arXiv

A Unified Framework for Random Forest Prediction Error Estimation

We introduce a unified framework for random forest prediction error estimation based on a novel estimator of the conditional prediction error distribution function. Our framework enables simple plug-in estimation of key prediction uncertainty metrics, including conditional mean squared prediction errors, conditional biases, and conditional quantiles, for random forests and many variants. Our approach is especially well-adapted for prediction interval estimation; we show via simulations that our proposed prediction intervals are competitive with, and in some settings outperform, existing methods. To establish theoretical grounding for our framework, we prove pointwise uniform consistency of a more stringent version of our estimator of the conditional prediction error distribution function. The estimators introduced here are implemented in the R package forestError.

preprint2020arXiv

"Playing the whole game": A data collection and analysis exercise with Google Calendar

We provide a computational exercise suitable for early introduction in an undergraduate statistics or data science course that allows students to 'play the whole game' of data science: performing both data collection and data analysis. While many teaching resources exist for data analysis, such resources are not as abundant for data collection given the inherent difficulty of the task. Our proposed exercise centers around student use of Google Calendar to collect data with the goal of answering the question 'How do I spend my time?' On the one hand, the exercise involves answering a question with near universal appeal, but on the other hand, the data collection mechanism is not beyond the reach of a typical undergraduate student. A further benefit of the exercise is that it provides an opportunity for discussions on ethical questions and considerations that data providers and data analysts face in today's age of large-scale internet-based data collection.

preprint2015arXiv

Data Science in Statistics Curricula: Preparing Students to "Think with Data"

A growing number of students are completing undergraduate degrees in statistics and entering the workforce as data analysts. In these positions, they are expected to understand how to utilize databases and other data warehouses, scrape data from Internet sources, program solutions to complex problems in multiple languages, and think algorithmically as well as statistically. These data science topics have not traditionally been a major component of undergraduate programs in statistics. Consequently, a curricular shift is needed to address additional learning outcomes. The goal of this paper is to motivate the importance of data science proficiency and to provide examples and resources for instructors to implement data science in their own statistics curricula. We provide case studies from seven institutions. These varied approaches to teaching data science demonstrate curricular innovations to address new needs. Also included here are examples of assignments designed for courses that foster engagement of undergraduates with data and data science.

preprint2013arXiv

A method for generating realistic correlation matrices

Simulating sample correlation matrices is important in many areas of statistics. Approaches such as generating Gaussian data and finding their sample correlation matrix or generating random uniform $[-1,1]$ deviates as pairwise correlations both have drawbacks. We develop an algorithm for adding noise, in a highly controlled manner, to general correlation matrices. In many instances, our method yields results which are superior to those obtained by simply simulating Gaussian data. Moreover, we demonstrate how our general algorithm can be tailored to a number of different correlation models. Using our results with a few different applications, we show that simulating correlation matrices can help assess statistical methodology.