Source author record

Christopher M. Homan

Christopher M. Homan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks Applications Artificial Intelligence Computation and Language Computational Complexity Discrete Mathematics Human-Computer Interaction Machine Learning

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling

As generative AI models such as large language models (LLMs) become more pervasive, ensuring the safety, robustness, and overall trustworthiness of these systems is paramount. However, AI is currently facing a reproducibility crisis driven by unreliable evaluations and unrepeatable experimental results. While human raters are often used to assess models for utility and safety, they introduce divergent biases and subjective opinions into their annotations. Overcoming this variance is exceptionally challenging because very little data exists to study how experimental repeatability actually improves as the annotator pool grows. Standard evaluation practices typically rely on a small number of annotations per item (often 3 to 5) and lack the persistent rater identifiers necessary to model individual variance across items. In this work, we introduce a multi-level bootstrapping approach to realistically model annotator behavior. Leveraging datasets with a large number of ratings and persistent rater identifiers, we analyze the tradeoffs between the number of items ($N$) and the number of responses per item ($K$) required to achieve statistical significance.

preprint2020arXiv

Assessing Human Translations from French to Bambara for Machine Learning: a Pilot Study

We present novel methods for assessing the quality of human-translated aligned texts for learning machine translation models of under-resourced languages. Malian university students translated French texts, producing either written or oral translations to Bambara. Our results suggest that similar quality can be obtained from either written or spoken translations for certain kinds of texts. They also suggest specific instructions that human translators should be given in order to improve the quality of their work.

preprint2015arXiv

Job-related discourse on social media

Working adults spend nearly one third of their daily time at their jobs. In this paper, we study job-related social media discourse from a community of users. We use both crowdsourcing and local expertise to train a classifier to detect job-related messages on Twitter. Additionally, we analyze the linguistic differences in a job-related corpus of tweets between individual users vs. commercial accounts. The volumes of job-related tweets from individual users indicate that people use Twitter with distinct monthly, daily, and hourly patterns. We further show that the moods associated with jobs, positive and negative, have unique diurnal rhythms.

preprint2014arXiv

Tuning the Diversity of Open-Ended Responses from the Crowd

Crowdsourcing can solve problems that current fully automated systems cannot. Its effectiveness depends on the reliability, accuracy, and speed of the crowd workers that drive it. These objectives are frequently at odds with one another. For instance, how much time should workers be given to discover and propose new solutions versus deliberate over those currently proposed? How do we determine if discovering a new answer is appropriate at all? And how do we manage workers who lack the expertise or attention needed to provide useful input to a given task? We present a mechanism that uses distinct payoffs for three possible worker actions---propose,vote, or abstain---to provide workers with the necessary incentives to guarantee an effective (or even optimal) balance between searching for new answers, assessing those currently available, and, when they have insufficient expertise or insight for the task at hand, abstaining. We provide a novel game theoretic analysis for this mechanism and test it experimentally on an image---labeling problem and show that it allows a system to reliably control the balance betweendiscovering new answers and converging to existing ones.

preprint2013arXiv

Respondent-Driven Sampling in Online Social Networks

Respondent-driven sampling (RDS) is a commonly used method for acquiring data on hidden communities, i.e., those that lack unbiased sampling frames or face social stigmas that make their mem- bers unwilling to identify themselves. Obtaining accurate statistical data about such communities is important because, for instance, they often have different health burdens from the greater population, and without good statistics it is hard and expensive to effectively reach them for pre- vention or treatment interventions. Online social networks (OSN) have the potential to transform RDS for the better. We present a new RDS recruitment protocol for (OSNs) and show via simulation that it out- performs the standard RDS protocol in terms of sampling accuracy and approaches the accuracy of Markov chain Monte Carlo random walks.

preprint2005arXiv

The Complexity of Computing the Size of an Interval

Given a p-order A over a universe of strings (i.e., a transitive, reflexive, antisymmetric relation such that if (x, y) is an element of A then |x| is polynomially bounded by |y|), an interval size function of A returns, for each string x in the universe, the number of strings in the interval between strings b(x) and t(x) (with respect to A), where b(x) and t(x) are functions that are polynomial-time computable in the length of x. By choosing sets of interval size functions based on feasibility requirements for their underlying p-orders, we obtain new characterizations of complexity classes. We prove that the set of all interval size functions whose underlying p-orders are polynomial-time decidable is exactly #P. We show that the interval size functions for orders with polynomial-time adjacency checks are closely related to the class FPSPACE(poly). Indeed, FPSPACE(poly) is exactly the class of all nonnegative functions that are an interval size function minus a polynomial-time computable function. We study two important functions in relation to interval size functions. The function #DIV maps each natural number n to the number of nontrivial divisors of n. We show that #DIV is an interval size function of a polynomial-time decidable partial p-order with polynomial-time adjacency checks. The function #MONSAT maps each monotone boolean formula F to the number of satisfying assignments of F. We show that #MONSAT is an interval size function of a polynomial-time decidable total p-order with polynomial-time adjacency checks. Finally, we explore the related notion of cluster computation.