Researcher profile

Seth Frey

Seth Frey contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

Self-reflection in Automated Qualitative Coding: Improving Text Annotation through Secondary LLM Critique

Large language models (LLMs) allow for sophisticated qualitative coding of large datasets, but zero- and few-shot classifiers can produce an intolerable number of errors, even with careful, validated prompting. We present a simple, generalizable two-stage workflow: an LLM applies a human-designed, LLM-adapted codebook; a secondary LLM critic performs self-reflection on each positive label by re-reading the source text alongside the first model's rationale and issuing a final decision. We evaluate this approach on six qualitative codes over 3,000 high-content emails from Apache Software Foundation project evaluation discussions. Our human-derived audit of 360 positive annotations (60 passages by six codes) found that the first-line LLM had a false-positive rate of 8% to 54%, despite F1 scores of 0.74 and 1.00 in testing. Subsequent recoding of all stage-one annotations via a second self-reflection stage improved F1 by 0.04 to 0.25, bringing two especially poor performing codes up to 0.69 and 0.79 from 0.52 and 0.55 respectively. Our manual evaluation identified two recurrent error classes: misinterpretation (violations of code definitions) and meta-discussion (debate about a project evaluation criterion mistaken for its use as a decision justification). Code-specific critic clauses addressing observed failure modes were especially effective with testing and refinement, replicating the codebook-adaption process for LLM interpretation in stage-one. We explain how favoring recall in first-line LLM annotation combined with secondary critique delivers precision-first, compute-light control. With human guidance and validation, self-reflection slots into existing LLM-assisted annotation pipelines to reduce noise and potentially salvage unusable classifiers.

preprint2023arXiv

Machine Translation for Accessible Multi-Language Text Analysis

English is the international standard of social research, but scholars are increasingly conscious of their responsibility to meet the need for scholarly insight into communication processes globally. This tension is as true in computational methods as any other area, with revolutionary advances in the tools for English language texts leaving most other languages far behind. In this paper, we aim to leverage those very advances to demonstrate that multi-language analysis is currently accessible to all computational scholars. We show that English-trained measures computed after translation to English have adequate-to-excellent accuracy compared to source-language measures computed on original texts. We show this for three major analytics -- sentiment analysis, topic analysis, and word embeddings -- over 16 languages, including Spanish, Chinese, Hindi, and Arabic. We validate this claim by comparing predictions on original language tweets and their backtranslations: double translations from their source language to English and back to the source language. Overall, our results suggest that Google Translate, a simple and widely accessible tool, is effective in preserving semantic content across languages and methods. Modern machine translation can thus help computational scholars make more inclusive and general claims about human communication.

preprint2023arXiv

Political, economic, and governance attitudes of blockchain users

We present a survey to evaluate crypto-political, crypto-economic, and crypto-governance sentiment in people who are part of a blockchain ecosystem. Based on 3710 survey responses, we describe their beliefs, attitudes, and modes of participation in crypto and investigate how self-reported political affiliation and blockchain ecosystem affiliation are associated with these. We observed polarization in questions on perceptions of the distribution of economic power, personal attitudes towards crypto, normative beliefs about the distribution of power in governance, and external regulation of blockchain technologies. Differences in political self-identification correlated with opinions on economic fairness, gender equity, decision-making power and how to obtain favorable regulation, while blockchain affiliation correlated with opinions on governance and regulation of crypto and respondents' semantic conception of crypto and personal goals for their involvement. We also find that a theory-driven constructed political axis is supported by the data and investigate the possibility of other groupings of respondents or beliefs arising from the data.

preprint2022arXiv

Deconstructing written rules and hierarchy in peer produced software communities

We employ recent advances in computational institutional analysis and NLP to investigate the systems of authority that are reflected in the written policy documents of the ASF. Our study to decipher the effective similarities or departures of the ASF model from conventional software companies reveals evidence of both flat and bureaucratic governance in a peer production set up, suggesting a complicated relationship between business-based theories of administrative hierarchy and foundational principles of the OSS movement.

preprint2022arXiv

Governing online goods: Maturity and formalization in Minecraft, Reddit, and World of Warcraft communities

Building a successful community means governing active populations and limited resources. This challenge often requires communities to design formal governance systems from scratch. But the characteristics of successful institutional designs are unclear. Communities that are more mature and established may have more elaborate formal policy systems. Alternatively, they may require less formalization precisely because of their maturity. Indeed, scholars often downplay the role that formal rules relative to unwritten rules, norms, and values. But in a community with formal rules, decisions are more consistent, transparent, and legitimate. To understand the relationship of formal institutions to community maturity and governance style, we conduct a large-scale quantitative analysis applying institutional analysis frameworks of self-governance scholar Elinor Ostrom to 80,000 communities across 3 platforms: the sandbox game Minecraft, the MMO game World of Warcraft, and Reddit. We classify communities' written rules to test predictors of institutional formalization. From this analysis we extract two major findings. First, institutional formalization, the size and complexity of an online community's governance system, is generally positively associated with maturity, as measured by age, population size, or degree of user engagement. Second, we find that online communities employ similar governance styles across platforms, strongly favoring "weak" norms to "strong" requirements. These findings suggest that designers and founders of online communities converge on styles of governance practice that are correlated with successful self-governance. With deeper insights into the patterns of successful self-governance, we can help more communities overcome the challenges of self-governance and create for their members powerful experiences of shared meaning and collective empowerment.

preprint2022arXiv

Open Source Software Sustainability: Combining Institutional Analysis and Socio-Technical Networks

Open Source Software (OSS) forms much of the fabric of our digital society, especially successful and sustainable ones. But many OSS projects do not become sustainable, resulting in abandonment and even risks for the world's digital infrastructure. Prior work has looked at the reasons for this mainly from two very different perspectives. In software engineering, the focus has been on understanding success and sustainability from the socio-technical perspective: the OSS programmers' day-to-day activities and the artifacts they create. In institutional analysis, on the other hand, emphasis has been on institutional designs (e.g., policies, rules, and norms) that structure governance. Even though each is necessary for a comprehensive understanding of OSS projects, the connection and interaction between the two approaches have been barely explored. In this paper, we make the first effort toward understanding OSS project sustainability using a dual-view analysis, by combining institutional analysis with socio-technical systems analysis. In particular, we (i) use linguistic approaches to extract institutional rules and norms from OSS contributors' communications to represent the evolution of their governance systems, and (ii) construct socio-technical networks based on longitudinal collaboration records to represent each project's organizational structure. We combined the two methods and applied them to a dataset of developer traces from 253 nascent OSS projects within the Apache Software Foundation (ASF) incubator. We find that the socio-technical and institutional features relate to each other, and provide complementary views into the progress of the ASF's OSS projects. Refining these combined analyses can help provide a more precise understanding of the synchronization between the evolution of institutional governance and organizational structure.

preprint2022arXiv

Quantifying the selective, stochastic, and complementary drivers of the institutional evolution in online communities

Institutions and cultures evolve adaptively in response to the current environmental incentives, usually. But sometimes institutional change is due to stochastic drives beyond current fitness, including drift, path dependency, blind imitation, and complementary cooperation in fluctuating environments. Disentangling the selective and stochastic components of social system change enables us to identify the key features to organizational development in the long run. Evolutionary approaches provide organizational science abundant theories to demonstrate organizational evolution by tracking particular beneficial or harmful features. We measure these different drivers empirically in institutional evolution among 20,000 Minecraft communities with the help of two of the most applied evolutionary models, the Price equation and the bet-hedging model. As a result, we find strong selection pressure on administrative rules and information rules, suggesting that their positive correlation with community fitness is the main reason for their frequency change. We also find that stochastic drives decrease the average frequency of administrative rules. The result makes sense when explained in light of evolutionary bet-hedging. We show through the bet-hedging result that institutional diversity contributes to the growth and stability of rules related to information, communication, and economic behaviors.

preprint2021arXiv

Effective Voice: Beyond Exit and Affect in Online Communities

Online communities provide ample opportunities for user self-expression but generally lack the means for average users to exercise direct control over community policies. This paper sets out to identify a set of strategies and techniques through which the voices of participants might be better heard through defined mechanisms for institutional governance. Drawing on Albert O. Hirschman's distinction between "exit" and "voice" in institutional life, it introduces a further distinction between two kinds of participation: effective voice, as opposed to the far more widespread practices of affective voice. Effective voice is a form of individual or collective speech that brings about a binding effect according to transparent processes. Platform developers and researchers might explore this neglected form of voice by introducing mechanisms for authority and accountability, collective action, and community evolution.

preprint2020arXiv

Institutional Similarity Drives Cultural Similarity among Online Communities

Understanding online communities requires an appreciation of both structure and culture. But basic questions remain difficult to pose. How do these facets interact and drive each other? Using data on the membership and governance styles of 5,000 small-scale online communities, we construct empirical measures for cross-server similarities in institutional structure and culture to explore the influence of institutional environment on their culture, and the influence of culture on their institutional environment. To establish the influence of culture and institutions on each other, we construct networks of communities, linking those that are more similar either in their members or governance. We then use network analysis to assess the causal relationships between shared culture and institutions. Our result shows that while effects in both directions are evident, there is a much stronger role for institutions on culture than culture on institutions. These processes are evident within administrative and informational type rules.

preprint2020arXiv

Super-teams or fair leagues? Parity policies by powerful regulators don't prevent capture

Much of modern society is founded on orchestrating institutions that produce social goods by fostering motivated teams, pitting them against each other, and distributing the fruits of the arms races that ensue. However, even when the "market maker" is willing and able to maintain parity between teams, it may fail to maintain a level playing field, as some teams acquire enough advantage within the system to gain influence over it and institutionalize their advantage. Using outcomes of over 60,000 games from four professional basketball leagues and more than 100 years' worth of seasons, we compute the evolving rate of transitivity violations (A>B, B>C, but C>A) to measure the ability of leagues to maintain parity between teams, and support the efficient generation and distribution of innovation. Comparing against a baseline of randomly permuted outcomes, we find that basketball has become less competitive over time, suggesting that teams diverge in performance, and reflecting a possible failure of market makers to tame their overpowered teams. Our results suggest that rich-get-richer dynamics are so pernicious that they can even emerge under the watch of a powerful administrator that is motivated to prevent them.

preprint2010arXiv

Investigations of Attractor Behavior over the Decay of Modular RBNs

When is it safe to approximate a complicated random Boolean network (RBN) as a simplified, easier to model RBN? When can static measures of network structure be reliably used to infer the network's dynamics? This simple experiment tests the ability of disjoint modular RBNs to approximate the dynamics of progressively more interconnected RBNs, while characterizing the performance of both static and dynamic measures of modularity as both break down. We find that, at least in the small networks investigated, the Newman 2004 [1] measure of static modularity performs as well as a more complex dynamic measure of modularity, and that the progressively increasing failure of one tracks that of the other. The dynamic measure is based on the Hamming distance of attractor schemata in rewired networks from those in perfectly modular networks. This result holds for a range of p-values.