Source author record

James Bagrow

James Bagrow appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cs.CY Machine Learning Software Engineering Applications Human-Computer Interaction Multiagent Systems nlin.AO physics.data-an physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

5works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

KANs need curvature: penalties for compositional smoothness

Kolmogorov-Arnold networks (KANs) offer a potent combination of accuracy and interpretability, thanks to their compositions of learnable univariate activation functions. However, the activations of well-fitting KANs tend to exhibit pathologically high-curvature oscillations, making them difficult to interpret, and standard regularization penalties do not prevent this. Here we derive a basis-agnostic curvature penalty and show that penalized models can maintain accuracy while achieving substantially smoother activations. Accounting for how function composition shapes curvature, we prove an upper bound on the full model's curvature relative to the curvature penalty, and use this to motivate richer forms of penalties. Scientific machine learning is increasingly bottlenecked by the trade-off between accuracy and interpretability. Results such as ours that improve interpretability without sacrificing accuracy will further strengthen KANs as a practical tool for both prediction and insight.

preprint2022arXiv

Accurate inference of crowdsourcing properties when using efficient allocation strategies

Allocation strategies improve the efficiency of crowdsourcing by decreasing the work needed to complete individual tasks accurately. However, these algorithms introduce bias by preferentially allocating workers onto easy tasks, leading to sets of completed tasks that are no longer representative of all tasks. This bias challenges inference of problem-wide properties such as typical task difficulty or crowd properties such as worker completion times, important information that goes beyond the crowd responses themselves. Here we study inference about problem properties when using an allocation algorithm to improve crowd efficiency. We introduce Decision-Explicit Probability Sampling (DEPS), a novel method to perform inference of problem properties while accounting for the potential bias introduced by an allocation strategy. Experiments on real and synthetic crowdsourcing data show that DEPS outperforms baseline inference methods while still leveraging the efficiency gains of the allocation method. The ability to perform accurate inference of general properties when using non-representative data allows crowdsourcers to extract more knowledge out of a given crowdsourced dataset.

preprint2022arXiv

Hierarchical team structure and multidimensional localization (or siloing) on networks

Knowledge silos emerge when structural properties of organizational interaction networks limit the diffusion of information. These structural barriers are known to take many forms at different scales - hubs in otherwise sparse organisations, large dense teams, or global core-periphery structure - but we lack an understanding of how these different structures interact. Here we bridge the gap between the mathematical literature on localization of spreading dynamics and the more applied literature on knowledge silos in organizational interaction networks. To do so, we introduce a new model that considers a layered structure of teams to unveil a new form of hierarchical localization (i.e., the localization of information at the top or center of an organization) and study its interplay with known phenomena of mesoscopic localization (i.e., the localization of information in large groups), $k$-core localization (i.e., around denser $k$-cores) and hub localization (i.e., around high degree stars). We also include a complex contagion mechanism by considering a general infection kernel which can depend on hierarchical level (influence), degree (popularity), infectious neighbors (social reinforcement) or team size (importance). This general model allows us to study the multifaceted phenomenon of information siloing in complex organizational interaction networks and opens the door to new optimization problems to promote or hinder the emergence of different localization regimes.

preprint2022arXiv

The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories

Communication surrounding the development of an open source project largely occurs outside the software repository itself. Historically, large communities often used a collection of mailing lists to discuss the different aspects of their projects. Multimodal tool use, with software development and communication happening on different channels, complicates the study of open source projects as a sociotechnical system. Here, we combine and standardize mailing lists of the Python community, resulting in 954,287 messages from 1995 to the present. We share all scraping and cleaning code to facilitate reproduction of this work, as well as smaller datasets for the Golang (122,721 messages), Angular (20,041 messages) and Node.js (12,514 messages) communities. To showcase the usefulness of these data, we focus on the CPython repository and merge the technical layer (which GitHub account works on what file and with whom) with the social layer (messages from unique email addresses) by identifying 33% of GitHub contributors in the mailing list data. We then explore correlations between the valence of social messaging and the structure of the collaboration network. We discuss how these data provide a laboratory to test theories from standard organizational science in large open source projects.

preprint2022arXiv

The penumbra of open source: projects outside of centralized platforms are longer maintained, more academic and more collaborative

GitHub has become the central online platform for much of open source, hosting most open source code repositories. With this popularity, the public digital traces of GitHub are now a valuable means to study teamwork and collaboration. In many ways, however, GitHub is a convenience sample, and may not be representative of open source development off the platform. Here we develop a novel, extensive sample of public open source project repositories outside of centralized platforms. We characterized these projects along a number of dimensions, and compare to a time-matched sample of corresponding GitHub projects. Our sample projects tend to have more collaborators, are maintained for longer periods, and tend to be more focused on academic and scientific problems.

James Bagrow

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

KANs need curvature: penalties for compositional smoothness

Accurate inference of crowdsourcing properties when using efficient allocation strategies

Hierarchical team structure and multidimensional localization (or siloing) on networks

The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories

The penumbra of open source: projects outside of centralized platforms are longer maintained, more academic and more collaborative