Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
19topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2022arXiv

Big Data = Big Insights? Operationalising Brooks' Law in a Massive GitHub Data Set

Massive data from software repositories and collaboration tools are widely used to study social aspects in software development. One question that several recent works have addressed is how a software project's size and structure influence team productivity, a question famously considered in Brooks' law. Recent studies using massive repository data suggest that developers in larger teams tend to be less productive than smaller teams. Despite using similar methods and data, other studies argue for a positive linear or even super-linear relationship between team size and productivity, thus contesting the view of software economics that software projects are diseconomies of scale. In our work, we study challenges that can explain the disagreement between recent studies of developer productivity in massive repository data. We further provide, to the best of our knowledge, the largest, curated corpus of GitHub projects tailored to investigate the influence of team size and collaboration patterns on individual and collective productivity. Our work contributes to the ongoing discussion on the choice of productivity metrics in the operationalisation of hypotheses about determinants of successful software projects. It further highlights general pitfalls in big data analysis and shows that the use of bigger data sets does not automatically lead to more reliable insights.

preprint2022arXiv

Consensus from group interactions: An adaptive voter model on hypergraphs

We study the effect of group interactions on the emergence of consensus in a spin system. Agents with discrete opinions $\{0,1\}$ form groups. They can change their opinion based on their group's influence (voter dynamics), but groups can also split and merge (adaptation). In a hypergraph, these groups are represented by hyperedges of different sizes. The heterogeneity of group sizes is controlled by a parameter $β$. To study the impact of $β$ on reaching consensus, we provide extensive computer simulations and compare them with an analytic approach for the dynamics of the average magnetization. We find that group interactions amplify small initial opinion biases, accelerate the formation of consensus and lead to a drift of the average magnetization. The conservation of the initial magnetization, known for basic voter models, is no longer obtained.

preprint2022arXiv

Disentangling Active and Passive Cosponsorship in the U.S. Congress

In the U.S. Congress, legislators can use active and passive cosponsorship to support bills. We show that these two types of cosponsorship are driven by two different motivations: the backing of political colleagues and the backing of the bill's content. To this end, we develop an Encoder+RGCN based model that learns legislator representations from bill texts and speech transcripts. These representations predict active and passive cosponsorship with an F1-score of 0.88. Applying our representations to predict voting decisions, we show that they are interpretable and generalize to unseen tasks.

preprint2022arXiv

Group relations, resilience and the I Ching

We evaluate the robustness and adaptivity of social groups with heterogeneous agents that are characterized by their binary state, their ability to change this state, their status and their preferred relations to other agents. To define group structures, we operationalize the hexagrams of the \emph{I Ching}. The relations and properties of agents are used to quantify their influence according to the social impact theory. From these influence values we derive a weighted stability measure for triads involving three agents, which is based on the weighted balance theory. It allows to quantify the robustness of groups and to propose a novel measure for group resilience which combines robustness and adaptivity. A stochastic approach determines the probabilities to find robust and adaptive groups. The discussion focuses on the generalization of our approach.

preprint2022arXiv

Network embeddedness indicates the innovation potential of firms

Firms' innovation potential depends on their position in the R&D network. But details on this relation remain unclear because measures to quantify network embeddedness have been controversially discussed. We propose and validate a new measure, coreness, obtained from the weighted k-core decomposition of the R&D network. Using data on R&D alliances, we analyse the change of coreness for 14,000 firms over 25 years and patenting activity. A regression analysis demonstrates that coreness explains firms' R&D output by predicting future patenting.

preprint2022arXiv

Reconstructing signed relations from interaction data

Positive and negative relations play an essential role in human behavior and shape the communities we live in. Despite their importance, data about signed relations is rare and commonly gathered through surveys. Interaction data is more abundant, for instance, in the form of proximity or communication data. So far, though, it could not be utilized to detect signed relations. In this paper, we show how the underlying signed relations can be extracted with such data. Employing a statistical network approach, we construct networks of signed relations in four communities. We then show that these relations correspond to the ones reported in surveys. Additionally, the inferred relations allow us to study the homophily of individuals with respect to gender, religious beliefs, and financial backgrounds. We evaluate the importance of triads in the signed network to study group cohesion.

preprint2022arXiv

The role of network embeddedness on the selection of collaboration partners: An agent-based model with empirical validation

We use a data-driven agent-based model to study the core-periphery structure of two collaboration networks, R&D alliances between firms and co-authorship relations between scientists. To characterize the network embeddedness of agents, we introduce a coreness value, obtained from a weighted $k$-core decomposition. We study the change of these coreness values when collaborations with newcomers or established agents are formed. Our agent-based model is able to reproduce the empirical coreness differences of collaboration partners and to explain why we observe a change in partner selection for agents with high network embeddedness.

preprint2021arXiv

Quantifying the importance of firms by means of reputation and network control

The reputation of firms is largely channeled through their ownership structure. We use this relation to determine reputation spillovers between transnational companies and their participated companies in an ownership network core of 1318 firms. We then apply concepts of network controllability to identify minimum sets of driver nodes (MDS) of 314 firms in this network. The importance of these driver nodes is classified regarding their control contribution, their operating revenue, and their reputation. The latter two are also taken as proxies for the access costs when utilizing firms as driver nodes. Using an enrichment analysis, we find that firms with high reputation maintain the controllability of the network, but rarely become top drivers, whereas firms with medium reputation most likely become top driver nodes. We further show that MDSs with lower access costs can be used to control the reputation dynamics in the whole network.

preprint2021arXiv

The downside of heterogeneity: How established relations counteract systemic adaptivity in tasks assignments

We study the lock-in effect in a network of task assignments. Agents have a heterogeneous fitness for solving tasks and can redistribute unfinished tasks to other agents. They learn over time to whom to reassign tasks and preferably choose agents with higher fitness. A lock-in occurs if reassignments can no longer adapt. Agents overwhelmed with tasks then fail, leading to failure cascades. We find that the probability for lock-ins and systemic failures increase with the heterogeneity in fitness values. To study this dependence, we use the Shannon entropy of the network of task assignments. A detailed discussion links our findings to the problem of resilience and observations in social systems.

preprint2020arXiv

A multi-layer network approach to modelling authorship influence on citation dynamics in physics journals

We provide a general framework to model the growth of networks consisting of different coupled layers. Our aim is to estimate the impact of one such layer on the dynamics of the others. As an application, we study a scientometric network, where one layer consists of publications as nodes and citations as links, whereas the second layer represents the authors. This allows to address the question how characteristics of authors, such as their number of publications or number of previous co-authors, impacts the citation dynamics of a new publication. To test different hypotheses about this impact, our model combines citation constituents and social constituents in different ways. We then evaluate their performance in reproducing the citation dynamics in nine different physics journals. For this, we develop a general method for statistical parameter estimation and model selection that is applicable to growing multi-layer networks. It takes both the parameter errors and the model complexity into account and is computationally efficient and scalable to large networks.

preprint2020arXiv

Enhanced or distorted wisdom of crowds? An agent-based model of opinion formation under social influence

We propose an agent-based model of collective opinion formation to study the wisdom of crowds under social influence. The opinion of an agent is a continuous positive value, denoting its subjective answer to a factual question. The wisdom of crowds states that the average of all opinions is close to the truth, i.e. the correct answer. But if agents have the chance to adjust their opinion in response to the opinions of others, this effect can be destroyed. Our model investigates this scenario by evaluating two competing effects: (i) agents tend to keep their own opinion (individual conviction $β$), (ii) they tend to adjust their opinion if they have information about the opinions of others (social influence $α$). For the latter, two different regimes (full information vs. aggregated information) are compared. Our simulations show that social influence only in rare cases enhances the wisdom of crowds. Most often, we find that agents converge to a collective opinion that is even farther away from the true answer. So, under social influence the wisdom of crowds can be systematically wrong.

preprint2020arXiv

Fragile, yet resilient: Adaptive decline in a collaboration network of firms

The dynamics of collaboration networks of firms follow a life-cycle of growth and decline. That does not imply they also become less resilient. Instead, declining collaboration networks may still have the ability to mitigate shocks from firms leaving, and to recover from these losses by adapting to new partners. To demonstrate this, we analyze 21.500 R\&D collaborations of 14.500 firms in six different industrial sectors over 25 years. We calculate time-dependent probabilities of firms leaving the network and simulate drop-out cascades, to determine the expected dynamics of decline. We then show that deviations from these expectations result from the adaptivity of the network, which mitigates the decline. These deviations can be used as a measure of network resilience.

preprint2020arXiv

HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks

The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies, we must account for the fact that such data contain a large number of independent observations of paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem, we introduce HYPA, a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph. HYPA provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.

preprint2020arXiv

Intervention scenarios to enhance knowledge transfer in a network of firm

We investigate a multi-agent model of firms in an R\&D network. Each firm is characterized by its knowledge stock $x_{i}(t)$, which follows a non-linear dynamics. It can grow with the input from other firms, i.e., by knowledge transfer, and decays otherwise. Maintaining interactions is costly. Firms can leave the network if their expected knowledge growth is not realized, which may cause other firms to also leave the network. The paper discusses two bottom-up intervention scenarios to prevent, reduce, or delay cascades of firms leaving. The first one is based on the formalism of network controllability, in which driver nodes are identified and subsequently incentivized, by reducing their costs. The second one combines node interventions and network interventions. It proposes the controlled removal of a single firm and the random replacement of firms leaving. This allows to generate small cascades, which prevents the occurrence of large cascades. We find that both approaches successfully mitigate cascades and thus improve the resilience of the R\&D network.

preprint2020arXiv

Reproducing scientists' mobility: A data-driven model

High skill labour is an important factor underpinning the competitive advantage of modern economies. Therefore, attracting and retaining scientists has become a major concern for migration policy. In this work, we study the migration of scientists on a global scale, by combining two large data sets covering the publications of 3.5 Mio scientists over 60 years. We analyse their geographical distances moved for a new affiliation and their age when moving, this way reconstructing their geographical "career paths". These paths are used to derive the world network of scientists mobility between cities and to analyse its topological properties. We further develop and calibrate an agent-based model, such that it reproduces the empirical findings both at the level of scientists and of the global network. Our model takes into account that the academic hiring process is largely demand-driven and demonstrates that the probability of scientists to relocate decreases both with age and with distance. Our results allow interpreting the model assumptions as micro-based decision rules that can explain the observed mobility patterns of scientists.

preprint2020arXiv

The ambiguous role of social influence on the wisdom of crowds: An analytic approach

"Wisdom of crowds" refers to the phenomenon that the average opinion of a group of individuals on a given question can be very close to the true answer. It requires a large group diversity of opinions, but the collective error, the difference between the average opinion and the true value, has to be small. We consider a stochastic opinion dynamics where individuals can change their opinion based on the opinions of others (social influence $α$), but to some degree also stick to their initial opinion (individual conviction $β$). We then derive analytic expressions for the dynamics of the collective error and the group diversity. We analyze their long-term behavior to determine the impact of the two parameters $(α,β)$ and the initial opinion distribution on the wisdom of crowds. This allows us to quantify the ambiguous role of social influence: only if the initial collective error is large, it helps to improve the wisdom of crowds, but in most cases it deteriorates the outcome. In these cases, individual conviction still improves the wisdom of crowds because it mitigates the impact of social influence.

preprint2019arXiv

Improving the robustness of online social networks: A simulation approach of network interventions

Online social networks (OSN) are prime examples of socio-technical systems in which individuals interact via a technical platform. OSN are very volatile because users enter and exit and frequently change their interactions. This makes the robustness of such systems difficult to measure and to control. To quantify robustness, we propose a coreness value obtained from the directed interaction network. We study the emergence of large drop-out cascades of users leaving the OSN by means of an agent-based model. For agents, we define a utility function that depends on their relative reputation and their costs for interactions. The decision of agents to leave the OSN depends on this utility. Our aim is to prevent drop-out cascades by influencing specific agents with low utility. We identify strategies to control agents in the core and the periphery of the OSN such that drop-out cascades are significantly reduced, and the robustness of the OSN is increased.

preprint2019arXiv

International crop trade networks: The impact of shocks and cascades

Analyzing available FAO data from 176 countries over 21 years, we observe an increase of complexity in the international trade of maize, rice, soy, and wheat. A larger number of countries play a role as producers or intermediaries, either for trade or food processing. In consequence, we find that the trade networks become more prone to failure cascades caused by exogenous shocks. In our model, countries compensate for demand deficits by imposing export restrictions. To capture these, we construct higher-order trade dependency networks for the different crops and years. These networks reveal hidden dependencies between countries and allow to discuss policy implications.

preprint2019arXiv

Quantifying Triadic Closure in Multi-Edge Social Networks

Multi-edge networks capture repeated interactions between individuals. In social networks, such edges often form closed triangles, or triads. Standard approaches to measure this triadic closure, however, fail for multi-edge networks, because they do not consider that triads can be formed by edges of different multiplicity. We propose a novel measure of triadic closure for multi-edge networks of social interactions based on a shared partner statistic. We demonstrate that our operalization is able to detect meaningful closure in synthetic and empirical multi-edge networks, where common approaches fail. This is a cornerstone in driving inferential network analyses from the analysis of binary networks towards the analyses of multi-edge and weighted networks, which offer a more realistic representation of social interactions and relations.

preprint2017arXiv

From Relational Data to Graphs: Inferring Significant Links using Generalized Hypergeometric Ensembles

The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. It builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.