Researcher profile

Taha Yasseri

Taha Yasseri contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
45works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

45 published item(s)

preprint2023arXiv

Rapid rise and decay in petition signing

Contemporary collective action, much of which involves social media and other Internet-based platforms, leaves a digital imprint which may be harvested to better understand the dynamics of mobilization. Petition signing is an example of collective action which has gained in popularity with rising use of social media and provides such data for the whole population of petition signatories for a given platform. This paper tracks the growth curves of all 20,000 petitions to the UK government petitions website (http://epetitions.direct.gov.uk) and 1,800 petitions to the US White House site (https://petitions.whitehouse.gov), analyzing the rate of growth and outreach mechanism. Previous research has suggested the importance of the first day to the ultimate success of a petition, but has not examined early growth within that day, made possible here through hourly resolution in the data. The analysis shows that the vast majority of petitions do not achieve any measure of success; over 99 percent fail to get the 10,000 signatures required for an official response and only 0.1 percent attain the 100,000 required for a parliamentary debate (0.7 percent in the US). We analyze the data through a multiplicative process model framework to explain the heterogeneous growth of signatures at the population level. We define and measure an average outreach factor for petitions and show that it decays very fast (reducing to 0.1 pervent after 10 hours in the UK and 30 hours in the US). After a day or two, a petition's fate is virtually set. The findings challenge conventional analyses of collective action from economics and political science, where the production function has been assumed to follow an S-shaped curve.

preprint2022arXiv

Can crowdsourcing rescue the social marketplace of ideas?

Facebook and Twitter recently announced community-based review platforms to address misinformation. We provide an overview of the potential affordances of such community-based approaches to content moderation based on past research and preliminary analysis of Twitter's Birdwatch data. While our analysis generally supports a community-based approach to content moderation, it also warns against potential pitfalls, particularly when the implementation of the new infrastructure focuses on crowd-based "validation" rather than "collaboration." We call for multidisciplinary research utilizing methods from complex systems studies, behavioural sociology, and computational social science to advance the research on crowd-based content moderation.

preprint2022arXiv

Collective Memory in the Digital Age

The digital transformation of our societies and in particular information and communication technologies have revolutionized how we generate, communicate, and acquire information. Collective memory as a core and unifying force in our societies has not been an exception among many societal concepts which have been revolutionized through digital transformation. In this chapter, we have distinguished between "the digitalized collective memory" and "collective memory in the digital age". In addition to discussing these two main concepts, we discuss how digital tools and trace data can open doorways into the study of collective memory that is formed inside and outside of the digital space.

preprint2022arXiv

Football is becoming more predictable; Network analysis of 88 thousands matches in 11 major leagues

In recent years excessive monetization of football and professionalism among the players has been argued to have affected the quality of the match in different ways. On the one hand, playing football has become a high-income profession and the players are highly motivated; on the other hand, stronger teams have higher incomes and therefore afford better players leading to an even stronger appearance in tournaments that can make the game more imbalanced and hence predictable. To quantify and document this observation, in this work we take a minimalist network science approach to measure the predictability of football over 26 years in major European leagues. We show that over time, the games in major leagues have indeed become more predictable. We provide further support for this observation by showing that inequality between teams has increased and the home-field advantage has been vanishing ubiquitously. We do not include any direct analysis on the effects of monetization on football's predictability or therefore, lack of excitement, however, we propose several hypotheses which could be tested in future analyses.

preprint2022arXiv

What drives passion? An empirical examination on the impact of personality trait interactions and job environments on work passion

Passionate employees are essential for organisational success as they foster higher performance and exhibit lower turnover or absenteeism. While a large body of research has investigated the consequences of passion, we know only little about its antecedents. Integrating trait interaction theory with trait activation theory, this paper examines how personality traits, i.e. conscientiousness, agreeableness, and neuroticism impact passion at work across different job situations. Passion has been conceptualized as a two-dimensional construct, consisting of harmonious work passion (HWP) and obsessive work passion (OWP). Our study is based on a sample of N = 824 participants from the myPersonality project. We find a positive relationship between neuroticism and OWP in enterprising environments. Further, we find a three-way interaction between conscientiousness, agreeableness, and enterprising environment in predicting OWP. Our findings imply that the impact of personality configurations on different forms of passion is contingent on the job environment. Moreover, in line with self-regulation theory, the results reveal agreeableness as a "cool influencer" and neuroticism as a "hot influencer" of the relationship between conscientiousness and work passion. We derive practical implications for organisations on how to foster work passion, particularly HWP, in organisations.

preprint2021arXiv

Credit Crunch: The Role of Household Lending Capacity in the Dutch Housing Boom and Bust 1995-2018

What causes house prices to rise and fall? Economists identify household access to credit as a crucial factor. "Loan-to-Value" and "Debt-to-GDP" ratios are the standard measures for credit access. However, these measures fail to explain the depth of the Dutch housing bust after the 2009 Financial Crisis. This work is the first to model household lending capacity based on the formulas that Dutch banks use in the mortgage application process. We compare the ability of regression models to forecast housing prices when different measures of credit access are utilised. We show that our measure of household lending capacity is a forward-looking, highly predictive variable that outperforms `Loan-to-Value' and debt ratios in forecasting the Dutch crisis. Sharp declines in lending capacity foreshadow the market deceleration.

preprint2021arXiv

Dissent and Rebellion in the House of Commons: A Social Network Analysis of Brexit-Related Divisions in the 57$^{ th}$ Parliament

The British party system is known for its discipline and cohesion, but it remains wedged on one issue: European integration. We offer a methodology using social network analysis that considers the individual interactions of MPs in the voting process. Using public Parliamentary records, we scraped votes of individual MPs in the 57th parliament (June 2017 to April 2019), computed pairwise similarity scores and calculated rebellion metrics based on eigenvector centralities. Comparing the networks of Brexit- and non-Brexit divisions, our methodology was able to detect a significant difference in eurosceptic behaviour for the former, and using a rebellion metric we predicted how MPs would vote in a forthcoming Brexit deal with over 90% accuracy.

preprint2021arXiv

Gender Imbalance and Spatiotemporal Patterns of Contributions to Citizen Science Projects: the case of Zooniverse

Citizen Science is research undertaken by professional scientists and members of the public collaboratively. Despite numerous benefits of citizen science for both the advancement of science and the community of the citizen scientists, there is still no comprehensive knowledge of patterns of contributions, and the demography of contributors to citizen science projects. In this paper we provide a first overview of spatiotemporal and gender distribution of citizen science workforce by analyzing 54 million classifications contributed by more than 340 thousand citizen science volunteers from 198 countries to one of the largest citizen science platforms, Zooniverse. First we report on the uneven geographical distribution of the citizen scientist and model the variations among countries based on the socio-economic conditions as well as the level of research investment in each country. Analyzing the temporal features of contributions, we report on high "burstiness" of participation instances as well as the leisurely nature of participation suggested by the time of the day that the citizen scientists were the most active. Finally, we discuss the gender imbalance among citizen scientists (about 30% female) and compare it with other collaborative projects as well as the gender distribution in more formal scientific activities. Citizen science projects need further attention from outside of the academic community, and our findings can help attract the attention of public and private stakeholders, as well as to inform the design of the platforms and science policy making processes.

preprint2021arXiv

Islamophobes are not all the same! A study of far right actors on Twitter

Far-right actors are often purveyors of Islamophobic hate speech online, using social media to spread divisive and prejudiced messages which can stir up intergroup tensions and conflict. Hateful content can inflict harm on targeted victims, create a sense of fear amongst communities and stir up intergroup tensions and conflict. Accordingly, there is a pressing need to better understand at a granular level how Islamophobia manifests online and who produces it. We investigate the dynamics of Islamophobia amongst followers of a prominent UK far right political party on Twitter, the British National Party. Analysing a new data set of five million tweets, collected over a period of one year, using a machine learning classifier and latent Markov modelling, we identify seven types of Islamophobic far right actors, capturing qualitative, quantitative and temporal differences in their behaviour. Notably, we show that a small number of users are responsible for most of the Islamophobia that we observe. We then discuss the policy implications of this typology in the context of social media regulation.

preprint2021arXiv

Positive algorithmic bias cannot stop fragmentation in homophilic networks

Fragmentation, echo chambers, and their amelioration in social networks have been a growing concern in the academic and non-academic world. This paper shows how, under the assumption of homophily, echo chambers and fragmentation are system-immanent phenomena of highly flexible social networks, even under ideal conditions for heterogeneity. We achieve this by finding an analytical, network-based solution to the Schelling model and by proving that weak ties do not hinder the process. Furthermore, we derive that no level of positive algorithmic bias in the form of rewiring is capable of preventing fragmentation and its effect on reducing the fragmentation speed is negligible.

preprint2021arXiv

Selling sex: what determines rates and popularity? An analysis of 11,500 online profiles

Sex work, or the exchange of sexual services for money or goods, is ubiquitous across eras and cultures. However, the practice of selling sex is often hidden due to stigma and the varying legal status of sex work. Online platforms that sex workers use to advertise services have become an increasingly important means of studying a market that is largely hidden. Although prior literature has primarily shed light on sex work from a public health or policy perspective (focusing largely on female sex workers), there are few studies that empirically research patterns of service provision in online sex work. This study investigated the determinants of pricing and popularity in the market for commercial sexual services online by using data from the largest UK network of online sexual services, a platform that is the industry-standard for sex workers. While the size of these influences varies across genders, nationality, age and the services provided are shown to be primary drivers of rates and popularity in sex work.

preprint2021arXiv

Tweeting for the Cause: Network analysis of UK petition sharing

Online government petitions represent a new data-rich mode of political participation. This work examines the thus far understudied dynamics of sharing petitions on social media in order to garner signatures and, ultimately, a government response. Using 20 months of Twitter data comprising over 1 million tweets linking to a petition, we perform analyses of networks constructed of petitions and supporters on Twitter, revealing implicit social dynamics therein. We find that Twitter users do not exclusively share petitions on one issue nor do they share exclusively popular petitions. Among the over 240,000 Twitter users, we find latent support groups, with the most central users primarily being politically active "average" individuals. Twitter as a platform for sharing government petitions, thus, appears to hold potential to foster the creation of and coordination among a new form of latent support interest groups online.

preprint2020arXiv

Computational Courtship: Understanding the Evolution of Online Dating through Large-scale Data Analysis

Have we become more tolerant of dating people of different social backgrounds compared to ten years ago? Has the rise of online dating exacerbated or alleviated gender inequalities in modern courtship? Are the most attractive people on these platforms necessarily the most successful? In this work, we examine the mate preferences and communication patterns of male and female users of the online dating site eHarmony over the past decade to identify how attitudes and behaviors have changed over this time period. While other studies have investigated disparities in user behavior between male and female users, this study is unique in its longitudinal approach. Specifically, we analyze how men and women differ in their preferences for certain traits in potential partners and how those preferences have changed over time. The second line of inquiry investigates to what extent physical attractiveness determines the rate of messages a user receives, and how this relationship varies between men and women. Thirdly, we explore whether online dating practices between males and females have become more equal over time or if biases and inequalities have remained constant (or increased). Fourthly, we study the behavioural traits in sending and replying to messages based on one's own experience of receiving messages and being replied to. Finally, we found that similarity between profiles is not a predictor for success except for the number of children and smoking habits. This work could have broader implications for shifting gender norms and social attitudes, reflected in online courtship rituals. Apart from the data-based research, we connect the results to existing theories that concern the role of ICTs in societal change. As searching for love online becomes increasingly common across generations and geographies, these findings may shed light on how people can build relationships through the Internet.

preprint2019arXiv

Fooling with facts: Quantifying anchoring bias through a large-scale online experiment

Living in the 'Information Age' means that not only access to information has become easier but also that the distribution of information is more dynamic than ever. Through a large-scale online field experiment, we provide new empirical evidence for the presence of the anchoring bias in people's judgment due to irrational reliance on a piece of information that they are initially given. The comparison of the anchoring stimuli and respective responses across different tasks reveals a positive, yet complex relationship between the anchors and the bias in participants' predictions of the outcomes of events in the future. Participants in the treatment group were equally susceptible to the anchors regardless of their level of engagement, previous performance, or gender. Given the strong and ubiquitous influence of anchors quantified here, we should take great care to closely monitor and regulate the distribution of information online to facilitate less biased decision making.

preprint2019arXiv

What, When and Where of petitions submitted to the UK Government during a time of chaos

In times marked by political turbulence and uncertainty, as well as increasing divisiveness and hyperpartisanship, Governments need to use every tool at their disposal to understand and respond to the concerns of their citizens. We study issues raised by the UK public to the Government during 2015-2017 (surrounding the UK EU-membership referendum), mining public opinion from a dataset of 10,950 petitions (representing 30.5 million signatures). We extract the main issues with a ground-up natural language processing (NLP) method, latent Dirichlet allocation (LDA). We then investigate their temporal dynamics and geographic features. We show that whilst the popularity of some issues is stable across the two years, others are highly influenced by external events, such as the referendum in June 2016. We also study the relationship between petitions' issues and where their signatories are geographically located. We show that some issues receive support from across the whole country but others are far more local. We then identify six distinct clusters of constituencies based on the issues which constituents sign. Finally, we validate our approach by comparing the petitions' issues with the top issues reported in Ipsos MORI survey data. These results show the huge power of computationally analyzing petitions to understand not only what issues citizens are concerned about but also when and from where.

preprint2018arXiv

Emo, Love, and God: Making Sense of Urban Dictionary, a Crowd-Sourced Online Dictionary

The Internet facilitates large-scale collaborative projects and the emergence of Web 2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the "wisdom of the crowd" has inspired successful projects such as Wikipedia, which has become the primary source of crowd-based information in many languages. On the other hand, the decentralized and often un-monitored environment of such projects may make them susceptible to low quality content. In this work, we focus on Urban Dictionary, a crowd-sourced online dictionary. We combine computational methods with qualitative annotation and shed light on the overall features of Urban Dictionary in terms of growth, coverage and types of content. We measure a high presence of opinion-focused entries, as opposed to the meaning-focused entries that we expect from traditional dictionaries. Furthermore, Urban Dictionary covers many informal, unfamiliar words as well as proper nouns. Urban Dictionary also contains offensive content, but highly offensive content tends to receive lower scores through the dictionary's voting system. The low threshold to include new material in Urban Dictionary enables quick recording of new words and new meanings, but the resulting heterogeneous content can pose challenges in using Urban Dictionary as a source to study language innovation.

preprint2018arXiv

Topic Modelling of Everyday Sexism Project Entries

The Everyday Sexism Project documents everyday examples of sexism reported by volunteer contributors from all around the world. It collected 100,000 entries in 13+ languages within the first 3 years of its existence. The content of reports in various languages submitted to Everyday Sexism is a valuable source of crowdsourced information with great potential for feminist and gender studies. In this paper, we take a computational approach to analyze the content of reports. We use topic-modelling techniques to extract emerging topics and concepts from the reports, and to map the semantic relations between those topics. The resulting picture closely resembles and adds to that arrived at through qualitative analysis, showing that this form of topic modeling could be useful for sifting through datasets that had not previously been subject to any analysis. More precisely, we come up with a map of topics for two different resolutions of our topic model and discuss the connection between the identified topics. In the low resolution picture, for instance, we found Public space/Street, Online, Work related/Office, Transport, School, Media harassment, and Domestic abuse. Among these, the strongest connection is between Public space/Street harassment and Domestic abuse and sexism in personal relationships.The strength of the relationships between topics illustrates the fluid and ubiquitous nature of sexism, with no single experience being unrelated to another.

preprint2017arXiv

Inspiration, Captivation, and Misdirection: Emergent Properties in Networks of Online Navigation

The World Wide Web (WWW) has fundamentally changed the ways billions of people are able to access information. Thus, understanding how people seek information online is an important issue of study. Wikipedia is a hugely important part of information provision on the web, with hundreds of millions of users browsing and contributing to its network of knowledge. The study of navigational behaviour on Wikipedia, due to the site's popularity and breadth of content, can reveal more general information seeking patterns that may be applied beyond Wikipedia and the Web. Our work addresses the relative shortcomings of existing literature in relating how information structure influences patterns of navigation online. We study aggregated clickstream data for articles on the English Wikipedia in the form of a weighted, directed navigational network. We introduce two parameters that describe how articles act to source and spread traffic through the network, based on their in/out strength and entropy. From these, we construct a navigational phase space where different article types occupy different, distinct regions, indicating how the structure of information online has differential effects on patterns of navigation. Finally, we go on to suggest applications for this analysis in identifying and correcting deficiencies in the Wikipedia page network that may also be adapted to more general information networks.

preprint2017arXiv

Social Complex Contagion in Music Listenership: A Natural Experiment with 1.3 Million Participants

Can live music events generate complex contagion in music streaming? This paper finds evidence in the affirmative, but only for the most popular artists. We generate a novel dataset from Last.fm, a music tracking website, to analyse the listenership history of 1.3 million users over a two-month time horizon. We use daily play counts along with event attendance data to run a regression discontinuity analysis in order to show the causal impact of concert attendance on music listenership among attendees and their friends network. First, we show that attending a music artist&#39;s live concert increases that artist&#39;s listenership among the attendees of the concert by approximately 1 song per day per attendee (p-value<0.001). Moreover, we show that this effect is contagious and can spread to users who did not attend the event. However, the extent of contagion depends on the type of artist. We only observe contagious increases in listenership for well-established, popular artists (.06 more daily plays per friend of an attendee [p<0.001]), while the effect is absent for emerging stars. We also show that the contagion effect size increases monotonically with the number of friends who have attended the live event.

preprint2017arXiv

Understanding Human-Machine Networks: A Cross-Disciplinary Survey

In the current hyper-connected era, modern Information and Communication Technology systems form sophisticated networks where not only do people interact with other people, but also machines take an increasingly visible and participatory role. Such human-machine networks (HMNs) are embedded in the daily lives of people, both for personal and professional use. They can have a significant impact by producing synergy and innovations. The challenge in designing successful HMNs is that they cannot be developed and implemented in the same manner as networks of machines nodes alone, nor following a wholly human-centric view of the network. The problem requires an interdisciplinary approach. Here, we review current research of relevance to HMNs across many disciplines. Extending the previous theoretical concepts of socio-technical systems, actor-network theory, cyber-physical-social systems, and social machines, we concentrate on the interactions among humans and between humans and machines. We identify eight types of HMNs: public-resource computing, crowdsourcing, web search engines, crowdsensing, online markets, social media, multiplayer online games and virtual worlds, and mass collaboration. We systematically select literature on each of these types and review it with a focus on implications for designing HMNs. Moreover, we discuss risks associated with HMNs and identify emerging design and development trends.

preprint2016arXiv

A Biased Review of Biases in Twitter Studies on Political Collective Action

In recent years researchers have gravitated to social media platforms, especially Twitter, as fertile ground for empirical analysis of social phenomena. Social media provides researchers access to trace data of interactions and discourse that once went unrecorded in the offline world. Researchers have sought to use these data to explain social phenomena both particular to social media and applicable to the broader social world. This paper offers a minireview of Twitter-based research on political crowd behavior. This literature offers insight into particular social phenomena on Twitter, but often fails to use standardized methods that permit interpretation beyond individual studies. Moreover, the literature fails to ground methodologies and results in social or political theory, divorcing empirical research from the theory needed to interpret it. Rather, papers focus primarily on methodological innovations for social media analyses, but these too often fail to sufficiently demonstrate the validity of such methodologies. This minireview considers a small number of selected papers; we analyze their (often lack of) theoretical approaches, review their methodological innovations, and offer suggestions as to the relevance of their results for political scientists and sociologists.

preprint2016arXiv

Dynamics and Biases of Online Attention: The Case of Aircraft Crashes

The Internet not only has changed the dynamics of our collective attention, but also through the transactional log of online activities, provides us with the opportunity to study attention dynamics at scale. In this paper, we particularly study attention to aircraft incidents and accidents using Wikipedia transactional data in two different language editions, English and Spanish. We study both the editorial activities on and the viewership of the articles about airline crashes. We analyse how the level of attention is influenced by different parameters such as number of deaths, airline region, and event locale and date. We find evidence that the attention given by Wikipedia editors to pre-Wikipedia aircraft incidents and accidents depends on the region of the airline for both English and Spanish editions. North American airline companies receive more prompt coverage in English Wikipedia. We also observe that the attention given by Wikipedia visitors is influenced by the airline region but only for events with high number of deaths. Finally we show that the rate and time span of the decay of attention is independent of the number of deaths and a fast decay within about a week seems to be universal. We discuss the implications of these findings in the context of attention bias.

preprint2016arXiv

Human-Machine Networks: Towards a Typology and Profiling Framework

In this paper we outline an initial typology and framework for the purpose of profiling human-machine networks, that is, collective structures where humans and machines interact to produce synergistic effects. Profiling a human-machine network along the dimensions of the typology is intended to facilitate access to relevant design knowledge and experience. In this way the profiling of an envisioned or existing human-machine network will both facilitate relevant design discussions and, more importantly, serve to identify the network type. We present experiences and results from two case trials: a crisis management system and a peer-to-peer reselling network. Based on the lessons learnt from the case trials we suggest potential benefits and challenges, and point out needed future work.

preprint2016arXiv

Memory Remains: Understanding Collective Memory in the Digital Age

Recently developed information communication technologies, particularly the Internet, have affected how we, both as individuals and as a society, create, store, and recall information. Internet also provides us with a great opportunity to study memory using transactional large scale data, in a quantitative framework similar to the practice in statistical physics. In this project, we make use of online data by analysing viewership statistics of Wikipedia articles on aircraft crashes. We study the relation between recent events and past events and particularly focus on understanding memory triggering patterns. We devise a quantitative model that explains the flow of viewership from a current event to past events based on similarity in time, geography, topic, and the hyperlink structure of Wikipedia articles. We show that on average the secondary flow of attention to past events generated by such remembering processes is larger than the primary attention flow to the current event. We are the first to report these cascading effects.

preprint2016arXiv

P-values: misunderstood and misused

P-values are widely used in both the social and natural sciences to quantify the statistical significance of observed results. The recent surge of big data research has made the p-value an even more popular tool to test the significance of a study. However, substantial literature has been produced critiquing how p-values are used and understood. In this paper we review this recent critical literature, much of which is routed in the life sciences, and consider its implications for social scientific research. We provide a coherent picture of what the main criticisms are, and draw together and disambiguate common themes. In particular, we explain how the False Discovery Rate is calculated, and how this differs from a p-value. We also make explicit the Bayesian nature of many recent criticisms, a dimension that is often underplayed or ignored. We conclude by identifying practical steps to help remediate some of the concerns identified. We recommend that (i) far lower significance levels are used, such as $0.01$ or $0.001$, and (ii) p-values are interpreted contextually, and situated within both the findings of the individual study and the broader field of inquiry (through, for example, meta-analyses).

preprint2016arXiv

Two Roads Diverged: A Semantic Network Analysis of Guanxi on Twitter

Guanxi, roughly translated as &#34;social connection&#34;, is a term commonly used in the Chinese language. In this research, we employed a linguistic approach to explore popular discourses on Guanxi. Although sharing the same Confucian roots, Chinese communities inside and outside Mainland China have undergone different historical trajectories. Hence, we took a comparative approach to examine guanxi in Mainland China and in Taiwan, Hong Kong, and Macau (TW-HK-M). Comparing guanxi discourses in two Chinese societies aims at revealing the divergence of guanxi culture. The data for this research were collected on Twitter over a three-week period by searching tweets containing guanxi written in Simplified Chinese characters and in Traditional Chinese characters. After building, visualising, and conducting community detection on both semantic networks, two guanxi discourses were then compared in terms of their major concept sub-communities. This research aims at addressing two questions: Has the meaning of guanxi transformed in contemporary Chinese societies? And how do different socio-economic configurations affect the practice of guanxi? Results suggest that guanxi in interpersonal relationships has adapted to a new family structure in both Chinese societies. In addition, the practice of guanxi in business varies in Mainland China and in TW-HK-M. Furthermore, an extended domain was identified where guanxi is used in a macro-level discussion of state relations. Network representations of the guanxi discourses enabled reification of the concept and shed lights on the understanding of social connections and social orders in contemporary China.

preprint2016arXiv

Understanding Communication Patterns in MOOCs: Combining Data Mining and qualitative methods

Massive Open Online Courses (MOOCs) offer unprecedented opportunities to learn at scale. Within a few years, the phenomenon of crowd-based learning has gained enormous popularity with millions of learners across the globe participating in courses ranging from Popular Music to Astrophysics. They have captured the imaginations of many, attracting significant media attention - with The New York Times naming 2012 &#34;The Year of the MOOC.&#34; For those engaged in learning analytics and educational data mining, MOOCs have provided an exciting opportunity to develop innovative methodologies that harness big data in education.

preprint2016arXiv

Wikipedia traffic data and electoral prediction: towards theoretically informed models

This aim of this article is to explore the potential use of Wikipedia page view data for predicting electoral results. Responding to previous critiques of work using socially generated data to predict elections, which have argued that these predictions take place without any understanding of the mechanism which enables them, we first develop a theoretical model which highlights why people might seek information online at election time, and how this activity might relate to overall electoral outcomes, focussing especially on how different types of parties such as new and established parties might generate different information seeking patterns. We test this model on a novel dataset drawn from a variety of countries in the 2009 and 2014 European Parliament elections. We show that while Wikipedia offers little insight into absolute vote outcomes, it offers a good information about changes in both overall turnout at elections and in vote share for particular parties. These results are used to enhance existing theories about the drivers of aggregate patterns in online information seeking.

preprint2014arXiv

Can electoral popularity be predicted using socially generated big data?

Today, our more-than-ever digital lives leave significant footprints in cyberspace. Large scale collections of these socially generated footprints, often known as big data, could help us to re-investigate different aspects of our social collective behaviour in a quantitative framework. In this contribution we discuss one such possibility: the monitoring and predicting of popularity dynamics of candidates and parties through the analysis of socially generated data on the web during electoral campaigns. Such data offer considerable possibility for improving our awareness of popularity dynamics. However they also suffer from significant drawbacks in terms of representativeness and generalisability. In this paper we discuss potential ways around such problems, suggesting the nature of different political systems and contexts might lend differing levels of predictive power to certain types of data source. We offer an initial exploratory test of these ideas, focussing on two data streams, Wikipedia page views and Google search queries. On the basis of this data, we present popularity dynamics from real case examples of recent elections in three different countries.

preprint2014arXiv

Investigating Political Participation and Social Information Using Big Data and a Natural Experiment

Social information is particularly prominent in digital settings where the design of platforms can more easily give real-time information about the behaviour of peers and reference groups and thereby stimulate political activity. Changes to these platforms can generate natural experiments allowing an assessment of the impact of changes in social information and design on participation. This paper investigates the impact of the introduction of trending information on the homepage of the UK government petitions platform. Using interrupted time series and a regression discontinuity design, we find that the introduction of the trending feature had no statistically significant effect on the overall number of signatures per day, but that the distribution of signatures across petitions changes: the most popular petitions gain even more signatures at the expense of those with less signatories. We find significant differences between petitions trending at different ranks, even after controlling for each petition&#39;s individual growth prior to trending. The findings suggest a non-negligible group of individuals visit the homepage of the site looking for petitions to sign and therefore see the list of trending petitions, and a significant proportion of this group responds to the social information that it provides. These findings contribute to our understanding of how social information, and the form in which it is presented, affects individual political behaviour in digital settings.

preprint2014arXiv

Mapping the UK Webspace: Fifteen Years of British Universities on the Web

This paper maps the national UK web presence on the basis of an analysis of the .uk domain from 1996 to 2010. It reviews previous attempts to use web archives to understand national web domains and describes the dataset. Next, it presents an analysis of the .uk domain, including the overall number of links in the archive and changes in the link density of different second-level domains over time. We then explore changes over time within a particular second-level domain, the academic subdomain .ac.uk, and compare linking practices with variables, including institutional affiliation, league table ranking, and geographic location. We do not detect institutional affiliation affecting linking practices and find only partial evidence of league table ranking affecting network centrality, but find a clear inverse relationship between the density of links and the geographical distance between universities. This echoes prior findings regarding offline academic activity, which allows us to argue that real-world factors like geography continue to shape academic relationships even in the Internet age. We conclude with directions for future uses of web archive resources in this emerging area of research.

preprint2014arXiv

Modeling Social Dynamics in a Collaborative Environment

Wikipedia is a prime example of today&#39;s value production in a collaborative environment. Using this example, we model the emergence, persistence and resolution of severe conflicts during collaboration by coupling opinion formation with article editing in a bounded confidence dynamics. The complex social behavior involved in editing articles is implemented as a minimal model with two basic elements; (i) individuals interact directly to share information and convince each other, and (ii) they edit a common medium to establish their own opinions. Opinions of the editors and that represented by the article are characterised by a scalar variable. When the pool of editors is fixed, three regimes can be distinguished: (a) a stable mainstream article opinion is continuously contested by editors with extremist views and there is slow convergence towards consensus, (b) the article oscillates between editors with extremist views, reaching consensus relatively fast at one of the extremes, and (c) the extremist editors are converted very fast to the mainstream opinion and the article has an erratic evolution. When editors are renewed with a certain rate, a dynamical transition occurs between different kinds of edit wars, which qualitatively reflect the dynamics of conflicts as observed in real Wikipedia data.

preprint2014arXiv

Structural limitations of learning in a crowd: communication vulnerability and information diffusion in MOOCs

Massive Open Online Courses (MOOCs) bring together a global crowd of thousands of learners for several weeks or months. In theory, the openness and scale of MOOCs can promote iterative dialogue that facilitates group cognition and knowledge construction. Using data from two successive instances of a popular business strategy MOOC, we filter observed communication patterns to arrive at the &#34;significant&#34; interaction networks between learners and use complex network analysis to explore the vulnerability and information diffusion potential of the discussion forums. We find that different discussion topics and pedagogical practices promote varying levels of 1) &#34;significant&#34; peer-to-peer engagement, 2) participant inclusiveness in dialogue, and ultimately, 3) modularity, which impacts information diffusion to prevent a truly &#34;global&#34; exchange of knowledge and learning. These results indicate the structural limitations of large-scale crowd-based learning and highlight the different ways that learners in MOOCs leverage, and learn within, social contexts. We conclude by exploring how these insights may inspire new developments in online education.

preprint2013arXiv

Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data

Use of socially generated &#34;big data&#34; to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society&#39;s reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between &#34;real time monitoring&#34; and &#34;early predicting&#34; remains a big challenge. Here we report on an endeavor to build a minimalistic predictive model for the financial success of movies based on collective activity data of online users. We show that the popularity of a movie can be predicted much before its release by measuring and analyzing the activity level of editors and viewers of the corresponding entry to the movie in Wikipedia, the well-known online encyclopedia.

preprint2013arXiv

Petition Growth and Success Rates on the UK No. 10 Downing Street Website

Now that so much of collective action takes place online, web-generated data can further understanding of the mechanics of Internet-based mobilisation. This trace data offers social science researchers the potential for new forms of analysis, using real-time transactional data based on entire populations, rather than sample-based surveys of what people think they did or might do. This paper uses a `big data&#39; approach to track the growth of over 8,000 petitions to the UK Government on the No. 10 Downing Street website for two years, analysing the rate of growth per day and testing the hypothesis that the distribution of daily change will be leptokurtic (rather than normal) as previous research on agenda setting would suggest. This hypothesis is confirmed, suggesting that Internet-based mobilisation is characterized by tipping points (or punctuated equilibria) and explaining some of the volatility in online collective action. We find also that most successful petitions grow quickly and that the number of signatures a petition receives on its first day is a significant factor in explaining the overall number of signatures a petition receives during its lifetime. These findings have implications for the strategies of those initiating petitions and the design of web sites with the aim of maximising citizen engagement with policy issues.

preprint2013arXiv

Temporal Analysis of Activity Patterns of Editors in Collaborative Mapping Project of OpenStreetMap

In the recent years Wikis have become an attractive platform for social studies of the human behaviour. Containing millions records of edits across the globe, collaborative systems such as Wikipedia have allowed researchers to gain a better understanding of editors participation and their activity patterns. However, contributions made to Geo-wikis_wiki-based collaborative mapping projects_ differ from systems such as Wikipedia in a fundamental way due to spatial dimension of the content that limits the contributors to a set of those who posses local knowledge about a specific area and therefore cross-platform studies and comparisons are required to build a comprehensive image of online open collaboration phenomena. In this work, we study the temporal behavioural pattern of OpenStreetMap editors, a successful example of geo-wiki, for two European capital cities. We categorise different type of temporal patterns and report on the historical trend within a period of 7 years of the project age. We also draw a comparison with the previously observed editing activity patterns of Wikipedia.

preprint2013arXiv

The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics

Activity of modern scholarship creates online footprints galore. Along with traditional metrics of research quality, such as citation counts, online images of researchers and institutions increasingly matter in evaluating academic impact, decisions about grant allocation, and promotion. We examined 400 biographical Wikipedia articles on academics from four scientific fields to test if being featured in the world&#39;s largest online encyclopedia is correlated with higher academic notability (assessed through citation counts). We found no statistically significant correlation between Wikipedia articles metrics (length, number of edits, number of incoming links from other articles, etc.) and academic notability of the mentioned researchers. We also did not find any evidence that the scientists with better WP representation are necessarily more prominent in their fields. In addition, we inspected the Wikipedia coverage of notable scientists sampled from Thomson Reuters list of &#34;highly cited researchers&#34;. In each of the examined fields, Wikipedia failed in covering notable scholars properly. Both findings imply that Wikipedia might be producing an inaccurate image of academics on the front end of science. By shedding light on how public perception of academic progress is formed, this study alerts that a subjective element might have been introduced into the hitherto structured system of academic evaluation.

preprint2013arXiv

The most controversial topics in Wikipedia: A multilingual and geographical analysis

We present, visualize and analyse the similarities and differences between the controversial topics related to &#34;edit wars&#34; identified in 10 different language versions of Wikipedia. After a brief review of the related work we describe the methods developed to locate, measure, and categorize the controversial topics in the different languages. Visualizations of the degree of overlap between the top 100 lists of most controversial articles in different languages and the content related to geographical locations will be presented. We discuss what the presented analysis and visualizations can tell us about the multicultural aspects of Wikipedia and practices of peer-production. Our results indicate that Wikipedia is more than just an encyclopaedia; it is also a window into convergent and divergent social-spatial priorities, interests and preferences.

preprint2013arXiv

Value production in a collaborative environment

We review some recent endeavors and add some new results to characterize and understand underlying mechanisms in Wikipedia (WP), the paradigmatic example of collaborative value production. We analyzed the statistics of editorial activity in different languages and observed typical circadian and weekly patterns, which enabled us to estimate the geographical origins of contributions to WPs in languages spoken in several time zones. Using a recently introduced measure we showed that the editorial activities have intrinsic dependencies in the burstiness of events. A comparison of the English and Simple English WPs revealed important aspects of language complexity and showed how peer cooperation solved the task of enhancing readability. One of our focus issues was characterizing the conflicts or edit wars in WPs, which helped us to automatically filter out controversial pages. When studying the temporal evolution of the controversiality of such pages we identified typical patterns and classified conflicts accordingly. Our quantitative analysis provides the basis of modeling conflicts and their resolution in collaborative environments and contribute to the understanding of this issue, which becomes increasingly important with the development of information communication technology.

preprint2012arXiv

A practical approach to language complexity: a Wikipedia case study

In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, e.g. that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully developing articles, concluding that controversy has the effect of reducing language complexity.

preprint2012arXiv

Dynamics of conflicts in Wikipedia

In this work we study the dynamical features of editorial wars in Wikipedia (WP). Based on our previously established algorithm, we build up samples of controversial and peaceful articles and analyze the temporal characteristics of the activity in these samples. On short time scales, we show that there is a clear correspondence between conflict and burstiness of activity patterns, and that memory effects play an important role in controversies. On long time scales, we identify three distinct developmental patterns for the overall behavior of the articles. We are able to distinguish cases eventually leading to consensus from those cases where a compromise is far from achievable. Finally, we analyze discussion networks and conclude that edit wars are mainly fought by few editors only.

preprint2012arXiv

Edit wars in Wikipedia

We present a new, efficient method for automatically detecting severe conflicts `edit wars&#39; in Wikipedia and evaluate this method on six different language WPs. We discuss how the number of edits, reverts, the length of discussions, the burstiness of edits and reverts deviate in such pages from those following the general workflow, and argue that earlier work has significantly over-estimated the contentiousness of the Wikipedia editing process.

preprint2012arXiv

Opinions, Conflicts and Consensus: Modeling Social Dynamics in a Collaborative Environment

Information-communication technology promotes collaborative environments like Wikipedia where, however, controversiality and conflicts can appear. To describe the rise, persistence, and resolution of such conflicts we devise an extended opinion dynamics model where agents with different opinions perform a single task to make a consensual product. As a function of the convergence parameter describing the influence of the product on the agents, the model shows spontaneous symmetry breaking of the final consensus opinion represented by the medium. In the case when agents are replaced with new ones at a certain rate, a transition from mainly consensus to a perpetual conflict occurs, which is in qualitative agreement with the scenarios observed in Wikipedia.

preprint2011arXiv

A Monte Carlo study of surface sputtering by dual and rotated ion beams

Several, recently proposed methods of surface manufacturing based on ion beam sputtering, which involve dual beam setups, sequential application of ion beams from different directions, or sample rotation, are studied with the method of kinetic Monte Carlo simulation of ion beam erosion and surface diffusion. In this work, we only consider erosion dominated situations. The results are discussed by comparing them to a number of theoretical propositions and to experimental findings. Two ion-beams aligned opposite to each other produce stationary, symmetric ripples. Two ion beams crossing at right angle will produce square patterns only, if they are exactly balanced. In all other cases of crossed beams, ripple patterns are created, and their orientations are shown to be predictable from linear continuum theory. In sequential ion beam sputtering we find a very rapid destruction of structures created from the previous beam direction after a rotation step, which leads to a transient decrease of overall roughness. Superpositions of patterns from several rotation steps are difficult to obtain, as they exist only in very short time windows. In setups with a single beam directed towards a rotating sample, we find a non-monotonic dependence of roughness on rotation frequency, with a very pronounced minimum appearing at the frequency scale set by the relaxation of prestructures observed in sequential ion beam setups. Furthermore we find that the logarithm of the height of structures decreases proportional to the inverse frequency.

preprint2011arXiv

Circadian patterns of Wikipedia editorial activity: A demographic analysis

Wikipedia (WP) as a collaborative, dynamical system of humans is an appropriate subject of social studies. Each single action of the members of this society, i.e. editors, is well recorded and accessible. Using the cumulative data of 34 Wikipedias in different languages, we try to characterize and find the universalities and differences in temporal activity patterns of editors. Based on this data, we estimate the geographical distribution of editors for each WP in the globe. Furthermore we also clarify the differences among different groups of WPs, which originate in the variance of cultural and social features of the communities of editors.