Source author record

David Garcia

David Garcia appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

36works

23topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Automatic Classifiers Underdetect Emotions Expressed by Men

The widespread adoption of automatic sentiment and emotion classifiers makes it important to ensure that these tools perform reliably across different populations. Yet their reliability is typically assessed using benchmarks that rely on third-party annotators rather than the individuals experiencing the emotions themselves, potentially concealing systematic biases. In this paper, we use a unique, large-scale dataset of more than one million self-annotated posts and a pre-registered research design to investigate gender biases in emotion detection across 414 combinations of models and emotion-related classes. We find that across different types of automatic classifiers and various underlying emotions, error rates are consistently higher for texts authored by men compared to those authored by women. We quantify how this bias could affect results in downstream applications and show that current machine learning tools, including large language models, should be applied with caution when the gender composition of a sample is not known or variable. Our findings demonstrate that sentiment analysis is not yet a solved problem, especially in ensuring equitable model behaviour across demographic groups.

preprint2026arXiv

Conformity and Social Impact on AI Agents

As AI agents increasingly operate in multi-agent environments, understanding their collective behavior becomes critical for predicting the dynamics of artificial societies. This study examines conformity, the tendency to align with group opinions under social pressure, in large multimodal language models functioning as AI agents. By adapting classic visual experiments from social psychology, we investigate how AI agents respond to group influence as social actors. Our experiments reveal that AI agents exhibit a systematic conformity bias, aligned with Social Impact Theory, showing sensitivity to group size, unanimity, task difficulty, and source characteristics. Critically, AI agents achieving near-perfect performance in isolation become highly susceptible to manipulation through social influence. This vulnerability persists across model scales: while larger models show reduced conformity on simple tasks due to improved capabilities, they remain vulnerable when operating at their competence boundary. These findings reveal fundamental security vulnerabilities in AI agent decision-making that could enable malicious manipulation, misinformation campaigns, and bias propagation in multi-agent systems, highlighting the urgent need for safeguards in collective AI deployments.

preprint2026arXiv

Conformity Generates Collective Misalignment in AI Agents Societies

Artificial intelligence safety research focuses on aligning individual language models with human values, yet deployed AI systems increasingly operate as interacting populations where social influence may override individual alignment. Here we show that populations of individually aligned AI agents can be driven into stable misaligned states through conformity dynamics. Simulating opinion dynamics across nine large language models and one hundred opinion pairs, we find that each agent's behavior is governed by two competing forces: a tendency to follow the majority and an intrinsic bias toward specific positions. Using tools from statistical physics, we derive a quantitative theory that predicts when populations become trapped in long-lived misaligned configurations, and identifies predictable tipping points where small numbers of adversarial agents can irreversibly shift population-level alignment even after manipulation ceases. These results demonstrate that individual-level alignment provides no guarantee of collective safety, calling for evaluation frameworks that account for emergent behavior in AI populations.

preprint2022arXiv

Assessment of the effectiveness of Omicron transmission mitigation strategies for European universities using an agent-based network model

Returning universities to full on-campus operations while the COVID-19 pandemic is ongoing has been a controversial discussion in many countries. The risk of large outbreaks in dense course settings is contrasted by the benefits of in-person teaching. Transmission risk depends on a range of parameters, such as vaccination coverage and efficacy, number of contacts and adoption of non-pharmaceutical intervention measures (NPIs). Due to the generalised academic freedom in Europe, many universities are asked to autonomously decide on and implement intervention measures and regulate on-campus operations. In the context of rapidly changing vaccination coverage and parameters of the virus, universities often lack sufficient scientific insight to base these decisions on. To address this problem, we analyse a calibrated, data-driven agent-based simulation of transmission dynamics of 10755 students and 974 faculty members in a medium-sized European university. We use a co-location network reconstructed from student enrollment data and calibrate transmission risk based on outbreak size distributions in education institutions. We focus on actionable interventions that are part of the already existing decision-making process of universities to provide guidance for concrete policy decisions. Here we show that, with the Omicron variant of the SARS-CoV-2 virus, even a reduction to 25% occupancy and universal mask mandates are not enough to prevent large outbreaks given the vaccination coverage of about 80% recently reported for students in Austria. Our results show that controlling the spread of the virus with available vaccines in combination with NPIs is not feasible in the university setting if presence of students and faculty on campus is required.

preprint2022arXiv

Detecting potentially harmful and protective suicide-related content on twitter: A machine learning approach

Research shows that exposure to suicide-related news media content is associated with suicide rates, with some content characteristics likely having harmful and others potentially protective effects. Although good evidence exists for a few selected characteristics, systematic large scale investigations are missing in general, and in particular for social media data. We apply machine learning methods to classify large quantities of Twitter data according to a novel annotation scheme that distinguishes 12 categories of suicide-related tweets. We then trained a benchmark of machine learning models including a majority classifier, an approach based on word frequency (TF-IDF with a linear SVM) and two state-of-the-art deep learning models (BERT, XLNet). The two deep learning models achieved the best performance in two classification tasks: In the first task, we classified six main content categories, including personal stories about either suicidal ideation and attempts or coping, calls for action intending to spread either problem awareness or prevention-related information, reporting of suicide cases, and other tweets irrelevant to these categories. The deep learning models reached accuracy scores above 73% on average across the six categories, and F1-scores in between 0.70 and 0.85 for all but the suicidal ideation and attempts category (0.51-0.55). In the second task, separating tweets referring to actual suicide from off-topic tweets, they correctly labeled around 88% of tweets, with BERT achieving F1-scores of 0.93 and 0.74 for the two categories, respectively. These classification performances are comparable to the state-of-the-art on similar tasks. By making data labeling more efficient, this work has enabled large-scale investigations on harmful and protective associations of social media content with suicide rates and help-seeking behavior.

preprint2022arXiv

Social media sharing by political elites: An asymmetric American exceptionalism

Increased sharing of untrustworthy information on social media platforms is one of the main challenges of our modern information society. Because information disseminated by political elites is known to shape citizen and media discourse, it is particularly important to examine the quality of information shared by politicians. Here we show that from 2016 onward, members of the Republican party in the U.S. Congress have been increasingly sharing links to untrustworthy sources. The proportion of untrustworthy information posted by Republicans versus Democrats is diverging at an accelerating rate, and this divergence has worsened since president Biden was elected. This divergence between parties seems to be unique to the U.S. as it cannot be observed in other western democracies such as Germany and the United Kingdom, where left-right disparities are smaller and have remained largely constant.

preprint2022arXiv

Validating daily social media macroscopes of emotions

To study emotions at the macroscopic level, affective scientists have made extensive use of sentiment analysis on social media text. However, this approach can suffer from a series of methodological issues with respect to sampling biases and measurement error. To date, it has not been validated if social media sentiment can measure the day to day temporal dynamics of emotions aggregated at the macro level of a whole online community. We ran a large-scale survey at an online newspaper to gather daily self-reports of affective states from its users and compare these with aggregated results of sentiment analysis of user discussions on the same online platform. Additionally, we preregistered a replication of our study using Twitter text as a macroscope of emotions for the same community. For both platforms, we find strong correlations between text analysis results and levels of self-reported emotions, as well as between inter-day changes of both measurements. We further show that a combination of supervised and unsupervised text analysis methods is the most accurate approach to measure emotion aggregates. We illustrate the application of such social media macroscopes when studying the association between the number of new COVID-19 cases and emotions, showing that the strength of associations is comparable when using survey data as when using social media data. Our findings indicate that macro level dynamics of affective states of users of an online platform can be tracked with social media text, complementing surveys when self-reported data is not available or difficult to gather.

preprint2020arXiv

A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency

Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i.e., word error rate (WER), and latency, i.e., the time the hypothesis is finalized after the user stops speaking. In this paper, we develop a first-pass Recurrent Neural Network Transducer (RNN-T) model and a second-pass Listen, Attend, Spell (LAS) rescorer that surpasses a conventional model in both quality and latency. On the quality side, we incorporate a large number of utterances across varied domains to increase acoustic diversity and the vocabulary seen by the model. We also train with accented English speech to make the model more robust to different pronunciations. In addition, given the increased amount of training data, we explore a varied learning rate schedule. On the latency front, we explore using the end-of-sentence decision emitted by the RNN-T model to close the microphone, and also introduce various optimizations to improve the speed of LAS rescoring. Overall, we find that RNN-T+LAS offers a better WER and latency tradeoff compared to a conventional model. For example, for the same latency, RNN-T+LAS obtains a 8% relative improvement in WER, while being more than 400-times smaller in model size.

preprint2020arXiv

Dashboard of sentiment in Austrian social media during COVID-19

To track online emotional expressions of the Austrian population close to real-time during the COVID-19 pandemic, we build a self-updating monitor of emotion dynamics using digital traces from three different data sources. This enables decision makers and the interested public to assess issues such as the attitude towards counter-measures taken during the pandemic and the possible emergence of a (mental) health crisis early on. We use web scraping and API access to retrieve data from the news platform derstandard.at, Twitter and a chat platform for students. We document the technical details of our workflow in order to provide materials for other researchers interested in building a similar tool for different contexts. Automated text analysis allows us to highlight changes of language use during COVID-19 in comparison to a neutral baseline. We use special word clouds to visualize that overall difference. Longitudinally, our time series show spikes in anxiety that can be linked to several events and media reporting. Additionally, we find a marked decrease in anger. The changes last for remarkably long periods of time (up to 12 weeks). We discuss these and more patterns and connect them to the emergence of collective emotions. The interactive dashboard showcasing our data is available online under http://www.mpellert.at/covid19_monitor_austria/. Our work has attracted media attention and is part of an web archive of resources on COVID-19 collected by the Austrian National Library.

preprint2020arXiv

Fragile, yet resilient: Adaptive decline in a collaboration network of firms

The dynamics of collaboration networks of firms follow a life-cycle of growth and decline. That does not imply they also become less resilient. Instead, declining collaboration networks may still have the ability to mitigate shocks from firms leaving, and to recover from these losses by adapting to new partners. To demonstrate this, we analyze 21.500 R\&D collaborations of 14.500 firms in six different industrial sectors over 25 years. We calculate time-dependent probabilities of firms leaving the network and simulate drop-out cascades, to determine the expected dynamics of decline. We then show that deviations from these expectations result from the adaptivity of the network, which mitigates the decline. These deviations can be used as a measure of network resilience.

preprint2020arXiv

Improving accuracy and speeding up Document Image Classification through parallel systems

This paper presents a study showing the benefits of the EfficientNet models compared with heavier Convolutional Neural Networks (CNNs) in the Document Classification task, essential problem in the digitalization process of institutions. We show in the RVL-CDIP dataset that we can improve previous results with a much lighter model and present its transfer learning capabilities on a smaller in-domain dataset such as Tobacco3482. Moreover, we present an ensemble pipeline which is able to boost solely image input by combining image model predictions with the ones generated by BERT model on extracted text by OCR. We also show that the batch size can be effectively increased without hindering its accuracy so that the training process can be sped up by parallelizing throughout multiple GPUs, decreasing the computational time needed. Lastly, we expose the training performance differences between PyTorch and Tensorflow Deep Learning frameworks.

preprint2016arXiv

Emotions, Demographics and Sociability in Twitter Interactions

The social connections people form online affect the quality of information they receive and their online experience. Although a host of socioeconomic and cognitive factors were implicated in the formation of offline social ties, few of them have been empirically validated, particularly in an online setting. In this study, we analyze a large corpus of geo-referenced messages, or tweets, posted by social media users from a major US metropolitan area. We linked these tweets to US Census data through their locations. This allowed us to measure emotions expressed in the tweets posted from an area, the structure of social connections, and also use that area's socioeconomic characteristics in analysis. %We extracted the structure of online social interactions from the people mentioned in tweets from that area. We find that at an aggregate level, places where social media users engage more deeply with less diverse social contacts are those where they express more negative emotions, like sadness and anger. Demographics also has an impact: these places have residents with lower household income and education levels. Conversely, places where people engage less frequently but with diverse contacts have happier, more positive messages posted from them and also have better educated, younger, more affluent residents. Results suggest that cognitive factors and offline characteristics affect the quality of online interactions. Our work highlights the value of linking social media data to traditional data sources, such as US Census, to drive novel analysis of online behavior.

preprint2016arXiv

The Dynamics of Emotions in Online Interaction

We study the changes in emotional states induced by reading and participating in online discussions, empirically testing a computational model of online emotional interaction. Using principles of dynamical systems, we quantify changes in valence and arousal through subjective reports, as recorded in three independent studies including 207 participants (110 female). In the context of online discussions, the dynamics of valence and arousal are composed of two forces: an internal relaxation towards baseline values independent of the emotional charge of the discussion, and a driving force of emotional states that depends on the content of the discussion. The dynamics of valence show the existence of positive and negative tendencies, while arousal increases when reading emotional content regardless of its polarity. The tendency of participants to take part in the discussion increases with positive arousal. When participating in an online discussion, the content of participants' expression depends on their valence, and their arousal significantly decreases afterwards as a regulation mechanism. We illustrate how these results allow the design of agent-based models to reproduce and analyze emotions in online communities. Our work empirically validates the microdynamics of a model of online collective emotions, bridging online data analysis with research in the laboratory.

preprint2016arXiv

The QWERTY effect on the web: How typing shapes the meaning of words in online human-computer interaction

The QWERTY effect postulates that the keyboard layout influences word meanings by linking positivity to the use of the right hand and negativity to the use of the left hand. For example, previous research has established that words with more right hand letters are rated more positively than words with more left hand letters by human subjects in small scale experiments. In this paper, we perform large scale investigations of the QWERTY effect on the web. Using data from eleven web platforms related to products, movies, books, and videos, we conduct observational tests whether a hand-meaning relationship can be found in decoding text on the web. Furthermore, we investigate whether encoding text on the web exhibits the QWERTY effect as well, by analyzing the relationship between the text of online reviews and their star ratings in four additional datasets. Overall, we find robust evidence for the QWERTY effect both at the point of text interpretation (decoding) and at the point of text creation (encoding). We also find under which conditions the effect might not hold. Our findings have implications for any algorithmic method aiming to evaluate the meaning of words on the web, including for example semantic or sentiment analysis, and show the existence of "dactilar onomatopoeias" that shape the dynamics of word-meaning associations. To the best of our knowledge, this is the first work to reveal the extent to which the QWERTY effect exists in large scale human-computer interaction on the web.

preprint2016arXiv

When the Filter Bubble Bursts: Collective Evaluation Dynamics in Online Communities

We analyze online collective evaluation processes through positive and negative votes in various social media. We find two modes of collective evaluations that stem from the existence of filter bubbles. Above a threshold of collective attention, negativity grows faster with positivity, as a sign of the burst of a filter bubble when information reaches beyond the local social context of a user. We analyze how collectively evaluated content can reach large social contexts and create polarization, showing that emotions expressed through text play a key role in collective evaluation processes.

preprint2016arXiv

Who Watches (and Shares) What on YouTube? And When? Using Twitter to Understand YouTube Viewership

We combine user-centric Twitter data with video-centric YouTube data to analyze who watches and shares what on YouTube. Combination of two data sets, with 87k Twitter users, 5.6mln YouTube videos and 15mln video sharing events, allows rich analysis going beyond what could be obtained with either of the two data sets individually. For Twitter, we generate user features relating to activity, interests and demographics. For YouTube, we obtain video features for topic, popularity and polarization. These two feature sets are combined through sharing events for YouTube URLs on Twitter. This combination is done both in a user-, a video- and a sharing-event-centric manner. For the user-centric analysis, we show how Twitter user features correlate both with YouTube features and with sharing-related features. As two examples, we show urban users are quicker to share than rural users and for some notions of "influence" influential users on Twitter share videos with a higher number of views. For the video-centric analysis, we find a superlinear relation between initial Twitter shares and the final amounts of views, showing the correlated behavior of Twitter. On user impact, we find the total amount of followers of users that shared the video in the first week does not affect its final popularity. However, aggregated user retweet rates serve as a better predictor for YouTube video popularity. For the sharing-centric analysis, we reveal existence of correlated behavior concerning the time between video creation and sharing within certain timescales, showing the time onset for a coherent response, and the time limit after which collective responses are extremely unlikely. We show that response times depend on video category, revealing that Twitter sharing of a video is highly dependent on its content. To the best of our knowledge this is the first large-scale study combining YouTube and Twitter data.

preprint2016arXiv

Women Through the Glass Ceiling: Gender Asymmetries in Wikipedia

Contributing to the writing of history has never been as easy as it is today thanks to Wikipedia, a community-created encyclopedia that aims to document the world's knowledge from a neutral point of view. Though everyone can participate it is well known that the editor community has a narrow diversity, with a majority of white male editors. While this participatory \emph{gender gap} has been studied extensively in the literature, this work sets out to \emph{assess potential gender inequalities in Wikipedia articles} along different dimensions: notability, topical focus, linguistic bias, structural properties, and meta-data presentation. We find that (i) women in Wikipedia are more notable than men, which we interpret as the outcome of a subtle glass ceiling effect; (ii) family-, gender-, and relationship-related topics are more present in biographies about women; (iii) linguistic bias manifests in Wikipedia since abstract terms tend to be used to describe positive aspects in the biographies of men and negative aspects in the biographies of women; and (iv) there are structural differences in terms of meta-data and hyperlinks, which have consequences for information-seeking activities. While some differences are expected, due to historical and social contexts, other differences are attributable to Wikipedia editors. The implications of such differences are discussed having Wikipedia contribution policies in mind. We hope that the present work will contribute to increased awareness about, first, gender issues in the content of Wikipedia, and second, the different levels on which gender biases can manifest on the Web.

preprint2015arXiv

Geography of Emotion: Where in a City are People Happier?

Location-sharing services were built upon people's desire to share their activities and locations with others. By "checking-in" to a place, such as a restaurant, a park, gym, or train station, people disclose where they are, thereby providing valuable information about land use and utilization of services in urban areas. This information may, in turn, be used to design smarter, happier, more equitable cities. We use data from Foursquare location-sharing service to identify areas within a major US metropolitan area with many check-ins, i.e., areas that people like to use. We then use data from the Twitter microblogging platform to analyze the properties of these areas. Specifically, we have extracted a large corpus of geo-tagged messages, called tweets, from a major metropolitan area and linked them US Census data through their locations. This allows us to measure the sentiment expressed in tweets that are posted from a specific area, and also use that area's demographic properties in analysis. Our results reveal that areas with many check-ins are different from other areas within the metropolitan region. In particular, these areas have happier tweets, which also encourage people from other areas to commute longer distances to these places. These findings shed light on human mobility patterns, as well as how physical environment influences human emotions.

preprint2015arXiv

Ideological and Temporal Components of Network Polarization in Online Political Participatory Media

Political polarization is traditionally analyzed through the ideological stances of groups and parties, but it also has a behavioral component that manifests in the interactions between individuals. We present an empirical analysis of the digital traces of politicians in politnetz.ch, a Swiss online platform focused on political activity, in which politicians interact by creating support links, comments, and likes. We analyze network polarization as the level of intra- party cohesion with respect to inter-party connectivity, finding that supports show a very strongly polarized structure with respect to party alignment. The analysis of this multiplex network shows that each layer of interaction contains relevant information, where comment groups follow topics related to Swiss politics. Our analysis reveals that polarization in the layer of likes evolves in time, increasing close to the federal elections of 2011. Furthermore, we analyze the internal social network of each party through metrics related to hierarchical structures, information efficiency, and social resilience. Our results suggest that the online social structure of a party is related to its ideology, and reveal that the degree of connectivity across two parties increases when they are close in the ideological space of a multi-party system.

preprint2015arXiv

It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia

Wikipedia is a community-created encyclopedia that contains information about notable people from different countries, epochs and disciplines and aims to document the world's knowledge from a neutral point of view. However, the narrow diversity of the Wikipedia editor community has the potential to introduce systemic biases such as gender biases into the content of Wikipedia. In this paper we aim to tackle a sub problem of this larger challenge by presenting and applying a computational method for assessing gender bias on Wikipedia along multiple dimensions. We find that while women on Wikipedia are covered and featured well in many Wikipedia language editions, the way women are portrayed starkly differs from the way men are portrayed. We hope our work contributes to increasing awareness about gender biases online, and in particular to raising attention to the different levels in which gender biases can manifest themselves on the web.

preprint2015arXiv

Sentiment cascades in the 15M movement

Recent grassroots movements have suggested that online social networks might play a key role in their organization, as adherents have a fast, many-to-many, communication channel to help coordinate their mobilization. The structure and dynamics of the networks constructed from the digital traces of protesters have been analyzed to some extent recently. However, less effort has been devoted to the analysis of the semantic content of messages exchanged during the protest. Using the data obtained from a microblogging service during the brewing and active phases of the 15M movement in Spain, we perform the first large scale test of theories on collective emotions and social interaction in collective actions. Our findings show that activity and information cascades in the movement are larger in the presence of negative collective emotions and when users express themselves in terms related to social content. At the level of individual participants, our results show that their social integration in the movement, as measured through social network metrics, increases with their level of engagement and of expression of negativity. Our findings show that non-rational factors play a role in the formation and activity of social movements through online media, having important consequences for viral spreading.

preprint2015arXiv

Social signals and algorithmic trading of Bitcoin

The availability of data on digital traces is growing to unprecedented sizes, but inferring actionable knowledge from large-scale data is far from being trivial. This is especially important for computational finance, where digital traces of human behavior offer a great potential to drive trading strategies. We contribute to this by providing a consistent approach that integrates various datasources in the design of algorithmic traders. This allows us to derive insights into the principles behind the profitability of our trading strategies. We illustrate our approach through the analysis of Bitcoin, a cryptocurrency known for its large price fluctuations. In our analysis, we include economic signals of volume and price of exchange for USD, adoption of the Bitcoin technology, and transaction volume of Bitcoin. We add social signals related to information search, word of mouth volume, emotional valence, and opinion polarization as expressed in tweets related to Bitcoin for more than 3 years. Our analysis reveals that increases in opinion polarization and exchange volume precede rising Bitcoin prices, and that emotional valence precedes opinion polarization and rising exchange volumes. We apply these insights to design algorithmic trading strategies for Bitcoin, reaching very high profits in less than a year. We verify this high profitability with robust statistical methods that take into account risk and trading costs, confirming the long-standing hypothesis that trading based social media sentiment has the potential to yield positive returns on investment.

preprint2014arXiv

Gender Asymmetries in Reality and Fiction: The Bechdel Test of Social Media

The subjective nature of gender inequality motivates the analysis and comparison of data from real and fictional human interaction. We present a computational extension of the Bechdel test: A popular tool to assess if a movie contains a male gender bias, by looking for two female characters who discuss about something besides a man. We provide the tools to quantify Bechdel scores for both genders, and we measure them in movie scripts and large datasets of dialogues between users of MySpace and Twitter. Comparing movies and users of social media, we find that movies and Twitter conversations have a consistent male bias, which does not appear when analyzing MySpace. Furthermore, the narrative of Twitter is closer to the movies that do not pass the Bechdel test than to those that pass it. We link the properties of movies and the users that share trailers of those movies. Our analysis reveals some particularities of movies that pass the Bechdel test: Their trailers are less popular, female users are more likely to share them than male users, and users that share them tend to interact less with male users. Based on our datasets, we define gender independence measurements to analyze the gender biases of a society, as manifested through digital traces of online behavior. Using the profile information of Twitter users, we find larger gender independence for urban users in comparison to rural ones. Additionally, the asymmetry between genders is larger for parents and lower for students. Gender asymmetry varies across US states, increasing with higher average income and latitude. This points to the relation between gender inequality and social, economical, and cultural factors of a society, and how gender roles exist in both fictional narratives and public online dialogues.

preprint2014arXiv

Online Privacy as a Collective Phenomenon

The problem of online privacy is often reduced to individual decisions to hide or reveal personal information in online social networks (OSNs). However, with the increasing use of OSNs, it becomes more important to understand the role of the social network in disclosing personal information that a user has not revealed voluntarily: How much of our private information do our friends disclose about us, and how much of our privacy is lost simply because of online social interaction? Without strong technical effort, an OSN may be able to exploit the assortativity of human private features, this way constructing shadow profiles with information that users chose not to share. Furthermore, because many users share their phone and email contact lists, this allows an OSN to create full shadow profiles for people who do not even have an account for this OSN. We empirically test the feasibility of constructing shadow profiles of sexual orientation for users and non-users, using data from more than 3 Million accounts of a single OSN. We quantify a lower bound for the predictive power derived from the social network of a user, to demonstrate how the predictability of sexual orientation increases with the size of this network and the tendency to share personal information. This allows us to define a privacy leak factor that links individual privacy loss with the decision of other individuals to disclose information. Our statistical analysis reveals that some individuals are at a higher risk of privacy loss, as prediction accuracy increases for users with a larger and more homogeneous first- and second-order neighborhood of their social network. While we do not provide evidence that shadow profiles exist at all, our results show that disclosing of private information is not restricted to an individual choice, but becomes a collective decision that has implications for policy and privacy regulation.

preprint2014arXiv

The digital traces of bubbles: feedback cycles between socio-economic signals in the Bitcoin economy

What is the role of social interactions in the creation of price bubbles? Answering this question requires obtaining collective behavioural traces generated by the activity of a large number of actors. Digital currencies offer a unique possibility to measure socio-economic signals from such digital traces. Here, we focus on Bitcoin, the most popular cryptocurrency. Bitcoin has experienced periods of rapid increase in exchange rates (price) followed by sharp decline; we hypothesise that these fluctuations are largely driven by the interplay between different social phenomena. We thus quantify four socio-economic signals about Bitcoin from large data sets: price on on-line exchanges, volume of word-of-mouth communication in on-line social media, volume of information search, and user base growth. By using vector autoregression, we identify two positive feedback loops that lead to price bubbles in the absence of exogenous stimuli: one driven by word of mouth, and the other by new Bitcoin adopters. We also observe that spikes in information search, presumably linked to external events, precede drastic price declines. Understanding the interplay between the socio-economic signals we measured can lead to applications beyond cryptocurrencies to other phenomena which leave digital footprints, such as on-line social network usage.

preprint2013arXiv

NEMESYS: Enhanced Network Security for Seamless Service Provisioning in the Smart Mobile Ecosystem

As a consequence of the growing popularity of smart mobile devices, mobile malware is clearly on the rise, with attackers targeting valuable user information and exploiting vulnerabilities of the mobile ecosystems. With the emergence of large-scale mobile botnets, smartphones can also be used to launch attacks on mobile networks. The NEMESYS project will develop novel security technologies for seamless service provisioning in the smart mobile ecosystem, and improve mobile network security through better understanding of the threat landscape. NEMESYS will gather and analyze information about the nature of cyber-attacks targeting mobile users and the mobile network so that appropriate counter-measures can be taken. We will develop a data collection infrastructure that incorporates virtualized mobile honeypots and a honeyclient, to gather, detect and provide early warning of mobile attacks and better understand the modus operandi of cyber-criminals that target mobile devices. By correlating the extracted information with the known patterns of attacks from wireline networks, we will reveal and identify trends in the way that cyber-criminals launch attacks against mobile devices.

preprint2013arXiv

Security for Smart Mobile Networks: The NEMESYS Approach

The growing popularity of smart mobile devices such as smartphones and tablets has made them an attractive target for cyber-criminals, resulting in a rapidly growing and evolving mobile threat as attackers experiment with new business models by targeting mobile users. With the emergence of the first large-scale mobile botnets, the core network has also become vulnerable to distributed denial-of-service attacks such as the signaling attack. Furthermore, complementary access methods such as Wi-Fi and femtocells introduce additional vulnerabilities for the mobile users as well as the core network. In this paper, we present the NEMESYS approach to smart mobile network security. The goal of the NEMESYS project is to develop novel security technologies for seamless service provisioning in the smart mobile ecosystem, and to improve mobile network security through a better understanding of the threat landscape. To this purpose, NEMESYS will collect and analyze information about the nature of cyber-attacks targeting smart mobile devices and the core network so that appropriate counter-measures can be taken. We are developing a data collection infrastructure that incorporates virtualized mobile honeypots and honeyclients in order to gather, detect and provide early warning of mobile attacks and understand the modus operandi of cyber-criminals that target mobile devices. By correlating the extracted information with known attack patterns from wireline networks, we plan to reveal and identify the possible shift in the way that cyber-criminals launch attacks against smart mobile devices.

preprint2013arXiv

Social Resilience in Online Communities: The Autopsy of Friendster

We empirically analyze five online communities: Friendster, Livejournal, Facebook, Orkut, Myspace, to identify causes for the decline of social networks. We define social resilience as the ability of a community to withstand changes. We do not argue about the cause of such changes, but concentrate on their impact. Changes may cause users to leave, which may trigger further leaves of others who lost connection to their friends. This may lead to cascades of users leaving. A social network is said to be resilient if the size of such cascades can be limited. To quantify resilience, we use the k-core analysis, to identify subsets of the network in which all users have at least k friends. These connections generate benefits (b) for each user, which have to outweigh the costs (c) of being a member of the network. If this difference is not positive, users leave. After all cascades, the remaining network is the k-core of the original network determined by the cost-to-benefit c/b ratio. By analysing the cumulative distribution of k-cores we are able to calculate the number of users remaining in each community. This allows us to infer the impact of the c/b ratio on the resilience of these online communities. We find that the different online communities have different k-core distributions. Consequently, similar changes in the c/b ratio have a different impact on the amount of active users. As a case study, we focus on the evolution of Friendster. We identify time periods when new users entering the network observed an insufficient c/b ratio. This measure can be seen as a precursor of the later collapse of the community. Our analysis can be applied to estimate the impact of changes in the user interface, which may temporarily increase the c/b ratio, thus posing a threat for the community to shrink, or even to collapse.

preprint2013arXiv

The Role of Emotions in Contributors Activity: A Case Study on the GENTOO Community

We analyse the relation between the emotions and the activity of contributors in the Open Source Software project Gentoo. Our case study builds on extensive data sets from the project's bug tracking platform Bugzilla, to quantify the activity of contributors, and its mail archives, to quantify the emotions of contributors by means of sentiment analysis. The Gentoo project is known for a period of centralization within its bug triaging community. This was followed by considerable changes in community organization and performance after the sudden retirement of the central contributor. We analyse how this event correlates with the negative emotions, both in bilateral email discussions with the central contributor, and at the level of the whole community of contributors. We then extend our study to consider the activity patterns on Gentoo contributors in general. We find that contributors are more likely to become inactive when they express strong positive or negative emotions in the bug tracker, or when they deviate from the expected value of emotions in the mailing list. We use these insights to develop a Bayesian classifier that detects the risk of contributors leaving the project. Our analysis opens new perspectives for measuring online contributor motivation by means of sentiment analysis and for real-time predictions of contributor turnover in Open Source Software projects.

preprint2012arXiv

Agent-based simulations of emotion spreading in online social networks

Quantitative analysis of empirical data from online social networks reveals group dynamics in which emotions are involved (Šuvakov et al). Full understanding of the underlying mechanisms, however, remains a challenging task. Using agent-based computer simulations, in this paper we study dynamics of emotional communications in online social networks. The rules that guide how the agents interact are motivated, and the realistic network structure and some important parameters are inferred from the empirical dataset of \texttt{MySpace} social network. Agent's emotional state is characterized by two variables representing psychological arousal---reactivity to stimuli, and valence---attractiveness or aversiveness, by which common emotions can be defined. Agent's action is triggered by increased arousal. High-resolution dynamics is implemented where each message carrying agent's emotion along the network link is identified and its effect on the recipient agent is considered as continuously aging in time. Our results demonstrate that (i) aggregated group behaviors may arise from individual emotional actions of agents; (ii) collective states characterized by temporal correlations and dominant positive emotions emerge, similar to the empirical system; (iii) nature of the driving signal---rate of user's stepping into online world, has profound effects on building the coherent behaviors, which are observed for users in online social networks. Further, our simulations suggest that spreading patterns differ for the emotions, e.g., "enthusiastic" and "ashamed", which have entirely different emotional content. {\bf {All data used in this study are fully anonymized.}}

preprint2012arXiv

Emotional persistence in online chatting communities

How do users behave in online chatrooms, where they instantaneously read and write posts? We analyzed about 2.5 million posts covering various topics in Internet relay channels, and found that user activity patterns follow known power-law and stretched exponential distributions, indicating that online chat activity is not different from other forms of communication. Analysing the emotional expressions (positive, negative, neutral) of users, we revealed a remarkable persistence both for individual users and channels. I.e. despite their anonymity, users tend to follow social norms in repeated interactions in online chats, which results in a specific emotional "tone" of the channels. We provide an agent-based model of emotional interaction, which recovers qualitatively both the activity patterns in chatrooms and the emotional persistence of users and channels. While our assumptions about agent's emotional expressions are rooted in psychology, the model allows to test different hypothesis regarding their emotional impact in online communication.

preprint2012arXiv

Positive words carry less information than negative words

We show that the frequency of word use is not only determined by the word length \cite{Zipf1935} and the average information content \cite{Piantadosi2011}, but also by its emotional content. We have analyzed three established lexica of affective word usage in English, German, and Spanish, to verify that these lexica have a neutral, unbiased, emotional content. Taking into account the frequency of word usage, we find that words with a positive emotional content are more frequently used. This lends support to Pollyanna hypothesis \cite{Boucher1969} that there should be a positive bias in human expression. We also find that negative words contain more information than positive words, as the informativeness of a word increases uniformly with its valence decrease. Our findings support earlier conjectures about (i) the relation between word frequency and information content, and (ii) the impact of positive emotions on communication and social links.

preprint2010arXiv

An Agent-Based Model of Collective Emotions in Online Communities

We develop a agent-based framework to model the emergence of collective emotions, which is applied to online communities. Agents individual emotions are described by their valence and arousal. Using the concept of Brownian agents, these variables change according to a stochastic dynamics, which also considers the feedback from online communication. Agents generate emotional information, which is stored and distributed in a field modeling the online medium. This field affects the emotional states of agents in a non-linear manner. We derive conditions for the emergence of collective emotions, observable in a bimodal valence distribution. Dependent on a saturated or a superlinear feedback between the information field and the agent's arousal, we further identify scenarios where collective emotions only appear once or in a repeated manner. The analytical results are illustrated by agent-based computer simulations. Our framework provides testable hypotheses about the emergence of collective emotions, which can be verified by data from online communities.

preprint1997arXiv

Heavy charged Higgs boson decaying into top quark in the MSSM

Observing a heavy charged Higgs boson produced in the near future at the Tevatron or at the LHC would be instant evidence of physics beyond the Standard Model. Whether such a Higgs boson would be supersymmetric or not it could only be decided after accurate prediction of its properties. Here we compute the decay width of the dominant decay of such a boson, namely H^+ -> t \bar{b}, including the leading electroweak corrections originating from large Yukawa couplings within the MSSM. These electroweak effects turn out to be of comparable size to the O(alpha_s) QCD corrections in relevant portions of the MSSM parameter space. Our analysis incorporates the stringent low-energy constraints imposed by radiative B-meson decays.

preprint1996arXiv

Quantum effects on $t\to H^{+} b$ in the MSSM: a window to ``virtual'' supersymmetry?

We analyze the one-loop effects (strong and electroweak) on the unconventional top quark decay mode $t\rightarrow H^{+} b$ within the MSSM. The results are presented in the on-shell renormalization scheme with a physically well motivated definition of $\tanβ$. The study of this process at the quantum level is useful to unravel the potential supersymmetric nature of the charged Higgs emerging from that decay. As compared with the standard mode $t\rightarrow W^{+} b$, the corrections to $t\rightarrow H^{+} b$ are large, slowly decoupling and persist at a sizeable level even for all sparticle masses well above the LEP 200 discovery range. As a matter of fact, the potential size of the SUSY effects, which amount to corrections of several ten percent, could counterbalance the standard QCD corrections and even make them to appear with the ``wrong'' sign. Therefore, if the charged Higgs decay of the top quark is kinematically allowed -a possibility which is not excluded by the recent measurements of the branching ratio $BR(t\rightarrow W^{+} b)$ at the Tevatron - it could be an invaluable laboratory to search for ``virtual'' supersymmetry. While a first significant test of these effects could possibly be performed at the upgraded Tevatron, a more precise verification would most likely be carried out in future experiments at the LHC.

preprint1994arXiv

Electroweak Supersymmetric Quantum Corrections to the Top Quark Width

Within the framework of the MSSM, we compute the electroweak one-loop supersymmetric quantum corrections to the width $Γ(t\rightarrow W^{+}\, b)$ of the canonical main decay of the top quark. The results are presented in two on-shell renormalization schemes parametrized either by $α$ or $G_F$. While in the standard model, and in the Higgs sector of the MSSM, the electroweak radiative corrections in the $G_F$-scheme are rather insensitive to the top quark mass and are of order of $1\%$ at most, the rest (``genuine'' part) of the supersymmetric quantum effects in the MSSM amount to a non-negligible correction that could be about one order of magnitude larger, depending on the top quark mass and of the region of the supersymmetric parameter space. These new electroweak effects, therefore, could be of the same order (and go in the same direction) as the conventional leading QCD corrections.

David Garcia

What is connected

Connect this record

See the researcher in context

Building this map preview

36 published item(s)

Automatic Classifiers Underdetect Emotions Expressed by Men

Conformity and Social Impact on AI Agents

Conformity Generates Collective Misalignment in AI Agents Societies

Assessment of the effectiveness of Omicron transmission mitigation strategies for European universities using an agent-based network model

Detecting potentially harmful and protective suicide-related content on twitter: A machine learning approach

Social media sharing by political elites: An asymmetric American exceptionalism

Validating daily social media macroscopes of emotions

A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency

Dashboard of sentiment in Austrian social media during COVID-19

Fragile, yet resilient: Adaptive decline in a collaboration network of firms

Improving accuracy and speeding up Document Image Classification through parallel systems

Emotions, Demographics and Sociability in Twitter Interactions

The Dynamics of Emotions in Online Interaction

The QWERTY effect on the web: How typing shapes the meaning of words in online human-computer interaction

When the Filter Bubble Bursts: Collective Evaluation Dynamics in Online Communities

Who Watches (and Shares) What on YouTube? And When? Using Twitter to Understand YouTube Viewership

Women Through the Glass Ceiling: Gender Asymmetries in Wikipedia

Geography of Emotion: Where in a City are People Happier?

Ideological and Temporal Components of Network Polarization in Online Political Participatory Media

It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia

Sentiment cascades in the 15M movement

Social signals and algorithmic trading of Bitcoin

Gender Asymmetries in Reality and Fiction: The Bechdel Test of Social Media

Online Privacy as a Collective Phenomenon

The digital traces of bubbles: feedback cycles between socio-economic signals in the Bitcoin economy

NEMESYS: Enhanced Network Security for Seamless Service Provisioning in the Smart Mobile Ecosystem

Security for Smart Mobile Networks: The NEMESYS Approach

Social Resilience in Online Communities: The Autopsy of Friendster

The Role of Emotions in Contributors Activity: A Case Study on the GENTOO Community

Agent-based simulations of emotion spreading in online social networks

Emotional persistence in online chatting communities

Positive words carry less information than negative words

An Agent-Based Model of Collective Emotions in Online Communities

Heavy charged Higgs boson decaying into top quark in the MSSM

Quantum effects on $t\to H^{+} b$ in the MSSM: a window to ``virtual'' supersymmetry?

Electroweak Supersymmetric Quantum Corrections to the Top Quark Width