Source author record

Markus Strohmaier

Markus Strohmaier appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks physics.soc-ph cs.CY Computation and Language Information Retrieval Artificial Intelligence Digital Libraries Machine Learning Human-Computer Interaction physics.data-an Data Structures and Algorithms Multiagent Systems nlin.AO

Catalog footprint

What is connected

36works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

SensePOLAR: Word sense aware interpretability for pre-trained contextual word embeddings

Adding interpretability to word embeddings represents an area of active research in text representation. Recent work has explored thepotential of embedding words via so-called polar dimensions (e.g. good vs. bad, correct vs. wrong). Examples of such recent approaches include SemAxis, POLAR, FrameAxis, and BiImp. Although these approaches provide interpretable dimensions for words, they have not been designed to deal with polysemy, i.e. they can not easily distinguish between different senses of words. To address this limitation, we present SensePOLAR, an extension of the original POLAR framework that enables word-sense aware interpretability for pre-trained contextual word embeddings. The resulting interpretable word embeddings achieve a level of performance that is comparable to original contextual word embeddings across a variety of natural language processing tasks including the GLUE and SQuAD benchmarks. Our work removes a fundamental limitation of existing approaches by offering users sense aware interpretations for contextual word embeddings.

preprint2022arXiv

Characterizing the country-wide adoption and evolution of the Jodel messaging app in Saudi Arabia

Social media is subject to constant growth and evolution, yet little is known about their early phases of adoption. To shed light on this aspect, this paper empirically characterizes the initial and country-wide adoption of a new type of social media in Saudi Arabia that happened in 2017. Unlike established social media, the studied network Jodel is anonymous and location-based to form hundreds of independent communities country-wide whose adoption pattern we compare. We take a detailed and full view from the operators perspective on the temporal and geographical dimension on the evolution of these different communities -- from their very first the first months of establishment to saturation. This way, we make the early adoption of a new type of social media visible, a process that is often invisible due to the lack of data covering the first days of a new network.

preprint2022arXiv

Combining sensors and surveys to study social contexts: Case of scientific conferences

In this paper, we present a unique collection of four data sets to study social behaviour. The data were collected at four international scientific conferences, during which we measured face-to-face contacts along with additional information about individuals. Building on innovative methods developed in the last decade to study human social behaviour, interactions between participants were monitored using the SocioPatterns platform, which allows collecting face-to-face physical proximity events every 20 seconds in a well-defined social context. Through accompanying surveys, we gathered extensive information about the participants, including sociodemographic characteristics, Big Five personality traits, DIAMONDS situational perceptions, measure of scientific attractiveness, motivations for attending the conferences, and perceptions of the crowd (e.g., in terms of gender distribution). Linking the sensor and survey data provides a rich window into social behaviour: At the individual level, the data sets allow personality scientists to investigate individual differences in social behaviour and pinpoint which individual characteristics (e.g., social roles, personality traits, situational perceptions) drive these individual differences. At the group level, the data also allow studying the mechanisms responsible for interacting patterns within a scientific crowd during a social, networking and idea-sharing event. The data are available for secondary analysis.

preprint2022arXiv

Group mixing drives inequality in face-to-face gatherings

Uncovering how inequality emerges from human interaction is imperative for just societies. Here we show that the way social groups interact in face-to-face situations can enable the emergence of disparities in the visibility of social groups. These disparities translate into members of specific social groups having fewer social ties than the average (i.e., degree inequality). We characterize group degree inequality in sensor-based data sets and present a mechanism that explains these disparities as the result of group mixing and group-size imbalance. We investigate how group sizes affect this inequality, thereby uncovering the critical size and mixing conditions in that a critical minority group emerges. If a minority group is larger than this critical size, it can be a well-connected, cohesive group; if it is smaller, minority cohesion widens degree inequality. Finally, we expose the under-representation of individuals in degree rankings due to mixing dynamics and propose a way to reduce such biases.

preprint2022arXiv

Inequality and Inequity in Network-based Ranking and Recommendation Algorithms

Though algorithms promise many benefits including efficiency, objectivity and accuracy, they may also introduce or amplify biases. Here we study two well-known algorithms, namely PageRank and Who-to-Follow (WTF), and show to what extent their ranks produce inequality and inequity when applied to directed social networks. To this end, we propose a directed network model with preferential attachment and homophily (DPAH) and demonstrate the influence of network structure on the rank distributions of these algorithms. Our main findings suggest that (i) inequality is positively correlated with inequity, (ii) inequality is driven by the interplay between preferential attachment, homophily, node activity and edge density, and (iii) inequity is driven by the interplay between homophily and minority size. In particular, these two algorithms reduce, replicate and amplify the representation of minorities in top ranks when majorities are homophilic, neutral and heterophilic, respectively. Moreover, when this representation is reduced, minorities may improve their visibility in the rank by connecting strategically in the network. For instance, by increasing their out-degree or homophily when majorities are also homophilic. These findings shed light on the social and algorithmic mechanisms that hinder equality and equity in network-based ranking and recommendation algorithms.

preprint2022arXiv

Minorities in networks and algorithms

In this chapter, we provide an overview of recent advances in data-driven and theory-informed complex models of social networks and their potential in understanding societal inequalities and marginalization. We focus on inequalities arising from networks and network-based algorithms and how they affect minorities. In particular, we examine how homophily and mixing biases shape large and small social networks, influence perception of minorities, and affect collaboration patterns. We also discuss dynamical processes on and of networks and the formation of norms and health inequalities. Additionally, we argue that network modeling is paramount for unveiling the effect of ranking and social recommendation algorithms on the visibility of minorities. Finally, we highlight the key challenges and future opportunities in this emerging research topic.

preprint2021arXiv

Simulating systematic bias in attributed social networks and its effect on rankings of minority nodes

Network analysis provides powerful tools to learn about a variety of social systems. However, most analyses implicitly assume that the considered relational data is error-free, reliable and accurately reflects the system to be analysed. Especially if the network consists of multiple groups, this assumption conflicts with a range of systematic biases, measurement errors and other inaccuracies that are well documented in the literature. To investigate the effects of such errors we introduce a framework for simulating systematic bias in attributed networks. Our framework enables us to model erroneous edge observations that are driven by external node attributes or errors arising from the (hidden) network structure itself. We exemplify how systematic inaccuracies distort conclusions drawn from network analyses on the network analysis task of minority representations in degree-based rankings. By analysing synthetic and real networks with varying homophily levels and group sizes, we find that introducing systematic edge errors can result both in a strongly increased or decreased ranking of the minority. The observed effect depends both on the type of edge error considered and level of homophily in the system. We thus conclude that the implications of systematic bias in edge data depend on an interplay between network topology and type of systematic error. This emphasises the need for an error model framework as developed here, which provides a first step towards studying the effects of systematic edge-uncertainty for various network analysis tasks.

preprint2020arXiv

Global gender differences in Wikipedia readership

Wikipedia represents the largest and most popular source of encyclopedic knowledge in the world today, aiming to provide equal access to information worldwide. From a global online survey of 65,031 readers of Wikipedia and their corresponding reading logs, we present novel evidence of gender differences in Wikipedia readership and how they manifest in records of user behavior. More specifically we report that (1) women are underrepresented among readers of Wikipedia, (2) women view fewer pages per reading session than men do, (3) men and women visit Wikipedia for similar reasons, and (4) men and women exhibit specific topical preferences. Our findings lay the foundation for identifying pathways toward knowledge equity in the usage of online encyclopedic knowledge.

preprint2020arXiv

Joint Multiclass Debiasing of Word Embeddings

Bias in Word Embeddings has been a subject of recent interest, along with efforts for its reduction. Current approaches show promising progress towards debiasing single bias dimensions such as gender or race. In this paper, we present a joint multiclass debiasing approach that is capable of debiasing multiple bias dimensions simultaneously. In that direction, we present two approaches, HardWEAT and SoftWEAT, that aim to reduce biases by minimizing the scores of the Word Embeddings Association Test (WEAT). We demonstrate the viability of our methods by debiasing Word Embeddings on three classes of biases (religion, gender and race) in three different publicly available word embeddings and show that our concepts can both reduce or even completely eliminate bias, while maintaining meaningful relationships between vectors in word embeddings. Our work strengthens the foundation for more unbiased neural representations of textual data.

preprint2020arXiv

The Effects of Gender Signals and Performance in Online Product Reviews

This work quantifies the effects of signaling and performing gender on the success of reviews written on the popular amazon shopping platform. Highly rated reviews play an important role in e-commerce since they are prominently displayed below products. Differences in how gender-signaling and gender-performing review authors are received can lead to important biases in what content and perspectives are represented among top reviews. To investigate this, we extract signals of author gender from user names, distinguishing reviews where the author's likely gender can be inferred. Using reviews authored by these gender-signaling authors, we train a deep-learning classifier to quantify the gendered writing style or gendered performance of reviews written by authors who do not send clear gender signals via their user name. We contrast the effects of gender signaling and performance on review success using matching experiments. While we find no general trend that gendered signals or performances influence overall review success, we find strong context-specific effects. For example, reviews in product categories such as Electronics or Computers are perceived as less helpful when authors signal that they are likely woman, but are received as more helpful in categories such as Beauty or Clothing. In addition to these interesting findings, our work provides a general chain of tools for studying gender-specific effects across various social media platforms.

preprint2020arXiv

The POLAR Framework: Polar Opposites Enable Interpretability of Pre-Trained Word Embeddings

We introduce POLAR - a framework that adds interpretability to pre-trained word embeddings via the adoption of semantic differentials. Semantic differentials are a psychometric construct for measuring the semantics of a word by analysing its position on a scale between two polar opposites (e.g., cold -- hot, soft -- hard). The core idea of our approach is to transform existing, pre-trained word embeddings via semantic differentials to a new "polar" space with interpretable dimensions defined by such polar opposites. Our framework also allows for selecting the most discriminative dimensions from a set of polar dimensions provided by an oracle, i.e., an external source. We demonstrate the effectiveness of our framework by deploying it to various downstream tasks, in which our interpretable word embeddings achieve a performance that is comparable to the original word embeddings. We also show that the interpretable dimensions selected by our framework align with human judgement. Together, these results demonstrate that interpretability can be added to word embeddings without compromising performance. Our work is relevant for researchers and engineers interested in interpreting pre-trained word embeddings.

preprint2020arXiv

Word-Emoji Embeddings from large scale Messaging Data reflect real-world Semantic Associations of Expressive Icons

We train word-emoji embeddings on large scale messaging data obtained from the Jodel online social network. Our data set contains more than 40 million sentences, of which 11 million sentences are annotated with a subset of the Unicode 13.0 standard Emoji list. We explore semantic emoji associations contained in this embedding by analyzing associations between emojis, between emojis and text, and between text and emojis. Our investigations demonstrate anecdotally that word-emoji embeddings trained on large scale messaging data can reflect real-world semantic associations. To enable further research we release the Jodel Emoji Embedding Dataset (JEED1488) containing 1488 emojis and their embeddings along 300 dimensions.

preprint2019arXiv

Homophily and minority size explain perception biases in social networks

People's perceptions about the size of minority groups in social networks can be biased, often showing systematic over- or underestimation. These social perception biases are often attributed to biased cognitive or motivational processes. Here we show that both over- and underestimation of the size of a minority group can emerge solely from structural properties of social networks. Using a generative network model, we show analytically that these biases depend on the level of homophily and its asymmetric nature, as well as on the size of the minority group. Our model predictions correspond well with empirical data from a cross-cultural survey and with numerical calculations on six real-world networks. We also show under what circumstances individuals can reduce their biases by relying on perceptions of their neighbors. This work advances our understanding of the impact of network structure on social perception biases and offers a quantitative approach for addressing related issues in society.

preprint2016arXiv

A System for Probabilistic Linking of Thesauri and Classification Systems

This paper presents a system which creates and visualizes probabilistic semantic links between concepts in a thesaurus and classes in a classification system. For creating the links, we build on the Polylingual Labeled Topic Model (PLL-TM). PLL-TM identifies probable thesaurus descriptors for each class in the classification system by using information from the natural language text of documents, their assigned thesaurus descriptors and their designated classes. The links are then presented to users of the system in an interactive visualization, providing them with an automatically generated overview of the relations between the thesaurus and the classification system.

preprint2016arXiv

Activity Dynamics in Collaboration Networks

Many online collaboration networks struggle to gain user activity and become self-sustaining due to the ramp-up problem or dwindling activity within the system. Prominent examples include online encyclopedias such as (Semantic) MediaWikis, Question and Answering portals such as StackOverflow, and many others. Only a small fraction of these systems manage to reach self-sustaining activity, a level of activity that prevents the system from reverting to a non-active state. In this paper, we model and analyze activity dynamics in synthetic and empirical collaboration networks. Our approach is based on two opposing and well-studied principles: (i) without incentives, users tend to lose interest to contribute and thus, systems become inactive, and (ii) people are susceptible to actions taken by their peers (social or peer influence). With the activity dynamics model that we introduce in this paper we can represent typical situations of such collaboration networks. For example, activity in a collaborative network, without external impulses or investments, will vanish over time, eventually rendering the system inactive. However, by appropriately manipulating the activity dynamics and/or the underlying collaboration networks, we can jump-start a previously inactive system and advance it towards an active state. To be able to do so, we first describe our model and its underlying mechanisms. We then provide illustrative examples of empirical datasets and characterize the barrier that has to be breached by a system before it can become self-sustaining in terms of critical mass and activity dynamics. Additionally, we expand on this empirical illustration and introduce a new metric p---the Activity Momentum---to assess the activity robustness of collaboration networks.

preprint2016arXiv

Assessing the Navigational Effects of Click Biases and Link Insertion on the Web

Websites have an inherent interest in steering user navigation in order to, for example, increase sales of specific products or categories, or to guide users towards specific information. In general, website administrators can use the following two strategies to influence their visitors' navigation behavior. First, they can introduce click biases to reinforce specific links on their website by changing their visual appearance, for example, by locating them on the top of the page. Second, they can utilize link insertion to generate new paths for users to navigate over. In this paper, we present a novel approach for measuring the potential effects of these two strategies on user navigation. Our results suggest that, depending on the pages for which we want to increase user visits, optimal link modification strategies vary. Moreover, simple topological measures can be used as proxies for assessing the impact of the intended changes on the navigation of users, even before these changes are implemented.

preprint2016arXiv

Discovering and Characterizing Mobility Patterns in Urban Spaces: A Study of Manhattan Taxi Data

Nowadays, human movement in urban spaces can be traced digitally in many cases. It can be observed that movement patterns are not constant, but vary across time and space. In this work,we characterize such spatio-temporal patterns with an innovative combination of two separate approaches that have been utilized for studying human mobility in the past. First, by using non-negative tensor factorization (NTF), we are able to cluster human behavior based on spatio-temporal dimensions. Second, for understanding these clusters, we propose to use HypTrails, a Bayesian approach for expressing and comparing hypotheses about human trails. To formalize hypotheses we utilize data that is publicly available on the Web, namely Foursquare data and census data provided by an open data platform. By applying this combination of approaches to taxi data in Manhattan, we can discover and explain different patterns in human mobility that cannot be identified in a collective analysis. As one example, we can find a group of taxi rides that end at locations with a high number of party venues (according to Foursquare) on weekend nights. Overall, our work demonstrates that human mobility is not one-dimensional but rather contains different facets both in time and space which we explain by utilizing online data. The findings of this paper argue for a more fine-grained analysis of human mobility in order to make more informed decisions for e.g., enhancing urban structures, tailored traffic control and location-based recommender systems.

preprint2016arXiv

Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains

Biomedical taxonomies, thesauri and ontologies in the form of the International Classification of Diseases (ICD) as a taxonomy or the National Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in acquiring, representing and processing information about human health. With increasing adoption and relevance, biomedical ontologies have also significantly increased in size. For example, the 11th revision of the ICD, which is currently under active development by the WHO contains nearly 50,000 classes representing a vast variety of different diseases and causes of death. This evolution in terms of size was accompanied by an evolution in the way ontologies are engineered. Because no single individual has the expertise to develop such large-scale ontologies, ontology-engineering projects have evolved from small-scale efforts involving just a few domain experts to large-scale projects that require effective collaboration between dozens or even hundreds of experts, practitioners and other stakeholders. Understanding how these stakeholders collaborate will enable us to improve editing environments that support such collaborations. We uncover how large ontology-engineering projects, such as the ICD in its 11th revision, unfold by analyzing usage logs of five different biomedical ontology-engineering projects of varying sizes and scopes using Markov chains. We discover intriguing interaction patterns (e.g., which properties users subsequently change) that suggest that large collaborative ontology-engineering projects are governed by a few general principles that determine and drive development. From our analysis, we identify commonalities and differences between different projects that have implications for project managers, ontology editors, developers and contributors working on collaborative ontology-engineering projects and tools in the biomedical domain.

preprint2016arXiv

How to Apply Markov Chains for Modeling Sequential Edit Patterns in Collaborative Ontology-Engineering Projects

With the growing popularity of large-scale collaborative ontology-engineering projects, such as the creation of the 11th revision of the International Classification of Diseases, we need new methods and insights to help project- and community-managers to cope with the constantly growing complexity of such projects. In this paper, we present a novel application of Markov chains to model sequential usage patterns that can be found in the change-logs of collaborative ontology-engineering projects. We provide a detailed presentation of the analysis process, describing all the required steps that are necessary to apply and determine the best fitting Markov chain model. Amongst others, the model and results allow us to identify structural properties and regularities as well as predict future actions based on usage sequences. We are specifically interested in determining the appropriate Markov chain orders which postulate on how many previous actions future ones depend on. To demonstrate the practical usefulness of the extracted Markov chains we conduct sequential pattern analyses on a large-scale collaborative ontology-engineering dataset, the International Classification of Diseases in its 11th revision. To further expand on the usefulness of the presented analysis, we show that the collected sequential patterns provide potentially actionable information for user-interface designers, ontology-engineering tool developers and project-managers to monitor, coordinate and dynamically adapt to the natural development processes that occur when collaboratively engineering an ontology. We hope that presented work will spur a new line of ontology-development tools, evaluation-techniques and new insights, further taking the interactive nature of the collaborative ontology-engineering process into consideration.

preprint2016arXiv

How Users Explore Ontologies on the Web: A Study of NCBO's BioPortal Usage Logs

Ontologies in the biomedical domain are numerous, highly specialized and very expensive to develop. Thus, a crucial prerequisite for ontology adoption and reuse is effective support for exploring and finding existing ontologies. Towards that goal, the National Center for Biomedical Ontology (NCBO) has developed BioPortal---an online repository designed to support users in exploring and finding more than 500 existing biomedical ontologies. In 2016, BioPortal represents one of the largest portals for exploration of semantic biomedical vocabularies and terminologies, which is used by many researchers and practitioners. While usage of this portal is high, we know very little about how exactly users search and explore ontologies and what kind of usage patterns or user groups exist in the first place. Deeper insights into user behavior on such portals can provide valuable information to devise strategies for a better support of users in exploring and finding existing ontologies, and thereby enable better ontology reuse. To that end, we study and group users according to their browsing behavior on BioPortal using data mining techniques. Additionally, we use the obtained groups to characterize and compare exploration strategies across ontologies. In particular, we were able to identify seven distinct browsing-behavior types, which all make use of different functionality provided by BioPortal. For example, Search Explorers make extensive use of the search functionality while Ontology Tree Explorers mainly rely on the class hierarchy to explore ontologies. Further, we show that specific characteristics of ontologies influence the way users explore and interact with the website. Our results may guide the development of more user-oriented systems for ontology exploration on the Web.

preprint2016arXiv

Inferring Gender from Names on the Web: A Comparative Evaluation of Gender Detection Methods

Computational social scientists often harness the Web as a "societal observatory" where data about human social behavior is collected. This data enables novel investigations of psychological, anthropological and sociological research questions. However, in the absence of demographic information, such as gender, many relevant research questions cannot be addressed. To tackle this problem, researchers often rely on automated methods to infer gender from name information provided on the web. However, little is known about the accuracy of existing gender-detection methods and how biased they are against certain sub-populations. In this paper, we address this question by systematically comparing several gender detection methods on a random sample of scientists for whom we know their full name, their gender and the country of their workplace. We further suggest a novel method that employs web-based image retrieval and gender recognition in facial images in order to augment name-based approaches. Our findings show that the performance of name-based gender detection approaches can be biased towards countries of origin and such biases can be reduced by combining name-based an image-based gender detection methods.

preprint2016arXiv

Linguistic neighbourhoods: explaining cultural borders on Wikipedia through multilingual co-editing activity

In this paper, we study the network of global interconnections between language communities, based on shared co-editing interests of Wikipedia editors, and show that although English is discussed as a potential lingua franca of the digital space, its domination disappears in the network of co-editing similarities, and instead local connections come to the forefront. Out of the hypotheses we explored, bilingualism, linguistic similarity of languages, and shared religion provide the best explanations for the similarity of interests between cultural communities. Population attraction and geographical proximity are also significant, but much weaker factors bringing communities together. In addition, we present an approach that allows for extracting significant cultural borders from editing activity of Wikipedia users, and comparing a set of hypotheses about the social mechanisms generating these borders. Our study sheds light on how culture is reflected in the collective process of archiving knowledge on Wikipedia, and demonstrates that cross-lingual interconnections on Wikipedia are not dominated by one powerful language. Our findings also raise some important policy questions for the Wikimedia Foundation.

preprint2016arXiv

The QWERTY effect on the web: How typing shapes the meaning of words in online human-computer interaction

The QWERTY effect postulates that the keyboard layout influences word meanings by linking positivity to the use of the right hand and negativity to the use of the left hand. For example, previous research has established that words with more right hand letters are rated more positively than words with more left hand letters by human subjects in small scale experiments. In this paper, we perform large scale investigations of the QWERTY effect on the web. Using data from eleven web platforms related to products, movies, books, and videos, we conduct observational tests whether a hand-meaning relationship can be found in decoding text on the web. Furthermore, we investigate whether encoding text on the web exhibits the QWERTY effect as well, by analyzing the relationship between the text of online reviews and their star ratings in four additional datasets. Overall, we find robust evidence for the QWERTY effect both at the point of text interpretation (decoding) and at the point of text creation (encoding). We also find under which conditions the effect might not hold. Our findings have implications for any algorithmic method aiming to evaluate the meaning of words on the web, including for example semantic or sentiment analysis, and show the existence of "dactilar onomatopoeias" that shape the dynamics of word-meaning associations. To the best of our knowledge, this is the first work to reveal the extent to which the QWERTY effect exists in large scale human-computer interaction on the web.

preprint2015arXiv

HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web

When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that drive the production of these trails can be useful for e.g., improving underlying network structures, predicting user clicks or enhancing recommendations. In this work, we present a general approach called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states. Our approach utilizes Markov chain models with Bayesian inference. The main idea is to incorporate hypotheses as informative Dirichlet priors and to leverage the sensitivity of Bayes factors on the prior for comparing hypotheses with each other. For eliciting Dirichlet priors from hypotheses, we present an adaption of the so-called (trial) roulette method. We demonstrate the general mechanics and applicability of HypTrails by performing experiments with (i) synthetic trails for which we control the mechanisms that have produced them and (ii) empirical trails stemming from different domains including website navigation, business reviews and online music played. Our work expands the repertoire of methods available for studying human trails on the Web.

preprint2015arXiv

Improving Reachability and Navigability in Recommender Systems

In this paper, we investigate recommender systems from a network perspective and investigate recommendation networks, where nodes are items (e.g., movies) and edges are constructed from top-N recommendations (e.g., related movies). In particular, we focus on evaluating the reachability and navigability of recommendation networks and investigate the following questions: (i) How well do recommendation networks support navigation and exploratory search? (ii) What is the influence of parameters, in particular different recommendation algorithms and the number of recommendations shown, on reachability and navigability? and (iii) How can reachability and navigability be improved in these networks? We tackle these questions by first evaluating the reachability of recommendation networks by investigating their structural properties. Second, we evaluate navigability by simulating three different models of information seeking scenarios. We find that with standard algorithms, recommender systems are not well suited to navigation and exploration and propose methods to modify recommendations to improve this. Our work extends from one-click-based evaluations of recommender systems towards multi-click analysis (i.e., sequences of dependent clicks) and presents a general, comprehensive approach to evaluating navigability of arbitrary recommendation networks.

preprint2015arXiv

It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia

Wikipedia is a community-created encyclopedia that contains information about notable people from different countries, epochs and disciplines and aims to document the world's knowledge from a neutral point of view. However, the narrow diversity of the Wikipedia editor community has the potential to introduce systemic biases such as gender biases into the content of Wikipedia. In this paper we aim to tackle a sub problem of this larger challenge by presenting and applying a computational method for assessing gender bias on Wikipedia along multiple dimensions. We find that while women on Wikipedia are covered and featured well in many Wikipedia language editions, the way women are portrayed starkly differs from the way men are portrayed. We hope our work contributes to increasing awareness about gender biases online, and in particular to raising attention to the different levels in which gender biases can manifest themselves on the web.

preprint2015arXiv

Mining cross-cultural relations from Wikipedia - A study of 31 European food cultures

For many people, Wikipedia represents one of the primary sources of knowledge about foreign cultures. Yet, different Wikipedia language editions offer different descriptions of cultural practices. Unveiling diverging representations of cultures provides an important insight, since they may foster the formation of cross-cultural stereotypes, misunderstandings and potentially even conflict. In this work, we explore to what extent the descriptions of cultural practices in various European language editions of Wikipedia differ on the example of culinary practices and propose an approach to mine cultural relations between different language communities trough their description of and interest in their own and other communities' food culture. We assess the validity of the extracted relations using 1) various external reference data sources (i.e., the European Social Survey, migration statistics), 2) crowdsourcing methods and 3) simulations.

preprint2015arXiv

Random Surfers on a Web Encyclopedia

The random surfer model is a frequently used model for simulating user navigation behavior on the Web. Various algorithms, such as PageRank, are based on the assumption that the model represents a good approximation of users browsing a website. However, the way users browse the Web has been drastically altered over the last decade due to the rise of search engines. Hence, new adaptations for the established random surfer model might be required, which better capture and simulate this change in navigation behavior. In this article we compare the classical uniform random surfer to empirical navigation and page access data in a Web Encyclopedia. Our high level contributions are (i) a comparison of stationary distributions of different types of the random surfer to quantify the similarities and differences between those models as well as (ii) new insights into the impact of search engines on traditional user navigation. Our results suggest that the behavior of the random surfer is almost similar to those of users - as long as users do not use search engines. We also find that classical website navigation structures, such as navigation hierarchies or breadcrumbs, only exercise limited influence on user navigation anymore. Rather, a new kind of navigational tools (e.g., recommendation systems) might be needed to better reflect the changes in browsing behavior of existing users.

preprint2015arXiv

Voting Behaviour and Power in Online Democracy: A Study of LiquidFeedback in Germany's Pirate Party

In recent years, political parties have adopted Online Delegative Democracy platforms such as LiquidFeedback to organise themselves and their political agendas via a grassroots approach. A common objection against the use of these platforms is the delegation system, where a user can delegate his vote to another user, giving rise to so-called super-voters, i.e. powerful users who receive many delegations. It has been asserted in the past that the presence of these super-voters undermines the democratic process, and therefore delegative democracy should be avoided. In this paper, we look at the emergence of super-voters in the largest delegative online democracy platform worldwide, operated by Germany's Pirate Party. We investigate the distribution of power within the party systematically, study whether super-voters exist, and explore the influence they have on the outcome of votings conducted online. While we find that the theoretical power of super-voters is indeed high, we also observe that they use their power wisely. Super-voters do not fully act on their power to change the outcome of votes, but they vote in favour of proposals with the majority of voters in many cases thereby exhibiting a stabilising effect on the system. We use these findings to present a novel class of power indices that considers observed voting biases and gives significantly better predictions than state-of-the-art measures.

preprint2014arXiv

A categorization scheme for socialbot attacks in online social networks

In the past, online social networks (OSN) like Facebook and Twitter became powerful instruments for communication and networking. Unfortunately, they have also become a welcome target for socialbot attacks. Therefore, a deep understanding of the nature of such attacks is important to protect the Eco-System of OSNs. In this extended abstract we propose a categorization scheme of social bot attacks that aims at providing an overview of the state of the art of techniques in this emerging field. Finally, we demonstrate the usefulness of our categorization scheme by characterizing recent socialbot attacks according to our categorization scheme.

preprint2014arXiv

Detecting Memory and Structure in Human Navigation Patterns Using Markov Chain Models of Varying Order

One of the most frequently used models for understanding human navigation on the Web is the Markov chain model, where Web pages are represented as states and hyperlinks as probabilities of navigating from one page to another. Predominantly, human navigation on the Web has been thought to satisfy the memoryless Markov property stating that the next page a user visits only depends on her current page and not on previously visited ones. This idea has found its way in numerous applications such as Google's PageRank algorithm and others. Recently, new studies suggested that human navigation may better be modeled using higher order Markov chain models, i.e., the next page depends on a longer history of past clicks. Yet, this finding is preliminary and does not account for the higher complexity of higher order Markov chain models which is why the memoryless model is still widely used. In this work we thoroughly present a diverse array of advanced inference methods for determining the appropriate Markov chain order. We highlight strengths and weaknesses of each method and apply them for investigating memory and structure of human navigation on the Web. Our experiments reveal that the complexity of higher order models grows faster than their utility, and thus we confirm that the memoryless model represents a quite practical model for human navigation on a page level. However, when we expand our analysis to a topical level, where we abstract away from specific page transitions to transitions between topics, we find that the memoryless assumption is violated and specific regularities can be observed. We report results from experiments with two types of navigational datasets (goal-oriented vs. free form) and observe interesting structural differences that make a strong argument for more contextual studies of human navigation in future work.

preprint2014arXiv

Evolution of Reddit: From the Front Page of the Internet to a Self-referential Community?

In the past few years, Reddit -- a community-driven platform for submitting, commenting and rating links and text posts -- has grown exponentially, from a small community of users into one of the largest online communities on the Web. To the best of our knowledge, this work represents the most comprehensive longitudinal study of Reddit's evolution to date, studying both (i) how user submissions have evolved over time and (ii) how the community's allocation of attention and its perception of submissions have changed over 5 years based on an analysis of almost 60 million submissions. Our work reveals an ever-increasing diversification of topics accompanied by a simultaneous concentration towards a few selected domains both in terms of posted submissions as well as perception and attention. By and large, our investigations suggest that Reddit has transformed itself from a dedicated gateway to the Web to an increasingly self-referential community that focuses on and reinforces its own user-generated image- and textual content over external sources.

preprint2014arXiv

Of course we share! Testing Assumptions about Social Tagging Systems

Social tagging systems have established themselves as an important part in today's web and have attracted the interest from our research community in a variety of investigations. The overall vision of our community is that simply through interactions with the system, i.e., through tagging and sharing of resources, users would contribute to building useful semantic structures as well as resource indexes using uncontrolled vocabulary not only due to the easy-to-use mechanics. Henceforth, a variety of assumptions about social tagging systems have emerged, yet testing them has been difficult due to the absence of suitable data. In this work we thoroughly investigate three available assumptions - e.g., is a tagging system really social? - by examining live log data gathered from the real-world public social tagging system BibSonomy. Our empirical results indicate that while some of these assumptions hold to a certain extent, other assumptions need to be reflected and viewed in a very critical light. Our observations have implications for the design of future search and other algorithms to better reflect the actual user behavior.

preprint2014arXiv

Understanding the impact of socialbot attacks in online social networks

Online social networks (OSN) like Twitter or Facebook are popular and powerful since they allow reaching millions of users online. They are also a popular target for socialbot attacks. Without a deep understanding of the impact of such attacks, the potential of online social networks as an instrument for facilitating discourse or democratic processes is in jeopardy. In this extended abstract we present insights from a live lab experiment in which social bots aimed at manipulating the social graph of an online social network, in our case Twitter. We explored the link creation behavior between targeted human users and our results suggest that socialbots may indeed have the ability to shape and influence the social graph in online social networks. However, our results also show that external factors may play an important role in the creation of social links in OSNs.

preprint2014arXiv

When Politicians Talk: Assessing Online Conversational Practices of Political Parties on Twitter

Assessing political conversations in social media requires a deeper understanding of the underlying practices and styles that drive these conversations. In this paper, we present a computational approach for assessing online conversational practices of political parties. Following a deductive approach, we devise a number of quantitative measures from a discussion of theoretical constructs in sociological theory. The resulting measures make different - mostly qualitative - aspects of online conversational practices amenable to computation. We evaluate our computational approach by applying it in a case study. In particular, we study online conversational practices of German politicians on Twitter during the German federal election 2013. We find that political parties share some interesting patterns of behavior, but also exhibit some unique and interesting idiosyncrasies. Our work sheds light on (i) how complex cultural phenomena such as online conversational practices are amenable to quantification and (ii) the way social media such as Twitter are utilized by political parties.

preprint2013arXiv

Semantic Stability in Social Tagging Streams

One potential disadvantage of social tagging systems is that due to the lack of a centralized vocabulary, a crowd of users may never manage to reach a consensus on the description of resources (e.g., books, users or songs) on the Web. Yet, previous research has provided interesting evidence that the tag distributions of resources may become semantically stable over time as more and more users tag them. At the same time, previous work has raised an array of new questions such as: (i) How can we assess the semantic stability of social tagging systems in a robust and methodical way? (ii) Does semantic stabilization of tags vary across different social tagging systems and ultimately, (iii) what are the factors that can explain semantic stabilization in such systems? In this work we tackle these questions by (i) presenting a novel and robust method which overcomes a number of limitations in existing methods, (ii) empirically investigating semantic stabilization processes in a wide range of social tagging systems with distinct domains and properties and (iii) detecting potential causes for semantic stabilization, specifically imitation behavior, shared background knowledge and intrinsic properties of natural language. Our results show that tagging streams which are generated by a combination of imitation dynamics and shared background knowledge exhibit faster and higher semantic stability than tagging streams which are generated via imitation dynamics or natural language streams alone.

Markus Strohmaier

What is connected

Connect this record

See the researcher in context

Building this map preview

36 published item(s)

SensePOLAR: Word sense aware interpretability for pre-trained contextual word embeddings

Characterizing the country-wide adoption and evolution of the Jodel messaging app in Saudi Arabia

Combining sensors and surveys to study social contexts: Case of scientific conferences

Group mixing drives inequality in face-to-face gatherings

Inequality and Inequity in Network-based Ranking and Recommendation Algorithms

Minorities in networks and algorithms

Simulating systematic bias in attributed social networks and its effect on rankings of minority nodes

Global gender differences in Wikipedia readership

Joint Multiclass Debiasing of Word Embeddings

The Effects of Gender Signals and Performance in Online Product Reviews

The POLAR Framework: Polar Opposites Enable Interpretability of Pre-Trained Word Embeddings

Word-Emoji Embeddings from large scale Messaging Data reflect real-world Semantic Associations of Expressive Icons

Homophily and minority size explain perception biases in social networks

A System for Probabilistic Linking of Thesauri and Classification Systems

Activity Dynamics in Collaboration Networks

Assessing the Navigational Effects of Click Biases and Link Insertion on the Web

Discovering and Characterizing Mobility Patterns in Urban Spaces: A Study of Manhattan Taxi Data

Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains

How to Apply Markov Chains for Modeling Sequential Edit Patterns in Collaborative Ontology-Engineering Projects

How Users Explore Ontologies on the Web: A Study of NCBO's BioPortal Usage Logs

Inferring Gender from Names on the Web: A Comparative Evaluation of Gender Detection Methods

Linguistic neighbourhoods: explaining cultural borders on Wikipedia through multilingual co-editing activity

The QWERTY effect on the web: How typing shapes the meaning of words in online human-computer interaction

HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web

Improving Reachability and Navigability in Recommender Systems

It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia

Mining cross-cultural relations from Wikipedia - A study of 31 European food cultures

Random Surfers on a Web Encyclopedia

Voting Behaviour and Power in Online Democracy: A Study of LiquidFeedback in Germany's Pirate Party

A categorization scheme for socialbot attacks in online social networks

Detecting Memory and Structure in Human Navigation Patterns Using Markov Chain Models of Varying Order

Evolution of Reddit: From the Front Page of the Internet to a Self-referential Community?

Of course we share! Testing Assumptions about Social Tagging Systems

Understanding the impact of socialbot attacks in online social networks

When Politicians Talk: Assessing Online Conversational Practices of Political Parties on Twitter

Semantic Stability in Social Tagging Streams