Source author record

Karl Aberer

Karl Aberer appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cs.CY Social and Information Networks Computation and Language Machine Learning physics.soc-ph Artificial Intelligence Computer Science and Game Theory Computer Vision Cryptography and Security Digital Libraries Human-Computer Interaction Information Retrieval Multimedia

Catalog footprint

What is connected

8works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Characterizing Retweet Bots: The Case of Black Market Accounts

Malicious Twitter bots are detrimental to public discourse on social media. Past studies have looked at spammers, fake followers, and astroturfing bots, but retweet bots, which artificially inflate content, are not well understood. In this study, we characterize retweet bots that have been uncovered by purchasing retweets from the black market. We detect whether they are fake or genuine accounts involved in inauthentic activities and what they do in order to appear legitimate. We also analyze their differences from human-controlled accounts. From our findings on the nature and life-cycle of retweet bots, we also point out several inconsistencies between the retweet bots used in this work and bots studied in prior works. Our findings challenge some of the fundamental assumptions related to bots and in particular how to detect them.

preprint2022arXiv

SciLander: Mapping the Scientific News Landscape

The COVID-19 pandemic has fueled the spread of misinformation on social media and the Web as a whole. The phenomenon dubbed `infodemic' has taken the challenges of information veracity and trust to new heights by massively introducing seemingly scientific and technical elements into misleading content. Despite the existing body of work on modeling and predicting misinformation, the coverage of very complex scientific topics with inherent uncertainty and an evolving set of findings, such as COVID-19, provides many new challenges that are not easily solved by existing tools. To address these issues, we introduce SciLander, a method for learning representations of news sources reporting on science-based topics. SciLander extracts four heterogeneous indicators for the news sources; two generic indicators that capture (1) the copying of news stories between sources, and (2) the use of the same terms to mean different things (i.e., the semantic shift of terms), and two scientific indicators that capture (1) the usage of jargon and (2) the stance towards specific citations. We use these indicators as signals of source agreement, sampling pairs of positive (similar) and negative (dissimilar) samples, and combine them in a unified framework to train unsupervised news source embeddings with a triplet margin loss objective. We evaluate our method on a novel COVID-19 dataset containing nearly 1M news articles from 500 sources spanning a period of 18 months since the beginning of the pandemic in 2020. Our results show that the features learned by our model outperform state-of-the-art baseline methods on the task of news veracity classification. Furthermore, a clustering analysis suggests that the learned representations encode information about the reliability, political leaning, and partisanship bias of these sources.

preprint2020arXiv

SciLens News Platform: A System for Real-Time Evaluation of News Articles

We demonstrate the SciLens News Platform, a novel system for evaluating the quality of news articles. The SciLens News Platform automatically collects contextual information about news articles in real-time and provides quality indicators about their validity and trustworthiness. These quality indicators derive from i) social media discussions regarding news articles, showcasing the reach and stance towards these articles, and ii) their content and their referenced sources, showcasing the journalistic foundations of these articles. Furthermore, the platform enables domain-experts to review articles and rate the quality of news sources. This augmented view of news articles, which combines automatically extracted indicators and domain-expert reviews, has provably helped the platform users to have a better consensus about the quality of the underlying articles. The platform is built in a distributed and robust fashion and runs operationally handling daily thousands of news articles. We evaluate the SciLens News Platform on the emerging topic of COVID-19 where we highlight the discrepancies between low and high-quality news outlets based on three axes, namely their newsroom activity, evidence seeking and social engagement. A live demonstration of the platform can be found here: http://scilens.epfl.ch.

preprint2020arXiv

Spoken dialect identification in Twitter using a multi-filter architecture

This paper presents our approach for SwissText & KONVENS 2020 shared task 2, which is a multi-stage neural model for Swiss German (GSW) identification on Twitter. Our model outputs either GSW or non-GSW and is not meant to be used as a generic language identifier. Our architecture consists of two independent filters where the first one favors recall, and the second one filter favors precision (both towards GSW). Moreover, we do not use binary models (GSW vs. not-GSW) in our filters but rather a multi-class classifier with GSW being one of the possible labels. Our model reaches F1-score of 0.982 on the test set of the shared task.

preprint2020arXiv

Upgrading the Newsroom: An Automated Image Selection System for News Articles

We propose an automated image selection system to assist photo editors in selecting suitable images for news articles. The system fuses multiple textual sources extracted from news articles and accepts multilingual inputs. It is equipped with char-level word embeddings to help both modeling morphologically rich languages, e.g. German, and transferring knowledge across nearby languages. The text encoder adopts a hierarchical self-attention mechanism to attend more to both keywords within a piece of text and informative components of a news article. We extensively experiment with our system on a large-scale text-image database containing multimodal multilingual news articles collected from Swiss local news media websites. The system is compared with multiple baselines with ablation studies and is shown to beat existing text-image retrieval methods in a weakly-supervised learning setting. Besides, we also offer insights on the advantage of using multiple textual sources and multilingual data.

preprint2016arXiv

ScienceWISE: Topic Modeling over Scientific Literature Networks

We provide an up-to-date view on the knowledge management system ScienceWISE (SW) and address issues related to the automatic assignment of articles to research topics. So far, SW has been proven to be an effective platform for managing large volumes of technical articles by means of ontological concept-based browsing. However, as the publication of research articles accelerates, the expressivity and the richness of the SW ontology turns into a double-edged sword: a more fine-grained characterization of articles is possible, but at the cost of introducing more spurious relations among them. In this context, the challenge of continuously recommending relevant articles to users lies in tackling a network partitioning problem, where nodes represent articles and co-occurring concepts create edges between them. In this paper, we discuss the three research directions we have taken for solving this issue: i) the identification of generic concepts to reinforce inter-article similarities; ii) the adoption of a bipartite network representation to improve scalability; iii) the design of a clustering algorithm to identify concepts for cross-disciplinary articles and obtain fine-grained topics for all articles.

preprint2016arXiv

The Curious Case of the PDF Converter that Likes Mozart: Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps

Third party apps that work on top of personal cloud services such as Google Drive and Dropbox, require access to the user's data in order to provide some functionality. Through detailed analysis of a hundred popular Google Drive apps from Google's Chrome store, we discover that the existing permission model is quite often misused: around two thirds of analyzed apps are over-privileged, i.e., they access more data than is needed for them to function. In this work, we analyze three different permission models that aim to discourage users from installing over-privileged apps. In experiments with 210 real users, we discover that the most successful permission model is our novel ensemble method that we call Far-reaching Insights. Far-reaching Insights inform the users about the data-driven insights that apps can make about them (e.g., their topics of interest, collaboration and activity patterns etc.) Thus, they seek to bridge the gap between what third parties can actually know about users and users perception of their privacy leakage. The efficacy of Far-reaching Insights in bridging this gap is demonstrated by our results, as Far-reaching Insights prove to be, on average, twice as effective as the current model in discouraging users from installing over-privileged apps. In an effort for promoting general privacy awareness, we deploy a publicly available privacy oriented app store that uses Far-reaching Insights. Based on the knowledge extracted from data of the store's users (over 115 gigabytes of Google Drive data from 1440 users with 662 installed apps), we also delineate the ecosystem for third-party cloud apps from the standpoint of developers and cloud providers. Finally, we present several general recommendations that can guide other future works in the area of privacy for the cloud.

preprint2013arXiv

Matching Demand with Supply in the Smart Grid using Agent-Based Multiunit Auction

Recent work has suggested reducing electricity generation cost by cutting the peak to average ratio (PAR) without reducing the total amount of the loads. However, most of these proposals rely on consumer's willingness to act. In this paper, we propose an approach to cut PAR explicitly from the supply side. The resulting cut loads are then distributed among consumers by the means of a multiunit auction which is done by an intelligent agent on behalf of the consumer. This approach is also in line with the future vision of the smart grid to have the demand side matched with the supply side. Experiments suggest that our approach reduces overall system cost and gives benefit to both consumers and the energy provider.

Karl Aberer

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Characterizing Retweet Bots: The Case of Black Market Accounts

SciLander: Mapping the Scientific News Landscape

SciLens News Platform: A System for Real-Time Evaluation of News Articles

Spoken dialect identification in Twitter using a multi-filter architecture

Upgrading the Newsroom: An Automated Image Selection System for News Articles

ScienceWISE: Topic Modeling over Scientific Literature Networks

The Curious Case of the PDF Converter that Likes Mozart: Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps

Matching Demand with Supply in the Smart Grid using Agent-Based Multiunit Auction