Source author record

Kalina Bontcheva

Kalina Bontcheva appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language cs.CY Social and Information Networks Artificial Intelligence Information Retrieval Machine Learning Digital Libraries Neural and Evolutionary Computing physics.soc-ph

Catalog footprint

What is connected

17works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

MP Twitter Engagement and Abuse Post-first COVID-19 Lockdown in the UK: White Paper

The UK has had a volatile political environment for some years now, with Brexit and leadership crises marking the past five years. With this work, we wanted to understand more about how the global health emergency, COVID-19, influences the amount, type or topics of abuse that UK politicians receive when engaging with the public. With this work, we wanted to understand more about how the global health emergency, COVID-19, influences the amount, type or topics of abuse that UK politicians receive when engaging with the public. This work covers the period of June to December 2020 and analyses Twitter abuse in replies to UK MPs. This work is a follow-up from our analysis of online abuse during the first four months of the COVID-19 pandemic in the UK. The paper examines overall abuse levels during this new seven month period, analyses reactions to members of different political parties and the UK government, and the relationship between online abuse and topics such as Brexit, government's COVID-19 response and policies, and social issues. In addition, we have also examined the presence of conspiracy theories posted in abusive replies to MPs during the period. We have found that abuse levels toward UK MPs were at an all-time high in December 2020 (5.4% of all reply tweets sent to MPs). This is almost 1% higher that the two months preceding the General Election. In a departure from the trend seen in the first four months of the pandemic, MPs from the Tory party received the highest percentage of abusive replies from July 2020 onward, which stays above 5% starting from September 2020 onward, as the COVID-19 crisis deepened and the Brexit negotiations with the EU started nearing completion.

preprint2021arXiv

Multistage BiCross encoder for multilingual access to COVID-19 health information

The Coronavirus (COVID-19) pandemic has led to a rapidly growing 'infodemic' of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this paper proposes a novel high precision and high recall neural Multistage BiCross encoder approach. It is a sequential three-stage ranking pipeline which uses the Okapi BM25 retrieval algorithm and transformer-based bi-encoder and cross-encoder to effectively rank the documents with respect to the given query. We present experimental results from our participation in the Multilingual Information Access (MLIA) shared task on COVID-19 multilingual semantic search. The independently evaluated MLIA results validate our approach and demonstrate that it outperforms other state-of-the-art approaches according to nearly all evaluation metrics in cases of both monolingual and bilingual runs.

preprint2020arXiv

European Language Grid: An Overview

With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented, by nation states, languages, verticals and sectors, significantly holding back its impact. The European Language Grid (ELG) project addresses this fragmentation by establishing the ELG as the primary platform for LT in Europe. The ELG is a scalable cloud platform, providing, in an easy-to-integrate way, access to hundreds of commercial and non-commercial LTs for all European languages, including running tools and services as well as data sets and resources. Once fully operational, it will enable the commercial and non-commercial European LT community to deposit and upload their technologies and data sets into the ELG, to deploy them through the grid, and to connect with other resources. The ELG will boost the Multilingual Digital Single Market towards a thriving European LT community, creating new jobs and opportunities. Furthermore, the ELG project organises two open calls for up to 20 pilot projects. It also sets up 32 National Competence Centres (NCCs) and the European LT Council (LTC) for outreach and coordination purposes.

preprint2020arXiv

Local Media and Geo-situated Responses to Brexit: A Quantitative Analysis of Twitter, News and Survey Data

Societal debates and political outcomes are subject to news and social media influences, which are in turn subject to commercial and other forces. Local press are in decline, creating a ``news gap''. Research shows a contrary relationship between UK regions' economic dependence on EU membership and their voting in the 2016 UK EU membership referendum, raising questions about local awareness. We draw on a corpus of Twitter data which has been annotated for user location and Brexit vote intent, allowing us to investigate how location, topics of concern and Brexit stance are related. We compare this with a large corpus of articles from local and national news outlets, as well as survey data, finding evidence of a distinctly different focus in local reporting. National press focused more on terrorism and immigration than local press in most areas. Some Twitter users focused on immigration. Local press focused on trade, unemployment, local politics and agriculture. We find that remain voters shared interests more in keeping with local press on a per-region basis.

preprint2020arXiv

MP Twitter Abuse in the Age of COVID-19: White Paper

As COVID-19 sweeps the globe, outcomes depend on effective relationships between the public and decision-makers. In the UK there were uncivil tweets to MPs about perceived UK tardiness to go into lockdown. The pandemic has led to increased attention on ministers with a role in the crisis. However, generally this surge has been civil. Prime minister Boris Johnson's severe illness with COVID-19 resulted in an unusual peak of supportive responses on Twitter. Those who receive more COVID-19 mentions in their replies tend to receive less abuse (significant negative correlation). Following Mr Johnson's recovery, with rising economic concerns and anger about lockdown violations by influential figures, abuse levels began to rise in May. 1,902 replies to MPs within the study period were found containing hashtags or terms that refute the existence of the virus (e.g. #coronahoax, #coronabollocks, 0.04% of a total 4.7 million replies, or 9% of the number of mentions of "stay home save lives" and variants). These have tended to be more abusive. Evidence of some members of the public believing in COVID-19 conspiracy theories was also found. Higher abuse levels were associated with hashtags blaming China for the pandemic.

preprint2020arXiv

Online Abuse toward Candidates during the UK General Election 2019: Working Paper

The 2019 UK general election took place against a background of rising online hostility levels toward politicians and concerns about its impact on democracy. We collected 4.2 million tweets sent to or from election candidates in the six week period spanning from the start of November until shortly after the December 12th election. We found abuse in 4.46\% of replies received by candidates, up from 3.27\% in the matching period for the 2017 UK general election. Abuse levels have also been climbing month on month throughout 2019. Abuse also escalated throughout the campaign period. Abuse focused mainly on a small number of high profile politicians. Abuse is "spiky", triggered by external events such as debates, or certain tweets. Abuse increases when politicians discuss inflammatory topics such as borders and immigration. There may also be a backlash on topics such as social justice. Some tweets may become viral targets for personal abuse. On average, men received more general and political abuse; women received more sexist abuse. MPs choosing not to stand again had received more abuse during 2019.

preprint2020arXiv

Partisanship, Propaganda and Post-Truth Politics: Quantifying Impact in Online Debate

The recent past has highlighted the influential role of social networks and online media in shaping public debate on current affairs and political issues. This paper is focused on studying the role of politically-motivated actors and their strategies for influencing and manipulating public opinion online: partisan media, state-backed propaganda, and post-truth politics. In particular, we present quantitative research on the presence and impact of these three `Ps' in online Twitter debates in two contexts: (i) the run up to the UK EU membership referendum (`Brexit'); and (ii) the information operations of Russia-backed online troll accounts. We first compare the impact of highly partisan versus mainstream media during the Brexit referendum, specifically comparing tweets by half a million `leave' and `remain' supporters. Next, online propaganda strategies are examined, specifically left- and right-wing troll accounts. Lastly, we study the impact of misleading claims made by the political leaders of the leave and remain campaigns. This is then compared to the impact of the Russia-backed partisan media and propaganda accounts during the referendum. In particular, just two of the many misleading claims made by politicians during the referendum were found to be cited in 4.6 times more tweets than the 7,103 tweets related to Russia Today and Sputnik and in 10.2 times more tweets than the 3,200 Brexit-related tweets by the Russian troll accounts.

preprint2020arXiv

The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe

Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe's specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI, including many opportunities, synergies but also misconceptions, has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions.

preprint2020arXiv

Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability

With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider federation of AI/LT platforms. We illustrate the approach using the five emerging AI/LT platforms AI4EU, ELG, Lynx, QURATOR and SPEAKER.

preprint2020arXiv

Vindication, Virtue and Vitriol: A study of online engagement and abuse toward British MPs during the COVID-19 Pandemic

COVID-19 has given rise to malicious content online, including online abuse and hate toward British MPs. In order to understand and contextualise the level of abuse MPs receive, we consider how ministers use social media to communicate about the crisis, and the citizen engagement that this generates. The focus of the paper is on a large-scale, mixed methods study of abusive and antagonistic responses to UK politicians during the pandemic from early February to late May 2020. We find that pressing subjects such as financial concerns attract high levels of engagement, but not necessarily abusive dialogue. Rather, criticising authorities appears to attract higher levels of abuse. In particular, those who carry the flame for subjects like racism and inequality, may be accused of virtue signalling or receive higher abuse levels due to the topics they are required by their role to address. This work contributes to the wider understanding of abusive language online, in particular that which is directed at public officials.

preprint2016arXiv

Stance Detection with Bidirectional Conditional Encoding

Stance detection is the task of classifying the attitude expressed in a text towards a target such as Hillary Clinton to be "positive", negative" or "neutral". Previous work has assumed that either the target is mentioned in the text or that training data for every target is given. This paper considers the more challenging version of this task, where targets are not always mentioned and no training data is available for the test targets. We experiment with conditional LSTM encoding, which builds a representation of the tweet that is dependent on the target, and demonstrate that it outperforms encoding the tweet and the target independently. Performance is improved further when the conditional model is augmented with bidirectional encoding. We evaluate our approach on the SemEval 2016 Task 6 Twitter Stance Detection corpus achieving performance second best only to a system trained on semi-automatically labelled tweets for the test target. When such weak supervision is added, our approach achieves state-of-the-art results.

preprint2016arXiv

Using Gaussian Processes for Rumour Stance Classification in Social Media

Social media tend to be rife with rumours while new reports are released piecemeal during breaking news. Interestingly, one can mine multiple reactions expressed by social media users in those situations, exploring their stance towards rumours, ultimately enabling the flagging of highly disputed rumours as being potentially false. In this work, we set out to develop an automated, supervised classifier that uses multi-task learning to classify the stance expressed in each individual tweet in a rumourous conversation as either supporting, denying or questioning the rumour. Using a classifier based on Gaussian Processes, and exploring its effectiveness on two datasets with very different characteristics and varying distributions of stances, we show that our approach consistently outperforms competitive baseline classifiers. Our classifier is especially effective in estimating the distribution of different types of stance associated with a given rumour, which we set forth as a desired characteristic for a rumour-tracking system that will warn both ordinary users of Twitter and professional news practitioners when a rumour is being rebutted.

preprint2015arXiv

Classifying Tweet Level Judgements of Rumours in Social Media

Social media is a rich source of rumours and corresponding community reactions. Rumours reflect different characteristics, some shared and some individual. We formulate the problem of classifying tweet level judgements of rumours as a supervised learning task. Both supervised and unsupervised domain adaptation are considered, in which tweets from a rumour are classified on the basis of other annotated rumours. We demonstrate how multi-task learning helps achieve good results on rumours from the 2011 England riots.

preprint2015arXiv

Towards Detecting Rumours in Social Media

The spread of false rumours during emergencies can jeopardise the well-being of citizens as they are monitoring the stream of news from social media to stay abreast of the latest updates. In this paper, we describe the methodology we have developed within the PHEME project for the collection and sampling of conversational threads, as well as the tool we have developed to facilitate the annotation of these threads so as to identify rumourous ones. We describe the annotation task conducted on threads collected during the 2014 Ferguson unrest and we present and analyse our findings. Our results show that we can collect effectively social media rumours and identify multiple rumours associated with a range of stories that would have been hard to identify by relying on existing techniques that need manual input of rumour-specific keywords.

preprint2015arXiv

USFD: Twitter NER with Drift Compensation and Linked Data

This paper describes a pilot NER system for Twitter, comprising the USFD system entry to the W-NUT 2015 NER shared task. The goal is to correctly label entities in a tweet dataset, using an inventory of ten types. We employ structured learning, drawing on gazetteers taken from Linked Data, and on unsupervised clustering features, and attempting to compensate for stylistic and topic drift - a key challenge in social media text. Our result is competitive; we provide an analysis of the components of our methodology, and an examination of the target dataset in the context of this task.

preprint2014arXiv

Analysis of Named Entity Recognition and Linking for Tweets

Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.

preprint2013arXiv

Social Media and Information Overload: Survey Results

A UK-based online questionnaire investigating aspects of usage of user-generated media (UGM), such as Facebook, LinkedIn and Twitter, attracted 587 participants. Results show a high degree of engagement with social networking media such as Facebook, and a significant engagement with other media such as professional media, microblogs and blogs. Participants who experience information overload are those who engage less frequently with the media, rather than those who have fewer posts to read. Professional users show different behaviours to social users. Microbloggers complain of information overload to the greatest extent. Two thirds of Twitter-users have felt that they receive too many posts, and over half of Twitter-users have felt the need for a tool to filter out the irrelevant posts. Generally speaking, participants express satisfaction with the media, though a significant minority express a range of concerns including information overload and privacy.

Kalina Bontcheva

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

MP Twitter Engagement and Abuse Post-first COVID-19 Lockdown in the UK: White Paper

Multistage BiCross encoder for multilingual access to COVID-19 health information

European Language Grid: An Overview

Local Media and Geo-situated Responses to Brexit: A Quantitative Analysis of Twitter, News and Survey Data

MP Twitter Abuse in the Age of COVID-19: White Paper

Online Abuse toward Candidates during the UK General Election 2019: Working Paper

Partisanship, Propaganda and Post-Truth Politics: Quantifying Impact in Online Debate

The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe

Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability

Vindication, Virtue and Vitriol: A study of online engagement and abuse toward British MPs during the COVID-19 Pandemic

Stance Detection with Bidirectional Conditional Encoding

Using Gaussian Processes for Rumour Stance Classification in Social Media

Classifying Tweet Level Judgements of Rumours in Social Media

Towards Detecting Rumours in Social Media

USFD: Twitter NER with Drift Compensation and Linked Data

Analysis of Named Entity Recognition and Linking for Tweets

Social Media and Information Overload: Survey Results