Source author record

Vasileios Lampos

Vasileios Lampos appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks Computation and Language Information Retrieval physics.soc-ph Applications Artificial Intelligence cs.CY Machine Learning

Catalog footprint

What is connected

7works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Modelling the Spread of New Information on X

There has been considerable interest in modelling the spread of information on X (formerly Twitter) using machine learning models. Here, we consider the problem of predicting the reposting of new information, i.e., when a user propagates information about a topic previously unseen by the user. In existing work, information and users are randomly assigned to a test or training set, ensuring that both sets are drawn from the same distribution. In the spread of new information, the problem becomes an out-of-distribution classification task. Our experimental results reveal that while existing algorithms, which predominantly use features derived from the content of posts, perform well when the training and test distributions are the same, they perform much worse when the test set is out-of-distribution, i.e., when the topic of the testing data is absent from the training data. We then show that if the post features are supplemented or replaced with features derived from user profiles and past behaviours, the out-of-distribution prediction is greatly improved, with the F1 score increasing from 0.117 to 0.705. Our experimental results suggest that a significant component of reposting behaviour for previously unseen topics can be predicted from user profiles and past behaviours, and is largely content-agnostic.

preprint2024arXiv

Unsupervised hard Negative Augmentation for contrastive learning

We present Unsupervised hard Negative Augmentation (UNA), a method that generates synthetic negative instances based on the term frequency-inverse document frequency (TF-IDF) retrieval model. UNA uses TF-IDF scores to ascertain the perceived importance of terms in a sentence and then produces negative samples by replacing terms with respect to that. Our experiments demonstrate that models trained with UNA improve the overall performance in semantic textual similarity tasks. Additional performance gains are obtained when combining UNA with the paraphrasing augmentation. Further results show that our method is compatible with different backbone models. Ablation studies also support the choice of having a TF-IDF-driven control on negative augmentation.

preprint2020arXiv

Providing early indication of regional anomalies in COVID19 case counts in England using search engine queries

COVID19 was first reported in England at the end of January 2020, and by mid-June over 150,000 cases were reported. We assume that, similarly to influenza-like illnesses, people who suffer from COVID19 may query for their symptoms prior to accessing the medical system (or in lieu of it). Therefore, we analyzed searches to Bing from users in England, identifying cases where unexpected rises in relevant symptom searches occurred at specific areas of the country. Our analysis shows that searches for "fever" and "cough" were the most correlated with future case counts, with searches preceding case counts by 16-17 days. Unexpected rises in search patterns were predictive of future case counts multiplying by 2.5 or more within a week, reaching an Area Under Curve (AUC) of 0.64. Similar rises in mortality were predicted with an AUC of approximately 0.61 at a lead time of 3 weeks. Thus, our metric provided Public Health England with an indication which could be used to plan the response to COVID19 and could possibly be utilized to detect regional anomalies of other pathogens.

preprint2016arXiv

Flu Detector: Estimating influenza-like illness rates from online user-generated content

We provide a brief technical description of an online platform for disease monitoring, titled as the Flu Detector (fludetector.cs.ucl.ac.uk). Flu Detector, in its current version (v.0.5), uses either Twitter or Google search data in conjunction with statistical Natural Language Processing models to estimate the rate of influenza-like illness in the population of England. Its back-end is a live service that collects online data, utilises modern technologies for large-scale text processing, and finally applies statistical inference models that are trained offline. The front-end visualises the various disease rate estimates. Notably, the models based on Google data achieve a high level of accuracy with respect to the most recent four flu seasons in England (2012/13 to 2015/16). This highlighted Flu Detector as having a great potential of becoming a complementary source to the domestic traditional flu surveillance schemes.

preprint2013arXiv

Analysing Mood Patterns in the United Kingdom through Twitter Content

Social Media offer a vast amount of geo-located and time-stamped textual content directly generated by people. This information can be analysed to obtain insights about the general state of a large population of users and to address scientific questions from a diversity of disciplines. In this work, we estimate temporal patterns of mood variation through the use of emotionally loaded words contained in Twitter messages, possibly reflecting underlying circadian and seasonal rhythms in the mood of the users. We present a method for computing mood scores from text using affective word taxonomies, and apply it to millions of tweets collected in the United Kingdom during the seasons of summer and winter. Our analysis results in the detection of strong and statistically significant circadian patterns for all the investigated mood types. Seasonal variation does not seem to register any important divergence in the signals, but a periodic oscillation within a 24-hour period is identified for each mood type. The main common characteristic for all emotions is their mid-morning peak, however their mood score patterns differ in the evenings.

preprint2012arXiv

Detecting Events and Patterns in Large-Scale User Generated Textual Streams with Statistical Learning Methods

A vast amount of textual web streams is influenced by events or phenomena emerging in the real world. The social web forms an excellent modern paradigm, where unstructured user generated content is published on a regular basis and in most occasions is freely distributed. The present Ph.D. Thesis deals with the problem of inferring information - or patterns in general - about events emerging in real life based on the contents of this textual stream. We show that it is possible to extract valuable information about social phenomena, such as an epidemic or even rainfall rates, by automatic analysis of the content published in Social Media, and in particular Twitter, using Statistical Machine Learning methods. An important intermediate task regards the formation and identification of features which characterise a target event; we select and use those textual features in several linear, non-linear and hybrid inference approaches achieving a significantly good performance in terms of the applied loss function. By examining further this rich data set, we also propose methods for extracting various types of mood signals revealing how affective norms - at least within the social web's population - evolve during the day and how significant events emerging in the real world are influencing them. Lastly, we present some preliminary findings showing several spatiotemporal characteristics of this textual information as well as the potential of using it to tackle tasks such as the prediction of voting intentions.

preprint2012arXiv

On voting intentions inference from Twitter content: a case study on UK 2010 General Election

This is a report, where preliminary work regarding the topic of voting intention inference from Social Media - such as Twitter - is presented. Our case study is the UK 2010 General Election and we are focusing on predicting the percentages of voting intention polls (conducted by YouGov) for the three major political parties - Conservatives, Labours and Liberal Democrats - during a 5-month period before the election date (May 6, 2010). We form three methodologies for extracting positive or negative sentiment from tweets, which build on each other, and then propose two supervised models for turning sentiment into voting intention percentages. Interestingly, when the content of tweets is enriched by attaching synonymous words, a significant improvement on inference performance is achieved reaching a mean absolute error of 4.34% +/- 2.13%; in that case, the predictions are also shown to be statistically significant. The presented methods should be considered as work-in-progress; limitations and suggestions for future work appear in the final section of this script.

Vasileios Lampos

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Modelling the Spread of New Information on X

Unsupervised hard Negative Augmentation for contrastive learning

Providing early indication of regional anomalies in COVID19 case counts in England using search engine queries

Flu Detector: Estimating influenza-like illness rates from online user-generated content

Analysing Mood Patterns in the United Kingdom through Twitter Content

Detecting Events and Patterns in Large-Scale User Generated Textual Streams with Statistical Learning Methods

On voting intentions inference from Twitter content: a case study on UK 2010 General Election