Source author record

Mark Levene

Mark Levene appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.soc-ph Computation and Language Social and Information Networks Applications Digital Libraries Information Retrieval Artificial Intelligence Data Structures and Algorithms Machine Learning

Catalog footprint

What is connected

14works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A skew logistic distribution with application to modelling COVID-19 epidemic waves

A novel yet simple extension of the symmetric logistic distribution is proposed by introducing a skewness parameter. It is shown how the three parameters of the ensuing skew logistic distribution may be estimated using maximum likelihood. The skew logistic distribution is then extended to the skew bi-logistic distribution to allow the modelling of multiple waves in epidemic time series data. The proposed skew-logistic model is validated on COVID-19 data from the UK, and is evaluated for goodness-of-fit against the logistic and normal distributions using the recently formulated empirical survival Jensen-Shannon divergence (${\cal E}SJS$) and the Kolmogorov-Smirnov two-sample test statistic ($KS2$). We employ 95\% bootstrap confidence intervals to assess the improvement in goodness-of-fit of the skew logistic distribution over the other distributions. The obtained confidence intervals for the ${\cal E}SJS$ are narrower than those for the $KS2$ on using this data set, implying that the ${\cal E}SJS$ is more powerful than the $KS2$.

preprint2022arXiv

Monitoring Covid-19 on social media using a novel triage and diagnosis approach

Objective: This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and public health practitioners with additional information on the symptoms, severity and prevalence of the disease rather than to provide an actionable decision at the individual level. Materials and Methods: The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients' posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. Results: We report that macro- and micro-averaged F1 scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones.

preprint2021arXiv

A stochastic differential equation approach to the analysis of the UK 2017 and 2019 general election polls

Human dynamics and sociophysics build on statistical models that can shed light on and add to our understanding of social phenomena. We propose a generative model based on a stochastic differential equation that enables us to model the opinion polls leading up to the UK 2017 and 2019 general elections, and to make predictions relating to the actual result of the elections. After a brief analysis of the time series of the poll results, we provide empirical evidence that the gamma distribution, which is often used in financial modelling, fits the marginal distribution of this time series. We demonstrate that the proposed poll-based forecasting model may improve upon predictions based solely on polls. The method uses the Euler-Maruyama method to simulate the time series, measuring the prediction error with the mean absolute error and the root mean square error, and as such could be used as part of a toolkit for forecasting elections.

preprint2020arXiv

A general centrality framework based on node navigability

Centrality metrics are a popular tool in Network Science to identify important nodes within a graph. We introduce the Potential Gain as a centrality measure that unifies many walk-based centrality metrics in graphs and captures the notion of node navigability, interpreted as the property of being reachable from anywhere else (in the graph) through short walks. Two instances of the Potential Gain (called the Geometric and the Exponential Potential Gain) are presented and we describe scalable algorithms for computing them on large graphs. We also give a proof of the relationship between the new measures and established centralities. The geometric potential gain of a node can thus be characterized as the product of its Degree centrality by its Katz centrality scores. At the same time, the exponential potential gain of a node is proved to be the product of Degree centrality by its Communicability index. These formal results connect potential gain to both the "popularity" and "similarity" properties that are captured by the above centralities.

preprint2020arXiv

A Problem in Human Dynamics: Modelling the Population Density of a Social Space

Human dynamics and sociophysics suggest statistical models that may explain and provide us with better insight into social phenomena. Here we tackle the problem of determining the distribution of the population density of a social space over time by modelling the dynamics of agents entering and exiting the space as a birth-death process. We show that, for a simple agent-based model in which the probabilities of entering and exiting the space depends on the number of agents currently present in the space, the population density of the space follows a gamma distribution. We also provide empirical evidence supporting the validity of the model by applying it to a data set of occupancy traces of a common space in an office building.

preprint2020arXiv

Potential gain as a centrality measure

Navigability is a distinctive features of graphs associated with artificial or natural systems whose primary goal is the transportation of information or goods. We say that a graph $\mathcal{G}$ is navigable when an agent is able to efficiently reach any target node in $\mathcal{G}$ by means of local routing decisions. In a social network navigability translates to the ability of reaching an individual through personal contacts. Graph navigability is well-studied, but a fundamental question is still open: why are some individuals more likely than others to be reached via short, friend-of-a-friend, communication chains? In this article we answer the question above by proposing a novel centrality metric called the potential gain, which, in an informal sense, quantifies the easiness at which a target node can be reached. We define two variants of the potential gain, called the geometric and the exponential potential gain, and present fast algorithms to compute them. The geometric and the potential gain are the first instances of a novel class of composite centrality metrics, i.e., centrality metrics which combine the popularity of a node in $\mathcal{G}$ with its similarity to all other nodes. As shown in previous studies, popularity and similarity are two main criteria which regulate the way humans seek for information in large networks such as Wikipedia. We give a formal proof that the potential gain of a node is always equivalent to the product of its degree centrality (which captures popularity) and its Katz centrality (which captures similarity).

preprint2020arXiv

Supervised Phrase-boundary Embeddings

We propose a new word embedding model, called SPhrase, that incorporates supervised phrase information. Our method modifies traditional word embeddings by ensuring that all target words in a phrase have exactly the same context. We demonstrate that including this information within a context window produces superior embeddings for both intrinsic evaluation tasks and downstream extrinsic tasks.

preprint2016arXiv

A multiplicative process for generating the rank-order distribution of UK election results

Human dynamics and sociophysics suggest statistical models that may explain and provide us with a better understanding of social phenomena. Here we propose a generative multiplicative decrease model that gives rise to a rank-order distribution and allows us to analyse the results of the last three UK parliamentary elections. We provide empirical evidence that the additive Weibull distribution, which can be generated from our model, is a close fit to the electoral data, offering a novel interpretation of the recent election results.

preprint2016arXiv

A stochastic evolutionary model generating a mixture of exponential distributions

Recent interest in human dynamics has stimulated the investigation of the stochastic processes that explain human behaviour in various contexts, such as mobile phone networks and social media. In this paper, we extend the stochastic urn-based model proposed in \cite{FENN15} so that it can generate mixture models,in particular, a mixture of exponential distributions. The model is designed to capture the dynamics of survival analysis, traditionally employed in clinical trials, reliability analysis in engineering, and more recently in the analysis of large data sets recording human dynamics. The mixture modelling approach, which is relatively simple and well understood, is very effective in capturing heterogeneity in data. We provide empirical evidence for the validity of the model, using a data set of popular search engine queries collected over a period of 114 months. We show that the survival function of these queries is closely matched by the exponential mixture solution for our model.

preprint2016arXiv

The Anatomy of a Search and Mining System for Digital Archives

Samtla (Search And Mining Tools with Linguistic Analysis) is a digital humanities system designed in collaboration with historians and linguists to assist them with their research work in quantifying the content of any textual corpora through approximate phrase search and document comparison. The retrieval engine uses a character-based n-gram language model rather than the conventional word-based one so as to achieve great flexibility in language agnostic query processing. The index is implemented as a space-optimised character-based suffix tree with an accompanying database of document content and metadata. A number of text mining tools are integrated into the system to allow researchers to discover textual patterns, perform comparative analysis, and find out what is currently popular in the research community. Herein we describe the system architecture, user interface, models and algorithms, and data storage of the Samtla system. We also present several case studies of its usage in practice together with an evaluation of the systems' ranking performance through crowdsourcing.

preprint2015arXiv

A stochastic evolutionary model for capturing human dynamics

The recent interest in human dynamics has led researchers to investigate the stochastic processes that explain human behaviour in various contexts. Here we propose a generative model to capture the dynamics of survival analysis, traditionally employed in clinical trials and reliability analysis in engineering. We derive a general solution for the model in the form of a product, and then a continuous approximation to the solution via the renewal equation describing age-structured population dynamics. This enables us to model a wide range of survival distributions, according to the choice of the mortality distribution. We provide empirical evidence for the validity of the model from a longitudinal data set of popular search engine queries over 114 months, showing that the survival function of these queries is closely matched by the solution for our model with power-law mortality.

preprint2014arXiv

A stochastic evolutionary model for survival dynamics

The recent interest in human dynamics has led researchers to investigate the stochastic processes that explain human behaviour in different contexts. Here we propose a generative model to capture the essential dynamics of survival analysis, traditionally employed in clinical trials and reliability analysis in engineering. In our model, the only implicit assumption made is that the longer an actor has been in the system, the more likely it is to have failed. We derive a power-law distribution for the process and provide preliminary empirical evidence for the validity of the model from two well-known survival analysis data sets.

preprint2013arXiv

A bibliometric index based on the complete list of cited publications

We propose a new index, the $j$-index, which is defined for an author as the sum of the square roots of the numbers of citations to each of the author's publications. The idea behind the $j$-index it to remedy a drawback of the $h$-index $-$ that the $h$-index does not take into account the full citation record of a researcher. The square root function is motivated by our desire to avoid the possible bias that may occur with a simple sum when an author has several very highly cited papers. We compare the $j$-index to the $h$-index, the $g$-index and the total citation count for three subject areas using several association measures. Our results indicate that that the association between the $j$-index and the other indices varies according to the subject area. One explanation of this variation may be due to the proportion of citations to publications of the researcher that are in the $h$-core. The $j$-index is {\em not} an $h$-index variant, and as such is intended to complement rather than necessarily replace the $h$-index and other bibliometric indicators, thus providing a more complete picture of a researcher's achievements.

preprint2011arXiv

A Discrete Evolutionary Model for Chess Players' Ratings

The Elo system for rating chess players, also used in other games and sports, was adopted by the World Chess Federation over four decades ago. Although not without controversy, it is accepted as generally reliable and provides a method for assessing players' strengths and ranking them in official tournaments. It is generally accepted that the distribution of players' rating data is approximately normal but, to date, no stochastic model of how the distribution might have arisen has been proposed. We propose such an evolutionary stochastic model, which models the arrival of players into the rating pool, the games they play against each other, and how the results of these games affect their ratings. Using a continuous approximation to the discrete model, we derive the distribution for players' ratings at time $t$ as a normal distribution, where the variance increases in time as a logarithmic function of $t$. We validate the model using published rating data from 2007 to 2010, showing that the parameters obtained from the data can be recovered through simulations of the stochastic model. The distribution of players' ratings is only approximately normal and has been shown to have a small negative skew. We show how to modify our evolutionary stochastic model to take this skewness into account, and we validate the modified model using the published official rating data.

Mark Levene

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

A skew logistic distribution with application to modelling COVID-19 epidemic waves

Monitoring Covid-19 on social media using a novel triage and diagnosis approach

A stochastic differential equation approach to the analysis of the UK 2017 and 2019 general election polls

A general centrality framework based on node navigability

A Problem in Human Dynamics: Modelling the Population Density of a Social Space

Potential gain as a centrality measure

Supervised Phrase-boundary Embeddings

A multiplicative process for generating the rank-order distribution of UK election results

A stochastic evolutionary model generating a mixture of exponential distributions

The Anatomy of a Search and Mining System for Digital Archives

A stochastic evolutionary model for capturing human dynamics

A stochastic evolutionary model for survival dynamics

A bibliometric index based on the complete list of cited publications

A Discrete Evolutionary Model for Chess Players' Ratings