Researcher profile

Mark Levene

Mark Levene contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

A skew logistic distribution with application to modelling COVID-19 epidemic waves

A novel yet simple extension of the symmetric logistic distribution is proposed by introducing a skewness parameter. It is shown how the three parameters of the ensuing skew logistic distribution may be estimated using maximum likelihood. The skew logistic distribution is then extended to the skew bi-logistic distribution to allow the modelling of multiple waves in epidemic time series data. The proposed skew-logistic model is validated on COVID-19 data from the UK, and is evaluated for goodness-of-fit against the logistic and normal distributions using the recently formulated empirical survival Jensen-Shannon divergence (${\cal E}SJS$) and the Kolmogorov-Smirnov two-sample test statistic ($KS2$). We employ 95\% bootstrap confidence intervals to assess the improvement in goodness-of-fit of the skew logistic distribution over the other distributions. The obtained confidence intervals for the ${\cal E}SJS$ are narrower than those for the $KS2$ on using this data set, implying that the ${\cal E}SJS$ is more powerful than the $KS2$.

preprint2022arXiv

Monitoring Covid-19 on social media using a novel triage and diagnosis approach

Objective: This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and public health practitioners with additional information on the symptoms, severity and prevalence of the disease rather than to provide an actionable decision at the individual level. Materials and Methods: The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients' posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. Results: We report that macro- and micro-averaged F1 scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones.

preprint2021arXiv

A stochastic differential equation approach to the analysis of the UK 2017 and 2019 general election polls

Human dynamics and sociophysics build on statistical models that can shed light on and add to our understanding of social phenomena. We propose a generative model based on a stochastic differential equation that enables us to model the opinion polls leading up to the UK 2017 and 2019 general elections, and to make predictions relating to the actual result of the elections. After a brief analysis of the time series of the poll results, we provide empirical evidence that the gamma distribution, which is often used in financial modelling, fits the marginal distribution of this time series. We demonstrate that the proposed poll-based forecasting model may improve upon predictions based solely on polls. The method uses the Euler-Maruyama method to simulate the time series, measuring the prediction error with the mean absolute error and the root mean square error, and as such could be used as part of a toolkit for forecasting elections.

preprint2020arXiv

A general centrality framework based on node navigability

Centrality metrics are a popular tool in Network Science to identify important nodes within a graph. We introduce the Potential Gain as a centrality measure that unifies many walk-based centrality metrics in graphs and captures the notion of node navigability, interpreted as the property of being reachable from anywhere else (in the graph) through short walks. Two instances of the Potential Gain (called the Geometric and the Exponential Potential Gain) are presented and we describe scalable algorithms for computing them on large graphs. We also give a proof of the relationship between the new measures and established centralities. The geometric potential gain of a node can thus be characterized as the product of its Degree centrality by its Katz centrality scores. At the same time, the exponential potential gain of a node is proved to be the product of Degree centrality by its Communicability index. These formal results connect potential gain to both the "popularity" and "similarity" properties that are captured by the above centralities.

preprint2020arXiv

A Problem in Human Dynamics: Modelling the Population Density of a Social Space

Human dynamics and sociophysics suggest statistical models that may explain and provide us with better insight into social phenomena. Here we tackle the problem of determining the distribution of the population density of a social space over time by modelling the dynamics of agents entering and exiting the space as a birth-death process. We show that, for a simple agent-based model in which the probabilities of entering and exiting the space depends on the number of agents currently present in the space, the population density of the space follows a gamma distribution. We also provide empirical evidence supporting the validity of the model by applying it to a data set of occupancy traces of a common space in an office building.

preprint2020arXiv

Potential gain as a centrality measure

Navigability is a distinctive features of graphs associated with artificial or natural systems whose primary goal is the transportation of information or goods. We say that a graph $\mathcal{G}$ is navigable when an agent is able to efficiently reach any target node in $\mathcal{G}$ by means of local routing decisions. In a social network navigability translates to the ability of reaching an individual through personal contacts. Graph navigability is well-studied, but a fundamental question is still open: why are some individuals more likely than others to be reached via short, friend-of-a-friend, communication chains? In this article we answer the question above by proposing a novel centrality metric called the potential gain, which, in an informal sense, quantifies the easiness at which a target node can be reached. We define two variants of the potential gain, called the geometric and the exponential potential gain, and present fast algorithms to compute them. The geometric and the potential gain are the first instances of a novel class of composite centrality metrics, i.e., centrality metrics which combine the popularity of a node in $\mathcal{G}$ with its similarity to all other nodes. As shown in previous studies, popularity and similarity are two main criteria which regulate the way humans seek for information in large networks such as Wikipedia. We give a formal proof that the potential gain of a node is always equivalent to the product of its degree centrality (which captures popularity) and its Katz centrality (which captures similarity).

preprint2011arXiv

A Discrete Evolutionary Model for Chess Players' Ratings

The Elo system for rating chess players, also used in other games and sports, was adopted by the World Chess Federation over four decades ago. Although not without controversy, it is accepted as generally reliable and provides a method for assessing players' strengths and ranking them in official tournaments. It is generally accepted that the distribution of players' rating data is approximately normal but, to date, no stochastic model of how the distribution might have arisen has been proposed. We propose such an evolutionary stochastic model, which models the arrival of players into the rating pool, the games they play against each other, and how the results of these games affect their ratings. Using a continuous approximation to the discrete model, we derive the distribution for players' ratings at time $t$ as a normal distribution, where the variance increases in time as a logarithmic function of $t$. We validate the model using published rating data from 2007 to 2010, showing that the parameters obtained from the data can be recovered through simulations of the stochastic model. The distribution of players' ratings is only approximately normal and has been shown to have a small negative skew. We show how to modify our evolutionary stochastic model to take this skewness into account, and we validate the modified model using the published official rating data.