Researcher profile

Jeremy Levesley

Jeremy Levesley contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2022arXiv

An Informational Space Based Semantic Analysis for Scientific Texts

One major problem in Natural Language Processing is the automatic analysis and representation of human language. Human language is ambiguous and deeper understanding of semantics and creating human-to-machine interaction have required an effort in creating the schemes for act of communication and building common-sense knowledge bases for the 'meaning' in texts. This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts. Computational methods extracting semantic feature are used to analyse the relations between texts of messages and 'representations of situations' for a newly created large collection of scientific texts, Leicester Scientific Corpus. The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties, with the vectors of some attributes: a list of scientific subject categories that the text belongs to. First, this paper introduces 'Meaning Space' in which the informational representation of the meaning is extracted from the occurrence of the word in texts across the scientific categories, i.e., the meaning of a word is represented by a vector of Relative Information Gain about the subject categories. Then, the meaning space is statistically analysed for Leicester Scientific Dictionary-Core and we investigate 'Principal Components of the Meaning' to describe the adequate dimensions of the meaning. The research in this paper conducts the base for the geometric representation of the meaning of texts.

preprint2022arXiv

Convergence of sparse grid Gaussian convolution approximation for multi-dimensional periodic function

We consider the problem of approximating $[0,1]^{d}$-periodic functions by convolution with a scaled Gaussian kernel. We start by establishing convergence rates to functions from periodic Sobolev spaces and we show that the saturation rate is $O(h^{2}),$ where $h$ is the scale of the Gaussian kernel. Taken from a discrete point of view, this result can be interpreted as the accuracy that can be achieved on the uniform grid with spacing $h.$ In the discrete setting, the curse of dimensionality would place severe restrictions on the computation of the approximation. For instance, a spacing of $2^{-n}$ would provide an approximation converging at a rate of $O(2^{-2n})$ but would require $(2^{n}+1)^{d}$ grid points. To overcome this we introduce a sparse grid version of Gaussian convolution approximation, where substantially fewer grid points are required, and show that the sparse grid version delivers a saturation rate of $O(n^{d-1}2^{-2n}).$ This rate is in line with what one would expect in the sparse grid setting (where the full grid error only deteriorates by a factor of order $n^{d-1}$) however the analysis that leads to the result is novel in that it draws on results from the theory of special functions and key observations regarding the form of certain weighted geometric sums.

preprint2020arXiv

Convergence of Multilevel Stationary Gaussian Convolution

It is well-known that polynomial reproduction is not possible when approximating with Gaussian kernels. Quasi-interpolation schemes have been developed which use a finite number of Gaussians at different scales, which then reproduce polynomials of low degree \cite{beatson}, and thus achieve polynomial orders of convergence. At the same time, interpolation with kernels of fixed width suffers from an explosion in condition number, and information from all data points influences the approximation at any one data point (no localisation). In \cite{HL1} the authors show that, for periodic convolution with the Gaussian kernel, a multilevel scheme can give orders of approximation faster than any polynomial. In this paper we present a new multilevel quasi-interpolation algorithm, the discrete version of the algorithm in \cite{HL1}, which mimics the continuous algorithm well, to single precision accuracy, and gives excellent convergence rates for band limited periodic functions. In this paper we explain how the algorithm works, and why we achieve the numerical results we do. The estimates developed have two parts, one involving the convergence of a low degree polynomial truncation term and one involving the control of the remainder of the truncation as the algorithm proceeds.

preprint2020arXiv

Personality Traits and Drug Consumption. A Story Told by Data

This is a preprint version of the first book from the series: "Stories told by data". In this book a story is told about the psychological traits associated with drug consumption. The book includes: - A review of published works on the psychological profiles of drug users. - Analysis of a new original database with information on 1885 respondents and usage of 18 drugs. (Database is available online.) - An introductory description of the data mining and machine learning methods used for the analysis of this dataset. - The demonstration that the personality traits (five factor model, impulsivity, and sensation seeking), together with simple demographic data, give the possibility of predicting the risk of consumption of individual drugs with sensitivity and specificity above 70% for most drugs. - The analysis of correlations of use of different substances and the description of the groups of drugs with correlated use (correlation pleiades). - Proof of significant differences of personality profiles for users of different drugs. This is explicitly proved for benzodiazepines, ecstasy, and heroin. - Tables of personality profiles for users and non-users of 18 substances. The book is aimed at advanced undergraduates or first-year PhD students, as well as researchers and practitioners. No previous knowledge of machine learning, advanced data mining concepts or modern psychology of personality is assumed. For more detailed introduction into statistical methods we recommend several undergraduate textbooks. Familiarity with basic statistics and some experience in the use of probabilities would be helpful as well as some basic technical understanding of psychology.

preprint2020arXiv

Principal Components of the Meaning

In this paper we argue that (lexical) meaning in science can be represented in a 13 dimension Meaning Space. This space is constructed using principal component analysis (singular decomposition) on the matrix of word category relative information gains, where the categories are those used by the Web of Science, and the words are taken from a reduced word set from texts in the Web of Science. We show that this reduced word set plausibly represents all texts in the corpus, so that the principal component analysis has some objective meaning with respect to the corpus. We argue that 13 dimensions is adequate to describe the meaning of scientific texts, and hypothesise about the qualitative meaning of the principal components.

preprint2019arXiv

Automatic Short Answer Grading and Feedback Using Text Mining Methods

Automatic grading is not a new approach but the need to adapt the latest technology to automatic grading has become very important. As the technology has rapidly became more powerful on scoring exams and essays, especially from the 1990s onwards, partially or wholly automated grading systems using computational methods have evolved and have become a major area of research. In particular, the demand of scoring of natural language responses has created a need for tools that can be applied to automatically grade these responses. In this paper, we focus on the concept of automatic grading of short answer questions such as are typical in the UK GCSE system, and providing useful feedback on their answers to students. We present experimental results on a dataset provided from the introductory computer science class in the University of North Texas. We first apply standard data mining techniques to the corpus of student answers for the purpose of measuring similarity between the student answers and the model answer. This is based on the number of common words. We then evaluate the relation between these similarities and marks awarded by scorers. We then consider an approach that groups student answers into clusters. Each cluster would be awarded the same mark, and the same feedback given to each answer in a cluster. In this manner, we demonstrate that clusters indicate the groups of students who are awarded the same or the similar scores. Words in each cluster are compared to show that clusters are constructed based on how many and which words of the model answer have been used. The main novelty in this paper is that we design a model to predict marks based on the similarities between the student answers and the model answer.