Source author record

Phillip Lord

Phillip Lord appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

12works
8topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2015arXiv

A Highly Literate Approach to Ontology Building

Ontologies present an attractive technology for describing bio-medicine, because they can be shared, and have rich computational properties. However, they lack the rich expressivity of English and fit poorly with the current scientific "publish or perish" model. While, there have been attempts to combine free text and ontologies, most of these perform \textit{post-hoc} annotation of text. In this paper, we introduce our new environment which borrows from literate programming, to allow an author to co-develop both text and ontological description. We are currently using this environment to document the Karyotype Ontology which allows rich descriptions of the chromosomal complement in humans. We explore some of the advantages and difficulties of this form of ontology development.

preprint2015arXiv

How, What and Why to test an ontology

Ontology development relates to software development in that they both involve the production of formal computational knowledge. It is possible, therefore, that some of the techniques used in software engineering could also be used for ontologies; for example, in software engineering testing is a well-established process, and part of many different methodologies. The application of testing to ontologies, therefore, seems attractive. The Karyotype Ontology is developed using the novel Tawny-OWL library. This provides a fully programmatic environment for ontology development, which includes a complete test harness. In this paper, we describe how we have used this harness to build an extensive series of tests as well as used a commodity continuous integration system to link testing deeply into our development process; this environment, is applicable to any OWL ontology whether written using Tawny-OWL or not. Moreover, we present a novel analysis of our tests, introducing a new classification of what our different tests are. For each class of test, we describe why we use these tests, also by comparison to software tests. We believe that this systematic comparison between ontology and software development will help us move to a more agile form of ontology development.

preprint2015arXiv

Scaffolding the Mitochondrial Disease Ontology from extant knowledge sources

Bio-medical ontologies can contain a large number of concepts. Often many of these concepts are very similar to each other, and similar or identical to concepts found in other bio-medical databases. This presents both a challenge and opportunity: maintaining many similar concepts is tedious and fastidious work, which could be substantially reduced if the data could be derived from pre-existing knowledge sources. In this paper, we describe how we have achieved this for an ontology of the mitochondria using our novel ontology development environment, the Tawny-OWL library.

preprint2013arXiv

A pattern-driven approach to biomedical ontology engineering

Developing ontologies can be expensive, time-consuming, as well as difficult to develop and maintain. This is especially true for more expressive and/or larger ontologies. Some ontologies are, however, relatively repetitive, reusing design patterns; building these with both generic and bespoke patterns should reduce duplication and increase regularity which in turn should impact on the cost of development. Here we report on the usage of patterns applied to two biomedical ontologies: firstly a novel ontology for karyotypes which has been built ground-up using a pattern based approach; and, secondly, our initial refactoring of the SIO ontology to make explicit use of patterns at development time. To enable this, we use the Tawny-OWL library which enables full-programmatic development of ontologies. We show how this approach can generate large numbers of classes from much simpler data structures which is highly beneficial within biomedical ontology engineering.

preprint2013arXiv

An evolutionary approach to Function

Background: Understanding the distinction between function and role is vexing and difficult. While it appears to be useful, in practice this distinction is hard to apply, particularly within biology. Results: I take an evolutionary approach, considering a series of examples, to develop and generate definitions for these concepts. I test them in practice against the Ontology for Biomedical Investigations (OBI). Finally, I give an axiomatisation and discuss methods for applying these definitions in practice. Conclusions: The definitions in this paper are applicable, formalizing current practice. As such, they make a significant contribution to the use of these concepts within biomedical ontologies.

preprint2013arXiv

Can inferred provenance and its visualisation be used to detect erroneous annotation? A case study using UniProtKB

A constant influx of new data poses a challenge in keeping the annotation in biological databases current. Most biological databases contain significant quantities of textual annotation, which often contains the richest source of knowledge. Many databases reuse existing knowledge, during the curation process annotations are often propagated between entries. However, this is often not made explicit. Therefore, it can be hard, potentially impossible, for a reader to identify where an annotation originated from. Within this work we attempt to identify annotation provenance and track its subsequent propagation. Specifically, we exploit annotation reuse within the UniProt Knowledgebase (UniProtKB), at the level of individual sentences. We describe a visualisation approach for the provenance and propagation of sentences in UniProtKB which enables a large-scale statistical analysis. Initially levels of sentence reuse within UniProtKB were analysed, showing that reuse is heavily prevalent, which enables the tracking of provenance and propagation. By analysing sentences throughout UniProtKB, a number of interesting propagation patterns were identified, covering over 100, 000 sentences. Over 8000 sentences remain in the database after they have been removed from the entries where they originally occurred. Analysing a subset of these sentences suggest that approximately 30% are erroneous, whilst 35% appear to be inconsistent. These results suggest that being able to visualise sentence propagation and provenance can aid in the determination of the accuracy and quality of textual annotation. Source code and supplementary data are available from the authors website.

preprint2013arXiv

The Karyotype Ontology: a computational representation for human cytogenetic patterns

The karyotype ontology describes the human chromosome complement as determined cytogenetically, and is designed as an initial step toward the goal of replacing the current system which is based on semantically meaningful strings. This ontology uses a novel, semi-programmatic methodology based around the tawny library to construct many classes rapidly. Here, we describe our use case, methodology and the event-based approach that we use to represent karyotypes. The ontology is available at http://www.purl.org/ontolink/karyotype/. The clojure code is available at http://code.google.com/p/karyotype-clj/.

preprint2013arXiv

The Semantic Web takes Wing: Programming Ontologies with Tawny-OWL

The Tawny-OWL library provides a fully-programmatic environment for ontology building; it enables the use of a rich set of tools for ontology development, by recasting development as a form of programming. It is built in Clojure - a modern Lisp dialect, and is backed by the OWL API. Used simply, it has a similar syntax to OWL Manchester syntax, but it provides arbitrary extensibility and abstraction. It builds on existing facilities for Clojure, which provides a rich and modern programming tool chain, for versioning, distributed development, build, testing and continuous integration. In this paper, we describe the library, this environment and the its potential implications for the ontology development process.

preprint2013arXiv

Twenty-Five Shades of Greycite: Semantics for referencing and preservation

Semantic publishing can enable richer documents with clearer, computationally interpretable properties. For this vision to become reality, however, authors must benefit from this process, so that they are incentivised to add these semantics. Moreover, the publication process that generates final content must allow and enable this semantic content. Here we focus on author-led or "grey" literature, which uses a convenient and simple publication pipeline. We describe how we have used metadata in articles to enable richer referencing of these articles and how we have customised the addition of these semantics to articles. Finally, we describe how we use the same semantics to aid in digital preservation and non-repudiability of research articles.

preprint2012arXiv

An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB

Motivation: Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determining annotation quality and correctness, which we look at addressing within this article. Specifically we investigate word reuse within bulk textual annotations and relate this to Zipf's Principle of Least Effort. We use UniProt Knowledge Base (UniProtKB) as a case study to demonstrate this approach since it allows us to compare annotation change, both over time and between automated and manually curated annotations. Results: By applying power-law distributions to word reuse in annotation, we show clear trends in UniProtKB over time, which are consistent with existing studies of quality on free text English. Further, we show a clear distinction between manual and automated analysis and investigate cohorts of protein records as they mature. These results suggest that this approach holds distinct promise as a mechanism for judging annotation quality. Availability: Source code is available at the authors website: http://homepages.cs.ncl.ac.uk/m.j.bell1/annotation. Contact: phillip.lord@newcastle.ac.uk

preprint2012arXiv

Three Steps to Heaven: Semantic Publishing in a Real World Workflow

Semantic publishing offers the promise of computable papers, enriched visualisation and a realisation of the linked data ideal. In reality, however, the publication process contrives to prevent richer semantics while culminating in a `lumpen' PDF. In this paper, we discuss a web-first approach to publication, and describe a three-tiered approach which integrates with the existing authoring tooling. Critically, although it adds limited semantics, it does provide value to all the participants in the process: the author, the reader and the machine.

preprint2011arXiv

Lost in translation: data integration tools meet the Semantic Web (experiences from the Ondex project)

More information is now being published in machine processable form on the web and, as de-facto distributed knowledge bases are materializing, partly encouraged by the vision of the Semantic Web, the focus is shifting from the publication of this information to its consumption. Platforms for data integration, visualization and analysis that are based on a graph representation of information appear first candidates to be consumers of web-based information that is readily expressible as graphs. The question is whether the adoption of these platforms to information available on the Semantic Web requires some adaptation of their data structures and semantics. Ondex is a network-based data integration, analysis and visualization platform which has been developed in a Life Sciences context. A number of features, including semantic annotation via ontologies and an attention to provenance and evidence, make this an ideal candidate to consume Semantic Web information, as well as a prototype for the application of network analysis tools in this context. By analyzing the Ondex data structure and its usage, we have found a set of discrepancies and errors arising from the semantic mismatch between a procedural approach to network analysis and the implications of a web-based representation of information. We report in the paper on the simple methodology that we have adopted to conduct such analysis, and on issues that we have found which may be relevant for a range of similar platforms