Source author record

Uwe Aickelin

Uwe Aickelin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

211works

28topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Fast Rate Generalization Error Bounds: Variations on a Theme

A recent line of works, initiated by Russo and Xu, has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of O(sqrt{lambda/n}) where lambda is some information-theoretic quantities such as the mutual information between the data sample and the learned hypothesis. However, such a learning rate is typically considered to be "slow", compared to a "fast rate" of O(1/n) in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate (O(1/n)) result can still be obtained using this bound under appropriate assumptions. Furthermore, we identify the key conditions needed for the fast rate generalization error, which we call the (eta,c)-central condition. Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a convergence rate of O(λ/{n}) for specific learning algorithms such as empirical risk minimization. Finally, analytical examples are given to show the effectiveness of the bounds.

preprint2022arXiv

Multi-objective Semi-supervised Clustering for Finding Predictive Clusters

This study concentrates on clustering problems and aims to find compact clusters that are informative regarding the outcome variable. The main goal is partitioning data points so that observations in each cluster are similar and the outcome variable can be predicated using these clusters simultaneously. We model this semi-supervised clustering problem as a multi-objective optimization problem with considering deviation of data points in clusters and prediction error of the outcome variable as two objective functions to be minimized. For finding optimal clustering solutions, we employ a non-dominated sorting genetic algorithm II approach and local regression is applied as prediction method for the output variable. For comparing the performance of the proposed model, we compute seven models using five real-world data sets. Furthermore, we investigate the impact of using local regression for predicting the outcome variable in all models, and examine the performance of the multi-objective models compared to single-objective models.

preprint2020arXiv

A new interval-based aggregation approach based on bagging and Interval Agreement Approach (IAA) in ensemble learning

The main aim in ensemble learning is using multiple individual classifiers outputs rather than one classifier output to aggregate them for more accurate classification. Generating an ensemble classifier generally is composed of three steps: selecting the base classifier, applying a sampling strategy to generate different individual classifiers and aggregation the classifiers outputs. This paper focuses on the classifiers outputs aggregation step and presents a new interval-based aggregation modeling using bagging resampling approach and Interval Agreement Approach (IAA) in ensemble learning. IAA is an interesting and practical aggregation approach in decision making which was introduced to combine decision makers opinions when they present their opinions by intervals. In this paper, in addition to implementing a new aggregation approach in ensemble learning, we designed some experiments to encourage researchers to use interval modeling in ensemble learning because it preserves more uncertainty and this leads to more accurate classification. For this purpose, we compared the results of implementing the proposed method to the majority vote as the most common and successful aggregation function in the literature on 10 medical data sets to show the better performance of the interval modeling and the proposed interval-based aggregation function in binary classification when it comes to ensemble learning. The results confirm the good performance of our proposed approach.

preprint2020arXiv

Information-theoretic analysis for transfer learning

Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different distributions (denoted as $μ$ and $μ'$, respectively). In this work, we give an information-theoretic analysis on the generalization error and the excess risk of transfer learning algorithms, following a line of work initiated by Russo and Zhou. Our results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence $D(mu||mu')$ plays an important role in characterizing the generalization error in the settings of domain adaptation. Specifically, we provide generalization error upper bounds for general transfer learning algorithms and extend the results to a specific empirical risk minimization (ERM) algorithm where data from both distributions are available in the training phase. We further apply the method to iterative, noisy gradient descent algorithms, and obtain upper bounds which can be easily calculated, only using parameters from the learning algorithms. A few illustrative examples are provided to demonstrate the usefulness of the results. In particular, our bound is tighter in specific classification problems than the bound derived using Rademacher complexity.

preprint2016arXiv

Adaptive Data Communication Interface: A User-Centric Visual Data Interpretation Framework

In this position paper, we present ideas about creating a next generation framework towards an adaptive interface for data communication and visualisation systems. Our objective is to develop a system that accepts large data sets as inputs and provides user-centric, meaningful visual information to assist owners to make sense of their data collection. The proposed framework comprises four stages: (i) the knowledge base compilation, where we search and collect existing state-ofthe-art visualisation techniques per domain and user preferences; (ii) the development of the learning and inference system, where we apply artificial intelligence techniques to learn, predict and recommend new graphic interpretations (iii) results evaluation; and (iv) reinforcement and adaptation, where valid outputs are stored in our knowledge base and the system is iteratively tuned to address new demands. These stages, as well as our overall vision, limitations and possible challenges are introduced in this article. We also discuss further extensions of this framework for other knowledge discovery tasks.

preprint2016arXiv

An ensemble of machine learning and anti-learning methods for predicting tumour patient survival rates

This paper primarily addresses a dataset relating to cellular, chemical and physical conditions of patients gathered at the time they are operated upon to remove colorectal tumours. This data provides a unique insight into the biochemical and immunological status of patients at the point of tumour removal along with information about tumour classification and post-operative survival. The relationship between severity of tumour, based on TNM staging, and survival is still unclear for patients with TNM stage 2 and 3 tumours. We ask whether it is possible to predict survival rate more accurately using a selection of machine learning techniques applied to subsets of data to gain a deeper understanding of the relationships between a patient's biochemical markers and survival. We use a range of feature selection and single classification techniques to predict the 5 year survival rate of TNM stage 2 and 3 patients which initially produces less than ideal results. The performance of each model individually is then compared with subsets of the data where agreement is reached for multiple models. This novel method of selective ensembling demonstrates that significant improvements in model accuracy on an unseen test set can be achieved for patients where agreement between models is achieved. Finally we point at a possible method to identify whether a patients prognosis can be accurately predicted or not.

preprint2016arXiv

Applying Interval Type-2 Fuzzy Rule Based Classifiers Through a Cluster-Based Class Representation

Fuzzy Rule-Based Classification Systems (FRBCSs) have the potential to provide so-called interpretable classifiers, i.e. classifiers which can be introspective, understood, validated and augmented by human experts by relying on fuzzy-set based rules. This paper builds on prior work for interval type-2 fuzzy set based FRBCs where the fuzzy sets and rules of the classifier are generated using an initial clustering stage. By introducing Subtractive Clustering in order to identify multiple cluster prototypes, the proposed approach has the potential to deliver improved classification performance while maintaining good interpretability, i.e. without resulting in an excessive number of rules. The paper provides a detailed overview of the proposed FRBC framework, followed by a series of exploratory experiments on both linearly and non-linearly separable datasets, comparing results to existing rule-based and SVM approaches. Overall, initial results indicate that the approach enables comparable classification performance to non rule-based classifiers such as SVM, while often achieving this with a very small number of rules.

preprint2016arXiv

Exploring Differences in Interpretation of Words Essential in Medical Expert-Patient Communication

In the context of cancer treatment and surgery, quality of life assessment is a crucial part of determining treatment success and viability. In order to assess it, patients completed questionnaires which employ words to capture aspects of patients well-being are the norm. As the results of these questionnaires are often used to assess patient progress and to determine future treatment options, it is important to establish that the words used are interpreted in the same way by both patients and medical professionals. In this paper, we capture and model patients perceptions and associated uncertainty about the words used to describe the level of their physical function used in the highly common (in Sarcoma Services) Toronto Extremity Salvage Score (TESS) questionnaire. The paper provides detail about the interval-valued data capture as well as the subsequent modelling of the data using fuzzy sets. Based on an initial sample of participants, we use Jaccard similarity on the resulting words models to show that there may be considerable differences in the interpretation of commonly used questionnaire terms, thus presenting a very real risk of miscommunication between patients and medical professionals as well as within the group of medical professionals.

preprint2016arXiv

Identifying Candidate Risk Factors for Prescription Drug Side Effects using Causal Contrast Set Mining

Big longitudinal observational databases present the opportunity to extract new knowledge in a cost effective manner. Unfortunately, the ability of these databases to be used for causal inference is limited due to the passive way in which the data are collected resulting in various forms of bias. In this paper we investigate a method that can overcome these limitations and determine causal contrast set rules efficiently from big data. In particular, we present a new methodology for the purpose of identifying risk factors that increase a patients likelihood of experiencing the known rare side effect of renal failure after ingesting aminosalicylates. The results show that the methodology was able to identify previously researched risk factors such as being prescribed diuretics and highlighted that patients with a higher than average risk of renal failure may be even more susceptible to experiencing it as a side effect after ingesting aminosalicylates.

preprint2016arXiv

Indebted households profiling: a knowledge discovery from database approach

A major challenge in consumer credit risk portfolio management is to classify households according to their risk profile. In order to build such risk profiles it is necessary to employ an approach that analyses data systematically in order to detect important relationships, interactions, dependencies and associations amongst the available continuous and categorical variables altogether and accurately generate profiles of most interesting household segments according to their credit risk. The objective of this work is to employ a knowledge discovery from database process to identify groups of indebted households and describe their profiles using a database collected by the Consumer Credit Counselling Service (CCCS) in the UK. Employing a framework that allows the usage of both categorical and continuous data altogether to find hidden structures in unlabelled data it was established the ideal number of clusters and such clusters were described in order to identify the households who exhibit a high propensity of excessive debt levels.

preprint2016arXiv

Juxtaposition of System Dynamics and Agent-based Simulation for a Case Study in Immunosenescence

Advances in healthcare and in the quality of life significantly increase human life expectancy. With the ageing of populations, new un-faced challenges are brought to science. The human body is naturally selected to be well-functioning until the age of reproduction to keep the species alive. However, as the lifespan extends, unseen problems due to the body deterioration emerge. There are several age-related diseases with no appropriate treatment; therefore, the complex ageing phenomena needs further understanding. Immunosenescence, the ageing of the immune system, is highly correlated to the negative effects of ageing, such as the increase of auto-inflammatory diseases and decrease in responsiveness to new diseases. Besides clinical and mathematical tools, we believe there is opportunity to further exploit simulation tools to understand immunosenescence. Compared to real-world experimentation, benefits include time and cost effectiveness due to the laborious, resource-intensiveness of the biological environment and the possibility of conducting experiments without ethic restrictions. Contrasted with mathematical models, simulation modelling is more suitable for representing complex systems and emergence. In addition, there is the belief that simulation models are easier to communicate in interdisciplinary contexts. Our work investigates the usefulness of simulations to understand immunosenescence by employing two different simulation methods, agent-based and system dynamics simulation, to a case study of immune cells depletion with age.

preprint2016arXiv

Measuring Player's Behaviour Change over Time in Public Goods Game

An important issue in public goods game is whether player's behaviour changes over time, and if so, how significant it is. In this game players can be classified into different groups according to the level of their participation in the public good. This problem can be considered as a concept drift problem by asking the amount of change that happens to the clusters of players over a sequence of game rounds. In this study we present a method for measuring changes in clusters with the same items over discrete time points using external clustering validation indices and area under the curve. External clustering indices were originally used to measure the difference between suggested clusters in terms of clustering algorithms and ground truth labels for items provided by experts. Instead of different cluster label comparison, we use these indices to compare between clusters of any two consecutive time points or between the first time point and the remaining time points to measure the difference between clusters through time points. In theory, any external clustering indices can be used to measure changes for any traditional (non-temporal) clustering algorithm, due to the fact that any time point alone is not carrying any temporal information. For the public goods game, our results indicate that the players are changing over time but the change is smooth and relatively constant between any two time points.

preprint2016arXiv

Modelling Cyber-Security Experts' Decision Making Processes using Aggregation Operators

An important role carried out by cyber-security experts is the assessment of proposed computer systems, during their design stage. This task is fraught with difficulties and uncertainty, making the knowledge provided by human experts essential for successful assessment. Today, the increasing number of progressively complex systems has led to an urgent need to produce tools that support the expert-led process of system-security assessment. In this research, we use weighted averages (WAs) and ordered weighted averages (OWAs) with evolutionary algorithms (EAs) to create aggregation operators that model parts of the assessment process. We show how individual overall ratings for security components can be produced from ratings of their characteristics, and how these individual overall ratings can be aggregated to produce overall rankings of potential attacks on a system. As well as the identification of salient attacks and weak points in a prospective system, the proposed method also highlights which factors and security components contribute most to a component's difficulty and attack ranking respectively. A real world scenario is used in which experts were asked to rank a set of technical attacks, and to answer a series of questions about the security components that are the subject of the attacks. The work shows how finding good aggregation operators, and identifying important components and factors of a cyber-security problem can be automated. The resulting operators have the potential for use as decision aids for systems designers and cyber-security experts, increasing the amount of assessment that can be achieved with the limited resources available.

preprint2016arXiv

Modelling Office Energy Consumption: An Agent Based Approach

In this paper, we develop an agent-based model which integrates four important elements, i.e. organisational energy management policies/regulations, energy management technologies, electric appliances and equipment, and human behaviour, based on a case study, to simulate the energy consumption in office buildings. With the model, we test the effectiveness of different energy management strategies, and solve practical office energy consumption problems. This paper theoretically contributes to an integration of four elements involved in the complex organisational issue of office energy consumption, and practically contributes to an application of agent-based approach for office building energy consumption study.

preprint2016arXiv

Optimising Rule-Based Classification in Temporal Data

This study optimises manually derived rule-based expert system classification of objects according to changes in their properties over time. One of the key challenges that this study tries to address is how to classify objects that exhibit changes in their behaviour over time, for example how to classify companies' share price stability over a period of time or how to classify students' preferences for subjects while they are progressing through school. A specific case the paper considers is the strategy of players in public goods games (as common in economics) across multiple consecutive games. Initial classification starts from expert definitions specifying class allocation for players based on aggregated attributes of the temporal data. Based on these initial classifications, the optimisation process tries to find an improved classifier which produces the best possible compact classes of objects (players) for every time point in the temporal data. The compactness of the classes is measured by a cost function based on internal cluster indices like the Dunn Index, distance measures like Euclidean distance or statistically derived measures like standard deviation. The paper discusses the approach in the context of incorporating changing player strategies in the aforementioned public good games, where common classification approaches so far do not consider such changes in behaviour resulting from learning or in-game experience. By using the proposed process for classifying temporal data and the actual players' contribution during the games, we aim to produce a more refined classification which in turn may inform the interpretation of public goods game data.

preprint2016arXiv

Refining adverse drug reaction signals by incorporating interaction variables identified using emergent pattern mining

Purpose: To develop a framework for identifying and incorporating candidate confounding interaction terms into a regularised cox regression analysis to refine adverse drug reaction signals obtained via longitudinal observational data. Methods: We considered six drug families that are commonly associated with myocardial infarction in observational healthcare data, but where the causal relationship ground truth is known (adverse drug reaction or not). We applied emergent pattern mining to find itemsets of drugs and medical events that are associated with the development of myocardial infarction. These are the candidate confounding interaction terms. We then implemented a cohort study design using regularised cox regression that incorporated and accounted for the candidate confounding interaction terms. Results The methodology was able to account for signals generated due to confounding and a cox regression with elastic net regularisation correctly ranked the drug families known to be true adverse drug reactions above those.

preprint2016arXiv

Self-Organising Maps in Computer Security

Some argue that biologically inspired algorithms are the future of solving difficult problems in computer science. Others strongly believe that the future lies in the exploration of mathematical foundations of problems at hand. The field of computer security tends to accept the latter view as a more appropriate approach due to its more workable validation and verification possibilities. The lack of rigorous scientific practices prevalent in biologically inspired security research does not aid in presenting bio-inspired security approaches as a viable way of dealing with complex security problems. This chapter introduces a biologically inspired algorithm, called the Self Organising Map (SOM), that was developed by Teuvo Kohonen in 1981. Since the algorithm's inception it has been scrutinised by the scientific community and analysed in more than 4000 research papers, many of which dealt with various computer security issues, from anomaly detection, analysis of executables all the way to wireless network monitoring. In this chapter a review of security related SOM research undertaken in the past is presented and analysed. The algorithm's biological analogies are detailed and the author's view on the future possibilities of this successful bio-inspired approach are given. The SOM algorithm's close relation to a number of vital functions of the human brain and the emergence of multi-core computer architectures are the two main reasons behind our assumption that the future of the SOM algorithm and its variations is promising, notably in the field of computer security.

preprint2016arXiv

Simulating user learning in authoritative technology adoption: An agent based model for council-led smart meter deployment planning in the UK

How do technology users effectively transit from having zero knowledge about a technology to making the best use of it after an authoritative technology adoption? This post-adoption user learning has received little research attention in technology management literature. In this paper we investigate user learning in authoritative technology adoption by developing an agent-based model using the case of council-led smart meter deployment in the UK City of Leeds. Energy consumers gain experience of using smart meters based on the learning curve in behavioural learning. With the agent-based model we carry out experiments to validate the model and test different energy interventions that local authorities can use to facilitate energy consumers' learning and maintain their continuous use of the technology. Our results show that the easier energy consumers become experienced, the more energy-efficient they are and the more energy saving they can achieve; encouraging energy consumers' contacts via various informational means can facilitate their learning; and developing and maintaining their positive attitude toward smart metering can enable them to use the technology continuously. Contributions and energy policy/intervention implications are discussed in this paper.

preprint2016arXiv

Supervised Adverse Drug Reaction Signalling Framework Imitating Bradford Hill's Causality Considerations

Big longitudinal observational medical data potentially hold a wealth of information and have been recognised as potential sources for gaining new drug safety knowledge. Unfortunately there are many complexities and underlying issues when analysing longitudinal observational data. Due to these complexities, existing methods for large-scale detection of negative side effects using observational data all tend to have issues distinguishing between association and causality. New methods that can better discriminate causal and non-causal relationships need to be developed to fully utilise the data. In this paper we propose using a set of causality considerations developed by the epidemiologist Bradford Hill as a basis for engineering features that enable the application of supervised learning for the problem of detecting negative side effects. The Bradford Hill considerations look at various perspectives of a drug and outcome relationship to determine whether it shows causal traits. We taught a classifier to find patterns within these perspectives and it learned to discriminate between association and causality. The novelty of this research is the combination of supervised learning and Bradford Hill's causality considerations to automate the Bradford Hill's causality assessment. We evaluated the framework on a drug safety gold standard know as the observational medical outcomes partnership's nonspecified association reference set. The methodology obtained excellent discriminate ability with area under the curves ranging between 0.792-0.940 (existing method optimal: 0.73) and a mean average precision of 0.640 (existing method optimal: 0.141). The proposed features can be calculated efficiently and be readily updated, making the framework suitable for big observational data.

preprint2016arXiv

Supervised Anomaly Detection in Uncertain Pseudoperiodic Data Streams

Uncertain data streams have been widely generated in many Web applications. The uncertainty in data streams makes anomaly detection from sensor data streams far more challenging. In this paper, we present a novel framework that supports anomaly detection in uncertain data streams. The proposed framework adopts an efficient uncertainty pre-processing procedure to identify and eliminate uncertainties in data streams. Based on the corrected data streams, we develop effective period pattern recognition and feature extraction techniques to improve the computational efficiency. We use classification methods for anomaly detection in the corrected data stream. We also empirically show that the proposed approach shows a high accuracy of anomaly detection on a number of real datasets.

preprint2015arXiv

A Data Mining framework to model Consumer Indebtedness with Psychological Factors

Modelling Consumer Indebtedness has proven to be a problem of complex nature. In this work we utilise Data Mining techniques and methods to explore the multifaceted aspect of Consumer Indebtedness by examining the contribution of Psychological Factors, like Impulsivity to the analysis of Consumer Debt. Our results confirm the beneficial impact of Psychological Factors in modelling Consumer Indebtedness and suggest a new approach in analysing Consumer Debt, that would take into consideration more Psychological characteristics of consumers and adopt techniques and practices from Data Mining.

preprint2015arXiv

Incorporating Spontaneous Reporting System Data to Aid Causal Inference in Longitudinal Healthcare Data

Inferring causality using longitudinal observational databases is challenging due to the passive way the data are collected. The majority of associations found within longitudinal observational data are often non-causal and occur due to confounding. The focus of this paper is to investigate incorporating information from additional databases to complement the longitudinal observational database analysis. We investigate the detection of prescription drug side effects as this is an example of a causal relationship. In previous work a framework was proposed for detecting side effects only using longitudinal data. In this paper we combine a measure of association derived from mining a spontaneous reporting system database to previously proposed analysis that extracts domain expertise features for causal analysis of a UK general practice longitudinal database. The results show that there is a significant improvement to the performance of detecting prescription drug side effects when the longitudinal observation data analysis is complemented by incorporating additional drug safety sources into the framework. The area under the receiver operating characteristic curve (AUC) for correctly classifying a side effect when other data were considered was 0.967, whereas without it the AUC was 0.923 However, the results of this paper may be biased by the evaluation and future work should overcome this by developing an unbiased reference set.

preprint2015arXiv

Personalising Mobile Advertising Based on Users Installed Apps

Mobile advertising is a billion pound industry that is rapidly expanding. The success of an advert is measured based on how users interact with it. In this paper we investigate whether the application of unsupervised learning and association rule mining could be used to enable personalised targeting of mobile adverts with the aim of increasing the interaction rate. Over May and June 2014 we recorded advert interactions such as tapping the advert or watching the whole advert video along with the set of apps a user has installed at the time of the interaction. Based on the apps that the users have installed we applied k-means clustering to profile the users into one of ten classes. Due to the large number of apps considered we implemented dimension reduction to reduced the app feature space by mapping the apps to their iTunes category and clustered users based on the percentage of their apps that correspond to each iTunes app category. The clustering was externally validated by investigating differences between the way the ten profiles interact with the various adverts genres (lifestyle, finance and entertainment adverts). In addition association rule mining was performed to find whether the time of the day that the advert is served and the number of apps a user has installed makes certain profiles more likely to interact with the advert genres. The results showed there were clear differences in the way the profiles interact with the different advert genres and the results of this paper suggest that mobile advert targeting would improve the frequency that users interact with an advert.

preprint2015arXiv

Refining Adverse Drug Reactions using Association Rule Mining for Electronic Healthcare Data

Side effects of prescribed medications are a common occurrence. Electronic healthcare databases present the opportunity to identify new side effects efficiently but currently the methods are limited due to confounding (i.e. when an association between two variables is identified due to them both being associated to a third variable). In this paper we propose a proof of concept method that learns common associations and uses this knowledge to automatically refine side effect signals (i.e. exposure-outcome associations) by removing instances of the exposure-outcome associations that are caused by confounding. This leaves the signal instances that are most likely to correspond to true side effect occurrences. We then calculate a novel measure termed the confounding-adjusted risk value, a more accurate absolute risk value of a patient experiencing the outcome within 60 days of the exposure. Tentative results suggest that the method works. For the four signals (i.e. exposure-outcome associations) investigated we are able to correctly filter the majority of exposure-outcome instances that were unlikely to correspond to true side effects. The method is likely to improve when tuning the association rule mining parameters for specific health outcomes. This paper shows that it may be possible to filter signals at a patient level based on association rules learned from considering patients' medical histories. However, additional work is required to develop a way to automate the tuning of the method's parameters.

preprint2014arXiv

A Fuzzy Directional Distance Measure

The measure of distance between two fuzzy sets is a fundamental tool within fuzzy set theory, however, distance measures currently within the literature use a crisp value to represent the distance between fuzzy sets. A real valued distance measure is developed into a fuzzy distance measure which better reflects the uncertainty inherent in fuzzy sets and a fuzzy directional distance measure is presented, which accounts for the direction of change between fuzzy sets. A multiplicative version is explored as a full maximal assignment is computationally intractable so an intermediate solution is offered.

preprint2014arXiv

A Novel Semi-Supervised Algorithm for Rare Prescription Side Effect Discovery

Drugs are frequently prescribed to patients with the aim of improving each patient's medical state, but an unfortunate consequence of most prescription drugs is the occurrence of undesirable side effects. Side effects that occur in more than one in a thousand patients are likely to be signalled efficiently by current drug surveillance methods, however, these same methods may take decades before generating signals for rarer side effects, risking medical morbidity or mortality in patients prescribed the drug while the rare side effect is undiscovered. In this paper we propose a novel computational meta-analysis framework for signalling rare side effects that integrates existing methods, knowledge from the web, metric learning and semi-supervised clustering. The novel framework was able to signal many known rare and serious side effects for the selection of drugs investigated, such as tendon rupture when prescribed Ciprofloxacin or Levofloxacin, renal failure with Naproxen and depression associated with Rimonabant. Furthermore, for the majority of the drug investigated it generated signals for rare side effects at a more stringent signalling threshold than existing methods and shows the potential to become a fundamental part of post marketing surveillance to detect rare side effects.

preprint2014arXiv

An Approach for Assessing Clustering of Households by Electricity Usage

How a household varies their regular usage of electricity is useful information for organisations to allow accurate targeting of behaviour modification initiatives with the aim of improving the overall efficiency of the electricity network. The variability of regular activities in a household is one possible indication of that household's willingness to accept incentives to change their behaviour. An approach is presented for identifying a way of representing the variability of a household's behaviour and developing an efficient way of clustering the households, using these measures of variability, into a few, usable groupings. To evaluate the effectiveness of the variability measures, a number of cluster validity indexes are explored with regard to how the indexes vary with the number of clusters, the number of attributes, and the quality of the attributes. The Cluster Dispersion Indicator (CDI) and the Davies-Boulden Indicator (DBI) are selected for future work developing various indicators of household behaviour variability. The approach is tested using data from 180 UK households monitored for over a year at a sampling interval of 5 minutes. Data is taken from the evening peak electricity usage period of 4pm to 8pm.

preprint2014arXiv

Analysing Fuzzy Sets Through Combining Measures of Similarity and Distance

Reasoning with fuzzy sets can be achieved through measures such as similarity and distance. However, these measures can often give misleading results when considered independently, for example giving the same value for two different pairs of fuzzy sets. This is particularly a problem where many fuzzy sets are generated from real data, and while two different measures may be used to automatically compare such fuzzy sets, it is difficult to interpret two different results. This is especially true where a large number of fuzzy sets are being compared as part of a reasoning system. This paper introduces a method for combining the results of multiple measures into a single measure for the purpose of analysing and comparing fuzzy sets. The combined measure alleviates ambiguous results and aids in the automatic comparison of fuzzy sets. The properties of the combined measure are given, and demonstrations are presented with discussions on the advantages over using a single measure.

preprint2014arXiv

Attributes for Causal Inference in Longitudinal Observational Databases

The pharmaceutical industry is plagued by the problem of side effects that can occur anytime a prescribed medication is ingested. There has been a recent interest in using the vast quantities of medical data available in longitudinal observational databases to identify causal relationships between drugs and medical events. Unfortunately the majority of existing post marketing surveillance algorithms measure how dependant or associated an event is on the presence of a drug rather than measuring causality. In this paper we investigate potential attributes that can be used in causal inference to identify side effects based on the Bradford-Hill causality criteria. Potential attributes are developed by considering five of the causality criteria and feature selection is applied to identify the most suitable of these attributes for detecting side effects. We found that attributes based on the specificity criterion may improve side effect signalling algorithms but the experiment and dosage criteria attributes investigated in this paper did not offer sufficient additional information.

preprint2014arXiv

Augmented Neural Networks for Modelling Consumer Indebtness

Consumer Debt has risen to be an important problem of modern societies, generating a lot of research in order to understand the nature of consumer indebtness, which so far its modelling has been carried out by statistical models. In this work we show that Computational Intelligence can offer a more holistic approach that is more suitable for the complex relationships an indebtness dataset has and Linear Regression cannot uncover. In particular, as our results show, Neural Networks achieve the best performance in modelling consumer indebtness, especially when they manage to incorporate the significant and experimentally verified results of the Data Mining process in the model, exploiting the flexibility Neural Networks offer in designing their topology. This novel method forms an elaborate framework to model Consumer indebtness that can be extended to any other real world application.

preprint2014arXiv

Comparing Stochastic Differential Equations and Agent-Based Modelling and Simulation for Early-stage Cancer

There is great potential to be explored regarding the use of agent-based modelling and simulation as an alternative paradigm to investigate early-stage cancer interactions with the immune system. It does not suffer from some limitations of ordinary differential equation models, such as the lack of stochasticity, representation of individual behaviours rather than aggregates and individual memory. In this paper we investigate the potential contribution of agent-based modelling and simulation when contrasted with stochastic versions of ODE models using early-stage cancer examples. We seek answers to the following questions: (1) Does this new stochastic formulation produce similar results to the agent-based version? (2) Can these methods be used interchangeably? (3) Do agent-based models outcomes reveal any benefit when compared to the Gillespie results? To answer these research questions we investigate three well-established mathematical models describing interactions between tumour cells and immune elements. These case studies were re-conceptualised under an agent-based perspective and also converted to the Gillespie algorithm formulation. Our interest in this work, therefore, is to establish a methodological discussion regarding the usability of different simulation approaches, rather than provide further biological insights into the investigated case studies. Our results show that it is possible to obtain equivalent models that implement the same mechanisms; however, the incapacity of the Gillespie algorithm to retain individual memory of past events affects the similarity of some results. Furthermore, the emergent behaviour of ABMS produces extra patters of behaviour in the system, which was not obtained by the Gillespie algorithm.

preprint2014arXiv

Comparison of algorithms that detect drug side effects using electronic healthcare databases

The electronic healthcare databases are starting to become more readily available and are thought to have excellent potential for generating adverse drug reaction signals. The Health Improvement Network (THIN) database is an electronic healthcare database containing medical information on over 11 million patients that has excellent potential for detecting ADRs. In this paper we apply four existing electronic healthcare database signal detecting algorithms (MUTARA, HUNT, Temporal Pattern Discovery and modified ROR) on the THIN database for a selection of drugs from six chosen drug families. This is the first comparison of ADR signalling algorithms that includes MUTARA and HUNT and enabled us to set a benchmark for the adverse drug reaction signalling ability of the THIN database. The drugs were selectively chosen to enable a comparison with previous work and for variety. It was found that no algorithm was generally superior and the algorithms' natural thresholds act at variable stringencies. Furthermore, none of the algorithms perform well at detecting rare ADRs.

preprint2014arXiv

Comparison of Distance Metrics for Hierarchical Data in Medical Databases

Distance metrics are broadly used in different research areas and applications, such as bio-informatics, data mining and many other fields. However, there are some metrics, like pq-gram and Edit Distance used specifically for data with a hierarchical structure. Other metrics used for non-hierarchical data are the geometric and Hamming metrics. We have applied these metrics to The Health Improvement Network (THIN) database which has some hierarchical data. The THIN data has to be converted into a tree-like structure for the first group of metrics. For the second group of metrics, the data are converted into a frequency table or matrix, then for all metrics, all distances are found and normalised. Based on this particular data set, our research question: which of these metrics is useful for THIN data? This paper compares the metrics, particularly the pq-gram metric on finding the similarities of patients' data. It also investigates the similar patients who have the same close distances as well as the metrics suitability for clustering the whole patient population. Our results show that the two groups of metrics perform differently as they represent different structures of the data. Nevertheless, all the metrics could represent some similar data of patients as well as discriminate sufficiently well in clustering the patient population using $k$-means clustering algorithm.

preprint2014arXiv

Data classification using the Dempster-Shafer method

In this paper, the Dempster-Shafer method is employed as the theoretical basis for creating data classification systems. Testing is carried out using three popular (multiple attribute) benchmark datasets that have two, three and four classes. In each case, a subset of the available data is used for training to establish thresholds, limits or likelihoods of class membership for each attribute, and hence create mass functions that establish probability of class membership for each attribute of the test data. Classification of each data item is achieved by combination of these probabilities via Dempster's Rule of Combination. Results for the first two datasets show extremely high classification accuracy that is competitive with other popular methods. The third dataset is non-numerical and difficult to classify, but good results can be achieved provided the system and mass functions are designed carefully and the right attributes are chosen for combination. In all cases the Dempster-Shafer method provides comparable performance to other more popular algorithms, but the overhead of generating accurate mass functions increases the complexity with the addition of new attributes. Overall, the results suggest that the D-S approach provides a suitable framework for the design of classification systems and that automating the mass function design and calculation would increase the viability of the algorithm for complex classification problems.

preprint2014arXiv

Detect Adverse Drug Reactions for Drug Aspirin

Adverse drug reaction (ADR) is widely concerned for public health issue. In this study we propose an original approach to detect the ADRs using feature matrix and feature selection. The experiments are carried out on the drug Aspirin. Major side effects for the drug are detected and better performance is achieved compared to other computerized methods. The detected ADRs are based on the computerized method, further investigation is needed.

preprint2014arXiv

Detecting adverse drug reactions for the drug Simvastatin

Adverse drug reactions (ADR) are widely concerning for public health issue. In this study we propose an original approach to detect ADRs using a feature matrix and feature selection. The experiments are carried out on the drug Simvastatin. Major side effects for the drug are detected and better performance is achieved compared to other computerized methods. Because currently the detected ADRs are based solely on computerized methods, further expert investigation is needed.

preprint2014arXiv

Ensemble Learning of Colorectal Cancer Survival Rates

In this paper, we describe a dataset relating to cellular and physical conditions of patients who are operated upon to remove colorectal tumours. This data provides a unique insight into immunological status at the point of tumour removal, tumour classification and post-operative survival. We build on existing research on clustering and machine learning facets of this data to demonstrate a role for an ensemble approach to highlighting patients with clearer prognosis parameters. Results for survival prediction using 3 different approaches are shown for a subset of the data which is most difficult to model. The performance of each model individually is compared with subsets of the data where some agreement is reached for multiple models. Significant improvements in model accuracy on an unseen test set can be achieved for patients where agreement between models is achieved.

preprint2014arXiv

Feature selection in detection of adverse drug reactions from the Health Improvement Network (THIN) database

Adverse drug reaction (ADR) is widely concerned for public health issue. ADRs are one of most common causes to withdraw some drugs from market. Prescription event monitoring (PEM) is an important approach to detect the adverse drug reactions. The main problem to deal with this method is how to automatically extract the medical events or side effects from high-throughput medical events, which are collected from day to day clinical practice. In this study we propose a novel concept of feature matrix to detect the ADRs. Feature matrix, which is extracted from big medical data from The Health Improvement Network (THIN) database, is created to characterize the medical events for the patients who take drugs. Feature matrix builds the foundation for the irregular and big medical data. Then feature selection methods are performed on feature matrix to detect the significant features. Finally the ADRs can be located based on the significant features. The experiments are carried out on three drugs: Atorvastatin, Alendronate, and Metoclopramide. Major side effects for each drug are detected and better performance is achieved compared to other computerized methods. The detected ADRs are based on computerized methods, further investigation is needed.

preprint2014arXiv

Modelling Electrical Car Diffusion Based on Agents

Replacing traditional fossil fuel vehicles with innovative zero-emission vehicles for the transport in ci ties is one of the major tactics to achieve the UK government 2020 target of cutting emission. We are developing an agent-based simulation model to study the possible impact of different governmental interventions on the diffusion of such vehicles. Options that could be studied with our what-if analysis to include things like car parking charges, price of electrical car, energy awareness and word of mouth. In this paper we present a first case study related to the introduction of a new car park charging scheme at the University of Nottingham. We have developed an agent based model to simulate theimpact of different car parking rates and other incentives on the uptake of electrical cars. The goal of this case study is to demonstrate the usefulness of agent-based modelling and simulation for such investigations.

preprint2014arXiv

Signalling Paediatric Side Effects using an Ensemble of Simple Study Designs

Background: Children are frequently prescribed medication off-label, meaning there has not been sufficient testing of the medication to determine its safety or effectiveness. The main reason this safety knowledge is lacking is due to ethical restrictions that prevent children from being included in the majority of clinical trials. Objective: The objective of this paper is to investigate whether an ensemble of simple study designs can be implemented to signal acutely occurring side effects effectively within the paediatric population by using historical longitudinal data. The majority of pharmacovigilance techniques are unsupervised, but this research presents a supervised framework. Methods: Multiple measures of association are calculated for each drug and medical event pair and these are used as features that are fed into a classiffier to determine the likelihood of the drug and medical event pair corresponding to an adverse drug reaction. The classiffier is trained using known adverse drug reactions or known non-adverse drug reaction relationships. Results: The novel ensemble framework obtained a false positive rate of 0:149, a sensitivity of 0:547 and a specificity of 0:851 when implemented on a reference set of drug and medical event pairs. The novel framework consistently outperformed each individual simple study design. Conclusion: This research shows that it is possible to exploit the mechanism of causality and presents a framework for signalling adverse drug reactions effectively.

preprint2014arXiv

Tuning a Multiple Classifier System for Side Effect Discovery using Genetic Algorithms

In previous work, a novel supervised framework implementing a binary classifier was presented that obtained excellent results for side effect discovery. Interestingly, unique side effects were identified when different binary classifiers were used within the framework, prompting the investigation of applying a multiple classifier system. In this paper we investigate tuning a side effect multiple classifying system using genetic algorithms. The results of this research show that the novel framework implementing a multiple classifying system trained using genetic algorithms can obtain a higher partial area under the receiver operating characteristic curve than implementing a single classifier. Furthermore, the framework is able to detect side effects efficiently and obtains a low false positive rate.

preprint2014arXiv

Variability of Behaviour in Electricity Load Profile Clustering; Who Does Things at the Same Time Each Day?

UK electricity market changes provide opportunities to alter households' electricity usage patterns for the benefit of the overall electricity network. Work on clustering similar households has concentrated on daily load profiles and the variability in regular household behaviours has not been considered. Those households with most variability in regular activities may be the most receptive to incentives to change timing. Whether using the variability of regular behaviour allows the creation of more consistent groupings of households is investigated and compared with daily load profile clustering. 204 UK households are analysed to find repeating patterns (motifs). Variability in the time of the motif is used as the basis for clustering households. Different clustering algorithms are assessed by the consistency of the results. Findings show that variability of behaviour, using motifs, provides more consistent groupings of households across different clustering algorithms and allows for more efficient targeting of behaviour change interventions.

preprint2013arXiv

A Beginners Guide to Systems Simulation in Immunology

Some common systems modelling and simulation approaches for immune problems are Monte Carlo simulations, system dynamics, discrete-event simulation and agent-based simulation. These methods, however, are still not widely adopted in immunology research. In addition, to our knowledge, there is few research on the processes for the development of simulation models for the immune system. Hence, for this work, we have two contributions to knowledge. The first one is to show the importance of systems simulation to help immunological research and to draw the attention of simulation developers to this research field. The second contribution is the introduction of a quick guide containing the main steps for modelling and simulation in immunology, together with challenges that occur during the model development. Further, this paper introduces an example of a simulation problem, where we test our guidelines.

preprint2013arXiv

A Comparison of Non-stationary, Type-2 and Dual Surface Fuzzy Control

Type-1 fuzzy logic has frequently been used in control systems. However this method is sometimes shown to be too restrictive and unable to adapt in the presence of uncertainty. In this paper we compare type-1 fuzzy control with several other fuzzy approaches under a range of uncertain conditions. Interval type-2 and non-stationary fuzzy controllers are compared, along with 'dual surface' type-2 control, named due to utilising both the lower and upper values produced from standard interval type-2 systems. We tune a type-1 controller, then derive the membership functions and footprints of uncertainty from the type-1 system and evaluate them using a simulated autonomous sailing problem with varying amounts of environmental uncertainty. We show that while these more sophisticated controllers can produce better performance than the type-1 controller, this is not guaranteed and that selection of Footprint of Uncertainty (FOU) size has a large effect on this relative performance.

preprint2013arXiv

A New Graphical Password Scheme Resistant to Shoulder-Surfing

Shoulder-surfing is a known risk where an attacker can capture a password by direct observation or by recording the authentication session. Due to the visual interface, this problem has become exacerbated in graphical passwords. There have been some graphical schemes resistant or immune to shoulder-surfing, but they have significant usability drawbacks, usually in the time and effort to log in. In this paper, we propose and evaluate a new shoulder-surfing resistant scheme which has a desirable usability for PDAs. Our inspiration comes from the drawing input method in DAS and the association mnemonics in Story for sequence retrieval. The new scheme requires users to draw a curve across their password images orderly rather than click directly on them. The drawing input trick along with the complementary measures, such as erasing the drawing trace, displaying degraded images, and starting and ending with randomly designated images provide a good resistance to shouldersurfing. A preliminary user study showed that users were able to enter their passwords accurately and to remember them over time.

preprint2013arXiv

A Three-Dimensional Model of Residential Energy Consumer Archetypes for Local Energy Policy Design in the UK

This paper reviews major studies in three traditional lines of research in residential energy consumption in the UK, i.e. economic/infrastructure, behaviour, and load profiling. Based on the review the paper proposes a three-dimensional model for archetyping residential energy consumers in the UK by considering property energy efficiency levels, the greenness of household behaviour of using energy, and the duration of property daytime occupancy. With the proposed model, eight archetypes of residential energy consumers in the UK have been identified. They are: pioneer greens, follower greens, concerned greens, home stayers, unconscientious wasters, regular wasters, daytime wasters, and disengaged wasters. Using a case study, these archetypes of residential energy consumers demonstrate the robustness of the 3-D model in aiding local energy policy/intervention design in the UK.

preprint2013arXiv

Adaptive Alert Throttling for Intrusion Detection Systems

Each time that an intrusion detection system raises an alert it must make some attempt to communicate the information to an operator. This communication channel can easily become the target of a denial of service attack because, like all communication channels, it has a fixed capacity. If this channel can become overwhelmed with bogus data, an attacker can quickly achieve complete neutralisation of intrusion detection capability. Although these types of attack are very hard to stop completely, our aim is to present techniques that improve alert throughput and capacity to such an extent that the resources required to successfully mount the attack become prohibitive.

preprint2013arXiv

Against Spyware Using CAPTCHA in Graphical Password Scheme

Text-based password schemes have inherent security and usability problems, leading to the development of graphical password schemes. However, most of these alternate schemes are vulnerable to spyware attacks. We propose a new scheme, using CAPTCHA (Completely Automated Public Turing tests to tell Computers and Humans Apart) that retaining the advantages of graphical password schemes, while simultaneously raising the cost of adversaries by orders of magnitude. Furthermore, some primary experiments are conducted and the results indicate that the usability should be improved in the future work.

preprint2013arXiv

An audio CAPTCHA to distinguish humans from computers

CAPTCHAs are employed as a security measure to differentiate human users from bots. A new sound-based CAPTCHA is proposed in this paper, which exploits the gaps between human voice and synthetic voice rather than relays on the auditory perception of human. The user is required to read out a given sentence, which is selected randomly from a specified book. The generated audio file will be analyzed automatically to judge whether the user is a human or not. In this paper, the design of the new CAPTCHA, the analysis of the audio files, and the choice of the audio frame window function are described in detail. And also, some experiments are conducted to fix the critical threshold and the coefficients of three indicators to ensure the security. The proposed audio CAPTCHA is proved accessible to users. The user study has shown that the human success rate reaches approximately 97% and the pass rate of attack software using Microsoft SDK 5.1 is only 4%. The experiments also indicated that it could be solved by most human users in less than 14 seconds and the average time is only 7.8 seconds.

preprint2013arXiv

An investigation into the relationship between type-2 FOU size and environmental uncertainty in robotic control

It has been suggested that, when faced with large amounts of uncertainty in situations of automated control, type-2 fuzzy logic based controllers will out-perform the simpler type-1 varieties due to the latter lacking the flexibility to adapt accordingly. This paper aims to investigate this problem in detail in order to analyse when a type-2 controller will improve upon type-1 performance. A robotic sailing boat is subjected to several experiments in which the uncertainty and difficulty of the sailing problem is increased in order to observe the effects on measured performance. Improved performance is observed but not in every case. The size of the FOU is shown to be have a large effect on performance with potentially severe performance penalties for incorrectly sized footprints.

preprint2013arXiv

Application of a clustering framework to UK domestic electricity data

This paper takes an approach to clustering domestic electricity load profiles that has been successfully used with data from Portugal and applies it to UK data. Clustering techniques are applied and it is found that the preferred technique in the Portuguese work (a two stage process combining Self Organised Maps and Kmeans) is not appropriate for the UK data. The work shows that up to nine clusters of households can be identified with the differences in usage profiles being visually striking. This demonstrates the appropriateness of breaking the electricity usage patterns down to more detail than the two load profiles currently published by the electricity industry. The paper details initial results using data collected in Milton Keynes around 1990. Further work is described and will concentrate on building accurate and meaningful clusters of similar electricity users in order to better direct demand side management initiatives to the most relevant target customers.

preprint2013arXiv

Artificial Immune Systems (INTROS 2)

The biological immune system is a robust, complex, adaptive system that defends the body from foreign pathogens. It is able to categorize all cells (or molecules) within the body as self or non-self substances. It does this with the help of a distributed task force that has the intelligence to take action from a local and also a global perspective using its network of chemical messengers for communication. There are two major branches of the immune system. The innate immune system is an unchanging mechanism that detects and destroys certain invading organisms, whilst the adaptive immune system responds to previously unknown foreign cells and builds a response to them that can remain in the body over a long period of time. This remarkable information processing biological system has caught the attention of computer science in recent years. A novel computational intelligence technique, inspired by immunology, has emerged, called Artificial Immune Systems. Several concepts from the immune system have been extracted and applied for solution to real world science and engineering problems. In this tutorial, we briefly describe the immune system metaphors that are relevant to existing Artificial Immune Systems methods. We will then show illustrative real-world problems suitable for Artificial Immune Systems and give a step-by-step algorithm walkthrough for one such problem. A comparison of the Artificial Immune Systems to other well-known algorithms, areas for future work, tips & tricks and a list of resources will round this tutorial off. It should be noted that as Artificial Immune Systems is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from time to time and from those examples given here.

preprint2013arXiv

Biomarker Clustering of Colorectal Cancer Data to Complement Clinical Classification

In this paper, we describe a dataset relating to cellular and physical conditions of patients who are operated upon to remove colorectal tumours. This data provides a unique insight into immunological status at the point of tumour removal, tumour classification and post-operative survival. Attempts are made to cluster this dataset and important subsets of it in an effort to characterize the data and validate existing standards for tumour classification. It is apparent from optimal clustering that existing tumour classification is largely unrelated to immunological factors within a patient and that there may be scope for re-evaluating treatment options and survival estimates based on a combination of tumour physiology and patient histochemistry.

preprint2013arXiv

Can background baroque music help to improve the memorability of graphical passwords?

Graphical passwords have been proposed as an alternative to alphanumeric passwords with their advantages in usability and security. However, they still tend to follow predictable patterns that are easier for attackers to exploit, probably due to users' memory limitations. Various literatures show that baroque music has positive effects on human learning and memorizing. To alleviate users' memory burden, we investigate the novel idea of introducing baroque music to graphical password schemes (specifically DAS, PassPoints and Story) and conduct a laboratory study to see whether it is helpful. In a ten minutes short-term recall, we found that participants in all conditions had high recall success rates that were not statistically different from each other. After one week, the music group coped PassPoints passwords significantly better than the group without music. But there was no statistical difference between two groups in recalling DAS passwords or Story passwords. Further more, we found that the music group tended to set significantly more complicated PassPoints passwords but less complicated DAS passwords.

preprint2013arXiv

Comparing Data-mining Algorithms Developed for Longitudinal Observational Databases

Longitudinal observational databases have become a recent interest in the post marketing drug surveillance community due to their ability of presenting a new perspective for detecting negative side effects. Algorithms mining longitudinal observation databases are not restricted by many of the limitations associated with the more conventional methods that have been developed for spontaneous reporting system databases. In this paper we investigate the robustness of four recently developed algorithms that mine longitudinal observational databases by applying them to The Health Improvement Network (THIN) for six drugs with well document known negative side effects. Our results show that none of the existing algorithms was able to consistently identify known adverse drug reactions above events related to the cause of the drug and no algorithm was superior.

preprint2013arXiv

Comparing Decison Support Tools for Cargo Screening Processes

When planning to change operations at ports there are two key stake holders with very different interests involved in the decision making processes. Port operators are attentive to their standards, a smooth service flow and economic viability while border agencies are concerned about national security. The time taken for security checks often interferes with the compliance to service standards that port operators would like to achieve. Decision support tools as for example Cost-Benefit Analysis or Multi Criteria Analysis are useful helpers to better understand the impact of changes to a system. They allow investigating future scenarios and helping to find solutions that are acceptable for all parties involved in port operations. In this paper we evaluate two different modelling methods, namely scenario analysis and discrete event simulation. These are useful for driving the decision support tools (i.e. they provide the inputs the decision support tools require). Our aims are, on the one hand, to guide the reader through the modelling processes and, on the other hand, to demonstrate what kind of decision support information one can obtain from the different modelling methods presented.

preprint2013arXiv

Creating Personalised Energy Plans. From Groups to Individuals using Fuzzy C Means Clustering

Changes in the UK electricity market mean that domestic users will be required to modify their usage behaviour in order that supplies can be maintained. Clustering allows usage profiles collected at the household level to be clustered into groups and assigned a stereotypical profile which can be used to target marketing campaigns. Fuzzy C Means clustering extends this by allowing each household to be a member of many groups and hence provides the opportunity to make personalised offers to the household dependent on their degree of membership of each group. In addition, feedback can be provided on how user's changing behaviour is moving them towards more "green" or cost effective stereotypical usage.

preprint2013arXiv

Defining a Simulation Strategy for Cancer Immunocompetence

Although there are various types of cancer treatments, none of these currently take into account the effect of ageing of the immune system and hence altered responses to cancer. Recent studies have shown that in vitro stimulation of T cells can help in the treatment of patients. There are many factors that have to be considered when simulating an organism's immunocompetence. Our particular interest lies in the study of loss of immunocompetence with age. We are trying to answer questions such as: Given a certain age of a patient, how fit is their immune system to fight cancer? Would an immune boost improve the effectiveness of a cancer treatment given the patient's immune phenotype and age? We believe that understanding the processes of immune system ageing and degradation through computer simulation may help in answering these questions. Specifically, we have decided to look at the change in numbers of naive T cells with age, as they play a important role in responses to cancer and anti-tumour vaccination. In this work we present an agent-based simulation model to understand the interactions which influence the naive T cell populations over time. Our agent model is based on existing mathematical system dynamic model, but in comparisons offers better scope for customisation and detailed analysis. We believe that the results obtained can in future help with the modelling of T cell populations inside tumours.

preprint2013arXiv

Detect adverse drug reactions for drug Alendronate

Adverse drug reaction (ADR) is widely concerned for public health issue. In this study we propose an original approach to detect the ADRs using feature matrix and feature selection. The experiments are carried out on the drug Simvastatin. Major side effects for the drug are detected and better performance is achieved compared to other computerized methods. The detected ADRs are based on the computerized method, further investigation is needed.

preprint2013arXiv

Detect adverse drug reactions for drug Atorvastatin

Adverse drug reactions (ADRs) are big concern for public health. ADRs are one of most common causes to withdraw some drugs from markets. Now two major methods for detecting ADRs are spontaneous reporting system (SRS), and prescription event monitoring (PEM). The World Health Organization (WHO) defines a signal in pharmacovigilance as "any reported information on a possible causal relationship between an adverse event and a drug, the relationship being unknown or incompletely documented previously". For spontaneous reporting systems, many machine learning methods are used to detect ADRs, such as Bayesian confidence propagation neural network (BCPNN), decision support methods, genetic algorithms, knowledge based approaches, etc. One limitation is the reporting mechanism to submit ADR reports, which has serious underreporting and is not able to accurately quantify the corresponding risk. Another limitation is hard to detect ADRs with small number of occurrences of each drug-event association in the database. In this paper we propose feature selection approach to detect ADRs from The Health Improvement Network (THIN) database. First a feature matrix, which represents the medical events for the patients before and after taking drugs, is created by linking patients' prescriptions and corresponding medical events together. Then significant features are selected based on feature selection methods, comparing the feature matrix before patients take drugs with one after patients take drugs. Finally the significant ADRs can be detected from thousands of medical events based on corresponding features. Experiments are carried out on the drug Atorvastatin. Good performance is achieved.

preprint2013arXiv

Detect adverse drug reactions for drug Pioglitazone

In this study we propose a novel method to successfully detect the ADRs using feature matrix and feature selection. A feature matrix, which characterizes the medical events before patients take drugs or after patients take drugs, is created from THIN database. The feature selection method of Student's t-test is used to detect the significant features from thousands of medical events. The significant ADRs, which are corresponding to significant features, are detected. Experiments are performed on the drug Pioglitazone. Compared to other computerized methods, our proposed method achieves good performance.

preprint2013arXiv

Detect adverse drug reactions for the drug Pravastatin

preprint2013arXiv

Dienstplanerstellung in Krankenhaeusern mittels genetischer Algorithmen

preprint2013arXiv

Discovering Sequential Patterns in a UK General Practice Database

The wealth of computerised medical information becoming readily available presents the opportunity to examine patterns of illnesses, therapies and responses. These patterns may be able to predict illnesses that a patient is likely to develop, allowing the implementation of preventative actions. In this paper sequential rule mining is applied to a General Practice database to find rules involving a patients age, gender and medical history. By incorporating these rules into current health-care a patient can be highlighted as susceptible to a future illness based on past or current illnesses, gender and year of birth. This knowledge has the ability to greatly improve health-care and reduce health-care costs.

preprint2013arXiv

Draw a line on your PDA to authenticate

The trend toward a highly mobile workforce and the ubiquity of graphical interfaces (such as the stylus and touch-screen) has enabled the emergence of graphical authentications in Personal Digital Assistants (PDAs) [1]. However, most of the current graphical password schemes are vulnerable to shoulder-surfing [2,3], a known risk where an attacker can capture a password by direct observation or by recording the authentication session. Several approaches have been developed to deal with this problem, but they have significant usability drawbacks, usually in the time and effort to log in, making them less suitable for authentication [4, 8]. For example, it is time-consuming for users to log in CHC [4] and there are complex text memory requirements in scheme proposed by Hong [5]. With respect to the scheme proposed by Weinshall [6], not only is it intricate to log in, but also the main claim of resisting shoulder-surfing is proven false [7]. In this paper, we introduce a new graphical password scheme which provides a good resistance to shouldersurfing and preserves a desirable usability.

preprint2013arXiv

Evaluating Different Cost-Benefit Analysis Methods for Port Security Operations

Service industries, such as ports, are attentive to their standards, a smooth service flow and economic viability. Cost benefit analysis has proven itself as a useful tool to support this type of decision making; it has been used by businesses and governmental agencies for many years. In this book chapter we demonstrate different modelling methods that are used for estimating input factors required for conducting cost benefit analysis based on a single case study. These methods are: scenario analysis, decision trees, Monte-Carlo simulation modelling and discrete event simulation modelling. Our aims are, on the one hand, to guide the analyst through the modelling processes and, on the other hand, to demonstrate what additional decision support information can be obtained from applying each of these modelling methods.

preprint2013arXiv

Examining the Classification Accuracy of TSVMs with ?Feature Selection in Comparison with the GLAD Algorithm

Gene expression data sets are used to classify and predict patient diagnostic categories. As we know, it is extremely difficult and expensive to obtain gene expression labelled examples. Moreover, conventional supervised approaches cannot function properly when labelled data (training examples) are insufficient using Support Vector Machines (SVM) algorithms. Therefore, in this paper, we suggest Transductive Support Vector Machines (TSVMs) as semi-supervised learning algorithms, learning with both labelled samples data and unlabelled samples to perform the classification of microarray data. To prune the superfluous genes and samples we used a feature selection method called Recursive Feature Elimination (RFE), which is supposed to enhance the output of classification and avoid the local optimization problem. We examined the classification prediction accuracy of the TSVM-RFE algorithm in comparison with the Genetic Learning Across Datasets (GLAD) algorithm, as both are semi-supervised learning methods. Comparing these two methods, we found that the TSVM-RFE surpassed both a SVM using RFE and GLAD.

preprint2013arXiv

Extending a Microsimulation of the Port of Dover

Modelling and simulating the traffic of heavily used but secure environments such as seaports and airports is of increasing importance. This paper discusses issues and problems that may arise when extending an existing microsimulation strategy. We also discuss how extensions of these simulations can aid planners with optimal physical and operational feedback. Conclusions are drawn about how microsimulations can be moved forward as a robust planning tool for the 21st century.

preprint2013arXiv

Extending Similarity Measures of Interval Type-2 Fuzzy Sets to General Type-2 Fuzzy Sets

Similarity measures provide one of the core tools that enable reasoning about fuzzy sets. While many types of similarity measures exist for type-1 and interval type-2 fuzzy sets, there are very few similarity measures that enable the comparison of general type-2 fuzzy sets. In this paper, we introduce a general method for extending existing interval type-2 similarity measures to similarity measures for general type-2 fuzzy sets. Specifically, we show how similarity measures for interval type-2 fuzzy sets can be employed in conjunction with the zSlices based general type-2 representation for fuzzy sets to provide measures of similarity which preserve all the common properties (i.e. reflexivity, symmetry, transitivity and overlapping) of the original interval type-2 similarity measure. We demonstrate examples of such extended fuzzy measures and provide comparisons between (different types of) interval and general type-2 fuzzy measures.

preprint2013arXiv

Finding the creatures of habit; Clustering households based on their flexibility in using electricity

Changes in the UK electricity market, particularly with the roll out of smart meters, will provide greatly increased opportunities for initiatives intended to change households' electricity usage patterns for the benefit of the overall system. Users show differences in their regular behaviours and clustering households into similar groupings based on this variability provides for efficient targeting of initiatives. Those people who are stuck into a regular pattern of activity may be the least receptive to an initiative to change behaviour. A sample of 180 households from the UK are clustered into four groups as an initial test of the concept and useful, actionable groupings are found.

preprint2013arXiv

Immune System Approaches to Intrusion Detection - A Review (ICARIS)

The use of artificial immune systems in intrusion detection is an appealing concept for two reasons. Firstly, the human immune system provides the human body with a high level of protection from invading pathogens, in a robust, self-organised and distributed manner. Secondly, current techniques used in computer security are not able to cope with the dynamic and increasingly complex nature of computer systems and their security. It is hoped that biologically inspired approaches in this area, including the use of immune-based systems will be able to meet this challenge. Here we collate the algorithms used, the development of the systems and the outcome of their implementation. It provides an introduction and review of the key developments within this field, in addition to making suggestions for future research.

preprint2013arXiv

Investigating Immune System Aging: System Dynamics and Agent-Based Modeling

System dynamics and agent based simulation models can both be used to model and understand interactions of entities within a population. Our modeling work presented here is concerned with understanding the suitability of the different types of simulation for the immune system aging problems and comparing their results. We are trying to answer questions such as: How fit is the immune system given a certain age? Would an immune boost be of therapeutic value, e.g. to improve the effectiveness of a simultaneous vaccination? Understanding the processes of immune system aging and degradation may also help in development of therapies that reverse some of the damages caused thus improving life expectancy. Therefore as a first step our research focuses on T cells; major contributors to immune system functionality. One of the main factors influencing immune system aging is the output rate of naive T cells. Of further interest is the number and phenotypical variety of these cells in an individual, which will be the case study focused on in this paper.

preprint2013arXiv

Investigating Mathematical Models of Immuno-Interactions with Early-Stage Cancer under an Agent-Based Modelling Perspective

Many advances in research regarding immuno-interactions with cancer were developed with the help of ordinary differential equation (ODE) models. These models, however, are not effectively capable of representing problems involving individual localisation, memory and emerging properties, which are common characteristics of cells and molecules of the immune system. Agent-based modelling and simulation is an alternative paradigm to ODE models that overcomes these limitations. In this paper we investigate the potential contribution of agent-based modelling and simulation when compared to ODE modelling and simulation. We seek answers to the following questions: Is it possible to obtain an equivalent agent-based model from the ODE formulation? Do the outcomes differ? Are there any benefits of using one method compared to the other? To answer these questions, we have considered three case studies using established mathematical models of immune interactions with early-stage cancer. These case studies were re-conceptualised under an agent-based perspective and the simulation results were then compared with those from the ODE models. Our results show that it is possible to obtain equivalent agent-based models (i.e. implementing the same mechanisms); the simulation output of both types of models however might differ depending on the attributes of the system to be modelled. In some cases, additional insight from using agent-based modelling was obtained. Overall, we can confirm that agent-based modelling is a useful addition to the tool set of immunologists, as it has extra features that allow for simulations with characteristics that are closer to the biological phenomena.

preprint2013arXiv

Investigating the Detection of Adverse Drug Events in a UK General Practice Electronic Health-Care Database

Data-mining techniques have frequently been developed for Spontaneous reporting databases. These techniques aim to find adverse drug events accurately and efficiently. Spontaneous reporting databases are prone to missing information, under reporting and incorrect entries. This often results in a detection lag or prevents the detection of some adverse drug events. These limitations do not occur in electronic health-care databases. In this paper, existing methods developed for spontaneous reporting databases are implemented on both a spontaneous reporting database and a general practice electronic health-care database and compared. The results suggests that the application of existing methods to the general practice database may help find signals that have gone undetected when using the spontaneous reporting system database. In addition the general practice database provides far more supplementary information, that if incorporated in analysis could provide a wealth of information for identifying adverse events more accurately.

preprint2013arXiv

Investigating the effectiveness of Variance Reduction Techniques in Manufacturing, Call Center and Cross-docking Discrete Event Simulation Models

Variance reduction techniques have been shown by others in the past to be a useful tool to reduce variance in Simulation studies. However, their application and success in the past has been mainly domain specific, with relatively little guidelines as to their general applicability, in particular for novices in this area. To facilitate their use, this study aims to investigate the robustness of individual techniques across a set of scenarios from different domains. Experimental results show that Control Variates is the only technique which achieves a reduction in variance across all domains. Furthermore, applied individually, Antithetic Variates and Control Variates perform particularly well in the Cross-docking scenarios, which was previously unknown.

preprint2013arXiv

Measuring the Directional Distance Between Fuzzy Sets

The measure of distance between two fuzzy sets is a fundamental tool within fuzzy set theory. However, current distance measures within the literature do not account for the direction of change between fuzzy sets; a useful concept in a variety of applications, such as Computing With Words. In this paper, we highlight this utility and introduce a distance measure which takes the direction between sets into account. We provide details of its application for normal and non-normal, as well as convex and non-convex fuzzy sets. We demonstrate the new distance measure using real data from the MovieLens dataset and establish the benefits of measuring the direction between fuzzy sets.

preprint2013arXiv

Memory Implementations - Current Alternatives

Memory can be defined as the ability to retain and recall information in a diverse range of forms. It is a vital component of the way in which we as human beings operate on a day to day basis. Given a particular situation, decisions are made and actions undertaken in response to that situation based on our memory of related prior events and experiences. By utilising our memory we can anticipate the outcome of our chosen actions to avoid unexpected or unwanted events. In addition, as we subtly alter our actions and recognise altered outcomes we learn and create new memories, enabling us to improve the efficiency of our actions over time. However, as this process occurs so naturally in the subconscious its importance is often overlooked.

preprint2013arXiv

Modelling and Analysing Cargo Screening Processes: A Project Outline

The efficiency of current cargo screening processes at sea and air ports is unknown as no benchmarks exists against which they could be measured. Some manufacturer benchmarks exist for individual sensors but we have not found any benchmarks that take a holistic view of the screening procedures assessing a combination of sensors and also taking operator variability into account. Just adding up resources and manpower used is not an effective way for assessing systems where human decision-making and operator compliance to rules play a vital role. For such systems more advanced assessment methods need to be used, taking into account that the cargo screening process is of a dynamic and stochastic nature. Our project aim is to develop a decision support tool (cargo-screening system simulator) that will map the right technology and manpower to the right commodity-threat combination in order to maximize detection rates. In this paper we present a project outline and highlight the research challenges we have identified so far. In addition we introduce our first case study, where we investigate the cargo screening process at the ferry port in Calais.

preprint2013arXiv

Modelling Electricity Consumption in Office Buildings: An Agent Based Approach

In this paper, we develop an agent-based model which integrates four important elements, i.e. organisational energy management policies/regulations, energy management technologies, electric appliances and equipment, and human behaviour, to simulate the electricity consumption in office buildings. Based on a case study, we use this model to test the effectiveness of different electricity management strategies, and solve practical office electricity consumption problems. This paper theoretically contributes to an integration of the four elements involved in the complex organisational issue of office electricity consumption, and practically contributes to an application of an agent-based approach for office building electricity consumption study.

preprint2013arXiv

Modelling Reactive and Proactive Behaviour in Simulation: A Case Study in a University Organisation

Simulation is a well established what-if scenario analysis tool in Operational Research (OR). While traditionally Discrete Event Simulation (DES) and System Dynamics Simulation (SDS) are the predominant simulation techniques in OR, a new simulation technique, namely Agent-Based Simulation (ABS), has emerged and is gaining more attention. In our research we focus on discrete simulation methods (i.e. DES and ABS). The contribution made by this paper is the comparison of DES and combined DES/ABS for modelling human reactive and different level of detail of human proactive behaviour in service systems. The results of our experiments show that the level of proactiveness considered in the model has a big impact on the simulation output. However, there is not a big difference between the results from the DES and the combined DES/ABS simulation models. Therefore, for service systems of the type we investigated we would suggest to use DES as the preferred analysis tool.

preprint2013arXiv

Modelling the Effects of User Learning on Forced Innovation Diffusion

Technology adoption theories assume that users' acceptance of an innovative technology is on a voluntary basis. However, sometimes users are force to accept an innovation. In this case users have to learn what it is useful for and how to use it. This learning process will enable users to transit from zero knowledge about the innovation to making the best use of it. So far the effects of user learning on technology adoption have received little research attention. In this paper - for the first time - we investigate the effects of user learning on forced innovation adoption by using an agent-based simulation approach using the case of forced smart metering deployments in the city of Leeds

preprint2013arXiv

Motif Detection Inspired by Immune Memory (JORS)

preprint2013arXiv

Performance Measurement Under Increasing Environmental Uncertainty In The Context of Interval Type-2 Fuzzy Logic Based Robotic Sailing

Performance measurement of robotic controllers based on fuzzy logic, operating under uncertainty, is a subject area which has been somewhat ignored in the current literature. In this paper standard measures such as RMSE are shown to be inappropriate for use under conditions where the environmental uncertainty changes significantly between experiments. An overview of current methods which have been applied by other authors is presented, followed by a design of a more sophisticated method of comparison. This method is then applied to a robotic control problem to observe its outcome compared with a single measure. Results show that the technique described provides a more robust method of performance comparison than less complex methods allowing better comparisons to be drawn.

preprint2013arXiv

Privileged Information for Data Clustering

Many machine learning algorithms assume that all input samples are independently and identically distributed from some common distribution on either the input space X, in the case of unsupervised learning, or the input and output space X x Y in the case of supervised and semi-supervised learning. In the last number of years the relaxation of this assumption has been explored and the importance of incorporation of additional information within machine learning algorithms became more apparent. Traditionally such fusion of information was the domain of semi-supervised learning. More recently the inclusion of knowledge from separate hypothetical spaces has been proposed by Vapnik as part of the supervised setting. In this work we are interested in exploring Vapnik's idea of master-class learning and the associated learning using privileged information, however within the unsupervised setting. Adoption of the advanced supervised learning paradigm for the unsupervised setting instigates investigation into the difference between privileged and technical data. By means of our proposed aRi-MAX method stability of the KMeans algorithm is improved and identification of the best clustering solution is achieved on an artificial dataset. Subsequently an information theoretic dot product based algorithm called P-Dot is proposed. This method has the ability to utilize a wide variety of clustering techniques, individually or in combination, while fusing privileged and technical data for improved clustering. Application of the P-Dot method to the task of digit recognition confirms our findings in a real-world scenario.

preprint2013arXiv

Quiet in Class: Classification, Noise and the Dendritic Cell Algorithm

Theoretical analyses of the Dendritic Cell Algorithm (DCA) have yielded several criticisms about its underlying structure and operation. As a result, several alterations and fixes have been suggested in the literature to correct for these findings. A contribution of this work is to investigate the effects of replacing the classification stage of the DCA (which is known to be flawed) with a traditional machine learning technique. This work goes on to question the merits of those unique properties of the DCA that are yet to be thoroughly analysed. If none of these properties can be found to have a benefit over traditional approaches, then "fixing" the DCA is arguably less efficient than simply creating a new algorithm. This work examines the dynamic filtering property of the DCA and questions the utility of this unique feature for the anomaly detection problem. It is found that this feature, while advantageous for noisy, time-ordered classification, is not as useful as a traditional static filter for processing a synthetic dataset. It is concluded that there are still unique features of the DCA left to investigate. Areas that may be of benefit to the Artificial Immune Systems community are suggested.

preprint2013arXiv

Real-world Transfer of Evolved Artificial Immune System Behaviours between Small and Large Scale Robotic Platforms

In mobile robotics, a solid test for adaptation is the ability of a control system to function not only in a diverse number of physical environments, but also on a number of different robotic platforms. This paper demonstrates that a set of behaviours evolved in simulation on a miniature robot (epuck) can be transferred to a much larger-scale platform (Pioneer), both in simulation and in the real world. The chosen architecture uses artificial evolution of epuck behaviours to obtain a genetic sequence, which is then employed to seed an idiotypic, artificial immune system (AIS) on the Pioneers. Despite numerous hardware and software differences between the platforms, navigation and target-finding experiments show that the evolved behaviours transfer very well to the larger robot when the idiotypic AIS technique is used. In contrast, transferability is poor when reinforcement learning alone is used, which validates the adaptability of the chosen architecture.

preprint2013arXiv

Scenario Analysis, Decision Trees and Simulation for Cost Benefit Analysis of the Cargo Screening Process

In this paper we present our ideas for conducting a cost benefit analysis by using three different methods: scenario analysis, decision trees and simulation. Then we introduce our case study and examine these methods in a real world situation. We show how these tools can be used and what the results are for each of them. Our aim is to conduct a comparison of these different probabilistic methods of estimating costs for port security risk assessment studies. Methodologically, we are trying to understand the limits of all the tools mentioned above by focusing on rare events.

preprint2013arXiv

Simulating the Dynamics of T Cell Subsets Throughout the Lifetime

It is widely accepted that the immune system undergoes age-related changes correlating with increased disease in the elderly. T cell subsets have been implicated. The aim of this work is firstly to implement and validate a simulation of T regulatory cell (Treg) dynamics throughout the lifetime, based on a model by Baltcheva. We show that our initial simulation produces an inversion between precursor and mature Treys at around 20 years of age, though the output differs significantly from the original laboratory dataset. Secondly, this report discusses development of the model to incorporate new data from a cross-sectional study of healthy blood donors addressing balance between Treys and Th17 cells with novel markers for Treg. The potential for simulation to add insight into immune aging is discussed.

preprint2013arXiv

Supervised Learning and Anti-learning of Colorectal Cancer Classes and Survival Rates from Cellular Biology Parameters

In this paper, we describe a dataset relating to cellular and physical conditions of patients who are operated upon to remove colorectal tumours. This data provides a unique insight into immunological status at the point of tumour removal, tumour classification and post-operative survival. Attempts are made to learn relationships between attributes (physical and immunological) and the resulting tumour stage and survival. Results for conventional machine learning approaches can be considered poor, especially for predicting tumour stages for the most important types of cancer. This poor performance is further investigated and compared with a synthetic, dataset based on the logical exclusive-OR function and it is shown that there is a significant level of 'anti-learning' present in all supervised methods used and this can be explained by the highly dimensional, complex and sparsely representative dataset. For predicting the stage of cancer from the immunological attributes, anti-learning approaches outperform a range of popular algorithms.

preprint2013arXiv

Systems Dynamics or Agent-Based Modelling for Immune Simulation?

In immune system simulation there are two competing simulation approaches: System Dynamics Simulation (SDS) and Agent-Based Simulation (ABS). In the literature there is little guidance on how to choose the best approach for a specific immune problem. Our overall research aim is to develop a framework that helps researchers with this choice. In this paper we investigate if it is possible to easily convert simulation models between approaches. With no explicit guidelines available from the literature we develop and test our own set of guidelines for converting SDS models into ABS models in a non-spacial scenario. We also define guidelines to convert ABS into SDS considering a non-spatial and a spatial scenario. After running some experiments with the developed models we found that in all cases there are significant differences between the results produced by the different simulation methods.

preprint2013arXiv

The Application of a Data Mining Framework to Energy Usage Profiling in Domestic Residences using UK data

This paper describes a method for defining representative load profiles for domestic electricity users in the UK. It considers bottom up and clustering methods and then details the research plans for implementing and improving existing framework approaches based on the overall usage profile. The work focuses on adapting and applying analysis framework approaches to UK energy data in order to determine the effectiveness of creating a few (single figures) archetypical users with the intention of improving on the current methods of determining usage profiles. The work is currently in progress and the paper details initial results using data collected in Milton Keynes around 1990. Various possible enhancements to the work are considered including a split based on temperature to reflect the varying UK weather conditions.

preprint2013arXiv

The Dendritic Cell Algorithm for Intrusion Detection

As one of the solutions to intrusion detection problems, Artificial Immune Systems (AIS) have shown their advantages. Unlike genetic algorithms, there is no one archetypal AIS, instead there are four major paradigms. Among them, the Dendritic Cell Algorithm (DCA) has produced promising results in various applications. The aim of this chapter is to demonstrate the potential for the DCA as a suitable candidate for intrusion detection problems. We review some of the commonly used AIS paradigms for intrusion detection problems and demonstrate the advantages of one particular algorithm, the DCA. In order to clearly describe the algorithm, the background to its development and a formal definition are given. In addition, improvements to the original DCA are presented and their implications are discussed, including previous work done on an online analysis component with segmentation and ongoing work on automated data preprocessing. Based on preliminary results, both improvements appear to be promising for online anomaly-based intrusion detection.

preprint2013arXiv

The effect of baroque music on the PassPoints graphical password

Graphical passwords have been demonstrated to be the possible alternatives to traditional alphanumeric passwords. However, they still tend to follow predictable patterns that are easier to attack. The crux of the problem is users' memory limitations. Users are the weakest link in password authentication mechanism. It shows that baroque music has positive effects on human memorizing and learning. We introduce baroque music to the PassPoints graphical password scheme and conduct a laboratory study in this paper. Results shown that there is no statistic difference between the music group and the control group without music in short-term recall experiments, both had high recall success rates. But in long-term recall, the music group performed significantly better. We also found that the music group tended to set significantly more complicated passwords, which are usually more resistant to dictionary and other guess attacks. But compared with the control group, the music group took more time to log in both in short-term and long-term tests. Besides, it appears that background music does not work in terms of hotspots.

preprint2013arXiv

Theoretical formulation and analysis of the deterministic dendritic cell algorithm

As one of the emerging algorithms in the field of Artificial Immune Systems (AIS), the Dendritic Cell Algorithm (DCA) has been successfully applied to a number of challenging real-world problems. However, one criticism is the lack of a formal definition, which could result in ambiguity for understanding the algorithm. Moreover, previous investigations have mainly focused on its empirical aspects. Therefore, it is necessary to provide a formal definition of the algorithm, as well as to perform runtime analyses to revealits theoretical aspects. In this paper, we define the deterministic version of the DCA, named the dDCA, using set theory and mathematical functions. Runtime analyses of the standard algorithm and the one with additional segmentation are performed. Our analysis suggests that the standard dDCA has a runtime complexity of O(n2) for the worst-case scenario, where n is the number of input data instances. The introduction of segmentation changes the algorithm's worst case runtime complexity to O(max(nN; nz)), for DC population size N with size of each segment z. Finally, two runtime variables of the algorithm are formulated based on the input data, to understand its runtime behaviour as guidelines for further development.

preprint2013arXiv

Towards a More Systematic Approach to Secure Systems Design and Analysis

The task of designing secure software systems is fraught with uncertainty, as data on uncommon attacks is limited, costs are difficult to estimate, and technology and tools are continually changing. Consequently, experts may interpret the security risks posed to a system in different ways, leading to variation in assessment. This paper presents research into measuring the variability in decision making between security professionals, with the ultimate goal of improving the quality of security advice given to software system designers. A set of thirty nine cyber-security experts took part in an exercise in which they independently assessed a realistic system scenario. This study quantifies agreement in the opinions of experts, examines methods of aggregating opinions, and produces an assessment of attacks from ratings of their components. We show that when aggregated, a coherent consensus view of security emerges which can be used to inform decisions made during systems design.

preprint2013arXiv

Towards modelling cost and risks of infrequent events in the cargo screening process

We introduce a simulation model of the port of Calais with a focus on the operation of immigration controls. Our aim is to compare the cost and benefits of different screening policies. Methodologically, we are trying to understand the limits of discrete event simulation of rare events. When will they become 'too rare' for simulation to give meaningful results?

preprint2013arXiv

Using Clustering to extract Personality Information from socio economic data

It has become apparent that models that have been applied widely in economics, including Machine Learning techniques and Data Mining methods, should take into consideration principles that derive from the theories of Personality Psychology in order to discover more comprehensive knowledge regarding complicated economic behaviours. In this work, we present a method to extract Behavioural Groups by using simple clustering techniques that can potentially reveal aspects of the Personalities for their members. We believe that this is very important because the psychological information regarding the Personalities of individuals is limited in real world applications and because it can become a useful tool in improving the traditional models of Knowledge Economy.

preprint2013arXiv

Validation of a Microsimulation of the Port of Dover

Modelling and simulating the traffic of heavily used but secure environments such as seaports and airports is of increasing importance. Errors made when simulating these environments can have long standing economic, social and environmental implications. This paper discusses issues and problems that may arise when designing a simulation strategy. Data for the Port is presented, methods for lightweight vehicle assessment that can be used to calibrate and validate simulations are also discussed along with a diagnosis of overcalibration issues. We show that decisions about where the intelligence lies in a system has important repercussions for the reliability of system statistics. Finally, conclusions are drawn about how microsimulations can be moved forward as a robust planning tool for the 21st century.

preprint2013arXiv

Variance in System Dynamics and Agent Based Modelling Using the SIR Model of Infectious Disease

Classical deterministic simulations of epidemiological processes, such as those based on System Dynamics, produce a single result based on a fixed set of input parameters with no variance between simulations. Input parameters are subsequently modified on these simulations using Monte-Carlo methods, to understand how changes in the input parameters affect the spread of results for the simulation. Agent Based simulations are able to produce different output results on each run based on knowledge of the local interactions of the underlying agents and without making any changes to the input parameters. In this paper we compare the influence and effect of variation within these two distinct simulation paradigms and show that the Agent Based simulation of the epidemiological SIR (Susceptible, Infectious, and Recovered) model is more effective at capturing the natural variation within SIR compared to an equivalent model using System Dynamics with Monte-Carlo simulation. To demonstrate this effect, the SIR model is implemented using both System Dynamics (with Monte-Carlo simulation) and Agent Based Modelling based on previously published empirical data.

preprint2013arXiv

Wavelet feature extraction and genetic algorithm for biomarker detection in colorectal cancer data

Biomarkers which predict patient's survival can play an important role in medical diagnosis and treatment. How to select the significant biomarkers from hundreds of protein markers is a key step in survival analysis. In this paper a novel method is proposed to detect the prognostic biomarkers of survival in colorectal cancer patients using wavelet analysis, genetic algorithm, and Bayes classifier. One dimensional discrete wavelet transform (DWT) is normally used to reduce the dimensionality of biomedical data. In this study one dimensional continuous wavelet transform (CWT) was proposed to extract the features of colorectal cancer data. One dimensional CWT has no ability to reduce dimensionality of data, but captures the missing features of DWT, and is complementary part of DWT. Genetic algorithm was performed on extracted wavelet coefficients to select the optimized features, using Bayes classifier to build its fitness function. The corresponding protein markers were located based on the position of optimized features. Kaplan-Meier curve and Cox regression model were used to evaluate the performance of selected biomarkers. Experiments were conducted on colorectal cancer dataset and several significant biomarkers were detected. A new protein biomarker CD46 was found to significantly associate with survival time.

preprint2011arXiv

A First Approach on Modelling Staff Proactiveness in Retail Simulation Models

There has been a noticeable shift in the relative composition of the industry in the developed countries in recent years; manufacturing is decreasing while the service sector is becoming more important. However, currently most simulation models for investigating service systems are still built in the same way as manufacturing simulation models, using a process-oriented world view, i.e. they model the flow of passive entities through a system. These kinds of models allow studying aspects of operational management but are not well suited for studying the dynamics that appear in service systems due to human behaviour. For these kinds of studies we require tools that allow modelling the system and entities using an object-oriented world view, where intelligent objects serve as abstract "actors" that are goal directed and can behave proactively. In our work we combine process-oriented discrete event simulation modelling and object-oriented agent based simulation modelling to investigate the impact of people management practices on retail productivity. In this paper, we reveal in a series of experiments what impact considering proactivity can have on the output accuracy of simulation models of human centric systems. The model and data we use for this investigation are based on a case study in a UK department store. We show that considering proactivity positively influences the validity of these kinds of models and therefore allows analysts to make better recommendations regarding strategies to apply people management practises.

preprint2011arXiv

Comparing System Dynamics and Agent-Based Simulation for Tumour Growth and its Interactions with Effector Cells

There is little research concerning comparisons and combination of System Dynamics Simulation (SDS) and Agent Based Simulation (ABS). ABS is a paradigm used in many levels of abstraction, including those levels covered by SDS. We believe that the establishment of frameworks for the choice between these two simulation approaches would contribute to the simulation research. Hence, our work aims for the establishment of directions for the choice between SDS and ABS approaches for immune system-related problems. Previously, we compared the use of ABS and SDS for modelling agents' behaviour in an environment with nomovement or interactions between these agents. We concluded that for these types of agents it is preferable to use SDS, as it takes up less computational resources and produces the same results as those obtained by the ABS model. In order to move this research forward, our next research question is: if we introduce interactions between these agents will SDS still be the most appropriate paradigm to be used? To answer this question for immune system simulation problems, we will use, as case studies, models involving interactions between tumour cells and immune effector cells. Experiments show that there are cases where SDS and ABS can not be used interchangeably, and therefore, their comparison is not straightforward.

preprint2010arXiv

Artificial Immune Systems (2010)

The human immune system has numerous properties that make it ripe for exploitation in the computational domain, such as robustness and fault tolerance, and many different algorithms, collectively termed Artificial Immune Systems (AIS), have been inspired by it. Two generations of AIS are currently in use, with the first generation relying on simplified immune models and the second generation utilising interdisciplinary collaboration to develop a deeper understanding of the immune system and hence produce more complex models. Both generations of algorithms have been successfully applied to a variety of problems, including anomaly detection, pattern recognition, optimisation and robotics. In this chapter an overview of AIS is presented, its evolution is discussed, and it is shown that the diversification of the field is linked to the diversity of the immune system itself, leading to a number of algorithms as opposed to one archetypal system. Two case studies are also presented to help provide insight into the mechanisms of AIS; these are the idiotypic network approach and the Dendritic Cell Algorithm.

preprint2010arXiv

Behavioural Correlation for Detecting P2P Bots

In the past few years, IRC bots, malicious programs which are remotely controlled by the attacker through IRC servers, have become a major threat to the Internet and users. These bots can be used in different malicious ways such as issuing distributed denial of services attacks to shutdown other networks and services, keystrokes logging, spamming, traffic sniffing cause serious disruption on networks and users. New bots use peer to peer (P2P) protocols start to appear as the upcoming threat to Internet security due to the fact that P2P bots do not have a centralized point to shutdown or traceback, thus making the detection of P2P bots is a real challenge. In response to these threats, we present an algorithm to detect an individual P2P bot running on a system by correlating its activities. Our evaluation shows that correlating different activities generated by P2P bots within a specified time period can detect these kind of bots.

preprint2010arXiv

Biological Inspiration for Artificial Immune Systems

Artificial immune systems (AISs) to date have generally been inspired by naive biological metaphors. This has limited the effectiveness of these systems. In this position paper two ways in which AISs could be made more biologically realistic are discussed. We propose that AISs should draw their inspiration from organisms which possess only innate immune systems, and that AISs should employ systemic models of the immune system to structure their overall design. An outline of plant and invertebrate immune systems is presented, and a number of contemporary research that more biologically-realistic AISs could have is also discussed.

preprint2010arXiv

Cheating for Problem Solving: A Genetic Algorithm with Social Interactions

We propose a variation of the standard genetic algorithm that incorporates social interaction between the individuals in the population. Our goal is to understand the evolutionary role of social systems and its possible application as a non-genetic new step in evolutionary algorithms. In biological populations, ie animals, even human beings and microorganisms, social interactions often affect the fitness of individuals. It is conceivable that the perturbation of the fitness via social interactions is an evolutionary strategy to avoid trapping into local optimum, thus avoiding a fast convergence of the population. We model the social interactions according to Game Theory. The population is, therefore, composed by cooperator and defector individuals whose interactions produce payoffs according to well known game models (prisoner's dilemma, chicken game, and others). Our results on Knapsack problems show, for some game models, a significant performance improvement as compared to a standard genetic algorithm.

preprint2010arXiv

Comparing Simulation Output Accuracy of Discrete Event and Agent Based Models: A Quantitive Approach

In our research we investigate the output accuracy of discrete event simulation models and agent based simulation models when studying human centric complex systems. In this paper we focus on human reactive behaviour as it is possible in both modelling approaches to implement human reactive behaviour in the model by using standard methods. As a case study we have chosen the retail sector, and here in particular the operations of the fitting room in the women wear department of a large UK department store. In our case study we looked at ways of determining the efficiency of implementing new management policies for the fitting room operation through modelling the reactive behaviour of staff and customers of the department. First, we have carried out a validation experiment in which we compared the results from our models to the performance of the real system. This experiment also allowed us to establish differences in output accuracy between the two modelling methids. In a second step a multi-scenario experiment was carried out to study the behaviour of the models when they are used for the purpose of operational improvement. Overall we have found that for our case study example both discrete event simulation and agent based simulation have the same potential to support the investigation into the efficiency of implementing new management policies.

preprint2010arXiv

Cooperative Automated Worm Response and Detection Immune Algorithm

The role of T-cells within the immune system is to confirm and assess anomalous situations and then either respond to or tolerate the source of the effect. To illustrate how these mechanisms can be harnessed to solve real-world problems, we present the blueprint of a T-cell inspired algorithm for computer security worm detection. We show how the three central T-cell processes, namely T-cell maturation, differentiation and proliferation, naturally map into this domain and further illustrate how such an algorithm fits into a complete immune inspired computer security system and framework.

preprint2010arXiv

DCA for Bot Detection

Ensuring the security of computers is a non-trivial task, with many techniques used by malicious users to compromise these systems. In recent years a new threat has emerged in the form of networks of hijacked zombie machines used to perform complex distributed attacks such as denial of service and to obtain sensitive data such as password information. These zombie machines are said to be infected with a 'bot' - a malicious piece of software which is installed on a host machine and is controlled by a remote attacker, termed the 'botmaster of a botnet'. In this work, we use the biologically inspired Dendritic Cell Algorithm (DCA) to detect the existence of a single bot on a compromised host machine. The DCA is an immune-inspired algorithm based on an abstract model of the behaviour of the dendritic cells of the human body. The basis of anomaly detection performed by the DCA is facilitated using the correlation of behavioural attributes such as keylogging and packet flooding behaviour. The results of the application of the DCA to the detection of a single bot show that the algorithm is a successful technique for the detection of such malicious software without responding to normally running programs.

preprint2010arXiv

Dendritic Cells for Anomaly Detection

Artificial immune systems, more specifically the negative selection algorithm, have previously been applied to intrusion detection. The aim of this research is to develop an intrusion detection system based on a novel concept in immunology, the Danger Theory. Dendritic Cells (DCs) are antigen presenting cells and key to the activation of the human signals from the host tissue and correlate these signals with proteins know as antigens. In algorithmic terms, individual DCs perform multi-sensor data fusion based on time-windows. The whole population of DCs asynchronously correlates the fused signals with a secondary data stream. The behaviour of human DCs is abstracted to form the DC Algorithm (DCA), which is implemented using an immune inspired framework, libtissue. This system is used to detect context switching for a basic machine learning dataset and to detect outgoing portscans in real-time. Experimental results show a significant difference between an outgoing portscan and normal traffic.

preprint2010arXiv

Dendritic Cells for Real-Time Anomaly Detection

Dendritic Cells (DCs) are innate immune system cells which have the power to activate or suppress the immune system. The behaviour of human of human DCs is abstracted to form an algorithm suitable for anomaly detection. We test this algorithm on the real-time problem of port scan detection. Our results show a significant difference in artificial DC behaviour for an outgoing portscan when compared to behaviour for normal processes.

preprint2010arXiv

Dendritic Cells for SYN Scan Detection

Artificial immune systems have previously been applied to the problem of intrusion detection. The aim of this research is to develop an intrusion detection system based on the function of Dendritic Cells (DCs). DCs are antigen presenting cells and key to activation of the human immune system, behaviour which has been abstracted to form the Dendritic Cell Algorithm (DCA). In algorithmic terms, individual DCs perform multi-sensor data fusion, asynchronously correlating the the fused data signals with a secondary data stream. Aggregate output of a population of cells, is analysed and forms the basis of an anomaly detection system. In this paper the DCA is applied to the detection of outgoing port scans using TCP SYN packets. Results show that detection can be achieved with the DCA, yet some false positives can be encountered when simultaneously scanning and using other network services. Suggestions are made for using adaptive signals to alleviate this uncovered problem.

preprint2010arXiv

Detecting Anomalous Process Behaviour using Second Generation Artificial Immune Systems

Artificial Immune Systems have been successfully applied to a number of problem domains including fault tolerance and data mining, but have been shown to scale poorly when applied to computer intrusion detec- tion despite the fact that the biological immune system is a very effective anomaly detector. This may be because AIS algorithms have previously been based on the adaptive immune system and biologically-naive mod- els. This paper focuses on describing and testing a more complex and biologically-authentic AIS model, inspired by the interactions between the innate and adaptive immune systems. Its performance on a realistic process anomaly detection problem is shown to be better than standard AIS methods (negative-selection), policy-based anomaly detection methods (systrace), and an alternative innate AIS approach (the DCA). In addition, it is shown that runtime information can be used in combination with system call information to enhance detection capability.

preprint2010arXiv

Detecting Botnets Through Log Correlation

Botnets, which consist of thousands of compromised machines, can cause significant threats to other systems by launching Distributed Denial of Service (SSoS) attacks, keylogging, and backdoors. In response to these threats, new effective techniques are needed to detect the presence of botnets. In this paper, we have used an interception technique to monitor Windows Application Programming Interface (API) functions calls made by communication applications and store these calls with their arguments in log files. Our algorithm detects botnets based on monitoring abnormal activity by correlating the changes in log file sizes from different hosts.

preprint2010arXiv

Detecting Bots Based on Keylogging Activities

A bot is a piece of software that is usually installed on an infected machine without the user's knowledge. A bot is controlled remotely by the attacker under a Command and Control structure. Recent statistics show that bots represent one of the fastest growing threats to our network by performing malicious activities such as email spamming or keylogging. However, few bot detection techniques have been developed to date. In this paper, we investigate a behavioural algorithm to detect a single bot that uses keylogging activity. Our approach involves the use of function calls analysis for the detection of the bot with a keylogging component. Correlation of the frequency of a specified time-window is performed to enhance he detection scheme. We perform a range of experiments with the spybot. Our results show that there is a high correlation between some function calls executed by this bot which indicates abnormal activity in our system.

preprint2010arXiv

Detecting Danger: Applying a Novel Immunological Concept to Intrusion Detection Systems

In recent years computer systems have become increasingly complex and consequently the challenge of protecting these systems has become increasingly difficult. Various techniques have been implemented to counteract the misuse of computer systems in the form of firewalls, anti-virus software and intrusion detection systems. The complexity of networks and dynamic nature of computer systems leaves current methods with significant room for improvement. Computer scientists have recently drawn inspiration from mechanisms found in biological systems and, in the context of computer security, have focused on the human immune system (HIS). The human immune system provides a high level of protection from constant attacks. By examining the precise mechanisms of the human immune system, it is hoped the paradigm will improve the performance of real intrusion detection systems. This paper presents an introduction to recent developments in the field of immunology. It discusses the incorporation of a novel immunological paradigm, Danger Theory, and how this concept is inspiring artificial immune systems (AIS). Applications within the context of computer security are outlined drawing direct reference to the underlying principles of Danger Theory and finally, the current state of intrusion detection systems is discussed and improvements suggested.

preprint2010arXiv

Detecting Danger: The Dendritic Cell Algorithm

The Dendritic Cell Algorithm (DCA) is inspired by the function of the dendritic cells of the human immune system. In nature, dendritic cells are the intrusion detection agents of the human body, policing the tissue and organs for potential invaders in the form of pathogens. In this research, and abstract model of DC behaviour is developed and subsequently used to form an algorithm, the DCA. The abstraction process was facilitated through close collaboration with laboratory- based immunologists, who performed bespoke experiments, the results of which are used as an integral part of this algorithm. The DCA is a population based algorithm, with each agent in the system represented as an 'artificial DC'. Each DC has the ability to combine multiple data streams and can add context to data suspected as anomalous. In this chapter the abstraction process and details of the resultant algorithm are given. The algorithm is applied to numerous intrusion detection problems in computer security including the detection of port scans and botnets, where it has produced impressive results with relatively low rates of false positives.

preprint2010arXiv

Detecting Motifs in System Call Sequences

The search for patterns or motifs in data represents an area of key interest to many researchers. In this paper we present the Motif Tracking Algorithm, a novel immune inspired pattern identification tool that is able to identify unknown motifs which repeat within time series data. The power of the algorithm is derived from its use of a small number of parameters with minimal assumptions. The algorithm searches from a completely neutral perspective that is independent of the data being analysed, and the underlying motifs. In this paper the motif tracking algorithm is applied to the search for patterns within sequences of low level system calls between the Linux kernel and the operating system's user space. The MTA is able to compress data found in large system call data sets to a limited number of motifs which summarise that data. The motifs provide a resource from which a profile of executed processes can be built. The potential for these profiles and new implications for security research are highlighted. A higher level call system language for measuring similarity between patterns of such calls is also suggested.

preprint2010arXiv

Development of a Cargo Screening Process Simulator: A First Approach

The efficiency of current cargo screening processes at sea and air ports is largely unknown as few benchmarks exists against which they could be measured. Some manufacturers provide benchmarks for individual sensors but we found no benchmarks that take a holistic view of the overall screening procedures and no benchmarks that take operator variability into account. Just adding up resources and manpower used is not an effective way for assessing systems where human decision-making and operator compliance to rules play a vital role. Our aim is to develop a decision support tool (cargo-screening system simulator) that will map the right technology and manpower to the right commodity-threat combination in order to maximise detection rates. In this paper we present our ideas for developing such a system and highlight the research challenges we have identified. Then we introduce our first case study and report on the progress we have made so far.

preprint2010arXiv

Experimenting with Innate Immunity

In a previous paper the authors argued the case for incorporating ideas from innate immunity into artificial immune systems (AISs) and presented an outline for a conceptual framework for such systems. A number of key general properties observed in the biological innate and adaptive immune systems were highlighted, and how such properties might be instantiated in artificial systems was discussed in detail. The next logical step is to take these ideas and build a software system with which AISs with these properties can be implemented and experimentally evaluated. This paper reports on the results of that step - the libtissue system.

preprint2010arXiv

Exploration Of The Dendritic Cell Algorithm Using The Duration Calculus

As one of the newest members in Artificial Immune Systems (AIS), the Dendritic Cell Algorithm (DCA) has been applied to a range of problems. These applications mainly belong to the field of anomaly detection. However, real-time detection, a new challenge to anomaly detection, requires improvement on the real-time capability of the DCA. To assess such capability, formal methods in the research of rea-time systems can be employed. The findings of the assessment can provide guideline for the future development of the algorithm. Therefore, in this paper we use an interval logic based method, named the Duration Calculus (DC), to specify a simplified single-cell model of the DCA. Based on the DC specifications with further induction, we find that each individual cell in the DCA can perform its function as a detector in real-time. Since the DCA can be seen as many such cells operating in parallel, it is potentially capable of performing real-time detection. However, the analysis process of the standard DCA constricts its real-time capability. As a result, we conclude that the analysis process of the standard DCA should be replaced by a real-time analysis component, which can perform periodic analysis for the purpose of real-time detection.

preprint2010arXiv

Further Exploration of the Dendritic Cell Algorithm: Antigen Multiplier and Time Windows

As an immune-inspired algorithm, the Dendritic Cell Algorithm (DCA), produces promising performances in the field of anomaly detection. This paper presents the application of the DCA to a standard data set, the KDD 99 data set. The results of different implementation versions of the DXA, including the antigen multiplier and moving time windows are reported. The real-valued Negative Selection Algorithm (NSA) using constant-sized detectors and the C4.5 decision tree algorithm are used, to conduct a baseline comparison. The results suggest that the DCA is applicable to KDD 99 data set, and the antigen multiplier and moving time windows have the same effect on the DCA for this particular data set. The real-valued NSA with constant-sized detectors is not applicable to the data set, and the C4.5 decision tree algorithm provides a benchmark of the classification performance for this data set.

preprint2010arXiv

Genetic Algorithms for Multiple-Choice Problems

This thesis investigates the use of problem-specific knowledge to enhance a genetic algorithm approach to multiple-choice optimisation problems.It shows that such information can significantly enhance performance, but that the choice of information and the way it is included are important factors for success.Two multiple-choice problems are considered.The first is constructing a feasible nurse roster that considers as many requests as possible.In the second problem, shops are allocated to locations in a mall subject to constraints and maximising the overall income.Genetic algorithms are chosen for their well-known robustness and ability to solve large and complex discrete optimisation problems.However, a survey of the literature reveals room for further research into generic ways to include constraints into a genetic algorithm framework.Hence, the main theme of this work is to balance feasibility and cost of solutions.In particular, co-operative co-evolution with hierarchical sub-populations, problem structure exploiting repair schemes and indirect genetic algorithms with self-adjusting decoder functions are identified as promising approaches.The research starts by applying standard genetic algorithms to the problems and explaining the failure of such approaches due to epistasis.To overcome this, problem-specific information is added in a variety of ways, some of which are designed to increase the number of feasible solutions found whilst others are intended to improve the quality of such solutions.As well as a theoretical discussion as to the underlying reasons for using each operator,extensive computational experiments are carried out on a variety of data.These show that the indirect approach relies less on problem structure and hence is easier to implement and superior in solution quality.

preprint2010arXiv

Information Fusion for Anomaly Detection with the Dendritic Cell Algorithm

Dendritic cells are antigen presenting cells that provide a vital link between the innate and adaptive immune system, providing the initial detection of pathogenic invaders. Research into this family of cells has revealed that they perform information fusion which directs immune responses. We have derived a Dendritic Cell Algorithm based on the functionality of these cells, by modelling the biological signals and differentiation pathways to build a control mechanism for an artificial immune system. We present algorithmic details in addition to experimental results, when the algorithm was applied to anomaly detection for the detection of port scans. The results show the Dendritic Cell Algorithm is sucessful at detecting port scans.

preprint2010arXiv

Information Fusion in the Immune System

Biologically-inspired methods such as evolutionary algorithms and neural networks are proving useful in the field of information fusion. Artificial Immune Systems (AISs) are a biologically-inspired approach which take inspiration from the biological immune system. Interestingly, recent research has show how AISs which use multi-level information sources as input data can be used to build effective algorithms for real time computer intrusion detection. This research is based on biological information fusion mechanisms used by the human immune system and as such might be of interest to the information fusion community. The aim of this paper is to present a summary of some of the biological information fusion mechanisms seen in the human immune system, and of how these mechanisms have been implemented as AISs

preprint2010arXiv

Integrating Innate and Adaptive Immunity for Intrusion Detection

Network Intrusion Detection Systems (NDIS) monitor a network with the aim of discerning malicious from benign activity on that network. While a wide range of approaches have met varying levels of success, most IDS's rely on having access to a database of known attack signatures which are written by security experts. Nowadays, in order to solve problems with false positive alters, correlation algorithms are used to add additional structure to sequences of IDS alerts. However, such techniques are of no help in discovering novel attacks or variations of known attacks, something the human immune system (HIS) is capable of doing in its own specialised domain. This paper presents a novel immune algorithm for application to an intrusion detection problem. The goal is to discover packets containing novel variations of attacks covered by an existing signature base.

preprint2010arXiv

Integrating Real-Time Analysis With The Dendritic Cell Algorithm Through Segmentation

As an immune inspired algorithm, the Dendritic Cell Algorithm (DCA) has been applied to a range of problems, particularly in the area of intrusion detection. Ideally, the intrusion detection should be performed in real-time, to continuously detect misuses as soon as they occur. Consequently, the analysis process performed by an intrusion detection system must operate in real-time or near-to real-time. The analysis process of the DCA is currently performed offline, therefore to improve the algorithm's performance we suggest the development of a real-time analysis component. The initial step of the development is to apply segmentation to the DCA. This involves segmenting the current output of the DCA into slices and performing the analysis in various ways. Two segmentation approaches are introduced and tested in this paper, namely antigen based segmentation (ABS) and time based segmentation (TBS). The results of the corresponding experiments suggest that applying segmentation produces different and significantly better results in some cases, when compared to the standard DCA without segmentation. Therefore, we conclude that the segmentation is applicable to the DCA for the purpose of real-time analysis.

preprint2010arXiv

Introducing Dendritic Cells as a Novel Immune-Inspired Algorithm for Anomoly Detection

Dendritic cells are antigen presenting cells that provide a vital link between the innate and adaptive immune system. Research into this family of cells has revealed that they perform the role of coordinating T-cell based immune responses, both reactive and for generating tolerance. We have derived an algorithm based on the functionality of these cells, and have used the signals and differentiation pathways to build a control mechanism for an artificial immune system. We present our algorithmic details in addition to some preliminary results, where the algorithm was applied for the purpose of anomaly detection. We hope that this algorithm will eventually become the key component within a large, distributed immune system, based on sound immunological concepts.

preprint2010arXiv

Investigating Output Accuracy for a Discrete Event Simulation Model and an Agent Based Simulation Model

In this paper, we investigate output accuracy for a Discrete Event Simulation (DES) model and Agent Based Simulation (ABS) model. The purpose of this investigation is to find out which of these simulation techniques is the best one for modelling human reactive behaviour in the retail sector. In order to study the output accuracy in both models, we have carried out a validation experiment in which we compared the results from our simulation models to the performance of a real system. Our experiment was carried out using a large UK department store as a case study. We had to determine an efficient implementation of management policy in the store's fitting room using DES and ABS. Overall, we have found that both simulation models were a good representation of the real system when modelling human reactive behaviour.

preprint2010arXiv

libtissue - implementing innate immunity

In a previous paper the authors argued the case for incorporating ideas from innate immunity into articficial immune systems (AISs) and presented an outline for a conceptual framework for such systems. A number of key general properties observed in the biological innate and adaptive immune systems were hughlighted, and how such properties might be instantiated in artificial systems was discussed in detail. The next logical step is to take these ideas and build a software system with which AISs with these properties can be implemented and experimentally evaluated. This paper reports on the results of that step - the libtissue system.

preprint2010arXiv

Malicious Code Execution Detection and Response Immune System inspired by the Danger Theory

The analysis of system calls is one method employed by anomaly detection systems to recognise malicious code execution. Similarities can be drawn between this process and the behaviour of certain cells belonging to the human immune system, and can be applied to construct an artificial immune system. A recently developed hypothesis in immunology, the Danger Theory, states that our immune system responds to the presence of intruders through sensing molecules belonging to those invaders, plus signals generated by the host indicating danger and damage. We propose the incorporation of this concept into a responsive intrusion detection system, where behavioural information of the system and running processes is combined with information regarding individual system calls.

preprint2010arXiv

Mimicking the Behaviour of Idiotypic AIS Robot Controllers Using Probabilistic Systems

Previous work has shown that robot navigation systems that employ an architecture based upon the idiotypic network theory of the immune system have an advantage over control techniques that rely on reinforcement learning only. This is thought to be a result of intelligent behaviour selection on the part of the idiotypic robot. In this paper an attempt is made to imitate idiotypic dynamics by creating controllers that use reinforcement with a number of different probabilistic schemes to select robot behaviour. The aims are to show that the idiotypic system is not merely performing some kind of periodic random behaviour selection, and to try to gain further insight into the processes that govern the idiotypic mechanism. Trials are carried out using simulated Pioneer robots that undertake navigation exercises. Results show that a scheme that boosts the probability of selecting highly-ranked alternative behaviours to 50% during stall conditions comes closest to achieving the properties of the idiotypic system, but remains unable to match it in terms of all round performance.

preprint2010arXiv

Modelling and simulating retail management practices: a first approach

Multi-agent systems offer a new and exciting way of understanding the world of work. We apply agent-based modeling and simulation to investigate a set of problems in a retail context. Specifically, we are working to understand the relationship between people management practices on the shop-floor and retail performance. Despite the fact we are working within a relatively novel and complex domain, it is clear that using an agent-based approach offers great potential for improving organizational capabilities in the future. Our multi-disciplinary research team has worked closely with one of the UK's top ten retailers to collect data and build an understanding of shop-floor operations and the key actors in a department (customers, staff, and managers). Based on this case study we have built and tested our first version of a retail branch agent-based simulation model where we have focused on how we can simulate the effects of people management practices on customer satisfaction and sales. In our experiments we have looked at employee development and cashier empowerment as two examples of shop floor management practices. In this paper we describe the underlying conceptual ideas and the features of our simulation model. We present a selection of experiments we have conducted in order to validate our simulation model and to show its potential for answering "what-if" questions in a retail context. We also introduce a novel performance measure which we have created to quantify customers' satisfaction with service, based on their individual shopping experiences.

preprint2010arXiv

Modelling Immunological Memory

Accurate immunological models offer the possibility of performing highthroughput experiments in silico that can predict, or at least suggest, in vivo phenomena. In this chapter, we compare various models of immunological memory. We first validate an experimental immunological simulator, developed by the authors, by simulating several theories of immunological memory with known results. We then use the same system to evaluate the predicted effects of a theory of immunological memory. The resulting model has not been explored before in artificial immune systems research, and we compare the simulated in silico output with in vivo measurements. Although the theory appears valid, we suggest that there are a common set of reasons why immunological memory models are a useful support tool; not conclusive in themselves.

preprint2010arXiv

Modelling Reactive and Proactive Behaviour in Simulation

This research investigated the simulation model behaviour of a traditional and combined discrete event as well as agent based simulation models when modelling human reactive and proactive behaviour in human centric complex systems. A departmental store was chosen as human centric complex case study where the operation system of a fitting room in WomensWear department was investigated. We have looked at ways to determine the efficiency of new management policies for the fitting room operation through simulating the reactive and proactive behaviour of staff towards customers. Once development of the simulation models and their verification had been done, we carried out a validation experiment in the form of a sensitivity analysis. Subsequently, we executed a statistical analysis where the mixed reactive and proactive behaviour experimental results were compared with some reactive experimental results from previously published works. Generally, this case study discovered that simple proactive individual behaviour could be modelled in both simulation models. In addition, we found the traditional discrete event model performed similar in the simulation model output compared to the combined discrete event and agent based simulation when modelling similar human behaviour.

preprint2010arXiv

Motif Detection Inspired by Immune Memory

preprint2010arXiv

Multi-Agent Simulation and Management Practices

Intelligent agents offer a new and exciting way of understanding the world of work. Agent-Based Simulation (ABS), one way of using intelligent agents, carries great potential for progressing our understanding of management practices and how they link to retail performance. We have developed simulation models based on research by a multi-disciplinary team of economists, work psychologists and computer scientists. We will discuss our experiences of implementing these concepts working with a well-known retail department store. There is no doubt that management practices are linked to the performance of an organisation (Reynolds et al., 2005; Wall & Wood, 2005). Best practices have been developed, but when it comes down to the actual application of these guidelines considerable ambiguity remains regarding their effectiveness within particular contexts (Siebers et al., forthcoming a). Most Operational Research (OR) methods can only be used as analysis tools once management practices have been implemented. Often they are not very useful for giving answers to speculative 'what-if' questions, particularly when one is interested in the development of the system over time rather than just the state of the system at a certain point in time. Simulation can be used to analyse the operation of dynamic and stochastic systems. ABS is particularly useful when complex interactions between system entities exist, such as autonomous decision making or negotiation. In an ABS model the researcher explicitly describes the decision process of simulated actors at the micro level. Structures emerge at the macro level as a result of the actions of the agents and their interactions with other agents and the environment. 3 We will show how ABS experiments can deal with testing and optimising management practices such as training, empowerment or teamwork. Hence, questions such as "will staff setting their own break times improve performance?" can be investigated.

preprint2010arXiv

Nurse Rostering with Genetic Algorithms

In recent years genetic algorithms have emerged as a useful tool for the heuristic solution of complex discrete optimisation problems. In particular there has been considerable interest in their use in tackling problems arising in the areas of scheduling and timetabling. However, the classical genetic algorithm paradigm is not well equipped to handle constraints and successful implementations usually require some sort of modification to enable the search to exploit problem specific knowledge in order to overcome this shortcoming. This paper is concerned with the development of a family of genetic algorithms for the solution of a nurse rostering problem at a major UK hospital. The hospital is made up of wards of up to 30 nurses. Each ward has its own group of nurses whose shifts have to be scheduled on a weekly basis. In addition to fulfilling the minimum demand for staff over three daily shifts, nurses' wishes and qualifications have to be taken into account. The schedules must also be seen to be fair, in that unpopular shifts have to be spread evenly amongst all nurses, and other restrictions, such as team nursing and special conditions for senior staff, have to be satisfied. The basis of the family of genetic algorithms is a classical genetic algorithm consisting of n-point crossover, single-bit mutation and a rank-based selection. The solution space consists of all schedules in which each nurse works the required number of shifts, but the remaining constraints, both hard and soft, are relaxed and penalised in the fitness function. The talk will start with a detailed description of the problem and the initial implementation and will go on to highlight the shortcomings of such an approach, in terms of the key element of balancing feasibility, i.e. covering the demand and work regulations, and quality, as measured by the nurses' preferences. A series of experiments involving parameter adaptation, niching, intelligent weights, delta coding, local hill climbing, migration and special selection rules will then be outlined and it will be shown how a series of these enhancements were able to eradicate these difficulties. Results based on several months' real data will be used to measure the impact of each modification, and to show that the final algorithm is able to compete with a tabu search approach currently employed at the hospital. The talk will conclude with some observations as to the overall quality of this approach to this and similar problems.

preprint2010arXiv

Oil Price Trackers Inspired by Immune Memory

We outline initial concepts for an immune inspired algorithm to evaluate and predict oil price time series data. The proposed solution evolves a short term pool of trackers dynamically, with each member attempting to map trends and anticipate future price movements. Successful trackers feed into a long term memory pool that can generalise across repeating trend patterns. The resulting sequence of trackers, ordered in time, can be used as a forecasting tool. Examination of the pool of evolving trackers also provides valuable insight into the properties of the crude oil market.

preprint2010arXiv

Optimisation of a Crossdocking Distribution Centre Simulation Model

This paper reports on continuing research into the modelling of an order picking process within a Crossdocking distribution centre using Simulation Optimisation. The aim of this project is to optimise a discrete event simulation model and to understand factors that affect finding its optimal performance. Our initial investigation revealed that the precision of the selected simulation output performance measure and the number of replications required for the evaluation of the optimisation objective function through simulation influences the ability of the optimisation technique. We experimented with Common Random Numbers, in order to improve the precision of our simulation output performance measure, and intended to use the number of replications utilised for this purpose as the initial number of replications for the optimisation of our Crossdocking distribution centre simulation model. Our results demonstrate that we can improve the precision of our selected simulation output performance measure value using Common Random Numbers at various levels of replications. Furthermore, after optimising our Crossdocking distribution centre simulation model, we are able to achieve optimal performance using fewer simulations runs for the simulation model which uses Common Random Numbers as compared to the simulation model which does not use Common Random Numbers.

preprint2010arXiv

Parcellation of fMRI Datasets with ICA and PLS-A Data Driven Approach

Inter-subject parcellation of functional Magnetic Resonance Imaging (fMRI) data based on a standard General Linear Model (GLM)and spectral clustering was recently proposed as a means to alleviate the issues associated with spatial normalization in fMRI. However, for all its appeal, a GLM-based parcellation approach introduces its own biases, in the form of a priori knowledge about the shape of Hemodynamic Response Function (HRF) and task-related signal changes, or about the subject behaviour during the task. In this paper, we introduce a data-driven version of the spectral clustering parcellation, based on Independent Component Analysis (ICA) and Partial Least Squares (PLS) instead of the GLM. First, a number of independent components are automatically selected. Seed voxels are then obtained from the associated ICA maps and we compute the PLS latent variables between the fMRI signal of the seed voxels (which covers regional variations of the HRF) and the principal components of the signal across all voxels. Finally, we parcellate all subjects data with a spectral clustering of the PLS latent variables. We present results of the application of the proposed method on both single-subject and multi-subject fMRI datasets. Preliminary experimental results, evaluated with intra-parcel variance of GLM t-values and PLS derived t-values, indicate that this data-driven approach offers improvement in terms of parcellation accuracy over GLM based techniques.

preprint2010arXiv

PCA 4 DCA: The Application Of Principal Component Analysis To The Dendritic Cell Algorithm

As one of the newest members in the field of artificial immune systems (AIS), the Dendritic Cell Algorithm (DCA) is based on behavioural models of natural dendritic cells (DCs). Unlike other AIS, the DCA does not rely on training data, instead domain or expert knowledge is required to predetermine the mapping between input signals from a particular instance to the three categories used by the DCA. This data preprocessing phase has received the criticism of having manually over-?tted the data to the algorithm, which is undesirable. Therefore, in this paper we have attempted to ascertain if it is possible to use principal component analysis (PCA) techniques to automatically categorise input data while still generating useful and accurate classication results. The integrated system is tested with a biometrics dataset for the stress recognition of automobile drivers. The experimental results have shown the application of PCA to the DCA for the purpose of automated data preprocessing is successful.

preprint2010arXiv

Performance Evaluation of DCA and SRC on a Single Bot Detection

Malicious users try to compromise systems using new techniques. One of the recent techniques used by the attacker is to perform complex distributed attacks such as denial of service and to obtain sensitive data such as password information. These compromised machines are said to be infected with malicious software termed a "bot". In this paper, we investigate the correlation of behavioural attributes such as keylogging and packet flooding behaviour to detect the existence of a single bot on a compromised machine by applying (1) Spearman's rank correlation (SRC) algorithm and (2) the Dendritic Cell Algorithm (DCA). We also compare the output results generated from these two methods to the detection of a single bot. The results show that the DCA has a better performance in detecting malicious activities.

preprint2010arXiv

Price Trackers Inspired by Immune Memory

In this paper we outline initial concepts for an immune inspired algorithm to evaluate price time series data. The proposed solution evolves a short term pool of trackers dynamically through a process of proliferation and mutation, with each member attempting to map to trends in price movements. Successful trackers feed into a long term memory pool that can generalise across repeating trend patterns. Tests are performed to examine the algorithm's ability to successfully identify trends in a small data set. The influence of the long term memory pool is then examined. We find the algorithm is able to identify price trends presented successfully and efficiently.

preprint2010arXiv

Real-Time Alert Correlation with Type Graphs

The premise of automated alert correlation is to accept that false alerts from a low level intrusion detection system are inevitable and use attack models to explain the output in an understandable way. Several algorithms exist for this purpose which use attack graphs to model the ways in which attacks can be combined. These algorithms can be classified in to two broad categories namely scenario-graph approaches, which create an attack model starting from a vulnerability assessment and type-graph approaches which rely on an abstract model of the relations between attack types. Some research in to improving the efficiency of type-graph correlation has been carried out but this research has ignored the hypothesizing of missing alerts. Our work is to present a novel type-graph algorithm which unifies correlation and hypothesizing in to a single operation. Our experimental results indicate that the approach is extremely efficient in the face of intensive alerts and produces compact output graphs comparable to other techniques.

preprint2010arXiv

Simulating Customer Experience and Word Of Mouth in Retail - A Case Study

Agents offer a new and exciting way of understanding the world of work. In this paper we describe the development of agent-based simulation models, designed to help to understand the relationship between people management practices and retail performance. We report on the current development of our simulation models which includes new features concerning the evolution of customers over time. To test the features we have conducted a series of experiments dealing with customer pool sizes, standard and noise reduction modes, and the spread of customers' word of mouth. To validate and evaluate our model, we introduce new performance measure specific to retail operations. We show that by varying different parameters in our model we can simulate a range of customer experiences leading to significant differences in performance measures. Ultimately, we are interested in better understanding the impact of changes in staff behavior due to changes in store management practices. Our multi-disciplinary research team draws upon expertise from work psychologists and computer scientists. Despite the fact we are working within a relatively novel and complex domain, it is clear that intelligent agents offer potential for fostering sustainable organizational capabilities in the future.

preprint2010arXiv

STORM - A Novel Information Fusion and Cluster Interpretation Technique

Analysis of data without labels is commonly subject to scrutiny by unsupervised machine learning techniques. Such techniques provide more meaningful representations, useful for better understanding of a problem at hand, than by looking only at the data itself. Although abundant expert knowledge exists in many areas where unlabelled data is examined, such knowledge is rarely incorporated into automatic analysis. Incorporation of expert knowledge is frequently a matter of combining multiple data sources from disparate hypothetical spaces. In cases where such spaces belong to different data types, this task becomes even more challenging. In this paper we present a novel immune-inspired method that enables the fusion of such disparate types of data for a specific set of problems. We show that our method provides a better visual understanding of one hypothetical space with the help of data from another hypothetical space. We believe that our model has implications for the field of exploratory data analysis and knowledge discovery.

preprint2010arXiv

System Dynamics Modelling of the Processes Involving the Maintenance of the Naive T Cell Repertoire

The study of immune system aging, i.e. immunosenescence, is a relatively new research topic. It deals with understanding the processes of immunodegradation that indicate signs of functionality loss possibly leading to death. Even though it is not possible to prevent immunosenescence, there is great benefit in comprehending its causes, which may help to reverse some of the damage done and thus improve life expectancy. One of the main factors influencing the process of immunosenescence is the number and phenotypical variety of naive T cells in an individual. This work presents a review of immunosenescence, proposes system dynamics modelling of the processes involving the maintenance of the naive T cell repertoire and presents some preliminary results.

preprint2010arXiv

Tailored RF pulse optimization for magnetization inversion at ultra high field

The radiofrequency (RF) transmit field is severely inhomogeneous at ultrahigh field due to both RF penetration and RF coil design issues. This particularly impairs image quality for sequences that use inversion pulses such as magnetization prepared rapid acquisition gradient echo and limits the use of quantitative arterial spin labeling sequences such as flow-attenuated inversion recovery. Here we have used a search algorithm to produce inversion pulses tailored to take into account the heterogeneity of the RF transmit field at 7 T. This created a slice selective inversion pulse that worked well (good slice profile and uniform inversion) over the range of RF amplitudes typically obtained in the head at 7 T while still maintaining an experimentally achievable pulse length and pulse amplitude in the brain at 7 T. The pulses used were based on the frequency offset correction inversion technique, as well as time dilation of functions, but the RF amplitude, frequency sweep, and gradient functions were all generated using a genetic algorithm with an evaluation function that took into account both the desired inversion profile and the transmit field inhomogeneity.

preprint2010arXiv

The Application of a Dendritic Cell Algorithm to a Robotic Classifier

The dendritic cell algorithm is an immune-inspired technique for processing time-dependant data. Here we propose it as a possible solution for a robotic classification problem. The dendritic cell algorithm is implemented on a real robot and an investigation is performed into the effects of varying the migration threshold median for the cell population. The algorithm performs well on a classification task with very little tuning. Ways of extending the implementation to allow it to be used as a classifier within the field of robotic security are suggested.

preprint2010arXiv

The DCA:SOMe Comparison A comparative study between two biologically-inspired algorithms

The Dendritic Cell Algorithm (DCA) is an immune-inspired algorithm, developed for the purpose of anomaly detection. The algorithm performs multi-sensor data fusion and correlation which results in a 'context aware' detection system. Previous applications of the DCA have included the detection of potentially malicious port scanning activity, where it has produced high rates of true positives and low rates of false positives. In this work we aim to compare the performance of the DCA and of a Self-Organizing Map (SOM) when applied to the detection of SYN port scans, through experimental analysis. A SOM is an ideal candidate for comparison as it shares similarities with the DCA in terms of the data fusion method employed. It is shown that the results of the two systems are comparable, and both produce false positives for the same processes. This shows that the DCA can produce anomaly detection results to the same standard as an established technique.

preprint2010arXiv

The Deterministic Dendritic Cell Algorithm

The Dendritic Cell Algorithm is an immune-inspired algorithm orig- inally based on the function of natural dendritic cells. The original instantiation of the algorithm is a highly stochastic algorithm. While the performance of the algorithm is good when applied to large real-time datasets, it is difficult to anal- yse due to the number of random-based elements. In this paper a deterministic version of the algorithm is proposed, implemented and tested using a port scan dataset to provide a controllable system. This version consists of a controllable amount of parameters, which are experimented with in this paper. In addition the effects are examined of the use of time windows and variation on the number of cells, both which are shown to influence the algorithm. Finally a novel metric for the assessment of the algorithms output is introduced and proves to be a more sensitive metric than the metric used with the original Dendritic Cell Algorithm.

preprint2010arXiv

The Motif Tracking Algorithm

The search for patterns or motifs in data represents a problem area of key interest to finance and economic researchers. In this paper we introduce the Motif Tracking Algorithm, a novel immune inspired pattern identification tool that is able to identify unknown motifs of a non specified length which repeat within time series data. The power of the algorithm comes from the fact that it uses a small number of parameters with minimal assumptions regarding the data being examined or the underlying motifs. Our interest lies in applying the algorithm to financial time series data to identify unknown patterns that exist. The algorithm is tested using three separate data sets. Particular suitability to financial data is shown by applying it to oil price data. In all cases the algorithm identifies the presence of a motif population in a fast and efficient manner due to the utilisation of an intuitive symbolic representation. The resulting population of motifs is shown to have considerable potential value for other applications such as forecasting and algorithm seeding.

preprint2010arXiv

The Transfer of Evolved Artificial Immune System Behaviours between Small and Large Scale Robotic Platforms

This paper demonstrates that a set of behaviours evolved in simulation on a miniature robot (epuck) can be transferred to a much larger scale platform (a virtual Pioneer P3-DX) that also differs in shape, sensor type, sensor configuration and programming interface. The chosen architecture uses a reinforcement learning-assisted genetic algorithm to evolve the epuck behaviours, which are encoded as a genetic sequence. This sequence is then used by the Pioneers as part of an adaptive, idiotypic artificial immune system (AIS) control architecture. Testing in three different simulated worlds shows that the Pioneer can use these behaviours to navigate and solve object-tracking tasks successfully, as long as its adaptive AIS mechanism is in place.

preprint2010arXiv

The Use of Probabilistic Systems to Mimic the Behaviour of Idiotypic AIS Robot Controllers

preprint2010arXiv

ToLeRating UR-STD

A new emerging paradigm of Uncertain Risk of Suspicion, Threat and Danger, observed across the field of information security, is described. Based on this paradigm a novel approach to anomaly detection is presented. Our approach is based on a simple yet powerful analogy from the innate part of the human immune system, the Toll-Like Receptors. We argue that such receptors incorporated as part of an anomaly detector enhance the detector's ability to distinguish normal and anomalous behaviour. In addition we propose that Toll-Like Receptors enable the classification of detected anomalies based on the types of attacks that perpetrate the anomalous behaviour. Classification of such type is either missing in existing literature or is not fit for the purpose of reducing the burden of an administrator of an intrusion detection system. For our model to work, we propose the creation of a taxonomy of the digital Acytota, based on which our receptors are created.

preprint2010arXiv

Towards a Conceptual Framework for Innate Immunity

Innate immunity now occupies a central role in immunology. However, artificial immune system models have largely been inspired by adaptive not innate immunity. This paper reviews the biological principles and properties of innate immunity and, adopting a conceptual framework, asks how these can be incorporated into artificial models. The aim is to outline a meta-framework for models of innate immunity.

preprint2010arXiv

Towards the Development of a Simulator for Investigating the Impact of People Management Practices on Retail Performance

Often models for understanding the impact of management practices on retail performance are developed under the assumption of stability, equilibrium and linearity, whereas retail operations are considered in reality to be dynamic, non-linear and complex. Alternatively, discrete event and agent-based modelling are approaches that allow the development of simulation models of heterogeneous non-equilibrium systems for testing out different scenarios. When developing simulation models one has to abstract and simplify from the real world, which means that one has to try and capture the 'essence' of the system required for developing a representation of the mechanisms that drive the progression in the real system. Simulation models can be developed at different levels of abstraction. To know the appropriate level of abstraction for a specific application is often more of an art than a science. We have developed a retail branch simulation model to investigate which level of model accuracy is required for such a model to obtain meaningful results for practitioners.

preprint2010arXiv

Two-Timescale Learning Using Idiotypic Behaviour Mediation For A Navigating Mobile Robot

A combined Short-Term Learning (STL) and Long-Term Learning (LTL) approach to solving mobile-robot navigation problems is presented and tested in both the real and virtual domains. The LTL phase consists of rapid simulations that use a Genetic Algorithm to derive diverse sets of behaviours, encoded as variable sets of attributes, and the STL phase is an idiotypic Artificial Immune System. Results from the LTL phase show that sets of behaviours develop very rapidly, and significantly greater diversity is obtained when multiple autonomous populations are used, rather than a single one. The architecture is assessed under various scenarios, including removal of the LTL phase and switching off the idiotypic mechanism in the STL phase. The comparisons provide substantial evidence that the best option is the inclusion of both the LTL phase and the idiotypic system. In addition, this paper shows that structurally different environments can be used for the two phases without compromising transferability.

preprint2009arXiv

A Component Based Heuristic Search Method with Evolutionary Eliminations

Nurse rostering is a complex scheduling problem that affects hospital personnel on a daily basis all over the world. This paper presents a new component-based approach with evolutionary eliminations, for a nurse scheduling problem arising at a major UK hospital. The main idea behind this technique is to decompose a schedule into its components (i.e. the allocated shift pattern of each nurse), and then to implement two evolutionary elimination strategies mimicking natural selection and natural mutation process on these components respectively to iteratively deliver better schedules. The worthiness of all components in the schedule has to be continuously demonstrated in order for them to remain there. This demonstration employs an evaluation function which evaluates how well each component contributes towards the final objective. Two elimination steps are then applied: the first elimination eliminates a number of components that are deemed not worthy to stay in the current schedule; the second elimination may also throw out, with a low level of probability, some worthy components. The eliminated components are replenished with new ones using a set of constructive heuristics using local optimality criteria. Computational results using 52 data instances demonstrate the applicability of the proposed approach in solving real-world problems.

preprint2009arXiv

An Agent Based Classification Model

The major function of this model is to access the UCI Wisconsin Breast Can- cer data-set[1] and classify the data items into two categories, which are normal and anomalous. This kind of classifi cation can be referred as anomaly detection, which discriminates anomalous behaviour from normal behaviour in computer systems. One popular solution for anomaly detection is Artifi cial Immune Sys- tems (AIS). AIS are adaptive systems inspired by theoretical immunology and observed immune functions, principles and models which are applied to prob- lem solving. The Dendritic Cell Algorithm (DCA)[2] is an AIS algorithm that is developed specifi cally for anomaly detection. It has been successfully applied to intrusion detection in computer security. It is believed that agent-based mod- elling is an ideal approach for implementing AIS, as intelligent agents could be the perfect representations of immune entities in AIS. This model evaluates the feasibility of re-implementing the DCA in an agent-based simulation environ- ment called AnyLogic, where the immune entities in the DCA are represented by intelligent agents. If this model can be successfully implemented, it makes it possible to implement more complicated and adaptive AIS models in the agent-based simulation environment.

preprint2009arXiv

An Evolutionary Squeaky Wheel Optimisation Approach to Personnel Scheduling

The quest for robust heuristics that are able to solve more than one problem is ongoing. In this paper, we present, discuss and analyse a technique called Evolutionary Squeaky Wheel Optimisation and apply it to two different personnel scheduling problems. Evolutionary Squeaky Wheel Optimisation improves the original Squeaky Wheel Optimisation's effectiveness and execution speed by incorporating two extra steps (Selection and Mutation) for added evolution. In the Evolutionary Squeaky Wheel Optimisation, a cycle of Analysis-Selection-Mutation-Prioritization-Construction continues until stopping conditions are reached. The aim of the Analysis step is to identify below average solution components by calculating a fitness value for all components. The Selection step then chooses amongst these underperformers and discards some probabilistically based on fitness. The Mutation step further discards a few components at random. Solutions can become incomplete and thus repairs may be required. The repairs are carried out by using the Prioritization to first produce priorities that determine an order by which the following Construction step then schedules the remaining components. Therefore, improvement in the Evolutionary Squeaky Wheel Optimisation is achieved by selective solution disruption mixed with interative improvement and constructive repair. Strong experimental results are reported on two different domains of personnel scheduling: bus and rail driver scheduling and hospital nurse scheduling.

preprint2009arXiv

An Idiotypic Immune Network as a Short Term Learning Architecture for Mobile Robots

A combined Short-Term Learning (STL) and Long-Term Learning (LTL) approach to solving mobile robot navigation problems is presented and tested in both real and simulated environments. The LTL consists of rapid simulations that use a Genetic Algorithm to derive diverse sets of behaviours. These sets are then transferred to an idiotypic Artificial Immune System (AIS), which forms the STL phase, and the system is said to be seeded. The combined LTL-STL approach is compared with using STL only, and with using a handdesigned controller. In addition, the STL phase is tested when the idiotypic mechanism is turned off. The results provide substantial evidence that the best option is the seeded idiotypic system, i.e. the architecture that merges LTL with an idiotypic AIS for the STL. They also show that structurally different environments can be used for the two phases without compromising transferability

preprint2009arXiv

An Immune Inspired Approach to Anomaly Detection

The immune system provides a rich metaphor for computer security: anomaly detection that works in nature should work for machines. However, early artificial immune system approaches for computer security had only limited success. Arguably, this was due to these artificial systems being based on too simplistic a view of the immune system. We present here a second generation artificial immune system for process anomaly detection. It improves on earlier systems by having different artificial cell types that process information. Following detailed information about how to build such second generation systems, we find that communication between cells types is key to performance. Through realistic testing and validation we show that second generation artificial immune systems are capable of anomaly detection beyond generic system policies. The paper concludes with a discussion and outline of the next steps in this exciting area of computer security.

preprint2009arXiv

An Immune Inspired Network Intrusion Detection System Utilising Correlation Context

Network Intrusion Detection Systems (NIDS) are computer systems which monitor a network with the aim of discerning malicious from benign activity on that network. While a wide range of approaches have met varying levels of success, most IDSs rely on having access to a database of known attack signatures which are written by security experts. Nowadays, in order to solve problems with false positive alerts, correlation algorithms are used to add additional structure to sequences of IDS alerts. However, such techniques are of no help in discovering novel attacks or variations of known attacks, something the human immune system (HIS) is capable of doing in its own specialised domain. This paper presents a novel immune algorithm for application to the IDS problem. The goal is to discover packets containing novel variations of attacks covered by an existing signature base.

preprint2009arXiv

Articulation and Clarification of the Dendritic Cell Algorithm

The Dendritic Cell algorithm (DCA) is inspired by recent work in innate immunity. In this paper a formal description of the DCA is given. The DCA is described in detail, and its use as an anomaly detector is illustrated within the context of computer security. A port scan detection task is performed to substantiate the influence of signal selection on the behaviour of the algorithm. Experimental results provide a comparison of differing input signal mappings.

preprint2009arXiv

Artificial Dendritic Cells: Multi-faceted Perspectives

Dendritic cells are the crime scene investigators of the human immune system. Their function is to correlate potentially anomalous invading entities with observed damage to the body. The detection of such invaders by dendritic cells results in the activation of the adaptive immune system, eventually leading to the removal of the invader from the host body. This mechanism has provided inspiration for the development of a novel bio-inspired algorithm, the Dendritic Cell Algorithm. This algorithm processes information at multiple levels of resolution, resulting in the creation of information granules of variable structure. In this chapter we examine the multi-faceted nature of immunology and how research in this field has shaped the function of the resulting Dendritic Cell Algorithm. A brief overview of the algorithm is given in combination with the details of the processes used for its development. The chapter is concluded with a discussion of the parallels between our understanding of the human immune system and how such knowledge influences the design of artificial immune systems.

preprint2009arXiv

Artificial Immune Systems

The biological immune system is a robust, complex, adaptive system that defends the body from foreign pathogens. It is able to categorize all cells (or molecules) within the body as self-cells or non-self cells. It does this with the help of a distributed task force that has the intelligence to take action from a local and also a global perspective using its network of chemical messengers for communication. There are two major branches of the immune system. The innate immune system is an unchanging mechanism that detects and destroys certain invading organisms, whilst the adaptive immune system responds to previously unknown foreign cells and builds a response to them that can remain in the body over a long period of time. This remarkable information processing biological system has caught the attention of computer science in recent years. A novel computational intelligence technique, inspired by immunology, has emerged, called Artificial Immune Systems. Several concepts from the immune have been extracted and applied for solution to real world science and engineering problems. In this tutorial, we briefly describe the immune system metaphors that are relevant to existing Artificial Immune Systems methods. We will then show illustrative real-world problems suitable for Artificial Immune Systems and give a step-by-step algorithm walkthrough for one such problem. A comparison of the Artificial Immune Systems to other well-known algorithms, areas for future work, tips & tricks and a list of resources will round this tutorial off. It should be noted that as Artificial Immune Systems is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from time to time and from those examples given here.

preprint2009arXiv

Artificial Immune Tissue using Self-Orgamizing Networks

As introduced by Bentley et al. (2005), artificial immune systems (AIS) are lacking tissue, which is present in one form or another in all living multi-cellular organisms. Some have argued that this concept in the context of AIS brings little novelty to the already saturated field of the immune inspired computational research. This article aims to show that such a component of an AIS has the potential to bring an advantage to a data processing algorithm in terms of data pre-processing, clustering and extraction of features desired by the immune inspired system. The proposed tissue algorithm is based on self-organizing networks, such as self-organizing maps (SOM) developed by Kohonen (1996) and an analogy of the so called Toll-Like Receptors (TLR) affecting the activation function of the clusters developed by the SOM.

preprint2008arXiv

A Bayesian Optimisation Algorithm for the Nurse Scheduling Problem

A Bayesian optimization algorithm for the nurse scheduling problem is presented, which involves choosing a suitable scheduling rule from a set for each nurses assignment. Unlike our previous work that used Gas to implement implicit learning, the learning in the proposed algorithm is explicit, ie. Eventually, we will be able to identify and mix building blocks directly. The Bayesian optimization algorithm is applied to implement such explicit learning by building a Bayesian network of the joint distribution of solutions. The conditional probability of each variable in the network is computed according to an initial set of promising solutions. Subsequently, each new instance for each variable is generated, ie in our case, a new rule string has been obtained. Another set of rule strings will be generated in this way, some of which will replace previous strings based on fitness selection. If stopping conditions are not met, the conditional probabilities for all nodes in the Bayesian network are updated again using the current set of promising rule strings. Computational results from 52 real data instances demonstrate the success of this approach. It is also suggested that the learning mechanism in the proposed approach might be suitable for other scheduling problems.

preprint2008arXiv

A Component Based Heuristic Search method with Adaptive Perturbations for Hospital Personnel Scheduling

Nurse rostering is a complex scheduling problem that affects hospital personnel on a daily basis all over the world. This paper presents a new component-based approach with adaptive perturbations, for a nurse scheduling problem arising at a major UK hospital. The main idea behind this technique is to decompose a schedule into its components (i.e. the allocated shift pattern of each nurse), and then mimic a natural evolutionary process on these components to iteratively deliver better schedules. The worthiness of all components in the schedule has to be continuously demonstrated in order for them to remain there. This demonstration employs a dynamic evaluation function which evaluates how well each component contributes towards the final objective. Two perturbation steps are then applied: the first perturbation eliminates a number of components that are deemed not worthy to stay in the current schedule; the second perturbation may also throw out, with a low level of probability, some worthy components. The eliminated components are replenished with new ones using a set of constructive heuristics using local optimality criteria. Computational results using 52 data instances demonstrate the applicability of the proposed approach in solving real-world problems.

preprint2008arXiv

A Pyramidal Evolutionary Algorithm with Different Inter-Agent Partnering Strategies for Scheduling Problems

This paper combines the idea of a hierarchical distributed genetic algorithm with different inter-agent partnering strategies. Cascading clusters of sub-populations are built from bottom up, with higher-level sub-populations optimising larger parts of the problem. Hence higher-level sub-populations search a larger search space with a lower resolution whilst lower-level sub-populations search a smaller search space with a higher resolution. The effects of different partner selection schemes amongst the agents on solution quality are examined for two multiple-choice optimisation problems. It is shown that partnering strategies that exploit problem-specific knowledge are superior and can counter inappropriate (sub-) fitness measurements.

preprint2008arXiv

A Recommender System based on Idiotypic Artificial Immune Networks

The immune system is a complex biological system with a highly distributed, adaptive and self-organising nature. This paper presents an Artificial Immune System (AIS) that exploits some of these characteristics and is applied to the task of film recommendation by Collaborative Filtering (CF). Natural evolution and in particular the immune system have not been designed for classical optimisation. However, for this problem, we are not interested in finding a single optimum. Rather we intend to identify a sub-set of good matches on which recommendations can be based. It is our hypothesis that an AIS built on two central aspects of the biological immune system will be an ideal candidate to achieve this: Antigen-antibody interaction for matching and idiotypic antibody-antibody interaction for diversity. Computational results are presented in support of this conjecture and compared to those found by other CF techniques.

preprint2008arXiv

A Recommender System based on the Immune Network

The immune system is a complex biological system with a highly distributed, adaptive and self-organising nature. This paper presents an artificial immune system (AIS) that exploits some of these characteristics and is applied to the task of film recommendation by collaborative filtering (CF). Natural evolution and in particular the immune system have not been designed for classical optimisation. However, for this problem, we are not interested in finding a single optimum. Rather we intend to identify a sub-set of good matches on which recommendations can be based. It is our hypothesis that an AIS built on two central aspects of the biological immune system will be an ideal candidate to achieve this: Antigen - antibody interaction for matching and antibody - antibody interaction for diversity. Computational results are presented in support of this conjecture and compared to those found by other CF techniques.

preprint2008arXiv

An Agent-Based Simulation of In-Store Customer Experiences

Agent-based modelling and simulation offers a new and exciting way of understanding the world of work. In this paper we describe the development of an agent-based simulation model, designed to help to understand the relationship between human resource management practices and retail productivity. We report on the current development of our simulation model which includes new features concerning the evolution of customers over time. To test some of these features we have conducted a series of experiments dealing with customer pool sizes, standard and noise reduction modes, and the spread of the word of mouth. Our multi-disciplinary research team draws upon expertise from work psychologists and computer scientists. Despite the fact we are working within a relatively novel and complex domain, it is clear that intelligent agents offer potential for fostering sustainable organisational capabilities in the future.

preprint2008arXiv

An Artificial Immune System as a Recommender System for Web Sites

Artificial Immune Systems have been used successfully to build recommender systems for film databases. In this research, an attempt is made to extend this idea to web site recommendation. A collection of more than 1000 individuals web profiles (alternatively called preferences / favourites / bookmarks file) will be used. URLs will be classified using the DMOZ (Directory Mozilla) database of the Open Directory Project as our ontology. This will then be used as the data for the Artificial Immune Systems rather than the actual addresses. The first attempt will involve using a simple classification code number coupled with the number of pages within that classification code. However, this implementation does not make use of the hierarchical tree-like structure of DMOZ. Consideration will then be given to the construction of a similarity measure for web profiles that makes use of this hierarchical information to build a better-informed Artificial Immune System.

preprint2008arXiv

An Estimation of Distribution Algorithm for Nurse Scheduling

Schedules can be built in a similar way to a human scheduler by using a set of rules that involve domain knowledge. This paper presents an Estimation of Distribution Algorithm (eda) for the nurse scheduling problem, which involves choosing a suitable scheduling rule from a set for the assignment of each nurse. Unlike previous work that used Genetic Algorithms (ga) to implement implicit learning, the learning in the proposed algorithm is explicit, i.e. we identify and mix building blocks directly. The eda is applied to implement such explicit learning by building a Bayesian network of the joint distribution of solutions. The conditional probability of each variable in the network is computed according to an initial set of promising solutions. Subsequently, each new instance for each variable is generated by using the corresponding conditional probabilities, until all variables have been generated, i.e. in our case, a new rule string has been obtained. Another set of rule strings will be generated in this way, some of which will replace previous strings based on fitness selection. If stopping conditions are not met, the conditional probabilities for all nodes in the Bayesian network are updated again using the current set of promising rule strings. Computational results from 52 real data instances demonstrate the success of this approach. It is also suggested that the learning mechanism in the proposed approach might be suitable for other scheduling problems.

preprint2008arXiv

An Estimation of Distribution Algorithm with Intelligent Local Search for Rule-based Nurse Rostering

This paper proposes a new memetic evolutionary algorithm to achieve explicit learning in rule-based nurse rostering, which involves applying a set of heuristic rules for each nurse's assignment. The main framework of the algorithm is an estimation of distribution algorithm, in which an ant-miner methodology improves the individual solutions produced in each generation. Unlike our previous work (where learning is implicit), the learning in the memetic estimation of distribution algorithm is explicit, i.e. we are able to identify building blocks directly. The overall approach learns by building a probabilistic model, i.e. an estimation of the probability distribution of individual nurse-rule pairs that are used to construct schedules. The local search processor (i.e. the ant-miner) reinforces nurse-rule pairs that receive higher rewards. A challenging real world nurse rostering problem is used as the test problem. Computational results show that the proposed approach outperforms most existing approaches. It is suggested that the learning methodologies suggested in this paper may be applied to other scheduling problems where schedules are built systematically according to specific rules

preprint2008arXiv

An Indirect Genetic Algorithm for a Nurse Scheduling Problem

This paper describes a Genetic Algorithms approach to a manpower-scheduling problem arising at a major UK hospital. Although Genetic Algorithms have been successfully used for similar problems in the past, they always had to overcome the limitations of the classical Genetic Algorithms paradigm in handling the conflict between objectives and constraints. The approach taken here is to use an indirect coding based on permutations of the nurses, and a heuristic decoder that builds schedules from these permutations. Computational experiments based on 52 weeks of live data are used to evaluate three different decoders with varying levels of intelligence, and four well-known crossover operators. Results are further enhanced by introducing a hybrid crossover operator and by making use of simple bounds to reduce the size of the solution space. The results reveal that the proposed algorithm is able to find high quality solutions and is both faster and more flexible than a recently published Tabu Search approach.

preprint2008arXiv

An Indirect Genetic Algorithm for Set Covering Problems

This paper presents a new type of genetic algorithm for the set covering problem. It differs from previous evolutionary approaches first because it is an indirect algorithm, i.e. the actual solutions are found by an external decoder function. The genetic algorithm itself provides this decoder with permutations of the solution variables and other parameters. Second, it will be shown that results can be further improved by adding another indirect optimisation layer. The decoder will not directly seek out low cost solutions but instead aims for good exploitable solutions. These are then post optimised by another hill-climbing algorithm. Although seemingly more complicated, we will show that this three-stage approach has advantages in terms of solution quality, speed and adaptability to new types of problems over more direct approaches. Extensive computational results are presented and compared to the latest evolutionary and other heuristic approaches to the same data instances.

preprint2008arXiv

An Investigation of the Sequential Sampling Method for Crossdocking Simulation Output Variance Reduction

This paper investigates the reduction of variance associated with a simulation output performance measure, using the Sequential Sampling method while applying minimum simulation replications, for a class of JIT (Just in Time) warehousing system called crossdocking. We initially used the Sequential Sampling method to attain a desired 95% confidence interval half width of plus/minus 0.5 for our chosen performance measure (Total usage cost, given the mean maximum level of 157,000 pounds and a mean minimum level of 149,000 pounds). From our results, we achieved a 95% confidence interval half width of plus/minus 2.8 for our chosen performance measure (Total usage cost, with an average mean value of 115,000 pounds). However, the Sequential Sampling method requires a huge number of simulation replications to reduce variance for our simulation output value to the target level. Arena (version 11) simulation software was used to conduct this study.

preprint2008arXiv

Artificial Immune Systems (AIS) - A New Paradigm for Heuristic Decision Making

Over the last few years, more and more heuristic decision making techniques have been inspired by nature, e.g. evolutionary algorithms, ant colony optimisation and simulated annealing. More recently, a novel computational intelligence technique inspired by immunology has emerged, called Artificial Immune Systems (AIS). This immune system inspired technique has already been useful in solving some computational problems. In this keynote, we will very briefly describe the immune system metaphors that are relevant to AIS. We will then give some illustrative real-world problems suitable for AIS use and show a step-by-step algorithm walkthrough. A comparison of AIS to other well-known algorithms and areas for future work will round this keynote off. It should be noted that as AIS is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from the examples given here.

preprint2008arXiv

Artificial Immune Systems Tutorial

The biological immune system is a robust, complex, adaptive system that defends the body from foreign pathogens. It is able to categorize all cells (or molecules) within the body as self-cells or non-self cells. It does this with the help of a distributed task force that has the intelligence to take action from a local and also a global perspective using its network of chemical messengers for communication. There are two major branches of the immune system. The innate immune system is an unchanging mechanism that detects and destroys certain invading organisms, whilst the adaptive immune system responds to previously unknown foreign cells and builds a response to them that can remain in the body over a long period of time. This remarkable information processing biological system has caught the attention of computer science in recent years. A novel computational intelligence technique, inspired by immunology, has emerged, called Artificial Immune Systems. Several concepts from the immune have been extracted and applied for solution to real world science and engineering problems. In this tutorial, we briefly describe the immune system metaphors that are relevant to existing Artificial Immune Systems methods. We will then show illustrative real-world problems suitable for Artificial Immune Systems and give a step-by-step algorithm walkthrough for one such problem. A comparison of the Artificial Immune Systems to other well-known algorithms, areas for future work, tips & tricks and a list of resources will round this tutorial off. It should be noted that as Artificial Immune Systems is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from time to time and from those examples given here.

preprint2008arXiv

Bayesian Optimisation Algorithm for Nurse Scheduling

Our research has shown that schedules can be built mimicking a human scheduler by using a set of rules that involve domain knowledge. This chapter presents a Bayesian Optimization Algorithm (BOA) for the nurse scheduling problem that chooses such suitable scheduling rules from a set for each nurses assignment. Based on the idea of using probabilistic models, the BOA builds a Bayesian network for the set of promising solutions and samples these networks to generate new candidate solutions. Computational results from 52 real data instances demonstrate the success of this approach. It is also suggested that the learning mechanism in the proposed algorithm may be suitable for other scheduling problems.

preprint2008arXiv

Building Better Nurse Scheduling Algorithms

The aim of this research is twofold: Firstly, to model and solve a complex nurse scheduling problem with an integer programming formulation and evolutionary algorithms. Secondly, to detail a novel statistical method of comparing and hence building better scheduling algorithms by identifying successful algorithm modifications. The comparison method captures the results of algorithms in a single figure that can then be compared using traditional statistical techniques. Thus, the proposed method of comparing algorithms is an objective procedure designed to assist in the process of improving an algorithm. This is achieved even when some results are non-numeric or missing due to infeasibility. The final algorithm outperforms all previous evolutionary algorithms, which relied on human expertise for modification.

preprint2008arXiv

Danger Theory: The Link between AIS and IDS?

We present ideas about creating a next generation Intrusion Detection System based on the latest immunological theories. The central challenge with computer security is determining the difference between normal and potentially harmful activity. For half a century, developers have protected their systems by coding rules that identify and block specific events. However, the nature of current and future threats in conjunction with ever larger IT systems urgently requires the development of automated and adaptive defensive tools. A promising solution is emerging in the form of Artificial Immune Systems. The Human Immune System can detect and defend against harmful and previously unseen invaders, so can we not build a similar Intrusion Detection System for our computers.

preprint2008arXiv

Data Reduction in Intrusion Alert Correlation

Network intrusion detection sensors are usually built around low level models of network traffic. This means that their output is of a similarly low level and as a consequence, is difficult to analyze. Intrusion alert correlation is the task of automating some of this analysis by grouping related alerts together. Attack graphs provide an intuitive model for such analysis. Unfortunately alert flooding attacks can still cause a loss of service on sensors, and when performing attack graph correlation, there can be a large number of extraneous alerts included in the output graph. This obscures the fine structure of genuine attacks and makes them more difficult for human operators to discern. This paper explores modified correlation algorithms which attempt to minimize the impact of this attack.

preprint2008arXiv

Dempster-Shafer for Anomaly Detection

In this paper, we implement an anomaly detection system using the Dempster-Shafer method. Using two standard benchmark problems we show that by combining multiple signals it is possible to achieve better results than by using a single signal. We further show that by applying this approach to a real-world email dataset the algorithm works for email worm detection. Dempster-Shafer can be a promising method for anomaly detection problems with multiple features (data sources), and two or more classes.

preprint2008arXiv

Enhanced Direct and Indirect Genetic Algorithm Approaches for a Mall Layout and Tenant Selection Problem

During our earlier research, it was recognised that in order to be successful with an indirect genetic algorithm approach using a decoder, the decoder has to strike a balance between being an optimiser in its own right and finding feasible solutions. Previously this balance was achieved manually. Here we extend this by presenting an automated approach where the genetic algorithm itself, simultaneously to solving the problem, sets weights to balance the components out. Subsequently we were able to solve a complex and non-linear scheduling problem better than with a standard direct genetic algorithm implementation.

preprint2008arXiv

Explicit Learning: an Effort towards Human Scheduling Algorithms

Scheduling problems are generally NP-hard combinatorial problems, and a lot of research has been done to solve these problems heuristically. However, most of the previous approaches are problem-specific and research into the development of a general scheduling algorithm is still in its infancy. Mimicking the natural evolutionary process of the survival of the fittest, Genetic Algorithms (GAs) have attracted much attention in solving difficult scheduling problems in recent years. Some obstacles exist when using GAs: there is no canonical mechanism to deal with constraints, which are commonly met in most real-world scheduling problems, and small changes to a solution are difficult. To overcome both difficulties, indirect approaches have been presented (in [1] and [2]) for nurse scheduling and driver scheduling, where GAs are used by mapping the solution space, and separate decoding routines then build solutions to the original problem.

preprint2008arXiv

Exploiting problem structure in a genetic algorithm approach to a nurse rostering problem

There is considerable interest in the use of genetic algorithms to solve problems arising in the areas of scheduling and timetabling. However, the classical genetic algorithm paradigm is not well equipped to handle the conflict between objectives and constraints that typically occurs in such problems. In order to overcome this, successful implementations frequently make use of problem specific knowledge. This paper is concerned with the development of a GA for a nurse rostering problem at a major UK hospital. The structure of the constraints is used as the basis for a co-evolutionary strategy using co-operating sub-populations. Problem specific knowledge is also used to define a system of incentives and disincentives, and a complementary mutation operator. Empirical results based on 52 weeks of live data show how these features are able to improve an unsuccessful canonical GA to the point where it is able to provide a practical solution to the problem

preprint2008arXiv

Genetic-Algorithm Seeding Of Idiotypic Networks For Mobile-Robot Navigation

Robot-control designers have begun to exploit the properties of the human immune system in order to produce dynamic systems that can adapt to complex, varying, real-world tasks. Jernes idiotypic-network theory has proved the most popular artificial-immune-system (AIS) method for incorporation into behaviour-based robotics, since idiotypic selection produces highly adaptive responses. However, previous efforts have mostly focused on evolving the network connections and have often worked with a single, pre-engineered set of behaviours, limiting variability. This paper describes a method for encoding behaviours as a variable set of attributes, and shows that when the encoding is used with a genetic algorithm (GA), multiple sets of diverse behaviours can develop naturally and rapidly, providing much greater scope for flexible behaviour-selection. The algorithm is tested extensively with a simulated e-puck robot that navigates around a maze by tracking colour. Results show that highly successful behaviour sets can be generated within about 25 minutes, and that much greater diversity can be obtained when multiple autonomous populations are used, rather than a single one.

preprint2008arXiv

Idiotypic Immune Networks in Mobile Robot Control

Jerne's idiotypic network theory postulates that the immune response involves inter-antibody stimulation and suppression as well as matching to antigens. The theory has proved the most popular Artificial Immune System (ais) model for incorporation into behavior-based robotics but guidelines for implementing idiotypic selection are scarce. Furthermore, the direct effects of employing the technique have not been demonstrated in the form of a comparison with non-idiotypic systems. This paper aims to address these issues. A method for integrating an idiotypic ais network with a Reinforcement Learning based control system (rl) is described and the mechanisms underlying antibody stimulation and suppression are explained in detail. Some hypotheses that account for the network advantage are put forward and tested using three systems with increasing idiotypic complexity. The basic rl, a simplified hybrid ais-rl that implements idiotypic selection independently of derived concentration levels and a full hybrid ais-rl scheme are examined. The test bed takes the form of a simulated Pioneer robot that is required to navigate through maze worlds detecting and tracking door markers.

preprint2008arXiv

Immune System Approaches to Intrusion Detection - A Review

The use of artificial immune systems in intrusion detection is an appealing concept for two reasons. Firstly, the human immune system provides the human body with a high level of protection from invading pathogens, in a robust, self-organised and distributed manner. Secondly, current techniques used in computer security are not able to cope with the dynamic and increasingly complex nature of computer systems and their security. It is hoped that biologically inspired approaches in this area, including the use of immune-based systems will be able to meet this challenge. Here we review the algorithms used, the development of the systems and the outcome of their implementation. We provide an introduction and analysis of the key developments within this field, in addition to making suggestions for future research.

preprint2008arXiv

Improved Squeaky Wheel Optimisation for Driver Scheduling

This paper presents a technique called Improved Squeaky Wheel Optimisation for driver scheduling problems. It improves the original Squeaky Wheel Optimisations effectiveness and execution speed by incorporating two additional steps of Selection and Mutation which implement evolution within a single solution. In the ISWO, a cycle of Analysis-Selection-Mutation-Prioritization-Construction continues until stopping conditions are reached. The Analysis step first computes the fitness of a current solution to identify troublesome components. The Selection step then discards these troublesome components probabilistically by using the fitness measure, and the Mutation step follows to further discard a small number of components at random. After the above steps, an input solution becomes partial and thus the resulting partial solution needs to be repaired. The repair is carried out by using the Prioritization step to first produce priorities that determine an order by which the following Construction step then schedules the remaining components. Therefore, the optimisation in the ISWO is achieved by solution disruption, iterative improvement and an iterative constructive repair process performed. Encouraging experimental results are reported.

preprint2008arXiv

Introduction to Multi-Agent Simulation

When designing systems that are complex, dynamic and stochastic in nature, simulation is generally recognised as one of the best design support technologies, and a valuable aid in the strategic and tactical decision making process. A simulation model consists of a set of rules that define how a system changes over time, given its current state. Unlike analytical models, a simulation model is not solved but is run and the changes of system states can be observed at any point in time. This provides an insight into system dynamics rather than just predicting the output of a system based on specific inputs. Simulation is not a decision making tool but a decision support tool, allowing better informed decisions to be made. Due to the complexity of the real world, a simulation model can only be an approximation of the target system. The essence of the art of simulation modelling is abstraction and simplification. Only those characteristics that are important for the study and analysis of the target system should be included in the simulation model.

preprint2008arXiv

Investigating a Hybrid Metaheuristic For Job Shop Rescheduling

Previous research has shown that artificial immune systems can be used to produce robust schedules in a manufacturing environment. The main goal is to develop building blocks (antibodies) of partial schedules that can be used to construct backup solutions (antigens) when disturbances occur during production. The building blocks are created based upon underpinning ideas from artificial immune systems and evolved using a genetic algorithm (Phase I). Each partial schedule (antibody) is assigned a fitness value and the best partial schedules are selected to be converted into complete schedules (antigens). We further investigate whether simulated annealing and the great deluge algorithm can improve the results when hybridised with our artificial immune system (Phase II). We use ten fixed solutions as our target and measure how well we cover these specific scenarios.

preprint2008arXiv

Investigating Artificial Immune Systems For Job Shop Rescheduling In Changing Environments

Artificial immune system can be used to generate schedules in changing environments and it has been proven to be more robust than schedules developed using a genetic algorithm. Good schedules can be produced especially when the number of the antigens is increased. However, an increase in the range of the antigens had somehow affected the fitness of the immune system. In this research, we are trying to improve the result of the system by rescheduling the same problem using the same method while at the same time maintaining the robustness of the schedules.

preprint2008arXiv

Movie Recommendation Systems Using An Artificial Immune System

We apply the Artificial Immune System (AIS) technology to the Collaborative Filtering (CF) technology when we build the movie recommendation system. Two different affinity measure algorithms of AIS, Kendall tau and Weighted Kappa, are used to calculate the correlation coefficients for this movie recommendation system. From the testing we think that Weighted Kappa is more suitable than Kendall tau for movie problems.

preprint2008arXiv

On the Application of Hierarchical Coevolutionary Genetic Algorithms: Recombination and Evaluation Partners

This paper examines the use of a hierarchical coevolutionary genetic algorithm under different partnering strategies. Cascading clusters of sub-populations are built from the bottom up, with higher-level sub-populations optimising larger parts of the problem. Hence higher-level sub-populations potentially search a larger search space with a lower resolution whilst lower-level sub-populations search a smaller search space with a higher resolution. The effects of different partner selection schemes amongst the sub-populations on solution quality are examined for two constrained optimisation problems. We examine a number of recombination partnering strategies in the construction of higher-level individuals and a number of related schemes for evaluating sub-solutions. It is shown that partnering strategies that exploit problem-specific knowledge are superior and can counter inappropriate (sub)fitness measurements.

preprint2008arXiv

On the Effects of Idiotypic Interactions for Recommendation Communities in Artificial Immune Systems

It has previously been shown that a recommender based on immune system idiotypic principles can out perform one based on correlation alone. This paper reports the results of work in progress, where we undertake some investigations into the nature of this beneficial effect. The initial findings are that the immune system recommender tends to produce different neighbourhoods, and that the superior performance of this recommender is due partly to the different neighbourhoods, and partly to the way that the idiotypic effect is used to weight each neighbours recommendations.

preprint2008arXiv

Partnering Strategies for Fitness Evaluation in a Pyramidal Evolutionary Algorithm

This paper combines the idea of a hierarchical distributed genetic algorithm with different inter-agent partnering strategies. Cascading clusters of sub-populations are built from bottom up, with higher-level sub-populations optimising larger parts of the problem. Hence higher-level sub-populations search a larger search space with a lower resolution whilst lower-level sub-populations search a smaller search space with a higher resolution. The effects of different partner selection schemes for (sub-)fitness evaluation purposes are examined for two multiple-choice optimisation problems. It is shown that random partnering strategies perform best by providing better sampling and more diversity.

preprint2008arXiv

Rule Generalisation in Intrusion Detection Systems using Snort

Intrusion Detection Systems (ids)provide an important layer of security for computer systems and networks, and are becoming more and more necessary as reliance on Internet services increases and systems with sensitive data are more commonly open to Internet access. An ids responsibility is to detect suspicious or unacceptable system and network activity and to alert a systems administrator to this activity. The majority of ids use a set of signatures that define what suspicious traffic is, and Snort is one popular and actively developing open-source ids that uses such a set of signatures known as Snort rules. Our aim is to identify a way in which Snort could be developed further by generalising rules to identify novel attacks. In particular, we attempted to relax and vary the conditions and parameters of current Snort rules, using a similar approach to classic rule learning operators such as generalisation and specialisation. We demonstrate the effectiveness of our approach through experiments with standard datasets and show that we are able to detect previously undeleted variants of various attacks. We conclude by discussing the general effectiveness and appropriateness of generalisation in Snort based ids rule processing.

preprint2008arXiv

Sensing Danger: Innate Immunology for Intrusion Detection

The immune system provides an ideal metaphor for anomaly detection in general and computer security in particular. Based on this idea, artificial immune systems have been used for a number of years for intrusion detection, unfortunately so far with little success. However, these previous systems were largely based on immunological theory from the 1970s and 1980s and over the last decade our understanding of immunological processes has vastly improved. In this paper we present two new immune inspired algorithms based on the latest immunological discoveries, such as the behaviour of Dendritic Cells. The resultant algorithms are applied to real world intrusion problems and show encouraging results. Overall, we believe there is a bright future for these next generation artificial immune algorithms.

preprint2008arXiv

Simulation Optimization of the Crossdock Door Assignment Problem

The purpose of this report is to present the Crossdock Door Assignment Problem, which involves assigning destinations to outbound dock doors of Crossdock centres such that travel distance by material handling equipment is minimized. We propose a two fold solution; simulation and optimization of the simulation model simulation optimization. The novel aspect of our solution approach is that we intend to use simulation to derive a more realistic objective function and use Memetic algorithms to find an optimal solution. The main advantage of using Memetic algorithms is that it combines a local search with Genetic Algorithms. The Crossdock Door Assignment Problem is a new domain application to Memetic Algorithms and it is yet unknown how it will perform.

preprint2008arXiv

Strategic Alert Throttling for Intrusion Detection Systems

Network intrusion detection systems are themselves becoming targets of attackers. Alert flood attacks may be used to conceal malicious activity by hiding it among a deluge of false alerts sent by the attacker. Although these types of attacks are very hard to stop completely, our aim is to present techniques that improve alert throughput and capacity to such an extent that the resources required to successfully mount the attack become prohibitive. The key idea presented is to combine a token bucket filter with a realtime correlation algorithm. The proposed algorithm throttles alert output from the IDS when an attack is detected. The attack graph used in the correlation algorithm is used to make sure that alerts crucial to forming strategies are not discarded by throttling.

preprint2008arXiv

The Application of Bayesian Optimization and Classifier Systems in Nurse Scheduling

Two ideas taken from Bayesian optimization and classifier systems are presented for personnel scheduling based on choosing a suitable scheduling rule from a set for each persons assignment. Unlike our previous work of using genetic algorithms whose learning is implicit, the learning in both approaches is explicit, i.e. we are able to identify building blocks directly. To achieve this target, the Bayesian optimization algorithm builds a Bayesian network of the joint probability distribution of the rules used to construct solutions, while the adapted classifier system assigns each rule a strength value that is constantly updated according to its usefulness in the current situation. Computational results from 52 real data instances of nurse scheduling demonstrate the success of both approaches. It is also suggested that the learning mechanism in the proposed approaches might be suitable for other scheduling problems.

preprint2008arXiv

The Danger Theory and Its Application to Artificial Immune Systems

Over the last decade, a new idea challenging the classical self-non-self viewpoint has become popular amongst immunologists. It is called the Danger Theory. In this conceptual paper, we look at this theory from the perspective of Artificial Immune System practitioners. An overview of the Danger Theory is presented with particular emphasis on analogies in the Artificial Immune Systems world. A number of potential application areas are then used to provide a framing for a critical assessment of the concept, and its relevance for Artificial Immune Systems.

preprint2008arXiv

The Role of Management Practices in Closing the Productivity Gap

There is no doubt that management practices are linked to the productivity and performance of a company. However, research findings are mixed. This paper provides a multi-disciplinary review of the current evidence of such a relationship and offers suggestions for further exploration. We provide an extensive review of the literature in terms of research findings from studies that have been trying to measure and understand the impact that individual management practices and clusters of management practices have on productivity at different levels of analysis. We focus our review on Operations Management (om) and Human Resource Management (hrm) practices as well as joint applications of these practices. In conclusion, we can say that taken as a whole, the research findings are equivocal. Some studies have found a positive relationship between the adoption of management practices and productivity, some negative and some no association whatsoever. We believe that the lack of universal consensus on the effect of the adoption of complementary management practices might be driven either by measurement issues or by the level of analysis. Consequently, there is a need for further research. In particular, for a multi-level approach from the lowest possible level of aggregation up to the firm-level of analysis in order to assess the impact of management practices upon the productivity of firms.

preprint2008arXiv

Using Intelligent Agents to Understand Management Practices and Retail Productivity

Intelligent agents offer a new and exciting way of understanding the world of work. In this paper we apply agent-based modeling and simulation to investigate a set of problems in a retail context. Specifically, we are working to understand the relationship between human resource management practices and retail productivity. Despite the fact we are working within a relatively novel and complex domain, it is clear that intelligent agents could offer potential for fostering sustainable organizational capabilities in the future. The project is still at an early stage. So far we have conducted a case study in a UK department store to collect data and capture impressions about operations and actors within departments. Furthermore, based on our case study we have built and tested our first version of a retail branch simulator which we will present in this paper.

preprint2008arXiv

Using Intelligent Agents to understand organisational behaviour

This paper introduces two ongoing research projects which seek to apply computer modelling techniques in order to simulate human behaviour within organisations. Previous research in other disciplines has suggested that complex social behaviours are governed by relatively simple rules which, when identified, can be used to accurately model such processes using computer technology. The broad objective of our research is to develop a similar capability within organisational psychology.

Uwe Aickelin

What is connected

Connect this record

See the researcher in context

Building this map preview

211 published item(s)

Fast Rate Generalization Error Bounds: Variations on a Theme

Multi-objective Semi-supervised Clustering for Finding Predictive Clusters

A new interval-based aggregation approach based on bagging and Interval Agreement Approach (IAA) in ensemble learning

Information-theoretic analysis for transfer learning

Adaptive Data Communication Interface: A User-Centric Visual Data Interpretation Framework

An ensemble of machine learning and anti-learning methods for predicting tumour patient survival rates

Applying Interval Type-2 Fuzzy Rule Based Classifiers Through a Cluster-Based Class Representation

Exploring Differences in Interpretation of Words Essential in Medical Expert-Patient Communication

Identifying Candidate Risk Factors for Prescription Drug Side Effects using Causal Contrast Set Mining

Indebted households profiling: a knowledge discovery from database approach

Juxtaposition of System Dynamics and Agent-based Simulation for a Case Study in Immunosenescence

Measuring Player's Behaviour Change over Time in Public Goods Game

Modelling Cyber-Security Experts' Decision Making Processes using Aggregation Operators

Modelling Office Energy Consumption: An Agent Based Approach

Optimising Rule-Based Classification in Temporal Data

Refining adverse drug reaction signals by incorporating interaction variables identified using emergent pattern mining

Self-Organising Maps in Computer Security

Simulating user learning in authoritative technology adoption: An agent based model for council-led smart meter deployment planning in the UK

Supervised Adverse Drug Reaction Signalling Framework Imitating Bradford Hill's Causality Considerations

Supervised Anomaly Detection in Uncertain Pseudoperiodic Data Streams

A Data Mining framework to model Consumer Indebtedness with Psychological Factors

Incorporating Spontaneous Reporting System Data to Aid Causal Inference in Longitudinal Healthcare Data

Personalising Mobile Advertising Based on Users Installed Apps

Refining Adverse Drug Reactions using Association Rule Mining for Electronic Healthcare Data

A Fuzzy Directional Distance Measure

A Novel Semi-Supervised Algorithm for Rare Prescription Side Effect Discovery

An Approach for Assessing Clustering of Households by Electricity Usage

Analysing Fuzzy Sets Through Combining Measures of Similarity and Distance

Attributes for Causal Inference in Longitudinal Observational Databases

Augmented Neural Networks for Modelling Consumer Indebtness

Comparing Stochastic Differential Equations and Agent-Based Modelling and Simulation for Early-stage Cancer

Comparison of algorithms that detect drug side effects using electronic healthcare databases

Comparison of Distance Metrics for Hierarchical Data in Medical Databases

Data classification using the Dempster-Shafer method

Detect Adverse Drug Reactions for Drug Aspirin

Detecting adverse drug reactions for the drug Simvastatin

Ensemble Learning of Colorectal Cancer Survival Rates

Feature selection in detection of adverse drug reactions from the Health Improvement Network (THIN) database

Modelling Electrical Car Diffusion Based on Agents

Signalling Paediatric Side Effects using an Ensemble of Simple Study Designs

Tuning a Multiple Classifier System for Side Effect Discovery using Genetic Algorithms

Variability of Behaviour in Electricity Load Profile Clustering; Who Does Things at the Same Time Each Day?

A Beginners Guide to Systems Simulation in Immunology

A Comparison of Non-stationary, Type-2 and Dual Surface Fuzzy Control

A New Graphical Password Scheme Resistant to Shoulder-Surfing

A Three-Dimensional Model of Residential Energy Consumer Archetypes for Local Energy Policy Design in the UK

Adaptive Alert Throttling for Intrusion Detection Systems

Against Spyware Using CAPTCHA in Graphical Password Scheme

An audio CAPTCHA to distinguish humans from computers

An investigation into the relationship between type-2 FOU size and environmental uncertainty in robotic control

Application of a clustering framework to UK domestic electricity data

Artificial Immune Systems (INTROS 2)

Biomarker Clustering of Colorectal Cancer Data to Complement Clinical Classification

Can background baroque music help to improve the memorability of graphical passwords?

Comparing Data-mining Algorithms Developed for Longitudinal Observational Databases

Comparing Decison Support Tools for Cargo Screening Processes

Creating Personalised Energy Plans. From Groups to Individuals using Fuzzy C Means Clustering

Defining a Simulation Strategy for Cancer Immunocompetence

Detect adverse drug reactions for drug Alendronate

Detect adverse drug reactions for drug Atorvastatin

Detect adverse drug reactions for drug Pioglitazone

Detect adverse drug reactions for the drug Pravastatin

Dienstplanerstellung in Krankenhaeusern mittels genetischer Algorithmen

Discovering Sequential Patterns in a UK General Practice Database

Draw a line on your PDA to authenticate

Evaluating Different Cost-Benefit Analysis Methods for Port Security Operations

Examining the Classification Accuracy of TSVMs with ?Feature Selection in Comparison with the GLAD Algorithm

Extending a Microsimulation of the Port of Dover

Extending Similarity Measures of Interval Type-2 Fuzzy Sets to General Type-2 Fuzzy Sets

Finding the creatures of habit; Clustering households based on their flexibility in using electricity

Immune System Approaches to Intrusion Detection - A Review (ICARIS)

Investigating Immune System Aging: System Dynamics and Agent-Based Modeling

Investigating Mathematical Models of Immuno-Interactions with Early-Stage Cancer under an Agent-Based Modelling Perspective

Investigating the Detection of Adverse Drug Events in a UK General Practice Electronic Health-Care Database