Researcher profile

Andrew Collins

Andrew Collins contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - Emerging
6works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2020arXiv

Per-Instance Algorithm Selection for Recommender Systems via Instance Clustering

Recommendation algorithms perform differently if the users, recommendation contexts, applications, and user interfaces vary even slightly. It is similarly observed in other fields, such as combinatorial problem solving, that algorithms perform differently for each instance presented. In those fields, meta-learning is successfully used to predict an optimal algorithm for each instance, to improve overall system performance. Per-instance algorithm selection has thus far been unsuccessful for recommender systems. In this paper we propose a per-instance meta-learner that clusters data instances and predicts the best algorithm for unseen instances according to cluster membership. We test our approach using 10 collaborative- and 4 content-based filtering algorithms, for varying clustering parameters, and find a significant improvement over the best performing base algorithm at alpha=0.053 (MAE: 0.7107 vs LightGBM 0.7214; t-test). We also explore the performances of our base algorithms on a ratings dataset and empirically show that the error of a perfect algorithm selector monotonically decreases for larger pools of algorithm. To the best of our knowledge, this is the first effective meta-learning technique for per-instance algorithm selection in recommender systems.

preprint2020arXiv

Siamese Meta-Learning and Algorithm Selection with 'Algorithm-Performance Personas' [Proposal]

Automated per-instance algorithm selection often outperforms single learners. Key to algorithm selection via meta-learning is often the (meta) features, which sometimes though do not provide enough information to train a meta-learner effectively. We propose a Siamese Neural Network architecture for automated algorithm selection that focuses more on 'alike performing' instances than meta-features. Our work includes a novel performance metric and method for selecting training samples. We introduce further the concept of 'Algorithm Performance Personas' that describe instances for which the single algorithms perform alike. The concept of 'alike performing algorithms' as ground truth for selecting training samples is novel and provides a huge potential as we believe. In this proposal, we outline our ideas in detail and provide the first evidence that our proposed metric is better suitable for training sample selection that standard performance metrics such as absolute errors.

preprint2016arXiv

Deciding the Bell number for hereditary graph properties

The paper [J. Balogh, B. Bollobás, D. Weinreich, A jump to the Bell number for hereditary graph properties, J. Combin. Theory Ser. B 95 (2005) 29--48] identifies a jump in the speed of hereditary graph properties to the Bell number $B_n$ and provides a partial characterisation of the family of minimal classes whose speed is at least $B_n$. In the present paper, we give a complete characterisation of this family. Since this family is infinite, the decidability of the problem of determining if the speed of a hereditary property is above or below the Bell number is questionable. We answer this question positively by showing that there exists an algorithm which, given a finite set $\mathcal{F}$ of graphs, decides whether the speed of the class of graphs containing no induced subgraphs from the set $\mathcal{F}$ is above or below the Bell number. For properties defined by infinitely many minimal forbidden induced subgraphs, the speed is known to be above the Bell number.

preprint2014arXiv

Genome disorder and breast cancer susceptibility

Many common diseases have a complex genetic basis in which large numbers of genetic variations combine with environmental and lifestyle factors to determine risk. However, quantifying such polygenic effects and their relationship to disease risk has been challenging. In order to address these difficulties we developed a global measure of the information content of an individual's genome relative to a reference population, which may be used to assess differences in global genome structure between cases and appropriate controls. Informally this measure, which we call relative genome information (RGI), quantifies the relative "disorder" of an individual's genome. In order to test its ability to predict disease risk we used RGI to compare single nucleotide polymorphism genotypes from two independent samples of women with early-onset breast cancer with three independent sets of controls. We found that RGI was significantly elevated in both sets of breast cancer cases in comparison with all three sets of controls, with disease risk rising sharply with RGI (odds ratio greater than 12 for the highest percentile RGI). Furthermore, we found that these differences are not due to associations with common variants at a small number of disease-associated loci, but rather are due to the combined associations of thousands of markers distributed throughout the genome. Our results indicate that the information content of an individual's genome may be used to measure the risk of a complex disease, and suggest that early-onset breast cancer has a strongly polygenic basis.

preprint2014arXiv

Implicit Representations and Factorial Properties of Graphs

The idea of implicit representation of graphs was introduced in [S. Kannan, M. Naor, S. Rudich, Implicit representation of graphs, SIAM J. Discrete Mathematics, 5 (1992) 596--603] and can be defined as follows. A representation of an $n$-vertex graph $G$ is said to be implicit if it assigns to each vertex of $G$ a binary code of length $O(\log n)$ so that the adjacency of two vertices is a function of their codes. Since an implicit representation of an $n$-vertex graph uses $O(n\log n)$ bits, any class of graphs admitting such a representation contains $2^{O(n\log n)}$ labelled graphs with $n$ vertices. In the terminology of [J. Balogh, B. Bollobás, D. Weinreich, The speed of hereditary properties of graphs, J. Combin. Theory B 79 (2000) 131--156] such classes have at most factorial speed of growth. In this terminology, the implicit graph conjecture can be stated as follows: every class with at most factorial speed of growth which is hereditary admits an implicit representation. The question of deciding whether a given hereditary class has at most factorial speed of growth is far from being trivial. In the present paper, we introduce a number of tools simplifying this question. Some of them can be used to obtain a stronger conclusion on the existence of an implicit representation. We apply our tools to reveal new hereditary classes with the factorial speed of growth. For many of them we show the existence of an implicit representation.

preprint2014arXiv

New results on word-representable graphs

A graph $G=(V,E)$ is word-representable if there exists a word $w$ over the alphabet $V$ such that letters $x$ and $y$ alternate in $w$ if and only if $(x,y)\in E$ for each $x\neq y$. The set of word-representable graphs generalizes several important and well-studied graph families, such as circle graphs, comparability graphs, 3-colorable graphs, graphs of vertex degree at most 3, etc. By answering an open question from [M. Halldorsson, S. Kitaev and A. Pyatkin, Alternation graphs, Lect. Notes Comput. Sci. 6986 (2011) 191--202. Proceedings of the 37th International Workshop on Graph-Theoretic Concepts in Computer Science, WG 2011, Tepla Monastery, Czech Republic, June 21-24, 2011.], in the present paper we show that not all graphs of vertex degree at most 4 are word-representable. Combining this result with some previously known facts, we derive that the number of $n$-vertex word-representable graphs is $2^{\frac{n^2}{3}+o(n^2)}$.