Source author record

Alexandros Kalousis

Alexandros Kalousis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computer Vision Applications Databases eess.IV eess.SP Information Retrieval Neural and Evolutionary Computing

Catalog footprint

What is connected

15works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Kanerva++: extending The Kanerva Machine with differentiable, locally block allocated latent memory

Episodic and semantic memory are critical components of the human memory model. The theory of complementary learning systems (McClelland et al., 1995) suggests that the compressed representation produced by a serial event (episodic memory) is later restructured to build a more generalized form of reusable knowledge (semantic memory). In this work we develop a new principled Bayesian memory allocation scheme that bridges the gap between episodic and semantic memory via a hierarchical latent variable model. We take inspiration from traditional heap allocation and extend the idea of locally contiguous memory to the Kanerva Machine, enabling a novel differentiable block allocated latent memory. In contrast to the Kanerva Machine, we simplify the process of memory writing by treating it as a fully feed forward deterministic process, relying on the stochasticity of the read key distribution to disperse information within the memory. We demonstrate that this allocation scheme improves performance in memory conditional image generation, resulting in new state-of-the-art conditional likelihood values on binarized MNIST (<=41.58 nats/image) , binarized Omniglot (<=66.24 nats/image), as well as presenting competitive performance on CIFAR10, DMLab Mazes, Celeb-A and ImageNet32x32.

preprint2022arXiv

Optimality Inductive Biases and Agnostic Guidelines for Offline Reinforcement Learning

The performance of state-of-the-art offline RL methods varies widely over the spectrum of dataset qualities, ranging from far-from-optimal random data to close-to-optimal expert demonstrations. We re-implement these methods to test their reproducibility, and show that when a given method outperforms the others on one end of the spectrum, it never does on the other end. This prevents us from naming a victor across the board. We attribute the asymmetry to the amount of inductive bias injected into the agent to entice it to posit that the behavior underlying the offline dataset is optimal for the task. Our investigations confirm that careless injections of such optimality inductive biases make dominant agents subpar as soon as the offline policy is sub-optimal. To bridge this gap, we generalize importance-weighted regression methods that have proved the most versatile across the spectrum of dataset grades into a modular framework that allows for the design of methods that align with how much we know about the dataset. This modularity enables qualitatively different injections of optimality inductive biases. We show that certain orchestrations strike the right balance, improving the return on one end of the spectrum without harming it on the other end. While the formulation of guidelines for the design of an offline method reduces to aligning the amount of optimality bias to inject with what we know about the quality of the data, the design of an agnostic method for which we need not know the quality of the data beforehand is more nuanced. Only our framework allowed us to design a method that performed well across the spectrum while remaining modular if more information about the quality of the data ever becomes available.

preprint2022arXiv

ProxyFAUG: Proximity-based Fingerprint Augmentation

The proliferation of data-demanding machine learning methods has brought to light the necessity for methodologies which can enlarge the size of training datasets, with simple, rule-based methods. In-line with this concept, the fingerprint augmentation scheme proposed in this work aims to augment fingerprint datasets which are used to train positioning models. The proposed method utilizes fingerprints which are recorded in spacial proximity, in order to perform fingerprint augmentation, creating new fingerprints which combine the features of the original ones. The proposed method of composing the new, augmented fingerprints is inspired by the crossover and mutation operators of genetic algorithms. The ProxyFAUG method aims to improve the achievable positioning accuracy of fingerprint datasets, by introducing a rule-based, stochastic, proximity-based method of fingerprint augmentation. The performance of ProxyFAUG is evaluated in an outdoor Sigfox setting using a public dataset. The best performing published positioning method on this dataset is improved by 40% in terms of median error and 6% in terms of mean error, with the use of the augmented dataset. The analysis of the results indicate a systematic and significant performance improvement at the lower error quartiles, as indicated by the impressive improvement of the median error.

preprint2021arXiv

Analysing the Data-Driven Approach of Dynamically Estimating Positioning Accuracy

The primary expectation from positioning systems is for them to provide the users with reliable estimates of their position. An additional piece of information that can greatly help the users utilize position estimates is the level of uncertainty that a positioning system assigns to the position estimate it produced. The concept of dynamically estimating the accuracy of position estimates of fingerprinting positioning systems has been sporadically discussed over the last decade in the literature of the field, where mainly handcrafted rules based on domain knowledge have been proposed. The emergence of IoT devices and the proliferation of data from Low Power Wide Area Networks (LPWANs) have facilitated the conceptualization of data-driven methods of determining the estimated certainty over position estimates. In this work, we analyze the data-driven approach of determining the Dynamic Accuracy Estimation (DAE), considering it in the broader context of a positioning system. More specifically, with the use of a public LoRaWAN dataset, the current work analyses: the repartition of the available training set between the tasks of determining the location estimates and the DAE, the concept of selecting a subset of the most reliable estimates, and the impact that the spatial distribution of the data has to the accuracy of the DAE. The work provides a wide overview of the data-driven approach of DAE determination in the context of the overall design of a positioning system.

preprint2020arXiv

Lifelong Generative Modeling

Lifelong learning is the problem of learning multiple consecutive tasks in a sequential manner, where knowledge gained from previous tasks is retained and used to aid future learning over the lifetime of the learner. It is essential towards the development of intelligent machines that can adapt to their surroundings. In this work we focus on a lifelong learning approach to unsupervised generative modeling, where we continuously incorporate newly observed distributions into a learned model. We do so through a student-teacher Variational Autoencoder architecture which allows us to learn and preserve all the distributions seen so far, without the need to retain the past data nor the past models. Through the introduction of a novel cross-model regularizer, inspired by a Bayesian update rule, the student model leverages the information learned by the teacher, which acts as a probabilistic knowledge store. The regularizer reduces the effect of catastrophic interference that appears when we learn over sequences of distributions. We validate our model's performance on sequential variants of MNIST, FashionMNIST, PermutedMNIST, SVHN and Celeb-A and demonstrate that our model mitigates the effects of catastrophic interference faced by neural networks in sequential learning scenarios.

preprint2016arXiv

Learning Leading Indicators for Time Series Predictions

We consider the problem of learning models for forecasting multiple time-series systems together with discovering the leading indicators that serve as good predictors for the system. We model the systems by linear vector autoregressive models (VAR) and link the discovery of leading indicators to inferring sparse graphs of Granger-causality. We propose new problem formulations and develop two new methods to learn such models, gradually increasing the complexity of assumptions and approaches. While the first method assumes common structures across the whole system, our second method uncovers model clusters based on the Granger-causality and leading indicators together with learning the model parameters. We study the performance of our methods on a comprehensive set of experiments and confirm their efficacy and their advantages over state-of-the-art sparse VAR and graphical Granger learning methods.

preprint2015arXiv

Factorizing LambdaMART for cold start recommendations

Recommendation systems often rely on point-wise loss metrics such as the mean squared error. However, in real recommendation settings only few items are presented to a user. This observation has recently encouraged the use of rank-based metrics. LambdaMART is the state-of-the-art algorithm in learning to rank which relies on such a metric. Despite its success it does not have a principled regularization mechanism relying in empirical approaches to control model complexity leaving it thus prone to overfitting. Motivated by the fact that very often the users' and items' descriptions as well as the preference behavior can be well summarized by a small number of hidden factors, we propose a novel algorithm, LambdaMART Matrix Factorization (LambdaMART-MF), that learns a low rank latent representation of users and items using gradient boosted trees. The algorithm factorizes lambdaMART by defining relevance scores as the inner product of the learned representations of the users and items. The low rank is essentially a model complexity controller; on top of it we propose additional regularizers to constraint the learned latent representations that reflect the user and item manifolds as these are defined by their original feature based descriptors and the preference behavior. Finally we also propose to use a weighted variant of NDCG to reduce the penalty for similar items with large rating discrepancy. We experiment on two very different recommendation datasets, meta-mining and movies-users, and evaluate the performance of LambdaMART-MF, with and without regularization, in the cold start setting as well as in the simpler matrix completion setting. In both cases it outperforms in a significant manner current state of the art algorithms.

preprint2014arXiv

Two-Stage Metric Learning

In this paper, we present a novel two-stage metric learning algorithm. We first map each learning instance to a probability distribution by computing its similarities to a set of fixed anchor points. Then, we define the distance in the input data space as the Fisher information distance on the associated statistical manifold. This induces in the input data space a new family of distance metric with unique properties. Unlike kernelized metric learning, we do not require the similarity measure to be positive semi-definite. Moreover, it can also be interpreted as a local metric learning algorithm with well defined distance approximation. We evaluate its performance on a number of datasets. It outperforms significantly other metric learning methods and SVM.

preprint2013arXiv

A Metric-learning based framework for Support Vector Machines and Multiple Kernel Learning

Most metric learning algorithms, as well as Fisher's Discriminant Analysis (FDA), optimize some cost function of different measures of within-and between-class distances. On the other hand, Support Vector Machines(SVMs) and several Multiple Kernel Learning (MKL) algorithms are based on the SVM large margin theory. Recently, SVMs have been analyzed from SVM and metric learning, and to develop new algorithms that build on the strengths of each. Inspired by the metric learning interpretation of SVM, we develop here a new metric-learning based SVM framework in which we incorporate metric learning concepts within SVM. We extend the optimization problem of SVM to include some measure of the within-class distance and along the way we develop a new within-class distance measure which is appropriate for SVM. In addition, we adopt the same approach for MKL and show that it can be also formulated as a Mahalanobis metric learning problem. Our end result is a number of SVM/MKL algorithms that incorporate metric learning concepts. We experiment with them on a set of benchmark datasets and observe important predictive performance improvements.

preprint2012arXiv

A metric learning perspective of SVM: on the relation of SVM and LMNN

Support Vector Machines, SVMs, and the Large Margin Nearest Neighbor algorithm, LMNN, are two very popular learning algorithms with quite different learning biases. In this paper we bring them into a unified view and show that they have a much stronger relation than what is commonly thought. We analyze SVMs from a metric learning perspective and cast them as a metric learning problem, a view which helps us uncover the relations of the two algorithms. We show that LMNN can be seen as learning a set of local SVM-like models in a quadratic space. Along the way and inspired by the metric-based interpretation of SVM s we derive a novel variant of SVMs, epsilon-SVM, to which LMNN is even more similar. We give a unified view of LMNN and the different SVM variants. Finally we provide some preliminary experiments on a number of benchmark datasets in which show that epsilon-SVM compares favorably both with respect to LMNN and SVM.

preprint2012arXiv

Learning Heterogeneous Similarity Measures for Hybrid-Recommendations in Meta-Mining

The notion of meta-mining has appeared recently and extends the traditional meta-learning in two ways. First it does not learn meta-models that provide support only for the learning algorithm selection task but ones that support the whole data-mining process. In addition it abandons the so called black-box approach to algorithm description followed in meta-learning. Now in addition to the datasets, algorithms also have descriptors, workflows as well. For the latter two these descriptions are semantic, describing properties of the algorithms. With the availability of descriptors both for datasets and data mining workflows the traditional modelling techniques followed in meta-learning, typically based on classification and regression algorithms, are no longer appropriate. Instead we are faced with a problem the nature of which is much more similar to the problems that appear in recommendation systems. The most important meta-mining requirements are that suggestions should use only datasets and workflows descriptors and the cold-start problem, e.g. providing workflow suggestions for new datasets. In this paper we take a different view on the meta-mining modelling problem and treat it as a recommender problem. In order to account for the meta-mining specificities we derive a novel metric-based-learning recommender approach. Our method learns two homogeneous metrics, one in the dataset and one in the workflow space, and a heterogeneous one in the dataset-workflow space. All learned metrics reflect similarities established from the dataset-workflow preference matrix. We demonstrate our method on meta-mining over biological (microarray datasets) problems. The application of our method is not limited to the meta-mining problem, its formulations is general enough so that it can be applied on problems with similar requirements.

preprint2012arXiv

Learning Neighborhoods for Metric Learning

Metric learning methods have been shown to perform well on different learning tasks. Many of them rely on target neighborhood relationships that are computed in the original feature space and remain fixed throughout learning. As a result, the learned metric reflects the original neighborhood relations. We propose a novel formulation of the metric learning problem in which, in addition to the metric, the target neighborhood relations are also learned in a two-step iterative approach. The new formulation can be seen as a generalization of many existing metric learning methods. The formulation includes a target neighbor assignment rule that assigns different numbers of neighbors to instances according to their quality; `high quality' instances get more neighbors. We experiment with two of its instantiations that correspond to the metric learning algorithms LMNN and MCML and compare it to other metric learning methods on a number of datasets. The experimental results show state-of-the-art performance and provide evidence that learning the neighborhood relations does improve predictive performance.

preprint2012arXiv

Parametric Local Metric Learning for Nearest Neighbor Classification

We study the problem of learning local metrics for nearest neighbor classification. Most previous works on local metric learning learn a number of local unrelated metrics. While this "independence" approach delivers an increased flexibility its downside is the considerable risk of overfitting. We present a new parametric local metric learning method in which we learn a smooth metric matrix function over the data manifold. Using an approximation error bound of the metric matrix function we learn local metrics as linear combinations of basis metrics defined on anchor points over different regions of the instance space. We constrain the metric matrix function by imposing on the linear combinations manifold regularization which makes the learned metric matrix function vary smoothly along the geodesics of the data manifold. Our metric learning method has excellent performance both in terms of predictive power and scalability. We experimented with several large-scale classification problems, tens of thousands of instances, and compared it with several state of the art metric learning methods, both global and local, as well as to SVM with automatic kernel selection, all of which it outperforms in a significant manner.

preprint2012arXiv

Relationship-aware sequential pattern mining

Relationship-aware sequential pattern mining is the problem of mining frequent patterns in sequences in which the events of a sequence are mutually related by one or more concepts from some respective hierarchical taxonomies, based on the type of the events. Additionally events themselves are also described with a certain number of taxonomical concepts. We present RaSP an algorithm that is able to mine relationship-aware patterns over such sequences; RaSP follows a two stage approach. In the first stage it mines for frequent type patterns and {\em all} their occurrences within the different sequences. In the second stage it performs hierarchical mining where for each frequent type pattern and its occurrences it mines for more specific frequent patterns in the lower levels of the taxonomies. We test RaSP on a real world medical application, that provided the inspiration for its development, in which we mine for frequent patterns of medical behavior in the antibiotic treatment of microbes and show that it has a very good computational performance given the complexity of the relationship-aware sequential pattern mining problem.

preprint2012arXiv

Structuring Relevant Feature Sets with Multiple Model Learning

Feature selection is one of the most prominent learning tasks, especially in high-dimensional datasets in which the goal is to understand the mechanisms that underly the learning dataset. However most of them typically deliver just a flat set of relevant features and provide no further information on what kind of structures, e.g. feature groupings, might underly the set of relevant features. In this paper we propose a new learning paradigm in which our goal is to uncover the structures that underly the set of relevant features for a given learning problem. We uncover two types of features sets, non-replaceable features that contain important information about the target variable and cannot be replaced by other features, and functionally similar features sets that can be used interchangeably in learned models, given the presence of the non-replaceable features, with no change in the predictive performance. To do so we propose a new learning algorithm that learns a number of disjoint models using a model disjointness regularization constraint together with a constraint on the predictive agreement of the disjoint models. We explore the behavior of our approach on a number of high-dimensional datasets, and show that, as expected by their construction, these satisfy a number of properties. Namely, model disjointness, a high predictive agreement, and a similar predictive performance to models learned on the full set of relevant features. The ability to structure the set of relevant features in such a manner can become a valuable tool in different applications of scientific knowledge discovery.

Alexandros Kalousis

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Kanerva++: extending The Kanerva Machine with differentiable, locally block allocated latent memory

Optimality Inductive Biases and Agnostic Guidelines for Offline Reinforcement Learning

ProxyFAUG: Proximity-based Fingerprint Augmentation

Analysing the Data-Driven Approach of Dynamically Estimating Positioning Accuracy

Lifelong Generative Modeling

Learning Leading Indicators for Time Series Predictions

Factorizing LambdaMART for cold start recommendations

Two-Stage Metric Learning

A Metric-learning based framework for Support Vector Machines and Multiple Kernel Learning

A metric learning perspective of SVM: on the relation of SVM and LMNN

Learning Heterogeneous Similarity Measures for Hybrid-Recommendations in Meta-Mining

Learning Neighborhoods for Metric Learning

Parametric Local Metric Learning for Nearest Neighbor Classification

Relationship-aware sequential pattern mining

Structuring Relevant Feature Sets with Multiple Model Learning