Researcher profile

Joeran Beel

Joeran Beel contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - Emerging
6works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2020arXiv

Auto-Surprise: An Automated Recommender-System (AutoRecSys) Library with Tree of Parzens Estimator (TPE) Optimization

We introduce Auto-Surprise, an Automated Recommender System library. Auto-Surprise is an extension of the Surprise recommender system library and eases the algorithm selection and configuration process. Compared to out-of-the-box Surprise library, Auto-Surprise performs better when evaluated with MovieLens, Book Crossing and Jester Datasets. It may also result in the selection of an algorithm with significantly lower runtime. Compared to Surprise's grid search, Auto-Surprise performs equally well or slightly better in terms of RMSE, and is notably faster in finding the optimum hyperparameters.

preprint2020arXiv

Finite Group Equivariant Neural Networks for Games

Games such as go, chess and checkers have multiple equivalent game states, i.e. multiple board positions where symmetrical and opposite moves should be made. These equivalences are not exploited by current state of the art neural agents which instead must relearn similar information, thereby wasting computing time. Group equivariant CNNs in existing work create networks which can exploit symmetries to improve learning, however, they lack the expressiveness to correctly reflect the move embeddings necessary for games. We introduce Finite Group Neural Networks (FGNNs), a method for creating agents with an innate understanding of these board positions. FGNNs are shown to improve the performance of networks playing checkers (draughts), and can be easily adapted to other games and learning problems. Additionally, FGNNs can be created from existing network architectures. These include, for the first time, those with skip connections and arbitrary layer types. We demonstrate that an equivariant version of U-Net (FGNN-U-Net) outperforms the unmodified network in image segmentation.

preprint2020arXiv

Per-Instance Algorithm Selection for Recommender Systems via Instance Clustering

Recommendation algorithms perform differently if the users, recommendation contexts, applications, and user interfaces vary even slightly. It is similarly observed in other fields, such as combinatorial problem solving, that algorithms perform differently for each instance presented. In those fields, meta-learning is successfully used to predict an optimal algorithm for each instance, to improve overall system performance. Per-instance algorithm selection has thus far been unsuccessful for recommender systems. In this paper we propose a per-instance meta-learner that clusters data instances and predicts the best algorithm for unseen instances according to cluster membership. We test our approach using 10 collaborative- and 4 content-based filtering algorithms, for varying clustering parameters, and find a significant improvement over the best performing base algorithm at alpha=0.053 (MAE: 0.7107 vs LightGBM 0.7214; t-test). We also explore the performances of our base algorithms on a ratings dataset and empirically show that the error of a perfect algorithm selector monotonically decreases for larger pools of algorithm. To the best of our knowledge, this is the first effective meta-learning technique for per-instance algorithm selection in recommender systems.

preprint2020arXiv

Siamese Meta-Learning and Algorithm Selection with 'Algorithm-Performance Personas' [Proposal]

Automated per-instance algorithm selection often outperforms single learners. Key to algorithm selection via meta-learning is often the (meta) features, which sometimes though do not provide enough information to train a meta-learner effectively. We propose a Siamese Neural Network architecture for automated algorithm selection that focuses more on 'alike performing' instances than meta-features. Our work includes a novel performance metric and method for selecting training samples. We introduce further the concept of 'Algorithm Performance Personas' that describe instances for which the single algorithms perform alike. The concept of 'alike performing algorithms' as ground truth for selecting training samples is novel and provides a huge potential as we believe. In this proposal, we outline our ideas in detail and provide the first evidence that our proposed metric is better suitable for training sample selection that standard performance metrics such as absolute errors.

preprint2020arXiv

Synthetic vs. Real Reference Strings for Citation Parsing, and the Importance of Re-training and Out-Of-Sample Data for Meaningful Evaluations: Experiments with GROBID, GIANT and Cora

Citation parsing, particularly with deep neural networks, suffers from a lack of training data as available datasets typically contain only a few thousand training instances. Manually labelling citation strings is very time-consuming, hence synthetically created training data could be a solution. However, as of now, it is unknown if synthetically created reference-strings are suitable to train machine learning algorithms for citation parsing. To find out, we train Grobid, which uses Conditional Random Fields, with a) human-labelled reference strings from 'real' bibliographies and b) synthetically created reference strings from the GIANT dataset. We find that both synthetic and organic reference strings are equally suited for training Grobid (F1 = 0.74). We additionally find that retraining Grobid has a notable impact on its performance, for both synthetic and real data (+30% in F1). Having as many types of labelled fields as possible during training also improves effectiveness, even if these fields are not available in the evaluation data (+13.5% F1). We conclude that synthetic data is suitable for training (deep) citation parsing models. We further suggest that in future evaluations of reference parsers both evaluation data similar and dissimilar to the training data should be used for more meaningful evaluations.

preprint2020arXiv

Towards an Interoperable Data Protocol Aimed at Linking the Fashion Industry with AI Companies

The fashion industry is looking forward to use artificial intelligence technologies to enhance their processes, services, and applications. Although the amount of fashion data currently in use is increasing, there is a large gap in data exchange between the fashion industry and the related AI companies, not to mention the different structure used for each fashion dataset. As a result, AI companies are relying on manually annotated fashion data to build different applications. Furthermore, as of this writing, the terminology, vocabulary and methods of data representation used to denote fashion items are still ambiguous and confusing. Hence, it is clear that the fashion industry and AI companies will benefit from a protocol that allows them to exchange and organise fashion information in a unified way. To achieve this goal we aim (1) to define a protocol called DDOIF that will allow interoperability of fashion data; (2) for DDOIF to contain diverse entities including extensive information on clothing and accessories attributes in the form of text and various media formats; and (3)To design and implement an API that includes, among other things, functions for importing and exporting a file built according to the DDOIF protocol that stores all information about a single item of clothing. To this end, we identified over 1000 class and subclass names used to name fashion items and use them to build the DDOIF dictionary. We make DDOIF publicly available to all interested users and developers and look forward to engaging more collaborators to improve and enrich it.