Source author record

Aske Plaat

Aske Plaat appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning cs.CY Distributed, Parallel, and Cluster Computing Human-Computer Interaction Neural and Evolutionary Computing Robotics Computation and Language Digital Libraries Information Retrieval Performance

Catalog footprint

What is connected

30works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Agentic Large Language Models, a survey

Background: There is great interest in agentic LLMs, large language models that act as agents. Objectives: We review the growing body of work in this area and provide a research agenda. Methods: Agentic LLMs are LLMs that (1) reason, (2) act, and (3) interact. We organize the literature according to these three categories. Results: The research in the first category focuses on reasoning, reflection, and retrieval, aiming to improve decision making; the second category focuses on action models, robots, and tools, aiming for agents that act as useful assistants; the third category focuses on multi-agent systems, aiming for collaborative task solving and simulating interaction to study emergent social behavior. We find that works mutually benefit from results in other categories: retrieval enables tool use, reflection improves multi-agent collaboration, and reasoning benefits all categories. Conclusions: We discuss applications of agentic LLMs and provide an agenda for further research. Important applications are in medical diagnosis, logistics and financial market analysis. Meanwhile, self-reflective agents playing roles and interacting with one another augment the process of scientific research itself. Further, agentic LLMs provide a solution for the problem of LLMs running out of training data: inference-time behavior generates new training states, such that LLMs can keep learning without needing ever larger datasets. We note that there is risk associated with LLM assistants taking action in the real world-safety, liability and security are open problems-while agentic LLMs are also likely to benefit society.

preprint2023arXiv

First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation

Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper, we present a clear ablation study of post-exploration in a general intrinsically motivated goal exploration process (IMGEP) framework, that the Go-Explore paper did not show. We study the isolated potential of post-exploration, by turning it on and off within the same algorithm under both tabular and deep RL settings on both discrete navigation and continuous control tasks. Experiments on a range of MiniGrid and Mujoco environments show that post-exploration indeed helps IMGEP agents reach more diverse states and boosts their performance. In short, our work suggests that RL researchers should consider to use post-exploration in IMGEP when possible since it is effective, method-agnostic and easy to implement.

preprint2022arXiv

A Unifying Framework for Reinforcement Learning and Planning

Sequential decision making, commonly formalized as optimization of a Markov Decision Process, is a key challenge in artificial intelligence. Two successful approaches to MDP optimization are reinforcement learning and planning, which both largely have their own research communities. However, if both research fields solve the same problem, then we might be able to disentangle the common factors in their solution approaches. Therefore, this paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP), which identifies underlying dimensions on which MDP planning and learning algorithms have to decide. At the end of the paper, we compare a variety of well-known planning, model-free and model-based RL algorithms along these dimensions. Altogether, the framework may help provide deeper insight in the algorithmic design space of planning and reinforcement learning.

preprint2022arXiv

Model-based Reinforcement Learning: A Survey

Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop. After these two sections, we also discuss implicit model-based RL as an end-to-end alternative for model learning and planning, and we cover the potential benefits of model-based RL. Along the way, the survey also draws connections to several related RL fields, like hierarchical RL and transfer learning. Altogether, the survey presents a broad conceptual overview of the combination of planning and learning for MDP optimization.

preprint2022arXiv

On Credit Assignment in Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning (HRL) has held longstanding promise to advance reinforcement learning. Yet, it has remained a considerable challenge to develop practical algorithms that exhibit some of these promises. To improve our fundamental understanding of HRL, we investigate hierarchical credit assignment from the perspective of conventional multistep reinforcement learning. We show how e.g., a 1-step `hierarchical backup' can be seen as a conventional multistep backup with $n$ skip connections over time connecting each subsequent state to the first independent of actions inbetween. Furthermore, we find that generalizing hierarchy to multistep return estimation methods requires us to consider how to partition the environment trace, in order to construct backup paths. We leverage these insight to develop a new hierarchical algorithm Hier$Q_k(λ)$, for which we demonstrate that hierarchical credit assignment alone can already boost agent performance (i.e., when eliminating generalization or exploration). Altogether, our work yields fundamental insight into the nature of hierarchical backups and distinguishes this as an additional basis for reinforcement learning research.

preprint2022arXiv

Reliable validation of Reinforcement Learning Benchmarks

Reinforcement Learning (RL) is one of the most dynamic research areas in Game AI and AI as a whole, and a wide variety of games are used as its prominent test problems. However, it is subject to the replicability crisis that currently affects most algorithmic AI research. Benchmarking in Reinforcement Learning could be improved through verifiable results. There are numerous benchmark environments whose scores are used to compare different algorithms, such as Atari. Nevertheless, reviewers must trust that figures represent truthful values, as it is difficult to reproduce an exact training curve. We propose improving this situation by providing access to the original experimental data to validate study results. To that end, we rely on the concept of minimal traces. These allow re-simulation of action sequences in deterministic RL environments and, in turn, enable reviewers to verify, re-use, and manually inspect experimental results without needing large compute clusters. It also permits validation of presented reward graphs, an inspection of individual episodes, and re-use of result data (baselines) for proper comparison in follow-up papers. We offer plug-and-play code that works with Gym so that our measures fit well in the existing RL and reproducibility eco-system. Our approach is freely available, easy to use, and adds minimal overhead, as minimal traces allow a data compression ratio of up to $\approx 10^4:1$ (94GB to 8MB for Atari Pong) compared to a regular MDP trace used in offline RL datasets. The paper presents proof-of-concept results for a variety of games.

preprint2022arXiv

When to Go, and When to Explore: The Benefit of Post-Exploration in Intrinsic Motivation

Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper we present a systematic study of post-exploration, answering open questions that the Go-Explore paper did not answer yet. First, we study the isolated potential of post-exploration, by turning it on and off within the same algorithm. Subsequently, we introduce new methodology to adaptively decide when to post-explore and for how long to post-explore. Experiments on a range of MiniGrid environments show that post-exploration indeed boosts performance (with a bigger impact than tuning regular exploration parameters), and this effect is further enhanced by adaptively deciding when and for how long to post-explore. In short, our work identifies adaptive post-exploration as a promising direction for RL exploration research.

preprint2021arXiv

Procedural Content Generation: Better Benchmarks for Transfer Reinforcement Learning

The idea of transfer in reinforcement learning (TRL) is intriguing: being able to transfer knowledge from one problem to another problem without learning everything from scratch. This promises quicker learning and learning more complex methods. To gain an insight into the field and to detect emerging trends, we performed a database search. We note a surprisingly late adoption of deep learning that starts in 2018. The introduction of deep learning has not yet solved the greatest challenge of TRL: generalization. Transfer between different domains works well when domains have strong similarities (e.g. MountainCar to Cartpole), and most TRL publications focus on different tasks within the same domain that have few differences. Most TRL applications we encountered compare their improvements against self-defined baselines, and the field is still missing unified benchmarks. We consider this to be a disappointing situation. For the future, we note that: (1) A clear measure of task similarity is needed. (2) Generalization needs to improve. Promising approaches merge deep learning with planning via MCTS or introduce memory through LSTMs. (3) The lack of benchmarking tools will be remedied to enable meaningful comparison and measure progress. Already Alchemy and Meta-World are emerging as interesting benchmark suites. We note that another development, the increase in procedural content generation (PCG), can improve both benchmarking and generalization in TRL.

preprint2021arXiv

Visualizing MuZero Models

MuZero, a model-based reinforcement learning algorithm that uses a value equivalent dynamics model, achieved state-of-the-art performance in Chess, Shogi and the game of Go. In contrast to standard forward dynamics models that predict a full next state, value equivalent models are trained to predict a future value, thereby emphasizing value relevant information in the representations. While value equivalent models have shown strong empirical success, there is no research yet that visualizes and investigates what types of representations these models actually learn. Therefore, in this paper we visualize the latent representation of MuZero agents. We find that action trajectories may diverge between observation embeddings and internal state transition dynamics, which could lead to instability during planning. Based on this insight, we propose two regularization techniques to stabilize MuZero's performance. Additionally, we provide an open-source implementation of MuZero along with an interactive visualizer of learned representations, which may aid further investigation of value equivalent algorithms.

preprint2020arXiv

A New Challenge: Approaching Tetris Link with AI

Decades of research have been invested in making computer programs for playing games such as Chess and Go. This paper focuses on a new game, Tetris Link, a board game that is still lacking any scientific analysis. Tetris Link has a large branching factor, hampering a traditional heuristic planning approach. We explore heuristic planning and two other approaches: Reinforcement Learning, Monte Carlo tree search. We document our approach and report on their relative performance in a tournament. Curiously, the heuristic approach is stronger than the planning/learning approaches. However, experienced human players easily win the majority of the matches against the heuristic planning AIs. We, therefore, surmise that Tetris Link is more difficult than expected. We offer our findings to the community as a challenge to improve upon.

preprint2020arXiv

Analysis of Hyper-Parameters for Small Games: Iterations or Epochs in Self-Play?

The landmark achievements of AlphaGo Zero have created great research interest into self-play in reinforcement learning. In self-play, Monte Carlo Tree Search is used to train a deep neural network, that is then used in tree searches. Training itself is governed by many hyperparameters.There has been surprisingly little research on design choices for hyper-parameter values and loss-functions, presumably because of the prohibitive computational cost to explore the parameter space. In this paper, we investigate 12 hyper-parameters in an AlphaZero-like self-play algorithm and evaluate how these parameters contribute to training. We use small games, to achieve meaningful exploration with moderate computational effort. The experimental results show that training is highly sensitive to hyper-parameter choices. Through multi-objective analysis we identify 4 important hyper-parameters to further assess. To start, we find surprising results where too much training can sometimes lead to lower performance. Our main result is that the number of self-play iterations subsumes MCTS-search simulations, game-episodes, and training epochs. The intuition is that these three increase together as self-play iterations increase, and that increasing them individually is sub-optimal. A consequence of our experiments is a direct recommendation for setting hyper-parameter values in self-play: the overarching outer-loop of self-play iterations should be maximized, in favor of the three inner-loop hyper-parameters, which should be set at lower values. A secondary result of our experiments concerns the choice of optimization goals, for which we also provide recommendations.

preprint2020arXiv

Tackling Morpion Solitaire with AlphaZero-likeRanked Reward Reinforcement Learning

Morpion Solitaire is a popular single player game, performed with paper and pencil. Due to its large state space (on the order of the game of Go) traditional search algorithms, such as MCTS, have not been able to find good solutions. A later algorithm, Nested Rollout Policy Adaptation, was able to find a new record of 82 steps, albeit with large computational resources. After achieving this record, to the best of our knowledge, there has been no further progress reported, for about a decade. In this paper we take the recent impressive performance of deep self-learning reinforcement learning approaches from AlphaGo/AlphaZero as inspiration to design a searcher for Morpion Solitaire. A challenge of Morpion Solitaire is that the state space is sparse, there are few win/loss signals. Instead, we use an approach known as ranked reward to create a reinforcement learning self-play framework for Morpion Solitaire. This enables us to find medium-quality solutions with reasonable computational effort. Our record is a 67 steps solution, which is very close to the human best (68) without any other adaptation to the problem than using ranked reward. We list many further avenues for potential improvement.

preprint2020arXiv

The Second Type of Uncertainty in Monte Carlo Tree Search

Monte Carlo Tree Search (MCTS) efficiently balances exploration and exploitation in tree search based on count-derived uncertainty. However, these local visit counts ignore a second type of uncertainty induced by the size of the subtree below an action. We first show how, due to the lack of this second uncertainty type, MCTS may completely fail in well-known sparse exploration problems, known from the reinforcement learning community. We then introduce a new algorithm, which estimates the size of the subtree below an action, and leverages this information in the UCB formula to better direct exploration. Subsequently, we generalize these ideas by showing that loops, i.e., the repeated occurrence of (approximately) the same state in the same trace, are actually a special case of subtree depth variation. Testing on a variety of tasks shows that our algorithms increase sample efficiency, especially when the planning budget per timestep is small.

preprint2020arXiv

Warm-Start AlphaZero Self-Play Search Enhancements

Recently, AlphaZero has achieved landmark results in deep reinforcement learning, by providing a single self-play architecture that learned three different games at super human level. AlphaZero is a large and complicated system with many parameters, and success requires much compute power and fine-tuning. Reproducing results in other games is a challenge, and many researchers are looking for ways to improve results while reducing computational demands. AlphaZero's design is purely based on self-play and makes no use of labeled expert data ordomain specific enhancements; it is designed to learn from scratch. We propose a novel approach to deal with this cold-start problem by employing simple search enhancements at the beginning phase of self-play training, namely Rollout, Rapid Action Value Estimate (RAVE) and dynamically weighted combinations of these with the neural network, and Rolling Horizon Evolutionary Algorithms (RHEA). Our experiments indicate that most of these enhancements improve the performance of their baseline player in three different (small) board games, with especially RAVE based variants playing strongly.

preprint2016arXiv

A New Method for Parallel Monte Carlo Tree Search

In recent years there has been much interest in the Monte Carlo tree search algorithm, a new, adaptive, randomized optimization algorithm. In fields as diverse as Artificial Intelligence, Operations Research, and High Energy Physics, research has established that Monte Carlo tree search can find good solutions without domain dependent heuristics. However, practice shows that reaching high performance on large parallel machines is not so successful as expected. This paper proposes a new method for parallel Monte Carlo tree search based on the pipeline computation pattern.

preprint2015arXiv

Best-First and Depth-First Minimax Search in Practice

Most practitioners use a variant of the Alpha-Beta algorithm, a simple depth-first pro- cedure, for searching minimax trees. SSS*, with its best-first search strategy, reportedly offers the potential for more efficient search. However, the complex formulation of the al- gorithm and its alleged excessive memory requirements preclude its use in practice. For two decades, the search efficiency of "smart" best-first SSS* has cast doubt on the effectiveness of "dumb" depth-first Alpha-Beta. This paper presents a simple framework for calling Alpha-Beta that allows us to create a variety of algorithms, including SSS* and DUAL*. In effect, we formulate a best-first algorithm using depth-first search. Expressed in this framework SSS* is just a special case of Alpha-Beta, solving all of the perceived drawbacks of the algorithm. In practice, Alpha-Beta variants typically evaluate less nodes than SSS*. A new instance of this framework, MTD(f), out-performs SSS* and NegaScout, the Alpha-Beta variant of choice by practitioners.

preprint2015arXiv

Data Science and Ebola

Data Science---Today, everybody and everything produces data. People produce large amounts of data in social networks and in commercial transactions. Medical, corporate, and government databases continue to grow. Sensors continue to get cheaper and are increasingly connected, creating an Internet of Things, and generating even more data. In every discipline, large, diverse, and rich data sets are emerging, from astrophysics, to the life sciences, to the behavioral sciences, to finance and commerce, to the humanities and to the arts. In every discipline people want to organize, analyze, optimize and understand their data to answer questions and to deepen insights. The science that is transforming this ocean of data into a sea of knowledge is called data science. This lecture will discuss how data science has changed the way in which one of the most visible challenges to public health is handled, the 2014 Ebola outbreak in West Africa.

preprint2015arXiv

Ensemble UCT Needs High Exploitation

Recent results have shown that the MCTS algorithm (a new, adaptive, randomized optimization algorithm) is effective in a remarkably diverse set of applications in Artificial Intelligence, Operations Research, and High Energy Physics. MCTS can find good solutions without domain dependent heuristics, using the UCT formula to balance exploitation and exploration. It has been suggested that the optimum in the exploitation- exploration balance differs for different search tree sizes: small search trees needs more exploitation; large search trees need more exploration. Small search trees occur in variations of MCTS, such as parallel and ensemble approaches. This paper investigates the possibility of improving the performance of Ensemble UCT by increasing the level of exploitation. As the search trees becomes smaller we achieve an improved performance. The results are important for improving the performance of large scale parallelism of MCTS.

preprint2015arXiv

Scaling Monte Carlo Tree Search on Intel Xeon Phi

Many algorithms have been parallelized successfully on the Intel Xeon Phi coprocessor, especially those with regular, balanced, and predictable data access patterns and instruction flows. Irregular and unbalanced algorithms are harder to parallelize efficiently. They are, for instance, present in artificial intelligence search algorithms such as Monte Carlo Tree Search (MCTS). In this paper we study the scaling behavior of MCTS, on a highly optimized real-world application, on real hardware. The Intel Xeon Phi allows shared memory scaling studies up to 61 cores and 244 hardware threads. We compare work-stealing (Cilk Plus and TBB) and work-sharing (FIFO scheduling) approaches. Interestingly, we find that a straightforward thread pool with a work-sharing FIFO queue shows the best performance. A crucial element for this high performance is the controlling of the grain size, an approach that we call Grain Size Controlled Parallel MCTS. Our subsequent comparing with the Xeon CPUs shows an even more comprehensible distinction in performance between different threading libraries. We achieve, to the best of our knowledge, the fastest implementation of a parallel MCTS on the 61 core Intel Xeon Phi using a real application (47 relative to a sequential run).

preprint2014arXiv

A New Paradigm for Minimax Search

This paper introduces a new paradigm for minimax game-tree search algo- rithms. MT is a memory-enhanced version of Pearls Test procedure. By changing the way MT is called, a number of best-first game-tree search algorithms can be simply and elegantly constructed (including SSS*). Most of the assessments of minimax search algorithms have been based on simulations. However, these simulations generally do not address two of the key ingredients of high performance game-playing programs: iterative deepening and memory usage. This paper presents experimental data from three game-playing programs (checkers, Othello and chess), covering the range from low to high branching factor. The improved move ordering due to iterative deepening and memory usage results in significantly different results from those portrayed in the literature. Whereas some simulations show Alpha-Beta expanding almost 100% more leaf nodes than other algorithms [12], our results showed variations of less than 20%. One new instance of our framework (MTD-f) out-performs our best alpha- beta searcher (aspiration NegaScout) on leaf nodes, total nodes and execution time. To our knowledge, these are the first reported results that compare both depth-first and best-first algorithms given the same amount of memory

preprint2014arXiv

Computer-mediated communication in adults with high-functioning Autism Spectrum Conditions

It has been suggested that people with Autism Spectrum Conditions (ASC) are attracted to computer-mediated communication (CMC). In this study, several open questions regarding CMC use in people with ASC which are investigated. We compare CMC use in adults with high-functioning ASC (N = 113) and a control group (N = 72). We find that people with ASC (1) spend more time on CMC than controls, (2) are more positive about CMC, (3) report relatively high levels of online social life satisfaction, and that (4) CMC use is negatively related to satisfaction with life for people with ASC. Our results indicate that the ASC subjects in this study use CMC at least as enthusiastically as controls, and are proficient and successful in its use.

preprint2014arXiv

Crowd-Sourcing Fuzzy and Faceted Classification for Concept Search

Searching for concepts in science and technology is often a difficult task. To facilitate concept search, different types of human-generated metadata have been created to define the content of scientific and technical disclosures. Classification schemes such as the International Patent Classification (IPC) and MEDLINE's MeSH are structured and controlled, but require trained experts and central management to restrict ambiguity (Mork, 2013). While unstructured tags of folksonomies can be processed to produce a degree of structure (Kalendar, 2010; Karampinas, 2012; Sarasua, 2012; Bragg, 2013) the freedom enjoyed by the crowd typically results in less precision (Stock 2007). Existing classification schemes suffer from inflexibility and ambiguity. Since humans understand language, inference, implication, abstraction and hence concepts better than computers, we propose to harness the collective wisdom of the crowd. To do so, we propose a novel classification scheme that is sufficiently intuitive for the crowd to use, yet powerful enough to facilitate search by analogy, and flexible enough to deal with ambiguity. The system will enhance existing classification information. Linking up with the semantic web and computer intelligence, a Citizen Science effort (Good, 2013) would support innovation by improving the quality of granted patents, reducing duplicitous research, and stimulating problem-oriented solution design. A prototype of our design is in preparation. A crowd-sourced fuzzy and faceted classification scheme will allow for better concept search and improved access to prior art in science and technology.

preprint2014arXiv

HEPGAME and the Simplification of Expressions

Advances in high energy physics have created the need to increase computational capacity. Project HEPGAME was composed to address this challenge. One of the issues is that numerical integration of expressions of current interest have millions of terms and takes weeks to compute. We have investigated ways to simplify these expressions, using Horner schemes and common subexpression elimination. Our approach applies MCTS, a search procedure that has been successful in AI. We use it to find near-optimal Horner schemes. Although MCTS finds better solutions, this approach gives rise to two further challenges. (1) MCTS (with UCT) introduces a constant, $C_p$ that governs the balance between exploration and exploitation. This constant has to be tuned manually. (2) There should be more guided exploration at the bottom of the tree, since the current approach reduces the quality of the solution towards the end of the expression. We investigate NMCS (Nested Monte Carlo Search) to address both issues, but find that NMCS is computationally unfeasible for our problem. Then, we modify the MCTS formula by introducing a dynamic exploration-exploitation parameter $T$ that decreases linearly with the iteration number. Consequently, we provide a performance analysis. We observe that a variable $C_p$ solves our domain: it yields more exploration at the bottom and as a result the tuning problem has been simplified. The region in $C_p$ for which good values are found is increased by more than a tenfold. This result encourages us to continue our research to solve other prominent problems in High Energy Physics.

preprint2014arXiv

MTD(f), A Minimax Algorithm Faster Than NegaScout

MTD(f) is a new minimax search algorithm, simpler and more efficient than previous algorithms. In tests with a number of tournament game playing programs for chess, checkers and Othello it performed better, on average, than NegaScout/PVS (the AlphaBeta variant used in practically all good chess, checkers, and Othello programs). One of the strongest chess programs of the moment, MIT's parallel chess program Cilkchess uses MTD(f) as its search algorithm, replacing NegaScout, which was used in StarSocrates, the previous version of the program.

preprint2014arXiv

Nearly Optimal Minimax Tree Search?

Knuth and Moore presented a theoretical lower bound on the number of leaves that any fixed-depth minimax tree-search algorithm traversing a uniform tree must explore, the so-called minimal tree. Since real-life minimax trees are not uniform, the exact size of this tree is not known for most applications. Further, most games have transpositions, implying that there exists a minimal graph which is smaller than the minimal tree. For three games (chess, Othello and checkers) we compute the size of the minimal tree and the minimal graph. Empirical evidence shows that in all three games, enhanced Alpha-Beta search is capable of building a tree that is close in size to that of the minimal graph. Hence, it appears game-playing programs build nearly optimal search trees. However, the conventional definition of the minimal graph is wrong. There are ways in which the size of the minimal graph can be reduced: by maximizing the number of transpositions in the search, and generating cutoffs using branches that lead to smaller search trees. The conventional definition of the minimal graph is just a left-most approximation. Calculating the size of the real minimal graph is too computationally intensive. However, upper bound approximations show it to be significantly smaller than the left-most minimal graph. Hence, it appears that game-playing programs are not searching as efficiently as is widely believed. Understanding the left-most and real minimal search graphs leads to some new ideas for enhancing Alpha-Beta search. One of them, enhanced transposition cutoffs, is shown to significantly reduce search tree size.

preprint2014arXiv

Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi

In 2013 Intel introduced the Xeon Phi, a new parallel co-processor board. The Xeon Phi is a cache-coherent many-core shared memory architecture claiming CPU-like versatility, programmability, high performance, and power efficiency. The first published micro-benchmark studies indicate that many of Intel's claims appear to be true. The current paper is the first study on the Phi of a complex artificial intelligence application. It contains an open source MCTS application for playing tournament quality Go (an oriental board game). We report the first speedup figures for up to 240 parallel threads on a real machine, allowing a direct comparison to previous simulation studies. After a substantial amount of work, we observed that performance scales well up to 32 threads, largely confirming previous simulation results of this Go program, although the performance surprisingly deteriorates between 32 and 240 threads. Furthermore, we report (1) unexpected performance anomalies between the Xeon Phi and Xeon CPU for small problem sizes and small numbers of threads, and (2) that performance is sensitive to scheduling choices. Achieving good performance on the Xeon Phi for complex programs is not straightforward; it requires a deep understanding of (1) search patterns, (2) of scheduling, and (3) of the architecture and its many cores and caches. In practice, the Xeon Phi is less straightforward to program for than originally envisioned by Intel.

preprint2014arXiv

SSS* = Alpha-Beta + TT

In 1979 Stockman introduced the SSS* minimax search algorithm that domi- nates Alpha-Beta in the number of leaf nodes expanded. Further investigation of the algorithm showed that it had three serious drawbacks, which prevented its use by practitioners: it is difficult to understand, it has large memory requirements, and it is slow. This paper presents an alternate formulation of SSS*, in which it is implemented as a series of Alpha-Beta calls that use a transposition table (AB- SSS*). The reformulation solves all three perceived drawbacks of SSS*, making it a practical algorithm. Further, because the search is now based on Alpha-Beta, the extensive research on minimax search enhancements can be easily integrated into AB-SSS*. To test AB-SSS* in practise, it has been implemented in three state-of-the- art programs: for checkers, Othello and chess. AB-SSS* is comparable in performance to Alpha-Beta on leaf node count in all three games, making it a viable alternative to Alpha-Beta in practise. Whereas SSS* has usually been regarded as being entirely different from Alpha-Beta, it turns out to be just an Alpha-Beta enhancement, like null-window searching. This runs counter to published simulation results. Our research leads to the surprising result that iterative deepening versions of Alpha-Beta can expand fewer leaf nodes than iterative deepening versions of SSS* due to dynamic move re-ordering.

preprint2014arXiv

Virtual Reflexes

Virtual Reality is used successfully to treat people for regular phobias. A new challenge is to develop Virtual Reality Exposure Training for social skills. Virtual actors in such systems have to show appropriate social behavior including emotions, gaze, and keeping distance. The behavior must be realistic and real-time. Current approaches consist of four steps: 1) trainee social signal detection, 2) cognitive-affective interpretation, 3) determination of the appropriate bodily responses, and 4) actuation. The "cognitive" detour of such approaches does not match the directness of human bodily reflexes and causes unrealistic responses and delay. Instead, we propose virtual reflexes as concurrent sensory-motor processes to control virtual actors. Here we present a virtual reflexes architecture, explain how emotion and cognitive modulation are embedded, detail its workings, and give an example description of an aggression training application.

preprint2014arXiv

Why Local Search Excels in Expression Simplification

Simplifying expressions is important to make numerical integration of large expressions from High Energy Physics tractable. To this end, Horner's method can be used. Finding suitable Horner schemes is assumed to be hard, due to the lack of local heuristics. Recently, MCTS was reported to be able to find near optimal schemes. However, several parameters had to be fine-tuned manually. In this work, we investigate the state space properties of Horner schemes and find that the domain is relatively flat and contains only a few local minima. As a result, the Horner space is appropriate to be explored by Stochastic Local Search (SLS), which has only two parameters: the number of iterations (computation time) and the neighborhood structure. We found a suitable neighborhood structure, leaving only the allowed computation time as a parameter. We performed a range of experiments. The results obtained by SLS are similar or better than those obtained by MCTS. Furthermore, we show that SLS obtains the good results at least 10 times faster. Using SLS, we can speed up numerical integration of many real-world large expressions by at least a factor of 24. For High Energy Physics this means that numerical integrations that took weeks can now be done in hours.

preprint2013arXiv

Combining Simulated Annealing and Monte Carlo Tree Search for Expression Simplification

In many applications of computer algebra large expressions must be simplified to make repeated numerical evaluations tractable. Previous works presented heuristically guided improvements, e.g., for Horner schemes. The remaining expression is then further reduced by common subexpression elimination. A recent approach successfully applied a relatively new algorithm, Monte Carlo Tree Search (MCTS) with UCT as the selection criterion, to find better variable orderings. Yet, this approach is fit for further improvements since it is sensitive to the so-called exploration-exploitation constant $C_p$ and the number of tree updates $N$. In this paper we propose a new selection criterion called Simulated Annealing UCT (SA-UCT) that has a dynamic exploration-exploitation parameter, which decreases with the iteration number $i$ and thus reduces the importance of exploration over time. First, we provide an intuitive explanation in terms of the exploration-exploitation behavior of the algorithm. Then, we test our algorithm on three large expressions of different origins. We observe that SA-UCT widens the interval of good initial values $C_p$ where best results are achieved. The improvement is large (more than a tenfold) and facilitates the selection of an appropriate $C_p$.

Aske Plaat

What is connected

Connect this record

See the researcher in context

Building this map preview

30 published item(s)

Agentic Large Language Models, a survey

First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation

A Unifying Framework for Reinforcement Learning and Planning

Model-based Reinforcement Learning: A Survey

On Credit Assignment in Hierarchical Reinforcement Learning

Reliable validation of Reinforcement Learning Benchmarks

When to Go, and When to Explore: The Benefit of Post-Exploration in Intrinsic Motivation

Procedural Content Generation: Better Benchmarks for Transfer Reinforcement Learning

Visualizing MuZero Models

A New Challenge: Approaching Tetris Link with AI

Analysis of Hyper-Parameters for Small Games: Iterations or Epochs in Self-Play?

Tackling Morpion Solitaire with AlphaZero-likeRanked Reward Reinforcement Learning

The Second Type of Uncertainty in Monte Carlo Tree Search

Warm-Start AlphaZero Self-Play Search Enhancements

A New Method for Parallel Monte Carlo Tree Search

Best-First and Depth-First Minimax Search in Practice

Data Science and Ebola

Ensemble UCT Needs High Exploitation

Scaling Monte Carlo Tree Search on Intel Xeon Phi

A New Paradigm for Minimax Search

Computer-mediated communication in adults with high-functioning Autism Spectrum Conditions

Crowd-Sourcing Fuzzy and Faceted Classification for Concept Search

HEPGAME and the Simplification of Expressions

MTD(f), A Minimax Algorithm Faster Than NegaScout

Nearly Optimal Minimax Tree Search?

Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi

SSS* = Alpha-Beta + TT

Virtual Reflexes

Why Local Search Excels in Expression Simplification

Combining Simulated Annealing and Monte Carlo Tree Search for Expression Simplification