Researcher profile

Santiago Ontañón

Santiago Ontañón contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2022arXiv

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

In recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games. Because these games have complicated rules, an action sampled from the full discrete action distribution predicted by the learned policy is likely to be invalid according to the game rules (e.g., walking into a wall). The usual approach to deal with this problem in policy gradient algorithms is to "mask out" invalid actions and just sample from the set of valid actions. The implications of this process, however, remain under-investigated. In this paper, we 1) show theoretical justification for such a practice, 2) empirically demonstrate its importance as the space of invalid actions grows, and 3) provide further insights by evaluating different action masking regimes, such as removing masking after an agent has been trained using masking. The source code can be found at https://github.com/vwxyzjn/invalid-action-masking

preprint2022arXiv

A2C is a special case of PPO

Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years. A common understanding is that A2C and PPO are separate algorithms because PPO's clipped objective appears significantly different than A2C's objective. In this paper, however, we show A2C is a special case of PPO. We present theoretical justifications and pseudocode analysis to demonstrate why. To validate our claim, we conduct an empirical experiment using \texttt{Stable-baselines3}, showing A2C and PPO produce the \textit{exact} same models when other settings are controlled.

preprint2022arXiv

Making Transformers Solve Compositional Tasks

Several studies have reported the inability of Transformer models to generalize compositionally, a key type of generalization in many NLP tasks such as semantic parsing. In this paper we explore the design space of Transformer models showing that the inductive biases given to the model by several design decisions significantly impact compositional generalization. Through this exploration, we identified Transformer configurations that generalize compositionally significantly better than previously reported in the literature in a diverse set of compositional tasks, and that achieve state-of-the-art results in a semantic parsing compositional generalization benchmark (COGS), and a string edit operation composition benchmark (PCFG).

preprint2021arXiv

Personalization Paradox in Behavior Change Apps: Lessons from a Social Comparison-Based Personalized App for Physical Activity

Social comparison-based features are widely used in social computing apps. However, most existing apps are not grounded in social comparison theories and do not consider individual differences in social comparison preferences and reactions. This paper is among the first to automatically personalize social comparison targets. In the context of an m-health app for physical activity, we use artificial intelligence (AI) techniques of multi-armed bandits. Results from our user study (n=53) indicate that there is some evidence that motivation can be increased using the AI-based personalization of social comparison. The detected effects achieved small-to-moderate effect sizes, illustrating the real-world implications of the intervention for enhancing motivation and physical activity. In addition to design implications for social comparison features in social apps, this paper identified the personalization paradox, the conflict between user modeling and adaptation, as a key design challenge of personalized applications for behavior change. Additionally, we propose research directions to mitigate this Personalization Paradox.

preprint2021arXiv

Player Modeling via Multi-Armed Bandits

This paper focuses on building personalized player models solely from player behavior in the context of adaptive games. We present two main contributions: The first is a novel approach to player modeling based on multi-armed bandits (MABs). This approach addresses, at the same time and in a principled way, both the problem of collecting data to model the characteristics of interest for the current player and the problem of adapting the interactive experience based on this model. Second, we present an approach to evaluating and fine-tuning these algorithms prior to generating data in a user study. This is an important problem, because conducting user studies is an expensive and labor-intensive process; therefore, an ability to evaluate the algorithms beforehand can save a significant amount of resources. We evaluate our approach in the context of modeling players' social comparison orientation (SCO) and present empirical results from both simulations and real players.

preprint2021arXiv

Player-Centered AI for Automatic Game Personalization: Open Problems

Computer games represent an ideal research domain for the next generation of personalized digital applications. This paper presents a player-centered framework of AI for game personalization, complementary to the commonly used system-centered approaches. Built on the Structure of Actions theory, the paper maps out the current landscape of game personalization research and identifies eight open problems that need further investigation. These problems require deep collaboration between technological advancement and player experience design.

preprint2021arXiv

Regression Oracles and Exploration Strategies for Short-Horizon Multi-Armed Bandits

This paper explores multi-armed bandit (MAB) strategies in very short horizon scenarios, i.e., when the bandit strategy is only allowed very few interactions with the environment. This is an understudied setting in the MAB literature with many applications in the context of games, such as player modeling. Specifically, we pursue three different ideas. First, we explore the use of regression oracles, which replace the simple average used in strategies such as epsilon-greedy with linear regression models. Second, we examine different exploration patterns such as forced exploration phases. Finally, we introduce a new variant of the UCB1 strategy called UCBT that has interesting properties and no tunable parameters. We present experimental results in a domain motivated by exergames, where the goal is to maximize a player's daily steps. Our results show that the combination of epsilon-greedy or epsilon-decreasing with regression oracles outperforms all other tested strategies in the short horizon setting.

preprint2021arXiv

The Personalization Paradox: the Conflict between Accurate User Models and Personalized Adaptive Systems

Personalized adaptation technology has been adopted in a wide range of digital applications such as health, training and education, e-commerce and entertainment. Personalization systems typically build a user model, aiming to characterize the user at hand, and then use this model to personalize the interaction. Personalization and user modeling, however, are often intrinsically at odds with each other (a fact some times referred to as the personalization paradox). In this paper, we take a closer look at this personalization paradox, and identify two ways in which it might manifest: feedback loops and moving targets. To illustrate these issues, we report results in the domain of personalized exergames (videogames for physical exercise), and describe our early steps to address some of the issues arisen by the personalization paradox.

preprint2020arXiv

An Overview of Distance and Similarity Functions for Structured Data

The notions of distance and similarity play a key role in many machine learning approaches, and artificial intelligence (AI) in general, since they can serve as an organizing principle by which individuals classify objects, form concepts and make generalizations. While distance functions for propositional representations have been thoroughly studied, work on distance functions for structured representations, such as graphs, frames or logical clauses, has been carried out in different communities and is much less understood. Specifically, a significant amount of work that requires the use of a distance or similarity function for structured representations of data usually employs ad-hoc functions for specific applications. Therefore, the goal of this paper is to provide an overview of this work to identify connections between the work carried out in different areas and point out directions for future work.

preprint2020arXiv

Comparing Observation and Action Representations for Deep Reinforcement Learning in $μ$RTS

This paper presents a preliminary study comparing different observation and action space representations for Deep Reinforcement Learning (DRL) in the context of Real-time Strategy (RTS) games. Specifically, we compare two representations: (1) a global representation where the observation represents the whole game state, and the RL agent needs to choose which unit to issue actions to, and which actions to execute; and (2) a local representation where the observation is represented from the point of view of an individual unit, and the RL agent picks actions for each unit independently. We evaluate these representations in $μ$RTS showing that the local representation seems to outperform the global representation when training agents with the task of harvesting resources.

preprint2020arXiv

RHOG: A Refinement-Operator Library for Directed Labeled Graphs

This document provides the foundations behind the functionality provided by the $ρ$G library (https://github.com/santiontanon/RHOG), focusing on the basic operations the library provides: subsumption, refinement of directed labeled graphs, and distance/similarity assessment between directed labeled graphs. $ρ$G development was initially supported by the National Science Foundation, by the EAGER grant IIS-1551338.

preprint2020arXiv

Understanding Learners' Problem-Solving Strategies in Concurrent and Parallel Programming: A Game-Based Approach

Concurrent and parallel programming (CPP) is an increasingly important subject in Computer Science Education. However, the conceptual shift from sequential programming is notoriously difficult to make. Currently, relatively little research exists on how people learn CPP core concepts. This paper presents our results of using Parallel, an educational game about CPP, focusing on the learners' self-efficacy and how they learn CPP concepts. Based on a study of 44 undergraduate students, our research shows that (a) self-efficacy increased significantly after playing the game; (b) the problem-solving strategies employed by students playing the game can be classified in three main types: trial and error, single-thread, and multi-threaded strategies, and (c) that self-efficacy is correlated with the percentage of time students spend in multithreaded problem-solving.