Source author record

Christos Dimitrakakis

Christos Dimitrakakis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Cryptography and Security Applications Computer Science and Game Theory Information Retrieval Multiagent Systems Networking and Internet Architecture

Catalog footprint

What is connected

26works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Interactive Inverse Reinforcement Learning for Cooperative Games

We study the problem of designing autonomous agents that can learn to cooperate effectively with a potentially suboptimal partner while having no access to the joint reward function. This problem is modeled as a cooperative episodic two-agent Markov decision process. We assume control over only the first of the two agents in a Stackelberg formulation of the game, where the second agent is acting so as to maximise expected utility given the first agent's policy. How should the first agent act in order to learn the joint reward function as quickly as possible and so that the joint policy is as close to optimal as possible? We analyse how knowledge about the reward function can be gained in this interactive two-agent scenario. We show that when the learning agent's policies have a significant effect on the transition function, the reward function can be learned efficiently.

preprint2022arXiv

On Meritocracy in Optimal Set Selection

Typically, merit is defined with respect to some intrinsic measure of worth. We instead consider a setting where an individual's worth is \emph{relative}: when a Decision Maker (DM) selects a set of individuals from a population to maximise expected utility, it is natural to consider the \emph{Expected Marginal Contribution} (EMC) of each person to the utility. We show that this notion satisfies an axiomatic definition of fairness for this setting. We also show that for certain policy structures, this notion of fairness is aligned with maximising expected utility, while for linear utility functions it is identical to the Shapley value. However, for certain natural policies, such as those that select individuals with a specific set of attributes (e.g. high enough test scores for college admissions), there is a trade-off between meritocracy and utility maximisation. We analyse the effect of constraints on the policy on both utility and fairness in extensive experiments based on college admissions and outcomes in Norwegian universities.

preprint2022arXiv

Risk-Sensitive Bayesian Games for Multi-Agent Reinforcement Learning under Policy Uncertainty

In stochastic games with incomplete information, the uncertainty is evoked by the lack of knowledge about a player's own and the other players' types, i.e. the utility function and the policy space, and also the inherent stochasticity of different players' interactions. In existing literature, the risk in stochastic games has been studied in terms of the inherent uncertainty evoked by the variability of transitions and actions. In this work, we instead focus on the risk associated with the \textit{uncertainty over types}. We contrast this with the multi-agent reinforcement learning framework where the other agents have fixed stationary policies and investigate risk-sensitiveness due to the uncertainty about the other agents' adaptive policies. We propose risk-sensitive versions of existing algorithms proposed for risk-neutral stochastic games, such as Iterated Best Response (IBR), Fictitious Play (FP) and a general multi-objective gradient approach using dual ascent (DAPG). Our experimental analysis shows that risk-sensitive DAPG performs better than competing algorithms for both social welfare and general-sum stochastic games.

preprint2022arXiv

SENTINEL: Taming Uncertainty with Ensemble-based Distributional Reinforcement Learning

In this paper, we consider risk-sensitive sequential decision-making in Reinforcement Learning (RL). Our contributions are two-fold. First, we introduce a novel and coherent quantification of risk, namely composite risk, which quantifies the joint effect of aleatory and epistemic risk during the learning process. Existing works considered either aleatory or epistemic risk individually, or as an additive combination. We prove that the additive formulation is a particular case of the composite risk when the epistemic risk measure is replaced with expectation. Thus, the composite risk is more sensitive to both aleatory and epistemic uncertainty than the individual and additive formulations. We also propose an algorithm, SENTINEL-K, based on ensemble bootstrapping and distributional RL for representing epistemic and aleatory uncertainty respectively. The ensemble of K learners uses Follow The Regularised Leader (FTRL) to aggregate the return distributions and obtain the composite risk. We experimentally verify that SENTINEL-K estimates the return distribution better, and while used with composite risk estimates, demonstrates higher risk-sensitive performance than state-of-the-art risk-sensitive and distributional RL algorithms.

preprint2020arXiv

Bayesian Reinforcement Learning via Deep, Sparse Sampling

We address the problem of Bayesian reinforcement learning using efficient model-based online planning. We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal policy, with a lower computational complexity. The main novelty is the use of a candidate policy generator, to generate long-term options in the planning tree (over beliefs), which allows us to create much sparser and deeper trees. Experimental results on different environments show that in comparison to the state-of-the-art, our algorithm is both computationally more efficient, and obtains significantly higher reward in discrete environments.

preprint2020arXiv

Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost?

Based on differential privacy (DP) framework, we introduce and unify privacy definitions for the multi-armed bandit algorithms. We represent the framework with a unified graphical model and use it to connect privacy definitions. We derive and contrast lower bounds on the regret of bandit algorithms satisfying these definitions. We leverage a unified proving technique to achieve all the lower bounds. We show that for all of them, the learner's regret is increased by a multiplicative factor dependent on the privacy level $ε$. We observe that the dependency is weaker when we do not require local differential privacy for the rewards.

preprint2020arXiv

Inferential Induction: A Novel Framework for Bayesian Reinforcement Learning

Bayesian reinforcement learning (BRL) offers a decision-theoretic solution for reinforcement learning. While "model-based" BRL algorithms have focused either on maintaining a posterior distribution on models or value functions and combining this with approximate dynamic programming or tree search, previous Bayesian "model-free" value function distribution approaches implicitly make strong assumptions or approximations. We describe a novel Bayesian framework, Inferential Induction, for correctly inferring value function distributions from data, which leads to the development of a new class of BRL algorithms. We design an algorithm, Bayesian Backwards Induction, with this framework. We experimentally demonstrate that the proposed algorithm is competitive with respect to the state of the art.

preprint2019arXiv

Epistemic Risk-Sensitive Reinforcement Learning

We develop a framework for interacting with uncertain environments in reinforcement learning (RL) by leveraging preferences in the form of utility functions. We claim that there is value in considering different risk measures during learning. In this framework, the preference for risk can be tuned by variation of the parameter $β$ and the resulting behavior can be risk-averse, risk-neutral or risk-taking depending on the parameter choice. We evaluate our framework for learning problems with model uncertainty. We measure and control for \emph{epistemic} risk using dynamic programming (DP) and policy gradient-based algorithms. The risk-averse behavior is then compared with the behavior of the optimal risk-neutral policy in environments with epistemic risk.

preprint2016arXiv

Bayesian Differential Privacy through Posterior Sampling

Differential privacy formalises privacy-preserving mechanisms that provide access to a database. We pose the question of whether Bayesian inference itself can be used directly to provide private access to data, with no modification. The answer is affirmative: under certain conditions on the prior, sampling from the posterior distribution can be used to achieve a desired level of privacy and utility. To do so, we generalise differential privacy to arbitrary dataset metrics, outcome spaces and distribution families. This allows us to also deal with non-i.i.d or non-tabular datasets. We prove bounds on the sensitivity of the posterior to the data, which gives a measure of robustness. We also show how to use posterior sampling to provide differentially private responses to queries, within a decision-theoretic framework. Finally, we provide bounds on the utility and on the distinguishability of datasets. The latter are complemented by a novel use of Le Cam's method to obtain lower bounds. All our general results hold for arbitrary database metrics, including those for the common definition of differential privacy. For specific choices of the metric, we give a number of examples satisfying our assumptions.

preprint2015arXiv

Algorithms for Differentially Private Multi-Armed Bandits

We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist $(ε, δ)$ differentially private variants of Upper Confidence Bound algorithms which have optimal regret, $O(ε^{-1} + \log T)$. This is a significant improvement over previous results, which only achieve poly-log regret $O(ε^{-2} \log^{2} T)$, because of our use of a novel interval-based mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds.

preprint2014arXiv

Cover Tree Bayesian Reinforcement Learning

This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with least squares policy iteration.

preprint2014arXiv

Generalised Entropy MDPs and Minimax Regret

Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.

preprint2014arXiv

Personalized News Recommendation with Context Trees

The profusion of online news articles makes it difficult to find interesting articles, a problem that can be assuaged by using a recommender system to bring the most relevant news stories to readers. However, news recommendation is challenging because the most relevant articles are often new content seen by few users. In addition, they are subject to trends and preference changes over time, and in many cases we do not have sufficient information to profile the reader. In this paper, we introduce a class of news recommendation systems based on context trees. They can provide high-quality news recommendation to anonymous visitors based on present browsing behaviour. We show that context-tree recommender systems provide good prediction accuracy and recommendation novelty, and they are sufficiently flexible to capture the unique properties of news articles.

preprint2014arXiv

Probabilistic inverse reinforcement learning in unknown environments

We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.

preprint2013arXiv

ABC Reinforcement Learning

This paper introduces a simple, general framework for likelihood-free Bayesian reinforcement learning, through Approximate Bayesian Computation (ABC). The main advantage is that we only require a prior distribution on a class of simulators (generative models). This is useful in domains where an analytical probabilistic model of the underlying process is too complex to formulate, but where detailed simulation models are available. ABC-RL allows the use of any Bayesian reinforcement learning technique, even in this case. In addition, it can be seen as an extension of rollout algorithms to the case where we do not know what the correct model to draw rollouts from is. We experimentally demonstrate the potential of this approach in a comparison with LSPI. Finally, we introduce a theorem showing that ABC is a sound methodology in principle, even when non-sufficient statistics are used.

preprint2013arXiv

Monte-Carlo utility estimates for Bayesian reinforcement learning

This paper introduces a set of algorithms for Monte-Carlo Bayesian reinforcement learning. Firstly, Monte-Carlo estimation of upper bounds on the Bayes-optimal value function is employed to construct an optimistic policy. Secondly, gradient-based algorithms for approximate upper and lower bounds are introduced. Finally, we introduce a new class of gradient algorithms for Bayesian Bellman error minimisation. We theoretically show that the gradient methods are sound. Experimentally, we demonstrate the superiority of the upper bound method in terms of reward obtained. However, we also show that the Bayesian Bellman error method is a close second, despite its significant computational simplicity.

preprint2013arXiv

Near-Optimal Blacklisting

Many applications involve agents sharing a resource, such as networks or services. When agents are honest, the system functions well and there is a net profit. Unfortunately, some agents may be malicious, but it may be hard to detect them. We consider the intrusion response problem of how to permanently blacklist agents, in order to maximise expected profit. This is not trivial, as blacklisting may erroneously expel honest agents. Conversely, while we gain information by allowing an agent to remain, we may incur a cost due to malicious behaviour. We present an efficient algorithm (HIPER) for making near-optimal decisions for this problem. Additionally, we derive three algorithms by reducing the problem to a Markov decision process (MDP). Theoretically, we show that HIPER is near-optimal. Experimentally, its performance is close to that of the full MDP solution, when the (stronger) requirements of the latter are met.

preprint2013arXiv

Probabilistic inverse reinforcement learning in unknown environments

preprint2012arXiv

Sparse Reward Processes

We introduce a class of learning problems where the agent is presented with a series of tasks. Intuitively, if there is relation among those tasks, then the information gained during execution of one task has value for the execution of another task. Consequently, the agent is intrinsically motivated to explore its environment beyond the degree necessary to solve the current task it has at hand. We develop a decision theoretic setting that generalises standard reinforcement learning tasks and captures this intuition. More precisely, we consider a multi-stage stochastic game between a learning agent and an opponent. We posit that the setting is a good model for the problem of life-long learning in uncertain environments, where while resources must be spent learning about currently important tasks, there is also the need to allocate effort towards learning about aspects of the world which are not relevant at the moment. This is due to the fact that unpredictable future events may lead to a change of priorities for the decision maker. Thus, in some sense, the model "explains" the necessity of curiosity. Apart from introducing the general formalism, the paper provides algorithms. These are evaluated experimentally in some exemplary domains. In addition, performance bounds are proven for some cases of this problem.

preprint2011arXiv

Bayesian multitask inverse reinforcement learning

We generalise the problem of inverse reinforcement learning to multiple tasks, from multiple demonstrations. Each one may represent one expert trying to solve a different task, or as different experts trying to solve the same task. Our main contribution is to formalise the problem as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the relatedness of different tasks or expert policies. In doing so, we introduce a prior on policy optimality, which is more natural to specify. We show that our framework allows us not only to learn to efficiently from multiple experts but to also effectively differentiate between the goals of each. Possible applications include analysing the intrinsic motivations of subjects in behavioural experiments and learning from multiple teachers.

preprint2011arXiv

Context models on sequences of covers

We present a class of models that, via a simple construction, enables exact, incremental, non-parametric, polynomial-time, Bayesian inference of conditional measures. The approach relies upon creating a sequence of covers on the conditioning variable and maintaining a different model for each set within a cover. Inference remains tractable by specifying the probabilistic model in terms of a random walk within the sequence of covers. We demonstrate the approach on problems of conditional density estimation, which, to our knowledge is the first closed-form, non-parametric Bayesian approach to this problem.

preprint2011arXiv

Preference elicitation and inverse reinforcement learning

We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent's preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relation of the resulting approach to other statistical methods for inverse reinforcement learning via analysis and experimental results. We show that preferences can be determined accurately, even if the observed agent's policy is sub-optimal with respect to its own preferences. In that case, significantly improved policies with respect to the agent's preferences are obtained, compared to both other methods and to the performance of the demonstrated policy.

preprint2011arXiv

Robust Bayesian reinforcement learning through tight lower bounds

In the Bayesian approach to sequential decision making, exact calculation of the (subjective) utility is intractable. This extends to most special cases of interest, such as reinforcement learning problems. While utility bounds are known to exist for this problem, so far none of them were particularly tight. In this paper, we show how to efficiently calculate a lower bound, which corresponds to the utility of a near-optimal memoryless policy for the decision problem, which is generally different from both the Bayes-optimal policy and the policy which is optimal for the expected MDP under the current belief. We then show how these can be applied to obtain robust exploration policies in a Bayesian reinforcement learning setting.

preprint2011arXiv

Tree Exploration for Bayesian RL Exploration

Research in reinforcement learning has produced algorithms for optimal decision making under uncertainty that fall within two main types. The first employs a Bayesian framework, where optimality improves with increased computational time. This is because the resulting planning task takes the form of a dynamic programming problem on a belief tree with an infinite number of states. The second type employs relatively simple algorithm which are shown to suffer small regret within a distribution-free framework. This paper presents a lower bound and a high probability upper bound on the optimal value function for the nodes in the Bayesian belief tree, which are analogous to similar bounds in POMDPs. The bounds are then used to create more efficient strategies for exploring the tree. The resulting algorithms are compared with the distribution-free algorithm UCB1, as well as a simpler baseline algorithm on multi-armed bandit problems.

preprint2010arXiv

Expected loss analysis of thresholded authentication protocols in noisy conditions

A number of authentication protocols have been proposed recently, where at least some part of the authentication is performed during a phase, lasting $n$ rounds, with no error correction. This requires assigning an acceptable threshold for the number of detected errors. This paper describes a framework enabling an expected loss analysis for all the protocols in this family. Furthermore, computationally simple methods to obtain nearly optimal value of the threshold, as well as for the number of rounds is suggested. Finally, a method to adaptively select both the number of rounds and the threshold is proposed.

preprint2010arXiv

Shedding Light on RFID Distance Bounding Protocols and Terrorist Fraud Attacks

The vast majority of RFID authentication protocols assume the proximity between readers and tags due to the limited range of the radio channel. However, in real scenarios an intruder can be located between the prover (tag) and the verifier (reader) and trick this last one into thinking that the prover is in close proximity. This attack is generally known as a relay attack in which scope distance fraud, mafia fraud and terrorist attacks are included. Distance bounding protocols represent a promising countermeasure to hinder relay attacks. Several protocols have been proposed during the last years but vulnerabilities of major or minor relevance have been identified in most of them. In 2008, Kim et al. [1] proposed a new distance bounding protocol with the objective of being the best in terms of security, privacy, tag computational overhead and fault tolerance. In this paper, we analyze this protocol and we present a passive full disclosure attack, which allows an adversary to discover the long-term secret key of the tag. The presented attack is very relevant, since no security objectives are met in Kim et al.'s protocol. Then, design guidelines are introduced with the aim of facilitating protocol designers the stimulating task of designing secure and efficient schemes against relay attacks. Finally a new protocol, named Hitomi and inspired by [1], is designed conforming the guidelines proposed previously.

Christos Dimitrakakis

What is connected

Connect this record

See the researcher in context

Building this map preview

26 published item(s)

Interactive Inverse Reinforcement Learning for Cooperative Games

On Meritocracy in Optimal Set Selection

Risk-Sensitive Bayesian Games for Multi-Agent Reinforcement Learning under Policy Uncertainty

SENTINEL: Taming Uncertainty with Ensemble-based Distributional Reinforcement Learning

Bayesian Reinforcement Learning via Deep, Sparse Sampling

Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost?

Inferential Induction: A Novel Framework for Bayesian Reinforcement Learning

Epistemic Risk-Sensitive Reinforcement Learning

Bayesian Differential Privacy through Posterior Sampling

Algorithms for Differentially Private Multi-Armed Bandits

Cover Tree Bayesian Reinforcement Learning

Generalised Entropy MDPs and Minimax Regret

Personalized News Recommendation with Context Trees

Probabilistic inverse reinforcement learning in unknown environments

ABC Reinforcement Learning

Monte-Carlo utility estimates for Bayesian reinforcement learning

Near-Optimal Blacklisting

Probabilistic inverse reinforcement learning in unknown environments

Sparse Reward Processes

Bayesian multitask inverse reinforcement learning

Context models on sequences of covers

Preference elicitation and inverse reinforcement learning

Robust Bayesian reinforcement learning through tight lower bounds

Tree Exploration for Bayesian RL Exploration

Expected loss analysis of thresholded authentication protocols in noisy conditions

Shedding Light on RFID Distance Bounding Protocols and Terrorist Fraud Attacks