Source author record

Djallel Bouneffouf

Djallel Bouneffouf appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Information Retrieval Neurons and Cognition Computation and Language Multiagent Systems Computer Science and Game Theory Computer Vision cs.CY Human-Computer Interaction math.OC Neural and Evolutionary Computing Other Computer Science

Catalog footprint

What is connected

35works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Mitigating Misalignment Contagion by Steering with Implicit Traits

Language models (LMs) are increasingly used in high-stakes, multi-agent settings, where following instructions and maintaining value alignment are critical. Most alignment research focuses on interactions between a single LM and a single user, failing to address the risk of misaligned behavior spreading between multiple LMs in multi-turn interactions. We find evidence of this phenomenon, which we call misalignment contagion, across multiple LMs as they engage multi-turn conversational social dilemma games. Specifically, we find that LMs become more anti-social after gameplay and that this effect is intensified when other players are steered to act maliciously. We explore different steering techniques to mitigate such misalignment contagion and find that reinforcing an LM's system prompt is insufficient and often harmful. Instead, we propose steering with implicit traits: a technique that intermittently injects system prompts with statements that reinforce an LMs initial traits and is more effective than system prompt repetition at keeping models in line with their initial pro-social behaviors. Importantly, this method does not require access to model parameters or internal model states, making it suitable for increasingly common use cases where complex multi-agent workflows are being designed with black box models.

preprint2022arXiv

Deep Annotation of Therapeutic Working Alliance in Psychotherapy

The therapeutic working alliance is an important predictor of the outcome of the psychotherapy treatment. In practice, the working alliance is estimated from a set of scoring questionnaires in an inventory that both the patient and the therapists fill out. In this work, we propose an analytical framework of directly inferring the therapeutic working alliance from the natural language within the psychotherapy sessions in a turn-level resolution with deep embeddings such as the Doc2Vec and SentenceBERT models. The transcript of each psychotherapy session can be transcribed and generated in real-time from the session speech recordings, and these embedded dialogues are compared with the distributed representations of the statements in the working alliance inventory. We demonstrate, in a real-world dataset with over 950 sessions of psychotherapy treatments in anxiety, depression, schizophrenia and suicidal patients, the effectiveness of this method in mapping out trajectories of patient-therapist alignment and the interpretability that can offer insights in clinical psychiatry. We believe such a framework can be provide timely feedback to the therapist regarding the quality of the conversation in interview sessions.

preprint2022arXiv

Learning to Generate Image Source-Agnostic Universal Adversarial Perturbations

Adversarial perturbations are critical for certifying the robustness of deep learning models. A universal adversarial perturbation (UAP) can simultaneously attack multiple images, and thus offers a more unified threat model, obviating an image-wise attack algorithm. However, the existing UAP generator is underdeveloped when images are drawn from different image sources (e.g., with different image resolutions). Towards an authentic universality across image sources, we take a novel view of UAP generation as a customized instance of few-shot learning, which leverages bilevel optimization and learning-to-optimize (L2O) techniques for UAP generation with improved attack success rate (ASR). We begin by considering the popular model agnostic meta-learning (MAML) framework to meta-learn a UAP generator. However, we see that the MAML framework does not directly offer the universal attack across image sources, requiring us to integrate it with another meta-learning framework of L2O. The resulting scheme for meta-learning a UAP generator (i) has better performance (50% higher ASR) than baselines such as Projected Gradient Descent, (ii) has better performance (37% faster) than the vanilla L2O and MAML frameworks (when applicable), and (iii) is able to simultaneously handle UAP generation for different victim models and image data sources.

preprint2022arXiv

Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior

As an important psychological and social experiment, the Iterated Prisoner's Dilemma (IPD) treats the choice to cooperate or defect as an atomic action. We propose to study the behaviors of online learning algorithms in the Iterated Prisoner's Dilemma (IPD) game, where we investigate the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learning. We evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion. This allows us to analyze the dynamics of policies learned by multiple self-interested independent reward-driven agents, and also allows us study the capacity of these algorithms to fit the human behaviors. Results suggest that considering the current situation to make decision is the worst in this kind of social dilemma game. Multiples discoveries on online learning behaviors and clinical validations are stated, as an effort to connect artificial intelligence algorithms with human behaviors and their abnormal states in neuropsychiatric conditions.

preprint2022arXiv

Optimal Epidemic Control as a Contextual Combinatorial Bandit with Budget

In light of the COVID-19 pandemic, it is an open challenge and critical practical problem to find a optimal way to dynamically prescribe the best policies that balance both the governmental resources and epidemic control in different countries and regions. To solve this multi-dimensional tradeoff of exploitation and exploration, we formulate this technical challenge as a contextual combinatorial bandit problem that jointly optimizes a multi-criteria reward function. Given the historical daily cases in a region and the past intervention plans in place, the agent should generate useful intervention plans that policy makers can implement in real time to minimizing both the number of daily COVID-19 cases and the stringency of the recommended interventions. We prove this concept with simulations of multiple realistic policy making scenarios and demonstrate a clear advantage in providing a pareto optimal solution in the epidemic intervention problem.

preprint2022arXiv

Predicting human decision making in psychological tasks with recurrent neural networks

Unlike traditional time series, the action sequences of human decision making usually involve many cognitive processes such as beliefs, desires, intentions, and theory of mind, i.e., what others are thinking. This makes predicting human decision-making challenging to be treated agnostically to the underlying psychological mechanisms. We propose here to use a recurrent neural network architecture based on long short-term memory networks (LSTM) to predict the time series of the actions taken by human subjects engaged in gaming activity, the first application of such methods in this research domain. In this study, we collate the human data from 8 published literature of the Iterated Prisoner's Dilemma comprising 168,386 individual decisions and post-process them into 8,257 behavioral trajectories of 9 actions each for both players. Similarly, we collate 617 trajectories of 95 actions from 10 different published studies of Iowa Gambling Task experiments with healthy human subjects. We train our prediction networks on the behavioral data and demonstrate a clear advantage over the state-of-the-art methods in predicting human decision-making trajectories in both the single-agent scenario of the Iowa Gambling Task and the multi-agent scenario of the Iterated Prisoner's Dilemma. Moreover, we observe that the weights of the LSTM networks modeling the top performers tend to have a wider distribution compared to poor performers, as well as a larger bias, which suggest possible interpretations for the distribution of strategies adopted by each group.

preprint2022arXiv

Reinforcement Learning with Algorithms from Probabilistic Structure Estimation

Reinforcement learning (RL) algorithms aim to learn optimal decisions in unknown environments through experience of taking actions and observing the rewards gained. In some cases, the environment is not influenced by the actions of the RL agent, in which case the problem can be modeled as a contextual multi-armed bandit and lightweight myopic algorithms can be employed. On the other hand, when the RL agent's actions affect the environment, the problem must be modeled as a Markov decision process and more complex RL algorithms are required which take the future effects of actions into account. Moreover, in practice, it is often unknown from the outset whether or not the agent's actions will impact the environment and it is therefore not possible to determine which RL algorithm is most fitting. In this work, we propose to avoid this difficult decision entirely and incorporate a choice mechanism into our RL framework. Rather than assuming a specific problem structure, we use a probabilistic structure estimation procedure based on a likelihood-ratio (LR) test to make a more informed selection of learning algorithm. We derive a sufficient condition under which myopic policies are optimal, present an LR test for this condition, and derive a bound on the regret of our framework. We provide examples of real-world scenarios where our framework is needed and provide extensive simulations to validate our approach.

preprint2021arXiv

Etat de l'art sur l'application des bandits multi-bras

The Multi-armed bandit offer the advantage to learn and exploit the already learnt knowledge at the same time. This capability allows this approach to be applied in different domains, going from clinical trials where the goal is investigating the effects of different experimental treatments while minimizing patient losses, to adaptive routing where the goal is to minimize the delays in a network. This article provides a review of the recent results on applying bandit to real-life scenario and summarize the state of the art for each of these fields. Different techniques has been proposed to solve this problem setting, like epsilon-greedy, Upper confident bound (UCB) and Thompson Sampling (TS). We are showing here how this algorithms were adapted to solve the different problems of exploration exploitation.

preprint2020arXiv

A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry

Drawing an inspiration from behavioral studies of human decision making, we propose here a more general and flexible parametric framework for reinforcement learning that extends standard Q-learning to a two-stream model for processing positive and negative rewards, and allows to incorporate a wide range of reward-processing biases -- an important component of human decision making which can help us better understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems, as well as various neuropsychiatric conditions associated with disruptions in normal reward processing. From the computational perspective, we observe that the proposed Split-QL model and its clinically inspired variants consistently outperform standard Q-Learning and SARSA methods, as well as recently proposed Double Q-Learning approaches, on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the Pac-Man game in a lifelong learning setting across different reward stationarities.

preprint2020arXiv

Computing the Dirichlet-Multinomial Log-Likelihood Function

Dirichlet-multinomial (DMN) distribution is commonly used to model over-dispersion in count data. Precise and fast numerical computation of the DMN log-likelihood function is important for performing statistical inference using this distribution, and remains a challenge. To address this, we use mathematical properties of the gamma function to derive a closed form expression for the DMN log-likelihood function. Compared to existing methods, calculation of the closed form has a lower computational complexity, hence is much faster without comprimising computational accuracy.

preprint2020arXiv

Contextual Bandit with Adaptive Feature Extraction

We consider an online decision making setting known as contextual bandit problem, and propose an approach for improving contextual bandit performance by using an adaptive feature extraction (representation learning) based on online clustering. Our approach starts with an off-line pre-training on unlabeled history of contexts (which can be exploited by our approach, but not by the standard contextual bandit), followed by an online selection and adaptation of encoders. Specifically, given an input sample (context), the proposed approach selects the most appropriate encoding function to extract a feature vector which becomes an input for a contextual bandit, and updates both the bandit and the encoding function based on the context and on the feedback (reward). Our experiments on a variety of datasets, and both in stationary and non-stationary environments of several kinds demonstrate clear advantages of the proposed adaptive representation learning over the standard contextual bandit based on "raw" input contexts.

preprint2020arXiv

Contextual Bandit with Missing Rewards

We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the reward associated with each context-based decision may not always be observed("missing rewards"). This new problem is motivated by certain online settings including clinical trial and ad recommendation applications. In order to address the missing rewards setting, we propose to combine the standard contextual bandit approach with an unsupervised learning mechanism such as clustering. Unlike standard contextual bandit methods, by leveraging clustering to estimate missing reward, we are able to learn from each incoming event, even those with missing rewards. Promising empirical results are obtained on several real-life datasets.

preprint2020arXiv

Hyper-parameter Tuning for the Contextual Bandit

We study here the problem of learning the exploration exploitation trade-off in the contextual bandit problem with linear reward function setting. In the traditional algorithms that solve the contextual bandit problem, the exploration is a parameter that is tuned by the user. However, our proposed algorithm learn to choose the right exploration parameters in an online manner based on the observed context, and the immediate reward received for the chosen action. We have presented here two algorithms that uses a bandit to find the optimal exploration of the contextual bandit algorithm, which we hope is the first step toward the automation of the multi-armed bandit algorithm.

preprint2020arXiv

Online learning with Corrupted context: Corrupted Contextual Bandits

We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the context used at each decision may be corrupted ("useless context"). This new problem is motivated by certain on-line settings including clinical trial and ad recommendation applications. In order to address the corrupted-context setting,we propose to combine the standard contextual bandit approach with a classical multi-armed bandit mechanism. Unlike standard contextual bandit methods, we are able to learn from all iteration, even those with corrupted context, by improving the computing of the expectation for each arm. Promising empirical results are obtained on several real-life datasets.

preprint2020arXiv

Solving Constrained CASH Problems with ADMM

The CASH problem has been widely studied in the context of automated configurations of machine learning (ML) pipelines and various solvers and toolkits are available. However, CASH solvers do not directly handle black-box constraints such as fairness, robustness or other domain-specific custom constraints. We present our recent approach [Liu, et al., 2020] that leverages the ADMM optimization framework to decompose CASH into multiple small problems and demonstrate how ADMM facilitates incorporation of black-box constraints.

preprint2020arXiv

Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling

Spectral clustering has shown a superior performance in analyzing the cluster structure. However, its computational complexity limits its application in analyzing large-scale data. To address this problem, many low-rank matrix approximating algorithms are proposed, including the Nystrom method - an approach with proven approximate error bounds. There are several algorithms that provide recipes to construct Nystrom approximations with variable accuracies and computing times. This paper proposes a scalable Nystrom-based clustering algorithm with a new sampling procedure, Centroid Minimum Sum of Squared Similarities (CMS3), and a heuristic on when to use it. Our heuristic depends on the eigen spectrum shape of the dataset, and yields competitive low-rank approximations in test datasets compared to the other state-of-the-art methods

preprint2014arXiv

A Neural Networks Committee for the Contextual Bandit Problem

This paper presents a new contextual bandit algorithm, NeuralBandit, which does not need hypothesis on stationarity of contexts and rewards. Several neural networks are trained to modelize the value of rewards knowing the context. Two variants, based on multi-experts approach, are proposed to choose online the parameters of multi-layer perceptrons. The proposed algorithms are successfully tested on a large dataset with and without stationarity of rewards.

preprint2014arXiv

Context-Based Information Retrieval in Risky Environment

Context-Based Information Retrieval is recently modelled as an exploration/ exploitation trade-off (exr/exp) problem, where the system has to choose between maximizing its expected rewards dealing with its current knowledge (exploitation) and learning more about the unknown user's preferences to improve its knowledge (exploration). This problem has been addressed by the reinforcement learning community but they do not consider the risk level of the current user's situation, where it may be dangerous to explore the non-top-ranked documents the user may not desire in his/her current situation if the risk level is high. We introduce in this paper an algorithm named CBIR-R-greedy that considers the risk level of the user's situation to adaptively balance between exr and exp.

preprint2014arXiv

Étude des dimensions spécifiques du contexte dans un système de filtrage d'informations

In the context of business information systems, e-commerce and access to knowledge, the relevance of the information provided to use is a key fact to the success of information systems. Therefore the quality of access is determined by access to the right information at the right time, at the right place. In this context, it is important to consider the users needs when access to information and his contextual situation in order to provide relevant information, tailored to their needs and context use. In what follows we describe the prelude to a project that tries to combine all of these needs to improve information systems.

preprint2014arXiv

Exponentiated Gradient Exploration for Active Learning

Active learning strategies respond to the costly labelling task in a supervised classification by selecting the most useful unlabelled examples in training a predictive model. Many conventional active learning algorithms focus on refining the decision boundary, rather than exploring new regions that can be more informative. In this setting, we propose a sequential algorithm named EG-Active that can improve any Active learning algorithm by an optimal random exploration. Experimental results show a statistically significant and appreciable improvement in the performance of our new approach over the existing active feedback methods.

preprint2014arXiv

Freshness-Aware Thompson Sampling

To follow the dynamicity of the user's content, researchers have recently started to model interactions between users and the Context-Aware Recommender Systems (CARS) as a bandit problem where the system needs to deal with exploration and exploitation dilemma. In this sense, we propose to study the freshness of the user's content in CARS through the bandit problem. We introduce in this paper an algorithm named Freshness-Aware Thompson Sampling (FA-TS) that manages the recommendation of fresh document according to the user's risk of the situation. The intensive evaluation and the detailed analysis of the experimental results reveals several important discoveries in the exploration/exploitation (exr/exp) behaviour.

preprint2014arXiv

Hybrid Q-Learning Applied to Ubiquitous recommender system

Ubiquitous information access becomes more and more important nowadays and research is aimed at making it adapted to users. Our work consists in applying machine learning techniques in order to bring a solution to some of the problems concerning the acceptance of the system by users. To achieve this, we propose a fundamental shift in terms of how we model the learning of recommender system: inspired by models of human reasoning developed in robotic, we combine reinforcement learning and case-base reasoning to define a recommendation process that uses these two approaches for generating recommendations on different context dimensions (social, temporal, geographic). We describe an implementation of the recommender system based on this framework. We also present preliminary results from experiments with the system and show how our approach increases the recommendation quality.

preprint2014arXiv

Improving adaptation of ubiquitous recommander systems by using reinforcement learning and collaborative filtering

The wide development of mobile applications provides a considerable amount of data of all types (images, texts, sounds, videos, etc.). Thus, two main issues have to be considered: assist users in finding information and reduce search and navigation time. In this sense, context-based recommender systems (CBRS) propose the user the adequate information depending on her/his situation. Our work consists in applying machine learning techniques and reasoning process in order to bring a solution to some of the problems concerning the acceptance of recommender systems by users, namely avoiding the intervention of experts, reducing cold start problem, speeding learning process and adapting to the user's interest. To achieve this goal, we propose a fundamental modification in terms of how we model the learning of the CBRS. Inspired by models of human reasoning developed in robotic, we combine reinforcement learning and case-based reasoning to define a contextual recommendation process based on different context dimensions (cognitive, social, temporal, geographic). This paper describes an ongoing work on the implementation of a CBRS based on a hybrid Q-learning (HyQL) algorithm which combines Q-learning, collaborative filtering and case-based reasoning techniques. It also presents preliminary results by comparing HyQL and the standard Q-Learning w.r.t. solving the cold start problem.

preprint2014arXiv

Optimizing an Utility Function for Exploration / Exploitation Trade-off in Context-Aware Recommender System

In this paper, we develop a dynamic exploration/ exploitation (exr/exp) strategy for contextual recommender systems (CRS). Specifically, our methods can adaptively balance the two aspects of exr/exp by automatically learning the optimal tradeoff. This consists of optimizing a utility function represented by a linearized form of the probability distributions of the rewards of the clicked and the non-clicked documents already recommended. Within an offline simulation framework we apply our algorithms to a CRS and conduct an evaluation with real event log data. The experimental results and detailed analysis demonstrate that our algorithms outperform existing algorithms in terms of click-through-rate (CTR).

preprint2014arXiv

R-UCB: a Contextual Bandit Algorithm for Risk-Aware Recommender Systems

Mobile Context-Aware Recommender Systems can be naturally modelled as an exploration/exploitation trade-off (exr/exp) problem, where the system has to choose between maximizing its expected rewards dealing with its current knowledge (exploitation) and learning more about the unknown user's preferences to improve its knowledge (exploration). This problem has been addressed by the reinforcement learning community but they do not consider the risk level of the current user's situation, where it may be dangerous to recommend items the user may not desire in her current situation if the risk level is high. We introduce in this paper an algorithm named R-UCB that considers the risk level of the user's situation to adaptively balance between exr and exp. The detailed analysis of the experimental results reveals several important discoveries in the exr/exp behaviour.

preprint2014arXiv

Recommandation mobile, sensible au contexte de contenus évolutifs: Contextuel-E-Greedy

We introduce in this paper an algorithm named Contextuel-E-Greedy that tackles the dynamicity of the user's content. It is based on dynamic exploration/exploitation tradeoff and can adaptively balance the two aspects by deciding which situation is most relevant for exploration or exploitation. The experimental results demonstrate that our algorithm outperforms surveyed algorithms.

preprint2014arXiv

Situation-Aware Approach to Improve Context-based Recommender System

In this paper, we introduce a novel situation aware approach to improve a context based recommender system. To build situation aware user profiles, we rely on evidence issued from retrieval situations. A retrieval situation refers to the social spatio temporal context of the user when he interacts with the recommender system. A situation is represented as a combination of social spatio temporal concepts inferred from ontological knowledge given social group, location and time information. User's interests are inferred from past user's interaction with the recommender system related to the identified situations. They are represented using concepts issued from a domain ontology. We also propose a method to dynamically adapt the system to the user's interest's evolution.

preprint2014arXiv

The Impact of Situation Clustering in Contextual-Bandit Algorithm for Context-Aware Recommender Systems

Most existing approaches in Context-Aware Recommender Systems (CRS) focus on recommending relevant items to users taking into account contextual information, such as time, location, or social aspects. However, few of them have considered the problem of user's content dynamicity. We introduce in this paper an algorithm that tackles the user's content dynamicity by modeling the CRS as a contextual bandit algorithm and by including a situation clustering algorithm to improve the precision of the CRS. Within a deliberately designed offline simulation framework, we conduct evaluations with real online event log data. The experimental results and detailed analysis reveal several important discoveries in context aware recommender system.

preprint2013arXiv

Applying machine learning techniques to improve user acceptance on ubiquitous environement

Ubiquitous information access becomes more and more important nowadays and research is aimed at making it adapted to users. Our work consists in applying machine learning techniques in order to adapt the information access provided by ubiquitous systems to users when the system only knows the user social group, without knowing anything about the user interest. The adaptation procedures associate actions to perceived situations of the user. Associations are based on feedback given by the user as a reaction to the behavior of the system. Our method brings a solution to some of the problems concerning the acceptance of the system by users when applying machine learning techniques to systems at the beginning of the interaction between the system and the user.

preprint2013arXiv

Evolution of the user's content: An Overview of the state of the art

The evolution of the user's content still remains a problem for an accurate recommendation.This is why the current research aims to design Recommender Systems (RS) able to continually adapt information that matches the user's interests. This paper aims to explain this problematic point in outlining the proposals that have been made in research with their advantages and disadvantages.

preprint2013arXiv

Exponentiated Gradient LINUCB for Contextual Multi-Armed Bandits

We present Exponentiated Gradient LINUCB, an algorithm for con-textual multi-armed bandits. This algorithm uses Exponentiated Gradient to find the optimal exploration of the LINUCB. Within a deliberately designed offline simulation framework we conduct evaluations with real online event log data. The experimental results demonstrate that our algorithm outperforms surveyed algorithms.

preprint2013arXiv

Mobile Recommender Systems Methods: An Overview

The information that mobiles can access becomes very wide nowadays, and the user is faced with a dilemma: there is an unlimited pool of information available to him but he is unable to find the exact information he is looking for. This is why the current research aims to design Recommender Systems (RS) able to continually send information that matches the user's interests in order to reduce his navigation time. In this paper, we treat the different approaches to recommend.

preprint2013arXiv

Proposition d'une technique de gestion de projet dans les startups

This project is part of the development of mobile CRM. It aims to develop a management application client named NOMALYS. This application allows the commercial and business leaders to see their CRM Mobile. We have focused in this project on the techniques of projects management, this study allowed to classify different techniques for managing software projects and proposed the most closely technique that match the needs of the studied company.

preprint2013arXiv

Role of temporal inference in the recognition of textual inference

This project is a part of nature language processing and its aims to develop a system of recognition inference text-appointed TIMINF. This type of system can detect, given two portions of text, if a text is semantically deducted from the other. We focused on making the inference time in this type of system. For that we have built and analyzed a body built from questions collected through the web. This study has enabled us to classify different types of times inferences and for designing the architecture of TIMINF which seeks to integrate a module inference time in a detection system inference text. We also assess the performance of sorties TIMINF system on a test corpus with the same strategy adopted in the challenge RTE.

preprint2013arXiv

Towards User Profile Modelling in Recommender System

The notion of profile appeared in the 1970s decade, which was mainly due to the need to create custom applications that could be adapted to the user. In this paper, we treat the different aspects of the user's profile, defining it, profile, its features and its indicators of interest, and then we describe the different approaches of modelling and acquiring the user's interests.

Djallel Bouneffouf

What is connected

Connect this record

See the researcher in context

Building this map preview

35 published item(s)

Mitigating Misalignment Contagion by Steering with Implicit Traits

Deep Annotation of Therapeutic Working Alliance in Psychotherapy

Learning to Generate Image Source-Agnostic Universal Adversarial Perturbations

Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior

Optimal Epidemic Control as a Contextual Combinatorial Bandit with Budget

Predicting human decision making in psychological tasks with recurrent neural networks

Reinforcement Learning with Algorithms from Probabilistic Structure Estimation

Etat de l'art sur l'application des bandits multi-bras

A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry

Computing the Dirichlet-Multinomial Log-Likelihood Function

Contextual Bandit with Adaptive Feature Extraction

Contextual Bandit with Missing Rewards

Hyper-parameter Tuning for the Contextual Bandit

Online learning with Corrupted context: Corrupted Contextual Bandits

Solving Constrained CASH Problems with ADMM

Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling

A Neural Networks Committee for the Contextual Bandit Problem

Context-Based Information Retrieval in Risky Environment

Étude des dimensions spécifiques du contexte dans un système de filtrage d'informations

Exponentiated Gradient Exploration for Active Learning

Freshness-Aware Thompson Sampling

Hybrid Q-Learning Applied to Ubiquitous recommender system

Improving adaptation of ubiquitous recommander systems by using reinforcement learning and collaborative filtering

Optimizing an Utility Function for Exploration / Exploitation Trade-off in Context-Aware Recommender System

R-UCB: a Contextual Bandit Algorithm for Risk-Aware Recommender Systems

Recommandation mobile, sensible au contexte de contenus évolutifs: Contextuel-E-Greedy

Situation-Aware Approach to Improve Context-based Recommender System

The Impact of Situation Clustering in Contextual-Bandit Algorithm for Context-Aware Recommender Systems

Applying machine learning techniques to improve user acceptance on ubiquitous environement

Evolution of the user's content: An Overview of the state of the art

Exponentiated Gradient LINUCB for Contextual Multi-Armed Bandits

Mobile Recommender Systems Methods: An Overview

Proposition d'une technique de gestion de projet dans les startups

Role of temporal inference in the recognition of textual inference

Towards User Profile Modelling in Recommender System