Source author record

Nicolas Gast

Nicolas Gast appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory math.PR Artificial Intelligence cs.CY Machine Learning Performance Discrete Mathematics Distributed, Parallel, and Cluster Computing math.CO math.OC Multiagent Systems nlin.AO Systems and Control

Catalog footprint

What is connected

9works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Decentralized model-free reinforcement learning in stochastic games with average-reward objective

We propose the first model-free algorithm that achieves low regret performance for decentralized learning in two-player zero-sum tabular stochastic games with infinite-horizon average-reward objective. In decentralized learning, the learning agent controls only one player and tries to achieve low regret performances against an arbitrary opponent. This contrasts with centralized learning where the agent tries to approximate the Nash equilibrium by controlling both players. In our infinite-horizon undiscounted setting, additional structure assumptions is needed to provide good behaviors of learning processes : here we assume for every strategy of the opponent, the agent has a way to go from any state to any other. This assumption is the analogous to the "communicating" assumption in the MDP setting. We show that our Decentralized Optimistic Nash Q-Learning (DONQ-learning) algorithm achieves both sublinear high probability regret of order $T^{3/4}$ and sublinear expected regret of order $T^{2/3}$. Moreover, our algorithm enjoys a low computational complexity and low memory space requirement compared to the previous works of (Wei et al. 2017) and (Jafarnia-Jahromi et al. 2021) in the same setting.

preprint2022arXiv

Asymptotic Degradation of Linear Regression Estimates With Strategic Data Sources

We consider the problem of linear regression from strategic data sources with a public good component, i.e., when data is provided by strategic agents who seek to minimize an individual provision cost for increasing their data's precision while benefiting from the model's overall precision. In contrast to previous works, our model tackles the case where there is uncertainty on the attributes characterizing the agents' data -- a critical aspect of the problem when the number of agents is large. We provide a characterization of the game's equilibrium, which reveals an interesting connection with optimal design. Subsequently, we focus on the asymptotic behavior of the covariance of the linear regression parameters estimated via generalized least squares as the number of data sources becomes large. We provide upper and lower bounds for this covariance matrix and we show that, when the agents' provision costs are superlinear, the model's covariance converges to zero but at a slower rate relative to virtually all learning problems with exogenous data. On the other hand, if the agents' provision costs are linear, this covariance fails to converge. This shows that even the basic property of consistency of generalized least squares estimators is compromised when the data sources are strategic.

preprint2022arXiv

Fairness in Selection Problems with Strategic Candidates

To better understand discriminations and the effect of affirmative actions in selection problems (e.g., college admission or hiring), a recent line of research proposed a model based on differential variance. This model assumes that the decision-maker has a noisy estimate of each candidate's quality and puts forward the difference in the noise variances between different demographic groups as a key factor to explain discrimination. The literature on differential variance, however, does not consider the strategic behavior of candidates who can react to the selection procedure to improve their outcome, which is well-known to happen in many domains. In this paper, we study how the strategic aspect affects fairness in selection problems. We propose to model selection problems with strategic candidates as a contest game: A population of rational candidates compete by choosing an effort level to increase their quality. They incur a cost-of-effort but get a (random) quality whose expectation equals the chosen effort. A Bayesian decision-maker observes a noisy estimate of the quality of each candidate (with differential variance) and selects the fraction $α$ of best candidates based on their posterior expected quality; each selected candidate receives a reward $S$. We characterize the (unique) equilibrium of this game in the different parameters' regimes, both when the decision-maker is unconstrained and when they are constrained to respect the fairness notion of demographic parity. Our results reveal important impacts of the strategic behavior on the discrimination observed at equilibrium and allow us to understand the effect of imposing demographic parity in this context. In particular, we find that, in many cases, the results contrast with the non-strategic setting.

preprint2020arXiv

On Fair Selection in the Presence of Implicit Variance

Quota-based fairness mechanisms like the so-called Rooney rule or four-fifths rule are used in selection problems such as hiring or college admission to reduce inequalities based on sensitive demographic attributes. These mechanisms are often viewed as introducing a trade-off between selection fairness and utility. In recent work, however, Kleinberg and Raghavan showed that, in the presence of implicit bias in estimating candidates' quality, the Rooney rule can increase the utility of the selection process. We argue that even in the absence of implicit bias, the estimates of candidates' quality from different groups may differ in another fundamental way, namely, in their variance. We term this phenomenon implicit variance and we ask: can fairness mechanisms be beneficial to the utility of a selection process in the presence of implicit variance (even in the absence of implicit bias)? To answer this question, we propose a simple model in which candidates have a true latent quality that is drawn from a group-independent normal distribution. To make the selection, a decision maker receives an unbiased estimate of the quality of each candidate, with normal noise, but whose variance depends on the candidate's group. We then compare the utility obtained by imposing a fairness mechanism that we term $γ$-rule (it includes demographic parity and the four-fifths rule as special cases), to that of a group-oblivious selection algorithm that picks the candidates with the highest estimated quality independently of their group. Our main result shows that the demographic parity mechanism always increases the selection utility, while any $γ$-rule weakly increases it. We extend our model to a two-stage selection process where the true quality is observed at the second stage. We discuss multiple extensions of our results, in particular to different distributions of the true latent quality.

preprint2020arXiv

Refined Mean Field Analysis of the Gossip Shuffle Protocol -- extended version --

Gossip protocols form the basis of many smart collective adaptive systems. They are a class of fully decentralised, simple but robust protocols for the distribution of information throughout large scale networks with hundreds or thousands of nodes. Mean field analysis methods have made it possible to approximate and analyse performance aspects of such large scale protocols in an efficient way. Taking the gossip shuffle protocol as a benchmark, we evaluate a recently developed refined mean field approach. We illustrate the gain in accuracy this can provide for the analysis of medium size models analysing two key performance measures. We also show that refined mean field analysis requires special attention to correctly capture the coordination aspects of the gossip shuffle protocol.

preprint2014arXiv

Incentives and Redistribution in Homogeneous Bike-Sharing Systems with Stations of Finite Capacity

Bike-sharing systems are becoming important for urban transportation. In such systems, users arrive at a station, take a bike and use it for a while, then return it to another station of their choice. Each station has a finite capacity: it cannot host more bikes than its capacity. We propose a stochastic model of an homogeneous bike-sharing system and study the effect of users random choices on the number of problematic stations, i.e., stations that, at a given time, have no bikes available or no available spots for bikes to be returned to. We quantify the influence of the station capacities, and we compute the fleet size that is optimal in terms of minimizing the proportion of problematic stations. Even in a homogeneous city, the system exhibits a poor performance: the minimal proportion of problematic stations is of the order of (but not lower than) the inverse of the capacity. We show that simple incentives, such as suggesting users to return to the least loaded station among two stations, improve the situation by an exponential factor. We also compute the rate at which bikes have to be redistributed by trucks to insure a given quality of service. This rate is of the order of the inverse of the station capacity. For all cases considered, the fleet size that corresponds to the best performance is half of the total number of spots plus a few more, the value of the few more can be computed in closed-form as a function of the system parameters. It corresponds to the average number of bikes in circulation.

preprint2011arXiv

Computing hitting times via fluid approximation: application to the coupon collector problem

In this paper, we show how to use stochastic approximation to compute hitting time of a stochastic process, based on the study of the time for a fluid approximation of this process to be at distance 1/N of its fixed point. This approach is developed to study a generalized version of the coupon collector problem. The system is composed by N independent identical Markov chains. At each time step, one Markov chain is picked at random and performs one transition. We show that the time at which all chains have hit the same state is bounded by a N log N + b N log log N + O(N) where a and b are two constants depending on eigenvalues of the Markov chain.

preprint2011arXiv

Decentralized List Scheduling

Classical list scheduling is a very popular and efficient technique for scheduling jobs in parallel and distributed platforms. It is inherently centralized. However, with the increasing number of processors, the cost for managing a single centralized list becomes too prohibitive. A suitable approach to reduce the contention is to distribute the list among the computational units: each processor has only a local view of the work to execute. Thus, the scheduler is no longer greedy and standard performance guarantees are lost. The objective of this work is to study the extra cost that must be paid when the list is distributed among the computational units. We first present a general methodology for computing the expected makespan based on the analysis of an adequate potential function which represents the load unbalance between the local lists. We obtain an equation on the evolution of the potential by computing its expected decrease in one step of the schedule. Our main theorem shows how to solve such equations to bound the makespan. Then, we apply this method to several scheduling problems, namely, for unit independent tasks, for weighted independent tasks and for tasks with precendence constraints. More precisely, we prove that the time for scheduling a global workload W composed of independent unit tasks on m processors is equal to W/m plus an additional term proportional to log_2 W. We provide a lower bound which shows that this is optimal up to a constant. This result is extended to the case of weighted independent tasks. In the last setting, precedence task graphs, our analysis leads to an improvement on the bound of Arora et al. We finally provide some experiments using a simulator. The distribution of the makespan is shown to fit existing probability laws. The additive term is shown by simulation to be around 3 \log_2 W confirming the tightness of our analysis.

preprint2011arXiv

Mean field for Markov Decision Processes: from Discrete to Continuous Optimization

We study the convergence of Markov Decision Processes made of a large number of objects to optimization problems on ordinary differential equations (ODE). We show that the optimal reward of such a Markov Decision Process, satisfying a Bellman equation, converges to the solution of a continuous Hamilton-Jacobi-Bellman (HJB) equation based on the mean field approximation of the Markov Decision Process. We give bounds on the difference of the rewards, and a constructive algorithm for deriving an approximating solution to the Markov Decision Process from a solution of the HJB equations. We illustrate the method on three examples pertaining respectively to investment strategies, population dynamics control and scheduling in queues are developed. They are used to illustrate and justify the construction of the controlled ODE and to show the gain obtained by solving a continuous HJB equation rather than a large discrete Bellman equation.

Nicolas Gast

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Decentralized model-free reinforcement learning in stochastic games with average-reward objective

Asymptotic Degradation of Linear Regression Estimates With Strategic Data Sources

Fairness in Selection Problems with Strategic Candidates

On Fair Selection in the Presence of Implicit Variance

Refined Mean Field Analysis of the Gossip Shuffle Protocol -- extended version --

Incentives and Redistribution in Homogeneous Bike-Sharing Systems with Stations of Finite Capacity

Computing hitting times via fluid approximation: application to the coupon collector problem

Decentralized List Scheduling

Mean field for Markov Decision Processes: from Discrete to Continuous Optimization