Researcher profile

Bruno Gaujal

Bruno Gaujal contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
7topics
3close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2023arXiv

Decentralized model-free reinforcement learning in stochastic games with average-reward objective

We propose the first model-free algorithm that achieves low regret performance for decentralized learning in two-player zero-sum tabular stochastic games with infinite-horizon average-reward objective. In decentralized learning, the learning agent controls only one player and tries to achieve low regret performances against an arbitrary opponent. This contrasts with centralized learning where the agent tries to approximate the Nash equilibrium by controlling both players. In our infinite-horizon undiscounted setting, additional structure assumptions is needed to provide good behaviors of learning processes : here we assume for every strategy of the opponent, the agent has a way to go from any state to any other. This assumption is the analogous to the "communicating" assumption in the MDP setting. We show that our Decentralized Optimistic Nash Q-Learning (DONQ-learning) algorithm achieves both sublinear high probability regret of order $T^{3/4}$ and sublinear expected regret of order $T^{2/3}$. Moreover, our algorithm enjoys a low computational complexity and low memory space requirement compared to the previous works of (Wei et al. 2017) and (Jafarnia-Jahromi et al. 2021) in the same setting.

preprint2010arXiv

Performance bounds in wormhole routing, a network calculus approach

We present a model of performance bound calculus on feedforward networks where data packets are routed under wormhole routing discipline. We are interested in determining maximum end-to-end delays and backlogs of messages or packets going from a source node to a destination node, through a given virtual path in the network. Our objective here is to give a network calculus approach for calculating the performance bounds. First we propose a new concept of curves that we call packet curves. The curves permit to model constraints on packet lengths of a given data flow, when the lengths are allowed to be different. Second, we use this new concept to propose an approach for calculating residual services for data flows served under non preemptive service disciplines. Third, we model a binary switch (with two input ports and two output ports), where data is served under wormhole discipline. We present our approach for computing the residual services and deduce the worst case bounds for flows passing through a wormhole binary switch. Finally, we illustrate this approach in numerical examples, and show how to extend it to feedforward networks.