Source author record

Richard Combes

Richard Combes appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Information Theory math.IT Networking and Internet Architecture math.OC Systems and Control Logic in Computer Science Performance

Catalog footprint

What is connected

18works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information. Our algorithm circumvents two problems shared by all state-of-the-art algorithms: it does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward. We prove a theoretical regret upper bound to justify these claims. We complement our theoretical results with numerical experiments, showing that the proposed algorithm outperforms state-of-the-art in practice as well.

preprint2021arXiv

A High Performance, Low Complexity Algorithm for Multi-Player Bandits Without Collision Sensing Information

Motivated by applications in cognitive radio networks, we consider the decentralized multi-player multi-armed bandit problem, without collision nor sensing information. We propose Randomized Selfish KL-UCB, an algorithm with very low computational complexity, inspired by the Selfish KL-UCB algorithm, which has been abandoned as it provably performs sub-optimally in some cases. We subject Randomized Selfish KL-UCB to extensive numerical experiments showing that it far outperforms state-of-the-art algorithms in almost all environments, sometimes by several orders of magnitude, and without the additional knowledge required by state-of-the-art algorithms. We also emphasize the potential of this algorithm for the more realistic dynamic setting, and support our claims with further experiments. We believe that the low complexity and high performance of Randomized Selfish KL-UCB makes it the most suitable for implementation in practical systems amongst known algorithms.

preprint2021arXiv

Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time

We consider combinatorial semi-bandits with uncorrelated Gaussian rewards. In this article, we propose the first method, to the best of our knowledge, that enables to compute the solution of the Graves-Lai optimization problem in polynomial time for many combinatorial structures of interest. In turn, this immediately yields the first known approach to implement asymptotically optimal algorithms in polynomial time for combinatorial semi-bandits.

preprint2021arXiv

Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

We consider combinatorial semi-bandits over a set of arms ${\cal X} \subset \{0,1\}^d$ where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound $R(T) = {\cal O}\Big( {d (\ln m)^2 (\ln T) \over Δ_{\min} }\Big)$, but it has computational complexity ${\cal O}(|{\cal X}|)$ which is typically exponential in $d$, and cannot be used in large dimensions. We propose the first algorithm which is both computationally and statistically efficient for this problem with regret $R(T) = {\cal O} \Big({d (\ln m)^2 (\ln T)\over Δ_{\min} }\Big)$ and computational complexity ${\cal O}(T {\bf poly}(d))$. Our approach involves carefully designing an approximate version of ESCB with the same regret guarantees, showing that this approximate algorithm can be implemented in time ${\cal O}(T {\bf poly}(d))$ by repeatedly maximizing a linear function over ${\cal X}$ subject to a linear budget constraint, and showing how to solve this maximization problems efficiently.

preprint2020arXiv

Solving Random Parity Games in Polynomial Time

We consider the problem of solving random parity games. We prove that parity games exibit a phase transition threshold above $d_P$, so that when the degree of the graph that defines the game has a degree $d > d_P$ then there exists a polynomial time algorithm that solves the game with high probability when the number of nodes goes to infinity. We further propose the SWCP (Self-Winning Cycles Propagation) algorithm and show that, when the degree is large enough, SWCP solves the game with high probability. Furthermore, the complexity of SWCP is polynomial $O\Big(|{\cal V}|^2 + |{\cal V}||{\cal E}|\Big)$. The design of SWCP is based on the threshold for the appearance of particular types of cycles in the players' respective subgraphs. We further show that non-sparse games can be solved in time $O(|{\cal V}|)$ with high probability, and emit a conjecture concerning the hardness of the $d=2$ case.

preprint2016arXiv

A Streaming Algorithm for Crowdsourced Data Classification

We propose a streaming algorithm for the binary classification of data based on crowdsourcing. The algorithm learns the competence of each labeller by comparing her labels to those of other labellers on the same tasks and uses this information to minimize the prediction error rate on each task. We provide performance guarantees of our algorithm for a fixed population of independent labellers. In particular, we show that our algorithm is optimal in the sense that the cumulative regret compared to the optimal decision with known labeller error probabilities is finite, independently of the number of tasks to label. The complexity of the algorithm is linear in the number of labellers and the number of tasks, up to some logarithmic factors. Numerical experiments illustrate the performance of our algorithm compared to existing algorithms, including simple majority voting and expectation-maximization algorithms, on both synthetic and real datasets.

preprint2016arXiv

Multipath streaming: fundamental limits and efficient algorithms

We investigate streaming over multiple links. A file is split into small units called chunks that may be requested on the various links according to some policy, and received after some random delay. After a start-up time called pre-buffering time, received chunks are played at a fixed speed. There is starvation if the chunk to be played has not yet arrived. We provide lower bounds (fundamental limits) on the starvation probability of any policy. We further propose simple, order-optimal policies that require no feedback. For general delay distributions, we provide tractable upper bounds for the starvation probability of the proposed policies, allowing to select the pre-buffering time appropriately. We specialize our results to: (i) links that employ CSMA or opportunistic scheduling at the packet level, (ii) links shared with a primary user (iii) links that use fair rate sharing at the flow level. We consider a generic model so that our results give insight into the design and performance of media streaming over (a) wired networks with several paths between the source and destination, (b) wireless networks featuring spectrum aggregation and (c) multi-homed wireless networks.

preprint2015arXiv

Combinatorial Bandits Revisited

This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific regret lower bound, and discuss its scaling with the dimension of the decision space. We propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ESCB has better performance guarantees than existing algorithms, and significantly outperforms these algorithms in practice. In the adversarial setting under bandit feedback, we propose \textsc{CombEXP}, an algorithm with the same regret scaling as state-of-the-art algorithms, but with lower computational complexity for some combinatorial problems.

preprint2015arXiv

Unimodal Bandits without Smoothness

We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. No further assumption is made regarding the smoothness and the structure of the expected reward function. For these problems, we propose the Stochastic Pentachotomy (SP) algorithm, and derive finite-time upper bounds on its regret and optimization error. In particular, we show that, for any expected reward function $μ$ that behaves as $μ(x)=μ(x^\star)-C|x-x^\star|^ξ$ locally around its maximizer $x^\star$ for some $ξ, C>0$, the SP algorithm is order-optimal. Namely its regret and optimization error scale as $O(\sqrt{T\log(T)})$ and $O(\sqrt{\log(T)/T})$, respectively, when the time horizon $T$ grows large. These scalings are achieved without the knowledge of $ξ$ and $C$. Our algorithm is based on asymptotically optimal sequential statistical tests used to successively trim an interval that contains the best arm with high probability. To our knowledge, the SP algorithm constitutes the first sequential arm selection rule that achieves a regret and optimization error scaling as $O(\sqrt{T})$ and $O(1/\sqrt{T})$, respectively, up to a logarithmic factor for non-smooth expected reward functions, as well as for smooth functions with unknown smoothness.

preprint2014arXiv

Dynamic Rate and Channel Selection in Cognitive Radio Systems

In this paper, we investigate dynamic channel and rate selection in cognitive radio systems which exploit a large number of channels free from primary users. In such systems, transmitters may rapidly change the selected (channel, rate) pair to opportunistically learn and track the pair offering the highest throughput. We formulate the problem of sequential channel and rate selection as an online optimization problem, and show its equivalence to a {\it structured} Multi-Armed Bandit problem. The structure stems from inherent properties of the achieved throughput as a function of the selected channel and rate. We derive fundamental performance limits satisfied by {\it any} channel and rate adaptation algorithm, and propose algorithms that achieve (or approach) these limits. In turn, the proposed algorithms optimally exploit the inherent structure of the throughput. We illustrate the efficiency of our algorithms using both test-bed and simulation experiments, in both stationary and non-stationary radio environments. In stationary environments, the packet successful transmission probabilities at the various channel and rate pairs do not evolve over time, whereas in non-stationary environments, they may evolve. In practical scenarios, the proposed algorithms are able to track the best channel and rate quite accurately without the need of any explicit measurement and feedback of the quality of the various channels.

preprint2014arXiv

Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms

We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitz bandits, we derive asymptotic problem specific lower bounds for the regret satisfied by any algorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz structure of the problem. In fact, we prove that OSLB is asymptotically optimal, as its asymptotic regret matches the lower bound. The regret analysis of our algorithms relies on a new concentration inequality for weighted sums of KL divergences between the empirical distributions of rewards and their true distributions. For continuous Lipschitz bandits, we propose to first discretize the action space, and then apply OSLB or CKL-UCB, algorithms that provably exploit the structure efficiently. This approach is shown, through numerical experiments, to significantly outperform existing algorithms that directly deal with the continuous set of arms. Finally the results and algorithms are extended to contextual bandits with similarities.

preprint2014arXiv

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

We consider stochastic multi-armed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in (Cope 2009, Yu 2011). The set of arms is either discrete, in which case arms correspond to the vertices of a finite graph whose structure represents similarity in rewards, or continuous, in which case arms belong to a bounded interval. For discrete unimodal bandits, we derive asymptotic lower bounds for the regret achieved under any algorithm, and propose OSUB, an algorithm whose regret matches this lower bound. Our algorithm optimally exploits the unimodal structure of the problem, and surprisingly, its asymptotic regret does not depend on the number of arms. We also provide a regret upper bound for OSUB in non-stationary environments where the expected rewards smoothly evolve over time. The analytical results are supported by numerical experiments showing that OSUB performs significantly better than the state-of-the-art algorithms. For continuous sets of arms, we provide a brief discussion. We show that combining an appropriate discretization of the set of arms with the UCB algorithm yields an order-optimal regret, and in practice, outperforms recently proposed algorithms designed to exploit the unimodal structure.

preprint2013arXiv

Distributed coordination of self-organizing mechanisms in communication networks

The fast development of the Self-Organizing Network (SON) technology in mobile networks renders the problem of coordinating SON functionalities operating simultaneously critical. SON functionalities can be viewed as control loops that may need to be coordinated to guarantee conflict free operation, to enforce stability of the network and to achieve performance gain. This paper proposes a distributed solution for coordinating SON functionalities. It uses Rosen's concave games framework in conjunction with convex optimization. The SON functionalities are modeled as linear Ordinary Differential Equation (ODE)s. The stability of the system is first evaluated using a basic control theory approach. The coordination solution consists in finding a linear map (called coordination matrix) that stabilizes the system of SON functionalities. It is proven that the solution remains valid in a noisy environment using Stochastic Approximation. A practical example involving three different SON functionalities deployed in Base Stations (BSs) of a Long Term Evolution (LTE) network demonstrates the usefulness of the proposed method.

preprint2013arXiv

Flow-level performance of random wireless networks

We study the flow-level performance of random wireless networks. The locations of base stations (BSs) follow a Poisson point process. The number and positions of active users are dynamic. We associate a queue to each BS. The performance and stability of a BS depend on its load. In some cases, the full distribution of the load can be derived. Otherwise we derive formulas for the first and second moments. Networks on the line and on the plane are considered. Our model is generic enough to include features of recent wireless networks such as 4G (LTE) networks. In dense networks, we show that the inter-cell interference power becomes normally distributed, simplifying many computations. Numerical experiments demonstrate that in cases of practical interest, the loads distribution can be well approximated by a gamma distribution with known mean and variance.

preprint2013arXiv

Mixed Polling with Rerouting and Applications

Queueing systems with a single server in which customers wait to be served at a finite number of distinct locations (buffers/queues) are called discrete polling systems. Polling systems in which arrivals of users occur anywhere in a continuum are called continuous polling systems. Often one encounters a combination of the two systems: the users can either arrive in a continuum or wait in a finite set (i.e. wait at a finite number of queues). We call these systems mixed polling systems. Also, in some applications, customers are rerouted to a new location (for another service) after their service is completed. In this work, we study mixed polling systems with rerouting. We obtain their steady state performance by discretization using the known pseudo conservation laws of discrete polling systems. Their stationary expected workload is obtained as a limit of the stationary expected workload of a discrete system. The main tools for our analysis are: a) the fixed point analysis of infinite dimensional operators and; b) the convergence of Riemann sums to an integral. We analyze two applications using our results on mixed polling systems and discuss the optimal system design. We consider a local area network, in which a moving ferry facilitates communication (data transfer) using a wireless link. We also consider a distributed waste collection system and derive the optimal collection point. In both examples, the service requests can arrive anywhere in a subset of the two dimensional plane. Namely, some users arrive in a continuous set while others wait for their service in a finite set. The only polling systems that can model these applications are mixed systems with rerouting as introduced in this manuscript.

preprint2013arXiv

Optimal Rate Sampling in 802.11 Systems

In 802.11 systems, Rate Adaptation (RA) is a fundamental mechanism allowing transmitters to adapt the coding and modulation scheme as well as the MIMO transmission mode to the radio channel conditions, and in turn, to learn and track the (mode, rate) pair providing the highest throughput. So far, the design of RA mechanisms has been mainly driven by heuristics. In contrast, in this paper, we rigorously formulate such design as an online stochastic optimisation problem. We solve this problem and present ORS (Optimal Rate Sampling), a family of (mode, rate) pair adaptation algorithms that provably learn as fast as it is possible the best pair for transmission. We study the performance of ORS algorithms in both stationary radio environments where the successful packet transmission probabilities at the various (mode, rate) pairs do not vary over time, and in non-stationary environments where these probabilities evolve. We show that under ORS algorithms, the throughput loss due to the need to explore sub-optimal (mode, rate) pairs does not depend on the number of available pairs, which is a crucial advantage as evolving 802.11 standards offer an increasingly large number of (mode, rate) pairs. We illustrate the efficiency of ORS algorithms (compared to the state-of-the-art algorithms) using simulations and traces extracted from 802.11 test-beds.

preprint2013arXiv

The association problem in wireless networks: a Policy Gradient Reinforcement Learning approach

The purpose of this paper is to develop a self-optimized association algorithm based on PGRL (Policy Gradient Reinforcement Learning), which is both scalable, stable and robust. The term robust means that performance degradation in the learning phase should be forbidden or limited to predefined thresholds. The algorithm is model-free (as opposed to Value Iteration) and robust (as opposed to Q-Learning). The association problem is modeled as a Markov Decision Process (MDP). The policy space is parameterized. The parameterized family of policies is then used as expert knowledge for the PGRL. The PGRL converges towards a local optimum and the average cost decreases monotonically during the learning process. The properties of the solution make it a good candidate for practical implementation. Furthermore, the robustness property allows to use the PGRL algorithm in an "always-on" learning mode.

preprint2012arXiv

Coordination of autonomic functionalities in communications networks

Future communication networks are expected to feature autonomic (or self-organizing) mechanisms to ease deployment (self-configuration), tune parameters automatically (self-optimization) and repair the network (self-healing). Self-organizing mechanisms have been designed as stand-alone entities, even though multiple mechanisms will run in parallel in operational networks. An efficient coordination mechanism will be the major enabler for large scale deployment of self-organizing networks. We model self-organizing mechanisms as control loops, and study the conditions for stability when running control loops in parallel. Based on control theory and Lyapunov stability, we propose a coordination mechanism to stabilize the system, which can be implemented in a distributed fashion. The mechanism remains valid in the presence of measurement noise via stochastic approximation. Instability and coordination in the context of wireless networks are illustrated with two examples and the influence of network geometry is investigated. We are essentially concerned with linear systems, and the applicability of our results for non-linear systems is discussed.

Richard Combes

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

A High Performance, Low Complexity Algorithm for Multi-Player Bandits Without Collision Sensing Information

Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time

Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

Solving Random Parity Games in Polynomial Time

A Streaming Algorithm for Crowdsourced Data Classification

Multipath streaming: fundamental limits and efficient algorithms

Combinatorial Bandits Revisited

Unimodal Bandits without Smoothness

Dynamic Rate and Channel Selection in Cognitive Radio Systems

Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

Distributed coordination of self-organizing mechanisms in communication networks

Flow-level performance of random wireless networks

Mixed Polling with Rerouting and Applications

Optimal Rate Sampling in 802.11 Systems

The association problem in wireless networks: a Policy Gradient Reinforcement Learning approach

Coordination of autonomic functionalities in communications networks