Source author record

Alexandre Proutiere

Alexandre Proutiere appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Social and Information Networks Systems and Control Information Theory math.IT math.OC physics.soc-ph Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Networking and Internet Architecture Computer Science and Game Theory eess.SY math.PR math.SP math.ST Multiagent Systems quant-ph Statistics Theory

Catalog footprint

What is connected

32works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning

Hierarchical reinforcement learning can improve generalization by decomposing long-horizon decision-making into simpler subproblems. However, existing approaches often rely on restrictive design choices, such as fixed temporal abstractions or goal-conditioned objectives, which largely confine them to goal-reaching tasks and limit their applicability to general reward functions. In this paper, we introduce switching successor measures, an extension of successor measures that enables hierarchical control in zero-shot reinforcement learning without additional supervision, fixed horizons, or manually designed subgoals. We show that switching successor measures arise naturally from classical successor measures while preserving their underlying structure. Building on this result, we propose FB $π$-Switch, an algorithm that extracts both a high-level subgoal-selection policy and a low-level control policy directly from forward-backward (FB) representations, allowing hierarchical behavior to emerge from a single learned representation. Experiments on both goal-conditioned and general reward-based tasks show that FB $π$-Switch improves over non-hierarchical baselines and matches state-of-the-art hierarchical methods in goal-conditioned settings. These results demonstrate that structured successor representations provide a flexible foundation for hierarchical zero-shot reinforcement learning beyond goal-reaching tasks. Our project website is available at: https://stestokth.github.io/switching-successors/.

preprint2022arXiv

Best Policy Identification in Linear MDPs

We investigate the problem of best policy identification in discounted linear Markov Decision Processes in the fixed confidence setting under a generative model. We first derive an instance-specific lower bound on the expected number of samples required to identify an $\varepsilon$-optimal policy with probability $1-δ$. The lower bound characterizes the optimal sampling rule as the solution of an intricate non-convex optimization program, but can be used as the starting point to devise simple and near-optimal sampling rules and algorithms. We devise such algorithms. One of these exhibits a sample complexity upper bounded by ${\cal O}({\frac{d}{(\varepsilon+Δ)^2}} (\log(\frac{1}δ)+d))$ where $Δ$ denotes the minimum reward gap of sub-optimal actions and $d$ is the dimension of the feature space. This upper bound holds in the moderate-confidence regime (i.e., for all $δ$), and matches existing minimax and gap-dependent lower bounds. We extend our algorithm to episodic linear MDPs.

preprint2022arXiv

Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach

Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity. In this paper, we devise algorithms learning optimal tilt control policies from existing data (in the so-called passive learning setting) or from data actively generated by the algorithms (the active learning setting). We formalize the design of such algorithms as a Best Policy Identification (BPI) problem in Contextual Linear Multi-Arm Bandits (CL-MAB). An arm represents an antenna tilt update; the context captures current network conditions; the reward corresponds to an improvement of performance, mixing coverage and capacity; and the objective is to identify, with a given level of confidence, an approximately optimal policy (a function mapping the context to an arm with maximal reward). For CL-MAB in both active and passive learning settings, we derive information-theoretical lower bounds on the number of samples required by any algorithm returning an approximately optimal policy with a given level of certainty, and devise algorithms achieving these fundamental limits. We apply our algorithms to the Remote Electrical Tilt (RET) optimization problem in cellular networks, and show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms.

preprint2022arXiv

Measurement-based Admission Control in Sliced Networks: A Best Arm Identification Approach

In sliced networks, the shared tenancy of slices requires adaptive admission control of data flows, based on measurements of network resources. In this paper, we investigate the design of measurement-based admission control schemes, deciding whether a new data flow can be admitted and in this case, on which slice. The objective is to devise a joint measurement and decision strategy that returns a correct decision (e.g., the least loaded slice) with a certain level of confidence while minimizing the measurement cost (the number of measurements made before committing to the decision). We study the design of such strategies for several natural admission criteria specifying what a correct decision is. For each of these criteria, using tools from best arm identification in bandits, we first derive an explicit information-theoretical lower bound on the cost of any algorithm returning the correct decision with fixed confidence. We then devise a joint measurement and decision strategy achieving this theoretical limit. We compare empirically the measurement costs of these strategies, and compare them both to the lower bounds as well as a naive measurement scheme. We find that our algorithm significantly outperforms the naive scheme (by a factor $2-8$).

preprint2021arXiv

Distributed Algorithms that Solve Boolean Equations with Local and Differential Privacies

In this paper, we propose distributed algorithms that solve a system of Boolean equations over a network, where each node in the network possesses only one Boolean equation from the system. The Boolean equation assigned at any particular node is a {\em private} equation known to this node only, and the nodes aim to compute the exact set of solutions to the system without exchanging their local equations. We show that each private Boolean equation can be locally lifted to a linear algebraic equation under a basis of Boolean vectors, leading to a network linear equation that is distributedly solvable using existing distributed linear equation algorithms as a subroutine. A number of exact or approximate solutions to the induced linear equation are then computed at each node from different initial values. The solutions to the original Boolean equations are eventually computed locally via a Boolean vector search algorithm. We prove that given solvable Boolean equations, when the initial values of the nodes for the distributed linear equation solving step are i.i.d selected according to a uniform distribution in a high-dimensional cube, our algorithms return the exact solution set of the Boolean equations at each node with high probability. Furthermore, we present an algorithm for distributed verification of the satisfiability of Boolean equations, and prove its correctness. Finally, we show that by utilizing linear equation solvers with differential privacy to replace the in-network computing routines, the overall distributed Boolean equation algorithms can be made differentially private. Under the standard Laplace mechanism, we prove an explicit level of noises that can be injected in the linear equation steps for ensuring a prescribed level of differential privacy.

preprint2020arXiv

Finite-time Identification of Stable Linear Systems: Optimality of the Least-Squares Estimator

We present a new finite-time analysis of the estimation error of the Ordinary Least Squares (OLS) estimator for stable linear time-invariant systems. We characterize the number of observed samples (the length of the observed trajectory) sufficient for the OLS estimator to be $(\varepsilon,δ)$-PAC, i.e., to yield an estimation error less than $\varepsilon$ with probability at least $1-δ$. We show that this number matches existing sample complexity lower bounds [1,2] up to universal multiplicative factors (independent of ($\varepsilon,δ)$ and of the system). This paper hence establishes the optimality of the OLS estimator for stable systems, a result conjectured in [1]. Our analysis of the performance of the OLS estimator is simpler, sharper, and easier to interpret than existing analyses. It relies on new concentration results for the covariates matrix.

preprint2020arXiv

Off-policy Learning for Remote Electrical Tilt Optimization

We address the problem of Remote Electrical Tilt (RET) optimization using off-policy Contextual Multi-Armed-Bandit (CMAB) techniques. The goal in RET optimization is to control the orientation of the vertical tilt angle of the antenna to optimize Key Performance Indicators (KPIs) representing the Quality of Service (QoS) perceived by the users in cellular networks. Learning an improved tilt update policy is hard. On the one hand, coming up with a new policy in an online manner in a real network requires exploring tilt updates that have never been used before, and is operationally too risky. On the other hand, devising this policy via simulations suffers from the simulation-to-reality gap. In this paper, we circumvent these issues by learning an improved policy in an offline manner using existing data collected on real networks. We formulate the problem of devising such a policy using the off-policy CMAB framework. We propose CMAB learning algorithms to extract optimal tilt update policies from the data. We train and evaluate these policies on real-world 4G Long Term Evolution (LTE) cellular network data. Our policies show consistent improvements over the rule-based logging policy used to collect the data.

preprint2020arXiv

Optimal Best-arm Identification in Linear Bandits

We study the problem of best-arm identification with fixed confidence in stochastic linear bandits. The objective is to identify the best arm with a given level of certainty while minimizing the sampling budget. We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds, asymptotically almost surely and in expectation. The algorithm relies on an arm sampling rule that tracks an optimal proportion of arm draws, and that remarkably can be updated as rarely as we wish, without compromising its theoretical guarantees. Moreover, unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms. Experimental results suggest that our algorithm significantly outperforms existing algorithms. The paper further provides a first analysis of the best-arm identification problem in linear bandits with a continuous set of arms.

preprint2020arXiv

Predictive Bandits

We introduce and study a new class of stochastic bandit problems, referred to as predictive bandits. In each round, the decision maker first decides whether to gather information about the rewards of particular arms (so that their rewards in this round can be predicted). These measurements are costly, and may be corrupted by noise. The decision maker then selects an arm to be actually played in the round. Predictive bandits find applications in many areas; e.g. they can be applied to channel selection problems in radio communication systems. In this paper, we provide the first theoretical results about predictive bandits, and focus on scenarios where the decision maker is allowed to measure at most one arm per round. We derive asymptotic instance-specific regret lower bounds for these problems, and develop algorithms whose regret match these fundamental limits. We illustrate the performance of our algorithms through numerical experiments. In particular, we highlight the gains that can be achieved by using reward predictions, and investigate the impact of the noise in the corresponding measurements.

preprint2016arXiv

Cluster-Aided Mobility Predictions

Predicting the future location of users in wireless net- works has numerous applications, and can help service providers to improve the quality of service perceived by their clients. The location predictors proposed so far estimate the next location of a specific user by inspecting the past individual trajectories of this user. As a consequence, when the training data collected for a given user is limited, the resulting prediction is inaccurate. In this paper, we develop cluster-aided predictors that exploit past trajectories collected from all users to predict the next location of a given user. These predictors rely on clustering techniques and extract from the training data similarities among the mobility patterns of the various users to improve the prediction accuracy. Specifically, we present CAMP (Cluster-Aided Mobility Predictor), a cluster-aided predictor whose design is based on recent non-parametric bayesian statistical tools. CAMP is robust and adaptive in the sense that it exploits similarities in users' mobility only if such similarities are really present in the training data. We analytically prove the consistency of the predictions provided by CAMP, and investigate its performance using two large-scale datasets. CAMP significantly outperforms existing predictors, and in particular those that only exploit individual past trajectories.

preprint2016arXiv

Optimal Cluster Recovery in the Labeled Stochastic Block Model

We consider the problem of community detection or clustering in the labeled Stochastic Block Model (LSBM) with a finite number $K$ of clusters of sizes linearly growing with the global population of items $n$. Every pair of items is labeled independently at random, and label $\ell$ appears with probability $p(i,j,\ell)$ between two items in clusters indexed by $i$ and $j$, respectively. The objective is to reconstruct the clusters from the observation of these random labels. Clustering under the SBM and their extensions has attracted much attention recently. Most existing work aimed at characterizing the set of parameters such that it is possible to infer clusters either positively correlated with the true clusters, or with a vanishing proportion of misclassified items, or exactly matching the true clusters. We find the set of parameters such that there exists a clustering algorithm with at most $s$ misclassified items in average under the general LSBM and for any $s=o(n)$, which solves one open problem raised in \cite{abbe2015community}. We further develop an algorithm, based on simple spectral methods, that achieves this fundamental performance limit within $O(n \mbox{polylog}(n))$ computations and without the a-priori knowledge of the model parameters.

preprint2015arXiv

Combinatorial Bandits Revisited

This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific regret lower bound, and discuss its scaling with the dimension of the decision space. We propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ESCB has better performance guarantees than existing algorithms, and significantly outperforms these algorithms in practice. In the adversarial setting under bandit feedback, we propose \textsc{CombEXP}, an algorithm with the same regret scaling as state-of-the-art algorithms, but with lower computational complexity for some combinatorial problems.

preprint2015arXiv

Network Synchronization with Convexity

In this paper, we establish a few new synchronization conditions for complex networks with nonlinear and nonidentical self-dynamics with switching directed communication graphs. In light of the recent works on distributed sub-gradient methods, we impose integral convexity for the nonlinear node self-dynamics in the sense that the self-dynamics of a given node is the gradient of some concave function corresponding to that node. The node couplings are assumed to be linear but with switching directed communication graphs. Several sufficient and/or necessary conditions are established for exact or approximate synchronization over the considered complex networks. These results show when and how nonlinear node self-dynamics may cooperate with the linear diffusive coupling, which eventually leads to network synchronization conditions under relaxed connectivity requirements.

preprint2015arXiv

Streaming, Memory Limited Matrix Completion with Noise

In this paper, we consider the streaming memory-limited matrix completion problem when the observed entries are noisy versions of a small random fraction of the original entries. We are interested in scenarios where the matrix size is very large so the matrix is very hard to store and manipulate. Here, columns of the observed matrix are presented sequentially and the goal is to complete the missing entries after one pass on the data with limited memory space and limited computational complexity. We propose a streaming algorithm which produces an estimate of the original matrix with a vanishing mean square error, uses memory space scaling linearly with the ambient dimension of the matrix, i.e. the memory required to store the output alone, and spends computations as much as the number of non-zero entries of the input matrix.

preprint2015arXiv

The Evolution of Beliefs over Signed Social Networks

We study the evolution of opinions (or beliefs) over a social network modeled as a signed graph. The sign attached to an edge in this graph characterizes whether the corresponding individuals or end nodes are friends (positive links) or enemies (negative links). Pairs of nodes are randomly selected to interact over time, and when two nodes interact, each of them updates its opinion based on the opinion of the other node and the sign of the corresponding link. This model generalizes DeGroot model to account for negative links: when two enemies interact, their opinions go in opposite directions. We provide conditions for convergence and divergence in expectation, in mean-square, and in almost sure sense, and exhibit phase transition phenomena for these notions of convergence depending on the parameters of the opinion update model and on the structure of the underlying graph. We establish a {\it no-survivor} theorem, stating that the difference in opinions of any two nodes diverges whenever opinions in the network diverge as a whole. We also prove a {\it live-or-die} lemma, indicating that almost surely, the opinions either converge to an agreement or diverge. Finally, we extend our analysis to cases where opinions have hard lower and upper limits. In these cases, we study when and how opinions may become asymptotically clustered to the belief boundaries, and highlight the crucial influence of (strong or weak) structural balance of the underlying network on this clustering phenomenon.

preprint2015arXiv

Unimodal Bandits without Smoothness

We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. No further assumption is made regarding the smoothness and the structure of the expected reward function. For these problems, we propose the Stochastic Pentachotomy (SP) algorithm, and derive finite-time upper bounds on its regret and optimization error. In particular, we show that, for any expected reward function $μ$ that behaves as $μ(x)=μ(x^\star)-C|x-x^\star|^ξ$ locally around its maximizer $x^\star$ for some $ξ, C>0$, the SP algorithm is order-optimal. Namely its regret and optimization error scale as $O(\sqrt{T\log(T)})$ and $O(\sqrt{\log(T)/T})$, respectively, when the time horizon $T$ grows large. These scalings are achieved without the knowledge of $ξ$ and $C$. Our algorithm is based on asymptotically optimal sequential statistical tests used to successively trim an interval that contains the best arm with high probability. To our knowledge, the SP algorithm constitutes the first sequential arm selection rule that achieves a regret and optimization error scaling as $O(\sqrt{T})$ and $O(1/\sqrt{T})$, respectively, up to a logarithmic factor for non-smooth expected reward functions, as well as for smooth functions with unknown smoothness.

preprint2014arXiv

Accurate Community Detection in the Stochastic Block Model via Spectral Algorithms

We consider the problem of community detection in the Stochastic Block Model with a finite number $K$ of communities of sizes linearly growing with the network size $n$. This model consists in a random graph such that each pair of vertices is connected independently with probability $p$ within communities and $q$ across communities. One observes a realization of this random graph, and the objective is to reconstruct the communities from this observation. We show that under spectral algorithms, the number of misclassified vertices does not exceed $s$ with high probability as $n$ grows large, whenever $pn=ω(1)$, $s=o(n)$ and \begin{equation*} \lim\inf_{n\to\infty} {n(α_1 p+α_2 q-(α_1 + α_2)p^{\frac{α_1}{α_1 + α_2}}q^{\frac{α_2}{α_1 + α_2}})\over \log (\frac{n}{s})} >1,\quad\quad(1) \end{equation*} where $α_1$ and $α_2$ denote the (fixed) proportions of vertices in the two smallest communities. In view of recent work by Abbe et al. and Mossel et al., this establishes that the proposed spectral algorithms are able to exactly recover communities whenever this is at all possible in the case of networks with two communities with equal sizes. We conjecture that condition (1) is actually necessary to obtain less than $s$ misclassified vertices asymptotically, which would establish the optimality of spectral method in more general scenarios.

preprint2014arXiv

Community Detection via Random and Adaptive Sampling

In this paper, we consider networks consisting of a finite number of non-overlapping communities. To extract these communities, the interaction between pairs of nodes may be sampled from a large available data set, which allows a given node pair to be sampled several times. When a node pair is sampled, the observed outcome is a binary random variable, equal to 1 if nodes interact and to 0 otherwise. The outcome is more likely to be positive if nodes belong to the same communities. For a given budget of node pair samples or observations, we wish to jointly design a sampling strategy (the sequence of sampled node pairs) and a clustering algorithm that recover the hidden communities with the highest possible accuracy. We consider both non-adaptive and adaptive sampling strategies, and for both classes of strategies, we derive fundamental performance limits satisfied by any sampling and clustering algorithm. In particular, we provide necessary conditions for the existence of algorithms recovering the communities accurately as the network size grows large. We also devise simple algorithms that accurately reconstruct the communities when this is at all possible, hence proving that the proposed necessary conditions for accurate community detection are also sufficient. The classical problem of community detection in the stochastic block model can be seen as a particular instance of the problems consider here. But our framework covers more general scenarios where the sequence of sampled node pairs can be designed in an adaptive manner. The paper provides new results for the stochastic block model, and extends the analysis to the case of adaptive sampling.

preprint2014arXiv

Distributed Load Balancing in Heterogeneous Systems

We consider the problem of distributed load balancing in heterogenous parallel server systems, where the service rate achieved by a user at a server depends on both the user and the server. Such heterogeneity typically arises in wireless networks (e.g., servers may represent frequency bands, and the service rate of a user varies across bands). Users select servers in a distributed manner. They initially attach to an arbitrary server. However, at random instants of time, they may probe the load at a new server and migrate there to improve their service rate. We analyze the system dynamics under the natural Random Local Search (RLS) migration scheme, introduced in \cite{sig10}. Under this scheme, when a user has the opportunity to switch servers, she does it only if this improves her service rate. The dynamics under RLS may be interpreted as those generated by strategic players updating their strategy in a load balancing game. In closed systems, where the user population is fixed, we show that this game has pure Nash Equilibriums (NEs), and we analyze their efficiency. We further prove that when the user population grows large, pure NEs get closer to a Proportionally Fair (PF) allocation of users to servers, and we characterize the gap between equilibriums and this ideal allocation depending on user population. Under the RLS algorithm, the system converges to pure NEs: we study the time it takes for the system to reach the PF allocation within a certain margin. In open systems, where users randomly enter the system and leave upon service completion, we establish that the RLS algorithm stabilizes the system whenever this it at all possible, i.e., it is throughput-optimal.

preprint2014arXiv

Dynamic Rate and Channel Selection in Cognitive Radio Systems

In this paper, we investigate dynamic channel and rate selection in cognitive radio systems which exploit a large number of channels free from primary users. In such systems, transmitters may rapidly change the selected (channel, rate) pair to opportunistically learn and track the pair offering the highest throughput. We formulate the problem of sequential channel and rate selection as an online optimization problem, and show its equivalence to a {\it structured} Multi-Armed Bandit problem. The structure stems from inherent properties of the achieved throughput as a function of the selected channel and rate. We derive fundamental performance limits satisfied by {\it any} channel and rate adaptation algorithm, and propose algorithms that achieve (or approach) these limits. In turn, the proposed algorithms optimally exploit the inherent structure of the throughput. We illustrate the efficiency of our algorithms using both test-bed and simulation experiments, in both stationary and non-stationary radio environments. In stationary environments, the packet successful transmission probabilities at the various channel and rate pairs do not evolve over time, whereas in non-stationary environments, they may evolve. In practical scenarios, the proposed algorithms are able to track the best channel and rate quite accurately without the need of any explicit measurement and feedback of the quality of the various channels.

preprint2014arXiv

Emergent Behaviors over Signed Random Dynamical Networks: Relative-State-Flipping Model

We study asymptotic dynamical patterns that emerge among a set of nodes interacting in a dynamically evolving signed random network, where positive links carry out standard consensus and negative links induce relative-state flipping. A sequence of deterministic signed graphs define potential node interactions that take place independently. Each node receives a positive recommendation consistent with the standard consensus algorithm from its positive neighbors, and a negative recommendation defined by relative-state flipping from its negative neighbors. After receiving these recommendations, each node puts a deterministic weight to each recommendation, and then encodes these weighted recommendations in its state update through stochastic attentions defined by two Bernoulli random variables. We establish a number of conditions regarding almost sure convergence and divergence of the node states. We also propose a condition for almost sure state clustering for essentially weakly balanced graphs, with the help of several martingale convergence lemmas. Some fundamental differences on the impact of the deterministic weights and stochastic attentions to the node state evolution are highlighted between the current relative-state-flipping model and the state-flipping model considered in Altafini 2013 and Shi et al. 2014.

preprint2014arXiv

Emergent Behaviors over Signed Random Dynamical Networks: State-Flipping Model

Recent studies from social, biological, and engineering network systems have drawn attention to the dynamics over signed networks, where each link is associated with a positive/negative sign indicating trustful/mistrustful, activator/inhibitor, or secure/malicious interactions. We study asymptotic dynamical patterns that emerge among a set of nodes that interact in a dynamically evolving signed random network. Node interactions take place at random on a sequence of deterministic signed graphs. Each node receives positive or negative recommendations from its neighbors depending on the sign of the interaction arcs, and updates its state accordingly. Recommendations along a positive arc follow the standard consensus update. As in the work by Altafini, negative recommendations use an update where the sign of the neighbor state is flipped. Nodes may weight positive and negative recommendations differently, and random processes are introduced to model the time-varying attention that nodes pay to these recommendations. Conditions for almost sure convergence and divergence of the node states are established. We show that under this so-called state-flipping model, all links contribute to a consensus of the absolute values of the nodes, even under switching sign patterns and dynamically changing environment. A no-survivor property is established, indicating that every node state diverges almost surely if the maximum network state diverges.

preprint2014arXiv

Feedback Policies for Measurement-based Quantum State Manipulation

In this paper, we propose feedback designs for manipulating a quantum state to a target state by performing sequential measurements. In light of Belavkin's quantum feedback control theory, for a given set of (projective or non-projective) measurements and a given time horizon, we show that finding the measurement selection policy that maximizes the probability of successful state manipulation is an optimal control problem for a controlled Markovian process. The optimal policy is Markovian and can be solved by dynamical programming. Numerical examples indicate that making use of feedback information significantly improves the success probability compared to classical scheme without taking feedback. We also consider other objective functionals including maximizing the expected fidelity to the target state as well as minimizing the expected arrival time. The connections and differences among these objectives are also discussed.

preprint2014arXiv

Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms

We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitz bandits, we derive asymptotic problem specific lower bounds for the regret satisfied by any algorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz structure of the problem. In fact, we prove that OSLB is asymptotically optimal, as its asymptotic regret matches the lower bound. The regret analysis of our algorithms relies on a new concentration inequality for weighted sums of KL divergences between the empirical distributions of rewards and their true distributions. For continuous Lipschitz bandits, we propose to first discretize the action space, and then apply OSLB or CKL-UCB, algorithms that provably exploit the structure efficiently. This approach is shown, through numerical experiments, to significantly outperform existing algorithms that directly deal with the continuous set of arms. Finally the results and algorithms are extended to contextual bandits with similarities.

preprint2014arXiv

Streaming, Memory Limited Algorithms for Community Detection

In this paper, we consider sparse networks consisting of a finite number of non-overlapping communities, i.e. disjoint clusters, so that there is higher density within clusters than across clusters. Both the intra- and inter-cluster edge densities vanish when the size of the graph grows large, making the cluster reconstruction problem nosier and hence difficult to solve. We are interested in scenarios where the network size is very large, so that the adjacency matrix of the graph is hard to manipulate and store. The data stream model in which columns of the adjacency matrix are revealed sequentially constitutes a natural framework in this setting. For this model, we develop two novel clustering algorithms that extract the clusters asymptotically accurately. The first algorithm is {\it offline}, as it needs to store and keep the assignments of nodes to clusters, and requires a memory that scales linearly with the network size. The second algorithm is {\it online}, as it may classify a node when the corresponding column is revealed and then discard this information. This algorithm requires a memory growing sub-linearly with the network size. To construct these efficient streaming memory-limited clustering algorithms, we first address the problem of clustering with partial information, where only a small proportion of the columns of the adjacency matrix is observed and develop, for this setting, a new spectral algorithm which is of independent interest.

preprint2014arXiv

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

We consider stochastic multi-armed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in (Cope 2009, Yu 2011). The set of arms is either discrete, in which case arms correspond to the vertices of a finite graph whose structure represents similarity in rewards, or continuous, in which case arms belong to a bounded interval. For discrete unimodal bandits, we derive asymptotic lower bounds for the regret achieved under any algorithm, and propose OSUB, an algorithm whose regret matches this lower bound. Our algorithm optimally exploits the unimodal structure of the problem, and surprisingly, its asymptotic regret does not depend on the number of arms. We also provide a regret upper bound for OSUB in non-stationary environments where the expected rewards smoothly evolve over time. The analytical results are supported by numerical experiments showing that OSUB performs significantly better than the state-of-the-art algorithms. For continuous sets of arms, we provide a brief discussion. We show that combining an appropriate discretization of the set of arms with the UCB algorithm yields an order-optimal regret, and in practice, outperforms recently proposed algorithms designed to exploit the unimodal structure.

preprint2013arXiv

Emergent Behaviors over Signed Random Networks in Dynamical Environments

We study asymptotic dynamical patterns that emerge among a set of nodes that interact in a dynamically evolving signed random network. Node interactions take place at random on a sequence of deterministic signed graphs. Each node receives positive or negative recommendations from its neighbors depending on the sign of the interaction arcs, and updates its state accordingly. Positive recommendations follow the standard consensus update while two types of negative recommendations, each modeling a different type of antagonistic or malicious interaction, are considered. Nodes may weigh positive and negative recommendations differently, and random processes are introduced to model the time-varying attention that nodes pay to the positive and negative recommendations. Various conditions for almost sure convergence, divergence, and clustering of the node states are established. Some fundamental similarities and differences are established for the two notions of negative recommendations.

preprint2013arXiv

Optimal Distributed Scheduling in Wireless Networks under the SINR interference model

Radio resource sharing mechanisms are key to ensuring good performance in wireless networks. In their seminal paper \cite{tassiulas1}, Tassiulas and Ephremides introduced the Maximum Weighted Scheduling algorithm, and proved its throughput-optimality. Since then, there have been extensive research efforts to devise distributed implementations of this algorithm. Recently, distributed adaptive CSMA scheduling schemes \cite{jiang08} have been proposed and shown to be optimal, without the need of message passing among transmitters. However their analysis relies on the assumption that interference can be accurately modelled by a simple interference graph. In this paper, we consider the more realistic and challenging SINR interference model. We present {\it the first distributed scheduling algorithms that (i) are optimal under the SINR interference model, and (ii) that do not require any message passing}. They are based on a combination of a simple and efficient power allocation strategy referred to as {\it Power Packing} and randomization techniques. We first devise algorithms that are rate-optimal in the sense that they perform as well as the best centralized scheduling schemes in scenarios where each transmitter is aware of the rate at which it should send packets to the corresponding receiver. We then extend these algorithms so that they reach throughput-optimality.

preprint2013arXiv

Optimal Rate Sampling in 802.11 Systems

In 802.11 systems, Rate Adaptation (RA) is a fundamental mechanism allowing transmitters to adapt the coding and modulation scheme as well as the MIMO transmission mode to the radio channel conditions, and in turn, to learn and track the (mode, rate) pair providing the highest throughput. So far, the design of RA mechanisms has been mainly driven by heuristics. In contrast, in this paper, we rigorously formulate such design as an online stochastic optimisation problem. We solve this problem and present ORS (Optimal Rate Sampling), a family of (mode, rate) pair adaptation algorithms that provably learn as fast as it is possible the best pair for transmission. We study the performance of ORS algorithms in both stationary radio environments where the successful packet transmission probabilities at the various (mode, rate) pairs do not vary over time, and in non-stationary environments where these probabilities evolve. We show that under ORS algorithms, the throughput loss due to the need to explore sub-optimal (mode, rate) pairs does not depend on the number of available pairs, which is a crucial advantage as evolving 802.11 standards offer an increasingly large number of (mode, rate) pairs. We illustrate the efficiency of ORS algorithms (compared to the state-of-the-art algorithms) using simulations and traces extracted from 802.11 test-beds.

preprint2013arXiv

Randomized Consensus with Attractive and Repulsive Links

We study convergence properties of a randomized consensus algorithm over a graph with both attractive and repulsive links. At each time instant, a node is randomly selected to interact with a random neighbor. Depending on if the link between the two nodes belongs to a given subgraph of attractive or repulsive links, the node update follows a standard attractive weighted average or a repulsive weighted average, respectively. The repulsive update has the opposite sign of the standard consensus update. In this way, it counteracts the consensus formation and can be seen as a model of link faults or malicious attacks in a communication network, or the impact of trust and antagonism in a social network. Various probabilistic convergence and divergence conditions are established. A threshold condition for the strength of the repulsive action is given for convergence in expectation: when the repulsive weight crosses this threshold value, the algorithm transits from convergence to divergence. An explicit value of the threshold is derived for classes of attractive and repulsive graphs. The results show that a single repulsive link can sometimes drastically change the behavior of the consensus algorithm. They also explicitly show how the robustness of the consensus algorithm depends on the size and other properties of the graphs.

preprint2012arXiv

Distributed Optimization: Convergence Conditions from a Dynamical System Perspective

This paper explores the fundamental properties of distributed minimization of a sum of functions with each function only known to one node, and a pre-specified level of node knowledge and computational capacity. We define the optimization information each node receives from its objective function, the neighboring information each node receives from its neighbors, and the computational capacity each node can take advantage of in controlling its state. It is proven that there exist a neighboring information way and a control law that guarantee global optimal consensus if and only if the solution sets of the local objective functions admit a nonempty intersection set for fixed strongly connected graphs. Then we show that for any tolerated error, we can find a control law that guarantees global optimal consensus within this error for fixed, bidirectional, and connected graphs under mild conditions. For time-varying graphs, we show that optimal consensus can always be achieved as long as the graph is uniformly jointly strongly connected and the nonempty intersection condition holds. The results illustrate that nonempty intersection for the local optimal solution sets is a critical condition for successful distributed optimization for a large class of algorithms.

preprint2009arXiv

Convergence and Tradeoff of Utility-Optimal CSMA

It has been recently suggested that in wireless networks, CSMA-based distributed MAC algorithms could achieve optimal utility without any message passing. We present the first proof of convergence of such adaptive CSMA algorithms towards an arbitrarily tight approximation of utility-optimizing schedule. We also briefly discuss the tradeoff between optimality at equilibrium and short-term fairness practically achieved by such algorithms.

Alexandre Proutiere

What is connected

Connect this record

See the researcher in context

Building this map preview

32 published item(s)

Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning

Best Policy Identification in Linear MDPs

Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach

Measurement-based Admission Control in Sliced Networks: A Best Arm Identification Approach

Distributed Algorithms that Solve Boolean Equations with Local and Differential Privacies

Finite-time Identification of Stable Linear Systems: Optimality of the Least-Squares Estimator

Off-policy Learning for Remote Electrical Tilt Optimization

Optimal Best-arm Identification in Linear Bandits

Predictive Bandits

Cluster-Aided Mobility Predictions

Optimal Cluster Recovery in the Labeled Stochastic Block Model

Combinatorial Bandits Revisited

Network Synchronization with Convexity

Streaming, Memory Limited Matrix Completion with Noise

The Evolution of Beliefs over Signed Social Networks

Unimodal Bandits without Smoothness

Accurate Community Detection in the Stochastic Block Model via Spectral Algorithms

Community Detection via Random and Adaptive Sampling

Distributed Load Balancing in Heterogeneous Systems

Dynamic Rate and Channel Selection in Cognitive Radio Systems

Emergent Behaviors over Signed Random Dynamical Networks: Relative-State-Flipping Model

Emergent Behaviors over Signed Random Dynamical Networks: State-Flipping Model

Feedback Policies for Measurement-based Quantum State Manipulation

Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms

Streaming, Memory Limited Algorithms for Community Detection

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

Emergent Behaviors over Signed Random Networks in Dynamical Environments

Optimal Distributed Scheduling in Wireless Networks under the SINR interference model

Optimal Rate Sampling in 802.11 Systems

Randomized Consensus with Attractive and Repulsive Links

Distributed Optimization: Convergence Conditions from a Dynamical System Perspective

Convergence and Tradeoff of Utility-Optimal CSMA