Source author record

Alex Olshevsky

Alex Olshevsky appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Systems and Control Machine Learning Multiagent Systems Distributed, Parallel, and Cluster Computing Computational Complexity math.DS eess.SY Human-Computer Interaction math.CO math.NA math.PR Numerical Analysis physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

38works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Bridging the Gap Between Average and Discounted TD Learning

The analysis of Temporal Difference (TD) learning in the average-reward setting faces notable theoretical difficulties because the Bellman operator is not contractive with respect to any norm. This complicates standard analyses of stochastic updates that are effective in discounted settings. Although a considerable body of literature addresses these challenges, existing theoretical approaches come with limitations. We introduce a novel algorithm designed explicitly for policy evaluation in the average-reward setting, utilizing sampling from two Markovian trajectories. Our proposed method overcomes previous limitations by guaranteeing convergence to the unique solution of a properly defined projected Bellman equation. Notably, and in contrast to earlier work, our convergence analysis is uniformly applicable to both linear function approximation and tabular settings and does not involve explicit dimension-dependent terms in its convergence bounds. These results align with what is known to hold in the discounted setting. Furthermore, our algorithm achieves improved dependence on the problem's condition number, reducing the sample complexity from quartic, as in prior literature, to quadratic scaling, and thus matching the efficiency seen in the discounted setting.

preprint2026arXiv

Data Deletion Can Help in Adaptive RL

Deploying reinforcement learning policies in the real world requires adapting to time-varying environments. We study this problem in the contextual Markov Decision Process (cMDP) framework, where a family of environments is indexed by a low-dimensional context unknown at test time. The standard approach decomposes the problem: train a so-called "universal policy" which assumes knowledge of the true context, then pair it with a context estimator which approximates context using the observed trajectory. We identify a simple, counterintuitive trick that substantially improves the estimator: randomly delete a fraction of the training buffer after each round. This works because data is collected across multiple rounds using progressively better policies, and older trajectories come from a different distribution than what the estimator will face at deployment time; random deletion creates an implicit exponential decay on older data while preserving diversity without requiring any explicit identification of which samples are stale. This reduces robustness gap by 30% for MLPs and by 6% on average for recurrent networks. Strikingly, it allows a narrow MLP with 5x fewer parameters to outperform a wide MLP trained without deletion. To understand when and why deletion helps, we analyze regularized empirical risk minimization with a mismatch between the train distribution and the distribution at deployment; in this idealized setting, we prove that removing a single uniformly random training point decreases expected test loss in expectation under mild conditions. For ridge regression we make this quantitative: deletion helps when the regularization coefficient is moderate and the signal-to-noise ratio (SNR) is sufficiently low, and, crucially, this SNR threshold gives a direct measure of how large the distribution mismatch between training and deployment must be for deletion to be beneficial.

preprint2022arXiv

Distributed TD(0) with Almost No Communication

We provide a new non-asymptotic analysis of distributed TD(0) with linear function approximation. Our approach relies on "one-shot averaging," where $N$ agents run local copies of TD(0) and average the outcomes only once at the very end. We consider two models: one in which the agents interact with an environment they can observe and whose transitions depends on all of their actions (which we call the global state model), and one in which each agent can run a local copy of an identical Markov Decision Process, which we call the local state model. In the global state model, we show that the convergence rate of our distributed one-shot averaging method matches the known convergence rate of TD(0). By contrast, the best convergence rate in the previous literature showed a rate which, according to the worst-case bounds given, could underperform the non-distributed version by $O(N^3)$ in terms of the number of agents $N$. In the local state model, we demonstrate a version of the linear time speedup phenomenon, where the convergence time of the distributed process is a factor of $N$ faster than the convergence time of TD(0). As far as we are aware, this is the first result rigorously showing benefits from parallelism for temporal difference methods.

preprint2022arXiv

Optimal Lockdown for Pandemic Control

As a common strategy of contagious disease containment, lockdowns will inevitably weaken the economy. The ongoing COVID-19 pandemic underscores the trade-off arising from public health and economic cost. An optimal lockdown policy to resolve this trade-off is highly desired. Here we propose a mathematical framework of pandemic control through an optimal stabilizing non-uniform lockdown, where our goal is to reduce the economic activity as little as possible while decreasing the number of infected individuals at a prescribed rate. This framework allows us to efficiently compute the optimal stabilizing lockdown policy for general epidemic spread models, including both the classical SIS/SIR/SEIR models and a new model of COVID-19 transmissions. We demonstrate the power of this framework by analyzing publicly available data of inter-county travel frequencies to analyze a model of COVID-19 spread in the 62 counties of New York State. We find that an optimal stabilizing lockdown based on epidemic status in April 2020 would have reduced economic activity more stringently outside of New York City compared to within it, even though the epidemic was much more prevalent in New York City at that point. Such a counterintuitive result highlights the intricacies of pandemic control and sheds light on future lockdown policy design.

preprint2021arXiv

A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent

This paper is concerned with minimizing the average of $n$ cost functions over a network in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a non-asymptotic convergence analysis. For strongly convex and smooth objective functions, DSGD asymptotically achieves the optimal network independent convergence rate compared to centralized stochastic gradient descent (SGD). Our main contribution is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate, which we show behaves as $K_T=\mathcal{O}\left(\frac{n}{(1-ρ_w)^2}\right)$, where $1-ρ_w$ denotes the spectral gap of the mixing matrix. Moreover, we construct a "hard" optimization problem for which we show the transient time needed for DSGD to approach the asymptotic convergence rate is lower bounded by $Ω\left(\frac{n}{(1-ρ_w)^2} \right)$, implying the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.

preprint2020arXiv

Asymptotic Convergence Rate of Alternating Minimization for Rank One Matrix Completion

We study alternating minimization for matrix completion in the simplest possible setting: completing a rank-one matrix from a revealed subset of the entries. We bound the asymptotic convergence rate by the variational characterization of eigenvalues of a reversible consensus problem. This leads to a polynomial upper bound on the asymptotic rate in terms of number of nodes as well as the largest degree of the graph of revealed entries.

preprint2020arXiv

Asymptotic Network Independence and Step-Size for A Distributed Subgradient Method

We consider whether distributed subgradient methods can achieve a linear speedup over a centralized subgradient method. While it might be hoped that distributed network of $n$ nodes that can compute $n$ times more subgradients in parallel compared to a single node might, as a result, be $n$ times faster, existing bounds for distributed optimization methods are often consistent with a slowdown rather than speedup compared to a single node. We show that a distributed subgradient method has this "linear speedup" property when using a class of square-summable-but-not-summable step-sizes which include $1/t^β$ when $β\in (1/2,1)$; for such step-sizes, we show that after a transient period whose size depends on the spectral gap of the network, the method achieves a performance guarantee that does not depend on the network or the number of nodes. We also show that the same method can fail to have this "asymptotic network independence" property under the optimally decaying step-size $1/\sqrt{t}$ and, as a consequence, can fail to provide a linear speedup compared to a single node with $1/\sqrt{t}$ step-size.

preprint2020arXiv

Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning

We provide a discussion of several recent results which, in certain scenarios, are able to overcome a barrier in distributed stochastic optimization for machine learning. Our focus is the so-called asymptotic network independence property, which is achieved whenever a distributed method executed over a network of n nodes asymptotically converges to the optimal solution at a comparable rate to a centralized method with the same computational power as the entire network. We explain this property through an example involving the training of ML models and sketch a short mathematical analysis for comparing the performance of distributed stochastic gradient descent (DSGD) with centralized stochastic gradient decent (SGD).

preprint2020arXiv

Deterministic and Randomized Actuator Scheduling With Guaranteed Performance Bounds

In this paper, we investigate the problem of actuator selection for linear dynamical systems. We develop a framework to design a sparse actuator schedule for a given large-scale linear system with guaranteed performance bounds using deterministic polynomial-time and randomized approximately linear-time algorithms. First, we introduce systemic controllability metrics for linear dynamical systems that are monotone and homogeneous with respect to the controllability Gramian. We show that several popular and widely used optimization criteria in the literature belong to this class of controllability metrics. Our main result is to provide a polynomial-time actuator schedule that on average selects only a constant number of actuators at each time step, independent of the dimension, to furnish a guaranteed approximation of the controllability metrics in comparison to when all actuators are in use. Our results naturally apply to the dual problem of sensor selection, in which we provide a guaranteed approximation to the observability Gramian. We illustrate the effectiveness of our theoretical findings via several numerical simulations using benchmark examples.

preprint2020arXiv

Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers

We consider worker skill estimation for the single-coin Dawid-Skene crowdsourcing model. In practice, skill-estimation is challenging because worker assignments are sparse and irregular due to the arbitrary and uncontrolled availability of workers. We formulate skill estimation as a rank-one correlation-matrix completion problem, where the observed components correspond to observed label correlations between workers. We show that the correlation matrix can be successfully recovered and skills are identifiable if and only if the sampling matrix (observed components) does not have a bipartite connected component. We then propose a projected gradient descent scheme and show that skill estimates converge to the desired global optima for such sampling matrices. Our proof is original and the results are surprising in light of the fact that even the weighted rank-one matrix factorization problem is NP-hard in general. Next, we derive sample complexity bounds in terms of spectral properties of the signless Laplacian of the sampling matrix. Our proposed scheme achieves state-of-art performance on a number of real-world datasets.

preprint2020arXiv

Local SGD With a Communication Overhead Depending Only on the Number of Workers

We consider speeding up stochastic gradient descent (SGD) by parallelizing it across multiple workers. We assume the same data set is shared among $n$ workers, who can take SGD steps and coordinate with a central server. Unfortunately, this could require a lot of communication between the workers and the server, which can dramatically reduce the gains from parallelism. The Local SGD method, proposed and analyzed in the earlier literature, suggests machines should make many local steps between such communications. While the initial analysis of Local SGD showed it needs $Ω( \sqrt{T} )$ communications for $T$ local gradient steps in order for the error to scale proportionately to $1/(nT)$, this has been successively improved in a string of papers, with the state-of-the-art requiring $Ω\left( n \left( \mbox{ polynomial in log } (T) \right) \right)$ communications. In this paper, we give a new analysis of Local SGD. A consequence of our analysis is that Local SGD can achieve an error that scales as $1/(nT)$ with only a fixed number of communications independent of $T$: specifically, only $Ω(n)$ communications are required.

preprint2020arXiv

On A Relaxation of Time-Varying Actuator Placement

We consider the time-varying actuator placement in continuous time, where the goal is to maximize the trace of the controllability Grammian. A natural relaxation of the problem is to allow the binary $\{0,1\}$ variable indicating whether an actuator is used at a given time to take on values in the closed interval $[0,1]$. We show that all optimal solutions of both the original and the relaxed problems can be given via an explicit formula, and that, as long as the input matrix has no zero columns, the solutions sets of the original and relaxed problem coincide.

preprint2019arXiv

Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions

We consider the standard model of distributed optimization of a sum of functions $F(\bz) = \sum_{i=1}^n f_i(\bz)$, where node $i$ in a network holds the function $f_i(\bz)$. We allow for a harsh network model characterized by asynchronous updates, message delays, unpredictable message losses, and directed communication among nodes. In this setting, we analyze a modification of the Gradient-Push method for distributed optimization, assuming that \begin{enumerate*}[label=(\roman*)] \item node $i$ is capable of generating gradients of its function $f_i(\bz)$ corrupted by zero-mean bounded-support additive noise at each step, \item $F(\bz)$ is strongly convex, and \item each $f_i(\bz)$ has Lipschitz gradients. We show that our proposed method asymptotically performs as well as the best bounds on centralized gradient descent that takes steps in the direction of the sum of the noisy gradients of all the functions $f_1(\bz), \ldots, f_n(\bz)$ at each step.

preprint2016arXiv

A Tutorial on Distributed (Non-Bayesian) Learning: Problem, Algorithms and Results

We overview some results on distributed learning with focus on a family of recently proposed algorithms known as non-Bayesian social learning. We consider different approaches to the distributed learning problem and its algorithmic solutions for the case of finitely many hypotheses. The original centralized problem is discussed at first, and then followed by a generalization to the distributed setting. The results on convergence and convergence rate are presented for both asymptotic and finite time regimes. Various extensions are discussed such as those dealing with directed time-varying networks, Nesterov's acceleration technique and a continuum sets of hypothesis.

preprint2016arXiv

Convergence Time of Quantized Metropolis Consensus Over Time-Varying Networks

We consider the quantized consensus problem on undirected time-varying connected graphs with n nodes, and devise a protocol with fast convergence time to the set of consensus points. Specifically, we show that when the edges of each network in a sequence of connected time-varying networks are activated based on Poisson processes with Metropolis rates, the expected convergence time to the set of consensus points is at most O(n^2 log^2 n), where each node performs a constant number of updates per unit time.

preprint2016arXiv

Distributed Learning with Infinitely Many Hypotheses

We consider a distributed learning setup where a network of agents sequentially access realizations of a set of random variables with unknown distributions. The network objective is to find a parametrized distribution that best describes their joint observations in the sense of the Kullback-Leibler divergence. Apart from recent efforts in the literature, we analyze the case of countably many hypotheses and the case of a continuum of hypotheses. We provide non-asymptotic bounds for the concentration rate of the agents' beliefs around the correct hypothesis in terms of the number of agents, the network parameters, and the learning abilities of the agents. Additionally, we provide a novel motivation for a general set of distributed Non-Bayesian update rules as instances of the distributed stochastic mirror descent algorithm.

preprint2016arXiv

Eigenvalue Clustering, Control Energy, and Logarithmic Capacity

We prove two bounds showing that if the eigenvalues of a matrix are clustered in a region of the complex plane then the corresponding discrete-time linear system requires significant energy to control. A curious feature of one of our bounds is that the dependence on the region is via its logarithmic capacity, which is a measure of how well a unit of mass may be spread out over the region to minimize a logarithmic potential.

preprint2016arXiv

Geometrically Convergent Distributed Optimization with Uncoordinated Step-Sizes

A recent algorithmic family for distributed optimization, DIGing's, have been shown to have geometric convergence over time-varying undirected/directed graphs. Nevertheless, an identical step-size for all agents is needed. In this paper, we study the convergence rates of the Adapt-Then-Combine (ATC) variation of the DIGing algorithm under uncoordinated step-sizes. We show that the ATC variation of DIGing algorithm converges geometrically fast even if the step-sizes are different among the agents. In addition, our analysis implies that the ATC structure can accelerate convergence compared to the distributed gradient descent (DGD) structure which has been used in the original DIGing algorithm.

preprint2016arXiv

On symmetric continuum opinion dynamics

This paper investigates the asymptotic behavior of some common opinion dynamic models in a continuum of agents. We show that as long as the interactions among the agents are symmetric, the distribution of the agents' opinion converges. We also investigate whether convergence occurs in a stronger sense than merely in distribution, namely, whether the opinion of almost every agent converges. We show that while this is not the case in general, it becomes true under plausible assumptions on inter-agent interactions, namely that agents with similar opinions exert a non-negligible pull on each other, or that the interactions are entirely determined by their opinions via a smooth function.

preprint2016arXiv

On the geometric convergence rate of distributed economic dispatch/demand response in power networks

Motivated by potential applications in power systems, we study a problem of optimizing a sum of $n$ convex functions on dynamic networks of $n$ nodes when each function is known to only a single node. The nodes' variables, while satisfy their local constraints, are coupled through a linear constraint. Our main contribution is to design a fully distributed primal-dual method for this problem. Under some fairly standard assumptions on objective functions, strong convexity and smoothness, we provide an explicit analysis for the convergence rate of our method on different networks. In particular, the nodes variables achieve a geometric convergence to the optimal with the associated convergence time scales quartically in the number of nodes on any sequence of time-varying undirected graphs satisfying a long-term connectivity condition. Moreover, this convergence time is constant independent on the number of nodes when the network is a b-regular simple graph with $b\geq 3$. Finally, to show the effectiveness of our method we also simulate a number of studies on economic dispatch problems and demand response problems in power systems.

preprint2015arXiv

Network Independent Rates in Distributed Learning

We propose a new belief update rule for Distributed Non-Bayesian learning in time-varying directed graphs, where a group of agents tries to collectively identify a hypothesis that best describes a sequence of observed data. We show that the proposed update rule, inspired by the Push-Sum algorithm, is consistent, moreover we provide an explicit characterization of its convergence rate. Our main result states that, after a transient time, all agents will concentrate their beliefs at a network independent rate. Network independent rates were not available for other consensus based distributed learning algorithms.

preprint2015arXiv

Nonasymptotic Convergence Rates for Cooperative Learning Over Time-Varying Directed Graphs

We study the problem of distributed hypothesis testing with a network of agents where some agents repeatedly gain access to information about the correct hypothesis. The group objective is to globally agree on a joint hypothesis that best describes the observed data at all the nodes. We assume that the agents can interact with their neighbors in an unknown sequence of time-varying directed graphs. Following the pioneering work of Jadbabaie, Molavi, Sandroni, and Tahbaz-Salehi, we propose local learning dynamics which combine Bayesian updates at each node with a local aggregation rule of private agent signals. We show that these learning dynamics drive all agents to the set of hypotheses which best explain the data collected at all nodes as long as the sequence of interconnection graphs is uniformly strongly connected. Our main result establishes a non-asymptotic, explicit, geometric convergence rate for the learning dynamic.

preprint2015arXiv

On Primitivity of Sets of Matrices

A nonnegative matrix $A$ is called primitive if $A^k$ is positive for some integer $k>0$. A generalization of this concept to finite sets of matrices is as follows: a set of matrices $\mathcal M = \{A_1, A_2, \ldots, A_m \}$ is primitive if $A_{i_1} A_{i_2} \ldots A_{i_k}$ is positive for some indices $i_1, i_2, ..., i_k$. The concept of primitive sets of matrices comes up in a number of problems within the study of discrete-time switched systems. In this paper, we analyze the computational complexity of deciding if a given set of matrices is primitive and we derive bounds on the length of the shortest positive product. We show that while primitivity is algorithmically decidable, unless $P=NP$ it is not possible to decide primitivity of a matrix set in polynomial time. Moreover, we show that the length of the shortest positive sequence can be superpolynomial in the dimension of the matrices. On the other hand, defining ${\mathcal P}$ to be the set of matrices with no zero rows or columns, we give a simple combinatorial proof of a previously-known characterization of primitivity for matrices in ${\mathcal P}$ which can be tested in polynomial time. This latter observation is related to the well-known 1964 conjecture of Cerny on synchronizing automata; in fact, any bound on the minimal length of a synchronizing word for synchronizing automata immediately translates into a bound on the length of the shortest positive product of a primitive set of matrices in ${\mathcal P}$. In particular, any primitive set of $n \times n$ matrices in ${\mathcal P}$ has a positive product of length $O(n^3)$.

preprint2015arXiv

Stochastic Gradient-Push for Strongly Convex Functions on Time-Varying Directed Graphs

We investigate the convergence rate of the recently proposed subgradient-push method for distributed optimization over time-varying directed graphs. The subgradient-push method can be implemented in a distributed way without requiring knowledge of either the number of agents or the graph sequence; each node is only required to know its out-degree at each time. Our main result is a convergence rate of $O \left((\ln t)/t \right)$ for strongly convex functions with Lipschitz gradients even if only stochastic gradient samples are available; this is asymptotically faster than the $O \left((\ln t)/\sqrt{t} \right)$ rate previously known for (general) convex functions.

preprint2014arXiv

Consensus with Ternary Messages

We provide a protocol for real-valued average consensus by networks of agents which exchange only a single message from the ternary alphabet {-1,0,1} between neighbors at each step. Our protocol works on time-varying undirected graphs subject to a connectivity condition, has a worst-case convergence time which is polynomial in the number of agents and the initial values, and requires no global knowledge about the graph topologies on the part of each node to implement except for knowing an upper bound on the degrees of its neighbors.

preprint2014arXiv

Cooperative learning in multi-agent systems from intermittent measurements

Motivated by the problem of tracking a direction in a decentralized way, we consider the general problem of cooperative learning in multi-agent systems with time-varying connectivity and intermittent measurements. We propose a distributed learning protocol capable of learning an unknown vector $μ$ from noisy measurements made independently by autonomous nodes. Our protocol is completely distributed and able to cope with the time-varying, unpredictable, and noisy nature of inter-agent communication, and intermittent noisy measurements of $μ$. Our main result bounds the learning speed of our protocol in terms of the size and combinatorial features of the (time-varying) networks connecting the nodes.

preprint2014arXiv

Distributed optimization over time-varying directed graphs

We consider distributed optimization by a collection of nodes, each having access to its own convex function, whose collective goal is to minimize the sum of the functions. The communications between nodes are described by a time-varying sequence of directed graphs, which is uniformly strongly connected. For such communications, assuming that every node knows its out-degree, we develop a broadcast-based algorithm, termed the subgradient-push, which steers every node to an optimal value under a standard assumption of subgradient boundedness. The subgradient-push requires no knowledge of either the number of agents or the graph sequence to implement. Our analysis shows that the subgradient-push algorithm converges at a rate of $O(\ln(t)/\sqrt{t})$, where the constant depends on the initial values at the nodes, the subgradient norms, and, more interestingly, on both the consensus speed and the imbalances of influence among the nodes.

preprint2014arXiv

How to decide consensus? A combinatorial necessary and sufficient condition and a proof that consensus is decidable but NP-hard

A set of stochastic matrices ${\cal P}$ is a consensus set if for every sequence of matrices $P(1), P(2), \ldots$ whose elements belong to ${\cal P}$ and every initial state $x(0)$, the sequence of states defined by $x(t) = P(t) P(t-1) \cdots P(1) x(0)$ converges to a vector whose entries are all identical. In this paper, we introduce an "avoiding set condition" for compact sets of matrices and prove in our main theorem that this explicit combinatorial condition is both necessary and sufficient for consensus. We show that several of the conditions for consensus proposed in the literature can be directly derived from the avoiding set condition. The avoiding set condition is easy to check with an elementary algorithm, and so our result also establishes that consensus is algorithmically decidable. Direct verification of the avoiding set condition may require more than a polynomial time number of operations. This is however likely to be the case for any consensus checking algorithm since we also prove in this paper that unless $P=NP$, consensus cannot be decided in polynomial time.

preprint2014arXiv

Minimal Controllability Problems

Given a linear system, we consider the problem of finding a small set of variables to affect with an input so that the resulting system is controllable. We show that this problem is NP-hard; indeed, we show that even approximating the minimum number of variables that need to be affected within a multiplicative factor of $c \log n$ is NP-hard for some positive $c$. On the positive side, we show it is possible to find sets of variables matching this inapproximability barrier in polynomial time. This can be done by a simple greedy heuristic which sequentially picks variables to maximize the rank increase of the controllability matrix. Experiments on Erdos-Renyi random graphs demonstrate this heuristic almost always succeeds at findings the minimum number of variables.

preprint2014arXiv

Minimum Input Selection for Structural Controllability

Given a linear system $\dot{x} = Ax$, where $A$ is an $n \times n$ matrix with $m$ nonzero entries, we consider the problem of finding the smallest set of state variables to affect with an input so that the resulting system is structurally controllable. We further assume we are given a set of "forbidden state variables" $F$ which cannot be affected with an input and which we have to avoid in our selection. Our main result is that this problem can be solved deterministically in $O(n+m \sqrt{n})$ operations.

preprint2012arXiv

Degree Fluctuations and the Convergence Time of Consensus Algorithms

We consider a consensus algorithm in which every node in a sequence of undirected, B-connected graphs assigns equal weight to each of its neighbors. Under the assumption that the degree of each node is fixed (except for times when the node has no connections to other nodes), we show that consensus is achieved within a given accuracy $ε$ on n nodes in time $B+4n^3 B \ln(2n/ε)$. Because there is a direct relation between consensus algorithms in time-varying environments and inhomogeneous random walks, our result also translates into a general statement on such random walks. Moreover, we give a simple proof of a result of Cao, Spielman, and Morse that the worst case convergence time becomes exponentially large in the number of nodes $n$ under slight relaxation of the degree constancy assumption.

preprint2012arXiv

Nonuniform Coverage Control on the Line

This paper investigates control laws allowing mobile, autonomous agents to optimally position themselves on the line for distributed sensing in a nonuniform field. We show that a simple static control law, based only on local measurements of the field by each agent, drives the agents close to the optimal positions after the agents execute in parallel a number of sensing/movement/computation rounds that is essentially quadratic in the number of agents. Further, we exhibit a dynamic control law which, under slightly stronger assumptions on the capabilities and knowledge of each agent, drives the agents close to the optimal positions after the agents execute in parallel a number of sensing/communication/computation/movement rounds that is essentially linear in the number of agents. Crucially, both algorithms are fully distributed and robust to unpredictable loss and addition of agents.

preprint2011arXiv

Distributed anonymous discrete function computation

We propose a model for deterministic distributed function computation by a network of identical and anonymous nodes. In this model, each node has bounded computation and storage capabilities that do not grow with the network size. Furthermore, each node only knows its neighbors, not the entire graph. Our goal is to characterize the class of functions that can be computed within this model. In our main result, we provide a necessary condition for computability which we show to be nearly sufficient, in the sense that every function that satisfies this condition can at least be approximated. The problem of computing suitably rounded averages in a distributed manner plays a central role in our development; we provide an algorithm that solves it in time that grows quadratically with the size of the network.

preprint2010arXiv

A lower bound for distributed averaging algorithms

We derive lower bounds on the convergence speed of a widely used class of distributed averaging algorithms. In particular, we prove that any distributed averaging algorithm whose state consists of a single real number and whose (possibly nonlinear) update function satisfies a natural smoothness condition has a worst case running time of at least on the order of $n^2$ on a network of $n$ nodes. Our results suggest that increased memory or expansion of the state space is crucial for improving the running times of distributed averaging algorithms.

preprint2010arXiv

Efficient Information Aggregation Strategies for Distributed Control and Signal Processing

This thesis is concerned with distributed control and coordination of networks consisting of multiple, potentially mobile, agents. This is motivated mainly by the emergence of large scale networks characterized by the lack of centralized access to information and time-varying connectivity. Control and optimization algorithms deployed in such networks should be completely distributed, relying only on local observations and information, and robust against unexpected changes in topology such as link failures. We will describe protocols to solve certain control and signal processing problems in this setting. We will demonstrate that a key challenge for such systems is the problem of computing averages in a decentralized way. Namely, we will show that a number of distributed control and signal processing problems can be solved straightforwardly if solutions to the averaging problem are available. The rest of the thesis will be concerned with algorithms for the averaging problem and its generalizations. We will (i) derive the fastest known averaging algorithms in a variety of settings and subject to a variety of communication and storage constraints (ii) prove a lower bound identifying a fundamental barrier for averaging algorithms (iii) propose a new model for distributed function computation which reflects the constraints facing many large-scale networks, and nearly characterize the general class of functions which can be computed in this model.

preprint2010arXiv

Matrix P-norms are NP-hard to approximate if p \neq 1,2,\infty

We show that for any rational p \in [1,\infty) except p = 1, 2, unless P = NP, there is no polynomial-time algorithm for approximating the matrix p-norm to arbitrary relative precision. We also show that for any rational p\in [1,\infty) including p = 1, 2, unless P = NP, there is no polynomial-time algorithm approximates the \infty, p mixed norm to some fixed relative precision.

preprint2010arXiv

NP-hardness of Deciding Convexity of Quartic Polynomials and Related Problems

We show that unless P=NP, there exists no polynomial time (or even pseudo-polynomial time) algorithm that can decide whether a multivariate polynomial of degree four (or higher even degree) is globally convex. This solves a problem that has been open since 1992 when N. Z. Shor asked for the complexity of deciding convexity for quartic polynomials. We also prove that deciding strict convexity, strong convexity, quasiconvexity, and pseudoconvexity of polynomials of even degree four or higher is strongly NP-hard. By contrast, we show that quasiconvexity and pseudoconvexity of odd degree polynomials can be decided in polynomial time.

preprint2009arXiv

Convergence Speed in Distributed Consensus and Control

We study the convergence speed of distributed iterative algorithms for the consensus and averaging problems, with emphasis on the latter. We first consider the case of a fixed communication topology. We show that a simple adaptation of a consensus algorithm leads to an averaging algorithm. We prove lower bounds on the worst-case convergence time for various classes of linear, time-invariant, distributed consensus methods, and provide an algorithm that essentially matches those lower bounds. We then consider the case of a time-varying topology, and provide a polynomial-time averaging algorithm.

Alex Olshevsky

What is connected

Connect this record

See the researcher in context

Building this map preview

38 published item(s)

Bridging the Gap Between Average and Discounted TD Learning

Data Deletion Can Help in Adaptive RL

Distributed TD(0) with Almost No Communication

Optimal Lockdown for Pandemic Control

A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent

Asymptotic Convergence Rate of Alternating Minimization for Rank One Matrix Completion

Asymptotic Network Independence and Step-Size for A Distributed Subgradient Method

Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning

Deterministic and Randomized Actuator Scheduling With Guaranteed Performance Bounds

Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers

Local SGD With a Communication Overhead Depending Only on the Number of Workers

On A Relaxation of Time-Varying Actuator Placement

Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions

A Tutorial on Distributed (Non-Bayesian) Learning: Problem, Algorithms and Results

Convergence Time of Quantized Metropolis Consensus Over Time-Varying Networks

Distributed Learning with Infinitely Many Hypotheses

Eigenvalue Clustering, Control Energy, and Logarithmic Capacity

Geometrically Convergent Distributed Optimization with Uncoordinated Step-Sizes

On symmetric continuum opinion dynamics

On the geometric convergence rate of distributed economic dispatch/demand response in power networks

Network Independent Rates in Distributed Learning

Nonasymptotic Convergence Rates for Cooperative Learning Over Time-Varying Directed Graphs

On Primitivity of Sets of Matrices

Stochastic Gradient-Push for Strongly Convex Functions on Time-Varying Directed Graphs

Consensus with Ternary Messages

Cooperative learning in multi-agent systems from intermittent measurements

Distributed optimization over time-varying directed graphs

How to decide consensus? A combinatorial necessary and sufficient condition and a proof that consensus is decidable but NP-hard

Minimal Controllability Problems

Minimum Input Selection for Structural Controllability

Degree Fluctuations and the Convergence Time of Consensus Algorithms

Nonuniform Coverage Control on the Line

Distributed anonymous discrete function computation

A lower bound for distributed averaging algorithms

Efficient Information Aggregation Strategies for Distributed Control and Signal Processing

Matrix P-norms are NP-hard to approximate if p \neq 1,2,\infty

NP-hardness of Deciding Convexity of Quartic Polynomials and Related Problems

Convergence Speed in Distributed Consensus and Control