Source author record

Mikael Johansson

Mikael Johansson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Systems and Control Machine Learning Information Theory math.IT Social and Information Networks Distributed, Parallel, and Cluster Computing physics.soc-ph math.DS Multiagent Systems Cryptography and Security eess.SP math.PR Mathematical Software Networking and Internet Architecture

Catalog footprint

What is connected

44works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Delay-adaptive step-sizes for asynchronous learning

In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. We develop general convergence results for delay-adaptive asynchronous iterations and specialize these to proximal incremental gradient descent and block-coordinate descent algorithms. For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art.

preprint2022arXiv

Distributed Safe Resource Allocation using Barrier Functions

Resource allocation plays a central role in many networked systems such as smart grids, communication networks and urban transportation systems. In these systems, many constraints have physical meaning and having feasible allocation is often vital to avoid system breakdown. Hence, algorithms with asymptotic feasibility guarantees are often insufficient since it is impractical to run algorithms for an infinite number of rounds. This paper proposes a distributed feasible method (DFM) for safe resource allocation based on barrier functions. In DFM, every iterate is feasible and thus safe to implement. We prove that under mild conditions, DFM converges to an arbitrarily small neighbourhood of the optimal solution. Numerical experiments demonstrate the competitive performance of DFM.

preprint2022arXiv

Inferring origin-destination distribution of agent transfer in a complex network using deep gated recurrent units

Predicting the origin-destination (OD) probability distribution of agent transfer is an important problem for managing complex systems. However, prediction accuracy of associated statistical estimators suffer from underdetermination. While specific techniques have been proposed to overcome this deficiency, there still lacks a general approach. Here, we propose a deep neural network framework with gated recurrent units (DNNGRU) to address this gap. Our DNNGRU is \emph{network-free}, as it is trained by supervised learning with time-series data on the volume of agents passing through edges. We use it to investigate how network topologies affect OD prediction accuracy, where performance enhancement is observed to depend on the degree of overlap between paths taken by different ODs. By comparing against methods that give exact results, we demonstrate the near-optimal performance of our DNNGRU, which we found to consistently outperform existing methods and alternative neural network architectures, under diverse data generation scenarios.

preprint2022arXiv

On Uniform Boundedness Properties of SGD and its Momentum Variants

A theoretical, and potentially also practical, problem with stochastic gradient descent is that trajectories may escape to infinity. In this note, we investigate uniform boundedness properties of iterates and function values along the trajectories of the stochastic gradient descent algorithm and its important momentum variant. Under smoothness and $R$-dissipativity of the loss function, we show that broad families of step-sizes, including the widely used step-decay and cosine with (or without) restart step-sizes, result in uniformly bounded iterates and function values. Several important applications that satisfy these assumptions, including phase retrieval problems, Gaussian mixture models, and some neural network classifiers, are discussed in detail. We further extend the uniform boundedness of SGD and its momentum variant under the generalized dissipativity for the functions whose tails grow slower than quadratic functions. This includes some interesting applications, for example, Bayesian logistic regression and logistic regression with $\ell_1$ regularization.

preprint2022arXiv

Optimal convergence rates of totally asynchronous optimization

Asynchronous optimization algorithms are at the core of modern machine learning and resource allocation systems. However, most convergence results consider bounded information delays and several important algorithms lack guarantees when they operate under total asynchrony. In this paper, we derive explicit convergence rates for the proximal incremental aggregated gradient (PIAG) and the asynchronous block-coordinate descent (Async-BCD) methods under a specific model of total asynchrony, and show that the derived rates are order-optimal. The convergence bounds provide an insightful understanding of how the growth rate of the delays deteriorates the convergence times of the algorithms. Our theoretical findings are demonstrated by a numerical example.

preprint2021arXiv

Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth Non-Convex Optimization

Stochastic gradient methods with momentum are widely used in applications and at the core of optimization subroutines in many popular machine learning libraries. However, their sample complexities have not been obtained for problems beyond those that are convex or smooth. This paper establishes the convergence rate of a stochastic subgradient method with a momentum term of Polyak type for a broad class of non-smooth, non-convex, and constrained optimization problems. Our key innovation is the construction of a special Lyapunov function for which the proven complexity can be achieved without any tuning of the momentum parameter. For smooth problems, we extend the known complexity bound to the constrained case and demonstrate how the unconstrained case can be analyzed under weaker assumptions than the state-of-the-art. Numerical results confirm our theoretical developments.

preprint2021arXiv

Efficient Stochastic Programming in Julia

We present StochasticPrograms.jl, a user-friendly and powerful open-source framework for stochastic programming written in the Julia language. The framework includes both modeling tools and structure-exploiting optimization algorithms. Stochastic programming models can be efficiently formulated using expressive syntax and models can be instantiated, inspected, and analyzed interactively. The framework scales seamlessly to distributed environments. Small instances of a model can be run locally to ensure correctness, while larger instances are automatically distributed in a memory-efficient way onto supercomputers or clouds and solved using parallel optimization algorithms. These structure-exploiting solvers are based on variations of the classical L-shaped and progressive-hedging algorithms. We provide a concise mathematical background for the various tools and constructs available in the framework, along with code listings exemplifying their usage. Both software innovations related to the implementation of the framework and algorithmic innovations related to the structured solvers are highlighted. We conclude by demonstrating strong scaling properties of the distributed algorithms on numerical benchmarks in a multi-node setup.

preprint2021arXiv

On the Convergence of Step Decay Step-Size for Stochastic Optimization

The convergence of stochastic gradient descent is highly dependent on the step-size, especially on non-convex problems such as neural network training. Step decay step-size schedules (constant and then cut) are widely used in practice because of their excellent convergence and generalization qualities, but their theoretical properties are not yet well understood. We provide the convergence results for step decay in the non-convex regime, ensuring that the gradient norm vanishes at an $\mathcal{O}(\ln T/\sqrt{T})$ rate. We also provide the convergence guarantees for general (possibly non-smooth) convex problems, ensuring an $\mathcal{O}(\ln T/\sqrt{T})$ convergence rate. Finally, in the strongly convex case, we establish an $\mathcal{O}(\ln T/T)$ rate for smooth problems, which we also prove to be tight, and an $\mathcal{O}(\ln^2 T /T)$ rate without the smoothness assumption. We illustrate the practical efficiency of the step decay step-size in several large scale deep neural network training tasks.

preprint2020arXiv

A flexible framework for communication-efficient machine learning: from HPC to IoT

With the increasing scale of machine learning tasks, it has become essential to reduce the communication between computing nodes. Early work on gradient compression focused on the bottleneck between CPUs and GPUs, but communication-efficiency is now needed in a variety of different system architectures, from high-performance clusters to energy-constrained IoT devices. In the current practice, compression levels are typically chosen before training and settings that work well for one task may be vastly suboptimal for another dataset on another architecture. In this paper, we propose a flexible framework which adapts the compression level to the true gradient at each iteration, maximizing the improvement in the objective function that is achieved per communicated bit. Our framework is easy to adapt from one technology to the next by modeling how the communication cost depends on the compression level for the specific technology. Theoretical results and practical experiments indicate that the automatic tuning strategies significantly increase communication efficiency on several state-of-the-art compression schemes.

preprint2020arXiv

Advances in Asynchronous Parallel and Distributed Optimization

Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables. Consequently, they generally can make more efficient use of computational resources than synchronous methods, and they are not sensitive to issues like stragglers (i.e., slow nodes) and unreliable communication links. Mathematical modeling of asynchronous methods involves proper accounting of information delays, which makes their analysis challenging. This article reviews recent developments in the design and analysis of asynchronous optimization methods, covering both centralized methods, where all processors update a master copy of the optimization variables, and decentralized methods, where each processor maintains a local copy of the variables. The analysis provides insights as to how the degree of asynchrony impacts convergence rates, especially in stochastic optimization methods.

preprint2020arXiv

Anderson Acceleration of Proximal Gradient Methods

Anderson acceleration is a well-established and simple technique for speeding up fixed-point computations with countless applications. Previous studies of Anderson acceleration in optimization have only been able to provide convergence guarantees for unconstrained and smooth problems. This work introduces novel methods for adapting Anderson acceleration to (non-smooth and constrained) proximal gradient algorithms. Under some technical conditions, we extend the existing local convergence results of Anderson acceleration for smooth fixed-point mappings to the proposed scheme. We also prove analytically that it is not, in general, possible to guarantee global convergence of native Anderson acceleration. We therefore propose a simple scheme for stabilization that combines the global worst-case guarantees of proximal gradient methods with the local adaptation and practical speed-up of Anderson acceleration.

preprint2020arXiv

Compressed Gradient Methods with Hessian-Aided Error Compensation

The emergence of big data has caused a dramatic shift in the operating regime for optimization algorithms. The performance bottleneck, which used to be computations, is now often communications. Several gradient compression techniques have been proposed to reduce the communication load at the price of a loss in solution accuracy. Recently, it has been shown how compression errors can be compensated for in the optimization algorithm to improve the solution accuracy. Even though convergence guarantees for error-compensated algorithms have been established, there is very limited theoretical support for quantifying the observed improvements in solution accuracy. In this paper, we show that Hessian-aided error compensation, unlike other existing schemes, avoids the accumulation of compression errors on quadratic problems. We also present strong convergence guarantees of Hessian-based error compensation for stochastic gradient descent. Our numerical experiments highlight the benefits of Hessian-based error compensation, and demonstrate that similar convergence improvements are attained when only a diagonal Hessian approximation is used.

preprint2016arXiv

Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server

This paper presents an asynchronous incremental aggregated gradient algorithm and its implementation in a parameter server framework for solving regularized optimization problems. The algorithm can handle both general convex (possibly non-smooth) regularizers and general convex constraints. When the empirical data loss is strongly convex, we establish linear convergence rate, give explicit expressions for step-size choices that guarantee convergence to the optimum, and bound the associated convergence factors. The expressions have an explicit dependence on the degree of asynchrony and recover classical results under synchronous operation. Simulations and implementations on commercial compute clouds validate our findings.

preprint2016arXiv

Stability Analysis of Monotone Systems via Max-separable Lyapunov Functions

We analyze stability properties of monotone nonlinear systems via max-separable Lyapunov functions, motivated by the following observations: first, recent results have shown that asymptotic stability of a monotone nonlinear system implies the existence of a max-separable Lyapunov function on a compact set; second, for monotone linear systems, asymptotic stability implies the stronger properties of D-stability and insensitivity to time-delays. This paper establishes that for monotone nonlinear systems, equivalence holds between asymptotic stability, the existence of a max-separable Lyapunov function, D-stability, and insensitivity to bounded and unbounded time-varying delays. In particular, a new and general notion of D-stability for monotone nonlinear systems is discussed and a set of necessary and sufficient conditions for delay-independent stability are derived. Examples show how the results extend the state-of-the-art.

preprint2015arXiv

An Asynchronous Mini-Batch Algorithm for Regularized Stochastic Optimization

Mini-batch optimization has proven to be a powerful paradigm for large-scale learning. However, the state of the art parallel mini-batch algorithms assume synchronous operation or cyclic update orders. When worker nodes are heterogeneous (due to different computational capabilities or different communication delays), synchronous and cyclic operations are inefficient since they will leave workers idle waiting for the slower nodes to complete their computations. In this paper, we propose an asynchronous mini-batch algorithm for regularized stochastic optimization problems with smooth loss functions that eliminates idle waiting and allows workers to run at their maximal update rates. We show that by suitably choosing the step-size values, the algorithm achieves a rate of the order $O(1/\sqrt{T})$ for general convex regularization functions, and the rate $O(1/T)$ for strongly convex regularization functions, where $T$ is the number of iterations. In both cases, the impact of asynchrony on the convergence rate of our algorithm is asymptotically negligible, and a near-linear speedup in the number of workers can be expected. Theoretical results are confirmed in real implementations on a distributed computing infrastructure.

preprint2015arXiv

Convergence analysis of approximate primal solutions in dual first-order methods

Dual first-order methods are powerful techniques for large-scale convex optimization. Although an extensive research effort has been devoted to studying their convergence properties, explicit convergence rates for the primal iterates have only been established under global Lipschitz continuity of the dual gradient. This is a rather restrictive assumption that does not hold for several important classes of problems. In this paper, we demonstrate that primal convergence rate guarantees can also be obtained when the dual gradient is only locally Lipschitz. The class of problems that we analyze admits general convex constraints including nonlinear inequality, linear equality, and set constraints. As an approximate primal solution, we take the minimizer of the Lagrangian, computed when evaluating the dual gradient. We derive error bounds for this approximate primal solution in terms of the errors of the dual variables, and establish convergence rates of the dual variables when the dual problem is solved using a projected gradient or fast gradient method. By combining these results, we show that the suboptimality and infeasibility of the approximate primal solution at iteration $k$ are no worse than $O(1/\sqrt{k})$ when the dual problem is solved using a projected gradient method, and $O(1/k)$ when a fast dual gradient method is used.

preprint2015arXiv

Energy efficient D2D communications in dynamic TDD systems

Network-assisted device-to-device communication is a promising technology for improving the performance of proximity-based services. This paper demonstrates how the integration of device-to-device communications and dynamic time-division duplex can improve the energy efficiency of future cellular networks, leading to a greener system operation and a prolonged battery lifetime of mobile devices. We jointly optimize the mode selection, transmission period and power allocation to minimize the energy consumption (from both a system and a device perspective) while satisfying a certain rate requirement. The radio resource management problems are formulated as mixed-integer nonlinear programming problems. Although they are known to be NP-hard in general, we exploit the problem structure to design efficient algorithms that optimally solve several problem cases. For the remaining cases, a heuristic algorithm that computes near-optimal solutions while respecting practical constraints on execution times and signaling overhead is also proposed. Simulation results confirm that the combination of device-to-device and flexible time-division-duplex technologies can significantly enhance spectrum and energy-efficiency of next generation cellular systems.

preprint2015arXiv

Finite-time Convergent Gossiping

Gossip algorithms are widely used in modern distributed systems, with applications ranging from sensor networks and peer-to-peer networks to mobile vehicle networks and social networks. A tremendous research effort has been devoted to analyzing and improving the asymptotic rate of convergence for gossip algorithms. In this work we study finite-time convergence of deterministic gossiping. We show that there exists a symmetric gossip algorithm that converges in finite time if and only if the number of network nodes is a power of two, while there always exists an asymmetric gossip algorithm with finite-time convergence, independent of the number of nodes. For $n=2^m$ nodes, we prove that a fastest convergence can be reached in $nm=n\log_2 n$ node updates via symmetric gossiping. On the other hand, under asymmetric gossip among $n=2^m+r$ nodes with $0\leq r<2^m$, it takes at least $mn+2r$ node updates for achieving finite-time convergence. It is also shown that the existence of finite-time convergent gossiping often imposes strong structural requirements on the underlying interaction graph. Finally, we apply our results to gossip algorithms in quantum networks, where the goal is to control the state of a quantum system via pairwise interactions. We show that finite-time convergence is never possible for such systems.

preprint2015arXiv

On Reconstructability of Quadratic Utility Functions from the Iterations in Gradient Methods

In this paper, we consider a scenario where an eavesdropper can read the content of messages transmitted over a network. The nodes in the network are running a gradient algorithm to optimize a quadratic utility function where such a utility optimization is a part of a decision making process by an administrator. We are interested in understanding the conditions under which the eavesdropper can reconstruct the utility function or a scaled version of it and, as a result, gain insight into the decision-making process. We establish that if the parameter of the gradient algorithm, i.e.,~the step size, is chosen appropriately, the task of reconstruction becomes practically impossible for a class of Bayesian filters with uniform priors. We establish what step-size rules should be employed to ensure this.

preprint2015arXiv

On the trade-off between control performance and communication cost in event-triggered control

We consider a stochastic system where the communication between the controller and the actuator is triggered by a threshold-based rule. The communication is performed across an unreliable link that stochastically erases transmitted packets. To decrease the communication burden, and as a partial protection against dropped packets, the controller sends a sequence of control commands to the actuator in each packet. These commands are stored in a buffer and applied sequentially until the next control packet arrives. In this context, we study dead-beat control laws and compute the expected linear-quadratic loss of the closed-loop system for any given event-threshold. Furthermore, we provide analytical expressions that quantify the trade-off between the communication cost and the control performance of event-triggered control systems. Numerical examples demonstrate the effectiveness of the proposed framework.

preprint2015arXiv

Optimal Radio Frequency Energy Harvesting with Limited Energy Arrival Knowledge

In this paper, we develop optimal policies for deciding when a wireless node with radio frequency (RF) energy harvesting (EH) capabilities should try and harvest ambient RF energy. While the idea of RF-EH is appealing, it is not always beneficial to attempt to harvest energy; in environments where the ambient energy is low, nodes could consume more energy being awake with their harvesting circuits turned on than what they can extract from the ambient radio signals; it is then better to enter a sleep mode until the ambient RF energy increases. Towards this end, we consider a scenario with intermittent energy arrivals and a wireless node that wakes up for a period of time (herein called the time-slot) and harvests energy. If enough energy is harvested during the time-slot, then the harvesting is successful and excess energy is stored; however, if there does not exist enough energy the harvesting is unsuccessful and energy is lost. We assume that the ambient energy level is constant during the time-slot, and changes at slot boundaries. The energy level dynamics are described by a two-state Gilbert-Elliott Markov chain model, where the state of the Markov chain can only be observed during the harvesting action, and not when in sleep mode. Two scenarios are studied under this model. In the first scenario, we assume that we have knowledge of the transition probabilities of the Markov chain and formulate the problem as a Partially Observable Markov Decision Process (POMDP), where we find a threshold-based optimal policy. In the second scenario, we assume that we don't have any knowledge about these parameters and formulate the problem as a Bayesian adaptive POMDP; to reduce the complexity of the computations we also propose a heuristic posterior sampling algorithm. The performance of our approaches is demonstrated via numerical examples.

preprint2015arXiv

The Evolution of Beliefs over Signed Social Networks

We study the evolution of opinions (or beliefs) over a social network modeled as a signed graph. The sign attached to an edge in this graph characterizes whether the corresponding individuals or end nodes are friends (positive links) or enemies (negative links). Pairs of nodes are randomly selected to interact over time, and when two nodes interact, each of them updates its opinion based on the opinion of the other node and the sign of the corresponding link. This model generalizes DeGroot model to account for negative links: when two enemies interact, their opinions go in opposite directions. We provide conditions for convergence and divergence in expectation, in mean-square, and in almost sure sense, and exhibit phase transition phenomena for these notions of convergence depending on the parameters of the opinion update model and on the structure of the underlying graph. We establish a {\it no-survivor} theorem, stating that the difference in opinions of any two nodes diverges whenever opinions in the network diverge as a whole. We also prove a {\it live-or-die} lemma, indicating that almost surely, the opinions either converge to an agreement or diverge. Finally, we extend our analysis to cases where opinions have hard lower and upper limits. In these cases, we study when and how opinions may become asymptotically clustered to the belief boundaries, and highlight the crucial influence of (strong or weak) structural balance of the underlying network on this clustering phenomenon.

preprint2014arXiv

A Buffer-aided Successive Opportunistic Relay Selection Scheme with Power Adaptation and Inter-Relay Interference Cancellation for Cooperative Diversity Systems

In this paper we consider a simple cooperative network consisting of a source, a destination and a cluster of decode-and-forward half-duplex relays. At each time-slot, the source and (possibly) one of the relays transmit a packet to another relay and the destination, respectively, resulting in inter-relay interference (IRI). In this work, with the aid of buffers at the relays, we mitigate the detrimental effect of IRI through interference cancellation. More specifically, we propose the min-power scheme that minimizes the total energy expenditure per time slot under an IRI cancellation scheme. Apart from minimizing the energy expenditure, the min-power selection scheme, also provides better throughput and lower outage probability than existing works in the literature. It is the first time that interference cancellation is combined with buffer-aided relays and power adaptation to mitigate the IRI and minimize the energy expenditure. The new relay selection policy is analyzed in terms of outage probability and diversity, by modeling the evolution of the relay buffers as a Markov Chain (MC). We construct the state transition matrix of the MC, and hence obtain the steady state with which we can characterize the outage probability. The proposed scheme outperforms relevant state-of-the-art relay selection schemes in terms of throughput, diversity and energy efficiency, as demonstrated via examples.

preprint2014arXiv

Approximation of Markov Processes by Lower Dimensional Processes via Total Variation Metrics

The aim of this paper is to approximate a finite-state Markov process by another process with fewer states, called herein the approximating process. The approximation problem is formulated using two different methods. The first method, utilizes the total variation distance to discriminate the transition probabilities of a high dimensional Markov process and a reduced order Markov process. The approximation is obtained by optimizing a linear functional defined in terms of transition probabilities of the reduced order Markov process over a total variation distance constraint. The transition probabilities of the approximated Markov process are given by a water-filling solution. The second method, utilizes total variation distance to discriminate the invariant probability of a Markov process and that of the approximating process. The approximation is obtained via two alternative formulations: (a) maximizing a functional of the occupancy distribution of the Markov process, and (b) maximizing the entropy of the approximating process invariant probability. For both formulations, once the reduced invariant probability is obtained, which does not correspond to a Markov process, a further approximation by a Markov process is proposed which minimizes the Kullback-Leibler divergence. These approximations are given by water-filling solutions. Finally, the theoretical results of both methods are applied to specific examples to illustrate the methodology, and the water-filling behavior of the approximations.

preprint2014arXiv

Asymptotic Stability and Decay Rates of Homogeneous Positive Systems With Bounded and Unbounded Delays

There are several results on the stability of nonlinear positive systems in the presence of time delays. However, most of them assume that the delays are constant. This paper considers time-varying, possibly unbounded, delays and establishes asymptotic stability and bounds the decay rate of a significant class of nonlinear positive systems which includes positive linear systems as a special case. Specifically, we present a necessary and sufficient condition for delay-independent stability of continuous-time positive systems whose vector fields are cooperative and homogeneous. We show that global asymptotic stability of such systems is independent of the magnitude and variation of the time delays. For various classes of time delays, we are able to derive explicit expressions that quantify the decay rates of positive systems. We also provide the corresponding counterparts for discrete-time positive systems whose vector fields are non-decreasing and homogeneous.

preprint2014arXiv

Emergent Behaviors over Signed Random Dynamical Networks: Relative-State-Flipping Model

We study asymptotic dynamical patterns that emerge among a set of nodes interacting in a dynamically evolving signed random network, where positive links carry out standard consensus and negative links induce relative-state flipping. A sequence of deterministic signed graphs define potential node interactions that take place independently. Each node receives a positive recommendation consistent with the standard consensus algorithm from its positive neighbors, and a negative recommendation defined by relative-state flipping from its negative neighbors. After receiving these recommendations, each node puts a deterministic weight to each recommendation, and then encodes these weighted recommendations in its state update through stochastic attentions defined by two Bernoulli random variables. We establish a number of conditions regarding almost sure convergence and divergence of the node states. We also propose a condition for almost sure state clustering for essentially weakly balanced graphs, with the help of several martingale convergence lemmas. Some fundamental differences on the impact of the deterministic weights and stochastic attentions to the node state evolution are highlighted between the current relative-state-flipping model and the state-flipping model considered in Altafini 2013 and Shi et al. 2014.

preprint2014arXiv

Emergent Behaviors over Signed Random Dynamical Networks: State-Flipping Model

Recent studies from social, biological, and engineering network systems have drawn attention to the dynamics over signed networks, where each link is associated with a positive/negative sign indicating trustful/mistrustful, activator/inhibitor, or secure/malicious interactions. We study asymptotic dynamical patterns that emerge among a set of nodes that interact in a dynamically evolving signed random network. Node interactions take place at random on a sequence of deterministic signed graphs. Each node receives positive or negative recommendations from its neighbors depending on the sign of the interaction arcs, and updates its state accordingly. Recommendations along a positive arc follow the standard consensus update. As in the work by Altafini, negative recommendations use an update where the sign of the neighbor state is flipped. Nodes may weight positive and negative recommendations differently, and random processes are introduced to model the time-varying attention that nodes pay to these recommendations. Conditions for almost sure convergence and divergence of the node states are established. We show that under this so-called state-flipping model, all links contribute to a consensus of the absolute values of the nodes, even under switching sign patterns and dynamically changing environment. A no-survivor property is established, indicating that every node state diverges almost surely if the maximum network state diverges.

preprint2014arXiv

Global convergence of the Heavy-ball method for convex optimization

This paper establishes global convergence and provides global bounds of the convergence rate of the Heavy-ball method for convex optimization problems. When the objective function has Lipschitz-continuous gradient, we show that the Cesaro average of the iterates converges to the optimum at a rate of $O(1/k)$ where k is the number of iterations. When the objective function is also strongly convex, we prove that the Heavy-ball iterates converge linearly to the unique optimum.

preprint2014arXiv

Modular design of jointly optimal controllers and forwarding policies for wireless control

We consider the joint design of packet forwarding policies and controllers for wireless control loops where sensor measurements are sent to the controller over an unreliable and energy-constrained multi-hop wireless network. For fixed sampling rate of the sensor, the co-design problem separates into two well-defined and independent subproblems: transmission scheduling for maximizing the deadline-constrained reliability and optimal control under packet loss. We develop optimal and implementable solutions for these subproblems and show that the optimally co-designed system can be efficiently found. Numerical examples highlight the many trade-offs involved and demonstrate the power of our approach.

preprint2014arXiv

Optimal parameter selection for the alternating direction method of multipliers (ADMM): quadratic problems

The alternating direction method of multipliers (ADMM) has emerged as a powerful technique for large-scale structured optimization. Despite many recent results on the convergence properties of ADMM, a quantitative characterization of the impact of the algorithm parameters on the convergence times of the method is still lacking. In this paper we find the optimal algorithm parameters that minimize the convergence factor of the ADMM iterates in the context of l2-regularized minimization and constrained quadratic programming. Numerical examples show that our parameter selection rules significantly outperform existing alternatives in the literature.

preprint2014arXiv

Optimal scaling of the ADMM algorithm for distributed quadratic programming

This paper presents optimal scaling of the alternating directions method of multipliers (ADMM) algorithm for a class of distributed quadratic programming problems. The scaling corresponds to the ADMM step-size and relaxation parameter, as well as the edge-weights of the underlying communication graph. We optimize these parameters to yield the smallest convergence factor of the algorithm. Explicit expressions are derived for the step-size and relaxation parameter, as well as for the corresponding convergence factor. Numerical simulations justify our results and highlight the benefits of optimally scaling the ADMM algorithm.

preprint2014arXiv

Sub-homogeneous positive monotone systems are insensitive to heterogeneous time-varying delays

We show that a sub-homogeneous positive monotone system with bounded heterogeneous time-varying delays is globally asymptotically stable if and only if the corresponding delay-free system is globally asymptotically stable. The proof is based on an extension of a delay-independent stability result for monotone systems under constant delays by Smith to systems with bounded heterogeneous time-varying delays. Under the additional assumption of positivity and sub-homogeneous vector fields, we establish the aforementioned delay insensitivity property and derive a novel test for global asymptotic stability. If the system has a unique equilibrium point in the positive orthant, we prove that our stability test is necessary and sufficient. Specialized to positive linear systems, our results extend and sharpen existing results from the literature.

preprint2013arXiv

Benchmarking Practical RRM Algorithms for D2D Communications in LTE Advanced

Device-to-device (D2D) communication integrated into cellular networks is a means to take advantage of the proximity of devices and allow for reusing cellular resources and thereby to increase the user bitrates and the system capacity. However, when D2D (in the 3rd Generation Partnership Project also called Long Term Evolution (LTE) Direct) communication in cellular spectrum is supported, there is a need to revisit and modify the existing radio resource management (RRM) and power control (PC) techniques to realize the potential of the proximity and reuse gains and to limit the interference at the cellular layer. In this paper, we examine the performance of the flexible LTE PC tool box and benchmark it against a utility optimal iterative scheme. We find that the open loop PC scheme of LTE performs well for cellular users both in terms of the used transmit power levels and the achieved signal-to-interference-and-noise-ratio (SINR) distribution. However, the performance of the D2D users as well as the overall system throughput can be boosted by the utility optimal scheme, because the utility maximizing scheme takes better advantage of both the proximity and the reuse gains. Therefore, in this paper we propose a hybrid PC scheme, in which cellular users employ the open loop path compensation method of LTE, while D2D users use the utility optimizing distributed PC scheme. In order to protect the cellular layer, the hybrid scheme allows for limiting the interference caused by the D2D layer at the cost of having a small impact on the performance of the D2D layer. To ensure feasibility, we limit the number of iterations to a practically feasible level. We make the point that the hybrid scheme is not only near optimal, but it also allows for a distributed implementation for the D2D users, while preserving the LTE PC scheme for the cellular users.

preprint2013arXiv

Deterministic and Stochastic Approaches to Supervisory Control Design for Networked Systems with Time-Varying Communication Delays

This paper proposes a supervisory control structure for networked systems with time-varying delays. The control structure, in which a supervisor triggers the most appropriate controller from a multi-controller unit, aims at improving the closed-loop performance relative to what can be obtained using a single robust controller. Our analysis considers average dwell-time switching and is based on a novel multiple Lyapunov-Krasovskii functional. We develop stability conditions that can be verified by semi-definite programming, and show that the associated state feedback synthesis problem also can be solved using convex optimization tools. Extensions of the analysis and synthesis procedures to the case when the evolution of the delay mode is described by a Markov chain are also developed. Simulations on small and large-scale networked control systems are used to illustrate the effectiveness of our approach.

preprint2013arXiv

Distributed Output-Feedback LQG Control with Delayed Information Sharing

This paper develops a controller synthesis method for distributed LQG control problems under output-feedback. We consider a system consisting of three interconnected linear subsystems with a delayed information sharing structure. While the state-feedback case has previously been solved, the extension to output-feedback is nontrivial as the classical separation principle fails. To find the optimal solution, the controller is decomposed into two independent components: a centralized LQG-optimal controller under delayed state observations, and a sum of correction terms based on additional local information available to decision makers. Explicit discrete-time equations are derived whose solutions are the gains of the optimal controller.

preprint2013arXiv

Emergent Behaviors over Signed Random Networks in Dynamical Environments

We study asymptotic dynamical patterns that emerge among a set of nodes that interact in a dynamically evolving signed random network. Node interactions take place at random on a sequence of deterministic signed graphs. Each node receives positive or negative recommendations from its neighbors depending on the sign of the interaction arcs, and updates its state accordingly. Positive recommendations follow the standard consensus update while two types of negative recommendations, each modeling a different type of antagonistic or malicious interaction, are considered. Nodes may weigh positive and negative recommendations differently, and random processes are introduced to model the time-varying attention that nodes pay to the positive and negative recommendations. Various conditions for almost sure convergence, divergence, and clustering of the node states are established. Some fundamental similarities and differences are established for the two notions of negative recommendations.

preprint2013arXiv

Exponential Stability of Homogeneous Positive Systems of Degree One With Time-Varying Delays

While the asymptotic stability of positive linear systems in the presence of bounded time delays has been thoroughly investigated, the theory for nonlinear positive systems is considerably less well-developed. This paper presents a set of conditions for establishing delay-independent stability and bounding the decay rate of a significant class of nonlinear positive systems which includes positive linear systems as a special case. Specifically, when the time delays have a known upper bound, we derive necessary and sufficient conditions for exponential stability of (a) continuous-time positive systems whose vector fields are homogeneous and cooperative, and (b) discrete-time positive systems whose vector fields are homogeneous and order preserving. We then present explicit expressions that allow us to quantify the impact of delays on the decay rate and show that the best decay rate of positive linear systems that our bounds provide can be found via convex optimization. Finally, we extend the results to general linear systems with time-varying delays.

preprint2013arXiv

Randomized Consensus with Attractive and Repulsive Links

We study convergence properties of a randomized consensus algorithm over a graph with both attractive and repulsive links. At each time instant, a node is randomly selected to interact with a random neighbor. Depending on if the link between the two nodes belongs to a given subgraph of attractive or repulsive links, the node update follows a standard attractive weighted average or a repulsive weighted average, respectively. The repulsive update has the opposite sign of the standard consensus update. In this way, it counteracts the consensus formation and can be seen as a model of link faults or malicious attacks in a communication network, or the impact of trust and antagonism in a social network. Various probabilistic convergence and divergence conditions are established. A threshold condition for the strength of the repulsive action is given for convergence in expectation: when the repulsive weight crosses this threshold value, the algorithm transits from convergence to divergence. An explicit value of the threshold is derived for classes of attractive and repulsive graphs. The results show that a single repulsive link can sometimes drastically change the behavior of the consensus algorithm. They also explicitly show how the robustness of the consensus algorithm depends on the size and other properties of the graphs.

preprint2012arXiv

A Regularized Saddle-Point Algorithm for Networked Optimization with Resource Allocation Constraints

We propose a regularized saddle-point algorithm for convex networked optimization problems with resource allocation constraints. Standard distributed gradient methods suffer from slow convergence and require excessive communication when applied to problems of this type. Our approach offers an alternative way to address these problems, and ensures that each iterative update step satisfies the resource allocation constraints. We derive step-size conditions under which the distributed algorithm converges geometrically to the regularized optimal value, and show how these conditions are affected by the underlying network topology. We illustrate our method on a robotic network application example where a group of mobile agents strive to maintain a moving target in the barycenter of their positions.

preprint2012arXiv

Accelerated Gradient Methods for Networked Optimization

We develop multi-step gradient methods for network-constrained optimization of strongly convex functions with Lipschitz-continuous gradients. Given the topology of the underlying network and bounds on the Hessian of the objective function, we determine the algorithm parameters that guarantee the fastest convergence and characterize situations when significant speed-ups can be obtained over the standard gradient method. Furthermore, we quantify how the performance of the gradient method and its accelerated counterpart are affected by uncertainty in the problem data, and conclude that in most cases our proposed method outperforms gradient descent. Finally, we apply the proposed technique to three engineering problems: resource allocation under network-wide budget constraints, distributed averaging, and Internet congestion control. In all cases, we demonstrate that our algorithm converges more rapidly than alternative algorithms reported in the literature.

preprint2012arXiv

Contractive Interference Functions and Rates of Convergence of Distributed Power Control Laws

The standard interference functions introduced by Yates have been very influential on the analysis and design of distributed power control laws. While powerful and versatile, the framework has some drawbacks: the existence of fixed-points has to be established separately, and no guarantees are given on the rate of convergence of the iterates. This paper introduces contractive interference functions, a slight reformulation of the standard interference functions that guarantees the existence and uniqueness of fixed-points along with linear convergence of iterates. We show that many power control laws from the literature are contractive and derive, sometimes for the first time, analytical convergence rate estimates for these algorithms. We also prove that contractive interference functions converge when executed totally asynchronously and, under the assumption that the communication delay is bounded, derive an explicit bound on the convergence time penalty due to increased delay. Finally, we demonstrate that although standard interference functions are, in general, not contractive, they are all para-contractions with respect to a certain metric. Similar results for two-sided scalable interference functions are also derived.

preprint2012arXiv

Ergodic Mirror Descent

We generalize stochastic subgradient descent methods to situations in which we do not receive independent samples from the distribution over which we optimize, but instead receive samples that are coupled over time. We show that as long as the source of randomness is suitably ergodic---it converges quickly enough to a stationary distribution---the method enjoys strong convergence guarantees, both in expectation and with high probability. This result has implications for stochastic optimization in high-dimensional spaces, peer-to-peer distributed optimization schemes, decision problems with dependent data, and stochastic optimization problems over combinatorial spaces.

preprint2012arXiv

How Agreement and Disagreement Evolve over Random Dynamic Networks

The dynamics of an agreement protocol interacting with a disagreement process over a common random network is considered. The model can represent the spreading of true and false information over a communication network, the propagation of faults in a large-scale control system, or the development of trust and mistrust in a society. At each time instance and with a given probability, a pair of network nodes are selected to interact. At random each of the nodes then updates its state towards the state of the other node (attraction), away from the other node (repulsion), or sticks to its current state (neglect). Agreement convergence and disagreement divergence results are obtained for various strengths of the updates for both symmetric and asymmetric update rules. Impossibility theorems show that a specific level of attraction is required for almost sure asymptotic agreement and a specific level of repulsion is required for almost sure asymptotic disagreement. A series of sufficient and/or necessary conditions are then established for agreement convergence or disagreement divergence. In particular, under symmetric updates, a critical convergence measure in the attraction and repulsion update strength is found, in the sense that the asymptotic property of the network state evolution transits from agreement convergence to disagreement divergence when this measure goes from negative to positive. The result can be interpreted as a tight bound on how much bad action needs to be injected in a dynamic network in order to consistently steer its overall behavior away from consensus.

preprint2012arXiv

Randomized Gossip Algorithm with Unreliable Communication

In this paper, we study an asynchronous randomized gossip algorithm under unreliable communication. At each instance, two nodes are selected to meet with a given probability. When nodes meet, two unreliable communication links are established with communication in each direction succeeding with a time-varying probability. It is shown that two particularly interesting cases arise when these communication processes are either perfectly dependent or independent. Necessary and sufficient conditions on the success probability sequence are proposed to ensure almost sure consensus or $ε$-consensus. Weak connectivity is required when the communication is perfectly dependent, while double connectivity is required when the communication is independent. Moreover, it is proven that with odd number of nodes, average preserving turns from almost forever (with probability one for all initial conditions) for perfectly dependent communication, to almost never (with probability zero for almost all initial conditions) for the independent case. This average preserving property does not hold true for general number of nodes. These results indicate the fundamental role the node interactions have in randomized gossip algorithms.

Mikael Johansson

What is connected

Connect this record

See the researcher in context

Building this map preview

44 published item(s)

Delay-adaptive step-sizes for asynchronous learning

Distributed Safe Resource Allocation using Barrier Functions

Inferring origin-destination distribution of agent transfer in a complex network using deep gated recurrent units

On Uniform Boundedness Properties of SGD and its Momentum Variants

Optimal convergence rates of totally asynchronous optimization

Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth Non-Convex Optimization

Efficient Stochastic Programming in Julia

On the Convergence of Step Decay Step-Size for Stochastic Optimization

A flexible framework for communication-efficient machine learning: from HPC to IoT

Advances in Asynchronous Parallel and Distributed Optimization

Anderson Acceleration of Proximal Gradient Methods

Compressed Gradient Methods with Hessian-Aided Error Compensation

Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server

Stability Analysis of Monotone Systems via Max-separable Lyapunov Functions

An Asynchronous Mini-Batch Algorithm for Regularized Stochastic Optimization

Convergence analysis of approximate primal solutions in dual first-order methods

Energy efficient D2D communications in dynamic TDD systems

Finite-time Convergent Gossiping

On Reconstructability of Quadratic Utility Functions from the Iterations in Gradient Methods

On the trade-off between control performance and communication cost in event-triggered control

Optimal Radio Frequency Energy Harvesting with Limited Energy Arrival Knowledge

The Evolution of Beliefs over Signed Social Networks

A Buffer-aided Successive Opportunistic Relay Selection Scheme with Power Adaptation and Inter-Relay Interference Cancellation for Cooperative Diversity Systems

Approximation of Markov Processes by Lower Dimensional Processes via Total Variation Metrics

Asymptotic Stability and Decay Rates of Homogeneous Positive Systems With Bounded and Unbounded Delays

Emergent Behaviors over Signed Random Dynamical Networks: Relative-State-Flipping Model

Emergent Behaviors over Signed Random Dynamical Networks: State-Flipping Model

Global convergence of the Heavy-ball method for convex optimization

Modular design of jointly optimal controllers and forwarding policies for wireless control

Optimal parameter selection for the alternating direction method of multipliers (ADMM): quadratic problems

Optimal scaling of the ADMM algorithm for distributed quadratic programming

Sub-homogeneous positive monotone systems are insensitive to heterogeneous time-varying delays

Benchmarking Practical RRM Algorithms for D2D Communications in LTE Advanced

Deterministic and Stochastic Approaches to Supervisory Control Design for Networked Systems with Time-Varying Communication Delays

Distributed Output-Feedback LQG Control with Delayed Information Sharing

Emergent Behaviors over Signed Random Networks in Dynamical Environments

Exponential Stability of Homogeneous Positive Systems of Degree One With Time-Varying Delays

Randomized Consensus with Attractive and Repulsive Links

A Regularized Saddle-Point Algorithm for Networked Optimization with Resource Allocation Constraints

Accelerated Gradient Methods for Networked Optimization

Contractive Interference Functions and Rates of Convergence of Distributed Power Control Laws

Ergodic Mirror Descent

How Agreement and Disagreement Evolve over Random Dynamic Networks

Randomized Gossip Algorithm with Unreliable Communication