Source author record

Ali H. Sayed

Ali H. Sayed appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Multiagent Systems Machine Learning math.OC Information Theory math.IT Systems and Control Distributed, Parallel, and Cluster Computing eess.SP Social and Information Networks physics.soc-ph Artificial Intelligence Cryptography and Security eess.SY math.PR math.ST Statistics Theory Computation

Catalog footprint

What is connected

55works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking

We study high-probability (HP) convergence guarantees in decentralized stochastic optimization, where multiple agents collaborate to jointly train a model over a network. Existing HP results in decentralized settings almost exclusively focus on the Decentralized Stochastic Gradient Descent ($\mathtt{DSGD}$) algorithm, which requires strong assumptions, such as bounded data heterogeneity, or strong convexity of each agent's cost. This is contrary to the mean-squared error (MSE) results, where methods incorporating bias-correction techniques are known to converge under relaxed assumptions and achieve better practical performance. In this paper we provide the first step toward bridging the gap, by studying HP convergence of $\mathtt{DSGD}$ incorporating the gradient tracking technique, in the presence of noise satisfying a relaxed sub-Gaussian condition. We show that the resulting method, dubbed $\mathtt{GT-DSGD}$, achieves order-optimal HP convergence rates for both non-convex and Polyak-Łojasiewicz costs, of order $\mathcal{O}\Big(\frac{\log(1/δ)}{\sqrt{nT}}\Big)$ and $\mathcal{O}\Big(\frac{\log(1/δ)}{nT}\Big)$, respectively, where $n$ is the number of agents, $T$ is the time horizon and $δ\in (0,1)$ is the confidence parameter. Our results establish that $\mathtt{GT-DSGD}$ converges in the HP sense under the same conditions on the cost as in the MSE sense, while achieving comparable transient times. To the best of our knowledge, these are the first HP guarantees for decentralized optimization methods incorporating bias-correction. Numerical experiments on real and synthetic data verify our theoretical findings, underlining the superior performance of $\mathtt{GT-DSGD}$ and highlighting that the benefits of incorporating bias-correction are also maintained in the HP sense.

preprint2023arXiv

Enforcing Privacy in Distributed Learning with Performance Guarantees

We study the privatization of distributed learning and optimization strategies. We focus on differential privacy schemes and study their effect on performance. We show that the popular additive random perturbation scheme degrades performance because it is not well-tuned to the graph structure. For this reason, we exploit two alternative graph-homomorphic constructions and show that they improve performance while guaranteeing privacy. Moreover, contrary to most earlier studies, the gradient of the risks is not assumed to be bounded (a condition that rarely holds in practice; e.g., quadratic risk). We avoid this condition and still devise a differentially private scheme with high probability. We examine optimization and learning scenarios and illustrate the theoretical findings through simulations.

preprint2023arXiv

Privatized Graph Federated Learning

Federated learning is a semi-distributed algorithm, where a server communicates with multiple dispersed clients to learn a global model. The federated architecture is not robust and is sensitive to communication and computational overloads due to its one-master multi-client structure. It can also be subject to privacy attacks targeting personal information on the communication links. In this work, we introduce graph federated learning (GFL), which consists of multiple federated units connected by a graph. We then show how graph homomorphic perturbations can be used to ensure the algorithm is differentially private. We conduct both convergence and privacy theoretical analyses and illustrate performance by means of computer simulations.

preprint2022arXiv

A Fundamental Limit of Distributed Hypothesis Testing Under Memoryless Quantization

We study a distributed hypothesis testing setup where peripheral nodes send quantized data to the fusion center in a memoryless fashion. The \emph{expected} number of bits sent by each node under the null hypothesis is kept limited. We characterize the optimal decay rate of the mis-detection (type-II error) probability provided that false alarms (type-I error) are rare, and study the tradeoff between the communication rate and maximal type-II error decay rate. We resort to rate-distortion methods to provide upper bounds to the tradeoff curve and show that at high rates lattice quantization achieves near-optimal performance. We also characterize the tradeoff for the case where nodes are allowed to record and quantize a fixed number of samples. Moreover, under sum-rate constraints, we show that an upper bound to the tradeoff curve is obtained with a water-filling solution.

preprint2022arXiv

Dencentralized learning in the presence of low-rank noise

Observations collected by agents in a network may be unreliable due to observation noise or interference. This paper proposes a distributed algorithm that allows each node to improve the reliability of its own observation by relying solely on local computations and interactions with immediate neighbors, assuming that the field (graph signal) monitored by the network lies in a low-dimensional subspace and that a low-rank noise is present in addition to the usual full-rank noise. While oblique projections can be used to project measurements onto a low-rank subspace along a direction that is oblique to the subspace, the resulting solution is not distributed. Starting from the centralized solution, we propose an algorithm that performs the oblique projection of the overall set of observations onto the signal subspace in an iterative and distributed manner. We then show how the oblique projection framework can be extended to handle distributed learning and adaptation problems over networks.

preprint2022arXiv

Hidden Markov Modeling over Graphs

This work proposes a multi-agent filtering algorithm over graphs for finite-state hidden Markov models (HMMs), which can be used for sequential state estimation or for tracking opinion formation over dynamic social networks. We show that the difference from the optimal centralized Bayesian solution is asymptotically bounded for geometrically ergodic transition models. Experiments illustrate the theoretical findings and in particular, demonstrate the superior performance of the proposed algorithm compared to a state-of-the-art social learning algorithm.

preprint2022arXiv

Social Learning under Randomized Collaborations

We study a social learning scheme where at every time instant, each agent chooses to receive information from one of its neighbors at random. We show that under this sparser communication scheme, the agents learn the truth eventually and the asymptotic convergence rate remains the same as the standard algorithms which use more communication resources. We also derive large deviation estimates of the log-belief ratios for a special case where each agent replaces its belief with that of the chosen neighbor.

preprint2020arXiv

A Multi-Agent Primal-Dual Strategy for Composite Optimization over Distributed Features

This work studies multi-agent sharing optimization problems with the objective function being the sum of smooth local functions plus a convex (possibly non-smooth) function coupling all agents. This scenario arises in many machine learning and engineering applications, such as regression over distributed features and resource allocation. We reformulate this problem into an equivalent saddle-point problem, which is amenable to decentralized solutions. We then propose a proximal primal-dual algorithm and establish its linear convergence to the optimal solution when the local functions are strongly-convex. To our knowledge, this is the first linearly convergent decentralized algorithm for multi-agent sharing problems with a general convex (possibly non-smooth) coupling function.

preprint2020arXiv

Adaptation in Online Social Learning

This work studies social learning under non-stationary conditions. Although designed for online inference, classic social learning algorithms perform poorly under drifting conditions. To mitigate this drawback, we propose the Adaptive Social Learning (ASL) strategy. This strategy leverages an adaptive Bayesian update, where the adaptation degree can be modulated by tuning a suitable step-size parameter. The learning performance of the ASL algorithm is examined by means of a steady-state analysis. It is shown that, under the regime of small step-sizes: i) consistent learning is possible; ii) an accurate prediction of the performance can be furnished in terms of a Gaussian approximation.

preprint2020arXiv

Affine Combination of Diffusion Strategies over Networks

Diffusion adaptation is a powerful strategy for distributed estimation and learning over networks. Motivated by the concept of combining adaptive filters, this work proposes a combination framework that aggregates the operation of multiple diffusion strategies for enhanced performance. By assigning a combination coefficient to each node, and using an adaptation mechanism to minimize the network error, we obtain a combined diffusion strategy that benefits from the best characteristics of all component strategies simultaneously in terms of excess-mean-square error (EMSE). Analyses of the universality are provided to show the superior performance of affine combination scheme and to characterize its behavior in the mean and mean-square sense. Simulation results are presented to demonstrate the effectiveness of the proposed strategies, as well as the accuracy of theoretical findings.

preprint2020arXiv

Decentralized Proximal Gradient Algorithms with Linear Convergence Rates

This work studies a class of non-smooth decentralized multi-agent optimization problems where the agents aim at minimizing a sum of local strongly-convex smooth components plus a common non-smooth term. We propose a general primal-dual algorithmic framework that unifies many existing state-of-the-art algorithms. We establish linear convergence of the proposed method to the exact solution in the presence of the non-smooth term. Moreover, for the more general class of problems with agent specific non-smooth terms, we show that linear convergence cannot be achieved (in the worst case) for the class of algorithms that uses the gradients and the proximal mappings of the smooth and non-smooth parts, respectively. We further provide a numerical counterexample that shows how some state-of-the-art algorithms fail to converge linearly for strongly-convex objectives and different local non-smooth terms.

preprint2020arXiv

Diffusion LMS with Communication Delays: Stability and Performance Analysis

We study the problem of distributed estimation over adaptive networks where communication delays exist between nodes. In particular, we investigate the diffusion Least-Mean- Square (LMS) strategy where delayed intermediate estimates (due to the communication channels) are employed during the combination step. One important question is: Do the delays affect the stability condition and performance? To answer this question, we conduct a detailed performance analysis in the mean and in the mean-square-error sense of the diffusion LMS with delayed estimates. Stability conditions, transient and steady-state mean-square-deviation (MSD) expressions are provided. One of the main findings is that diffusion LMS with delays can still converge under the same step-sizes condition of the diffusion LMS without delays. Finally, simulation results illustrate the theoretical findings.

preprint2020arXiv

Dynamic Federated Learning

Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. While many federated learning architectures process data in an online manner, and are hence adaptive by nature, most performance analyses assume static optimization problems and offer no guarantees in the presence of drifts in the problem solution or data characteristics. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm. The results clarify the trade-off between convergence and tracking performance.

preprint2020arXiv

Graph Learning over Partially Observed Diffusion Networks: Role of Degree Concentration

This work examines the problem of graph learning over a diffusion network when data can be collected from a limited portion of the network (partial observability). The main question is to establish technical guarantees of consistent recovery of the subgraph of probed network nodes, i) despite the presence of unobserved nodes; and ii) under different connectivity regimes, including the dense regime where the probed nodes are influenced by many connections coming from the unobserved ones. We ascertain that suitable estimators of the combination matrix (i.e., the matrix that quantifies the pairwise interaction between nodes) possess an identifiability gap that enables the discrimination between connected and disconnected nodes. Fundamental conditions are established under which the subgraph of monitored nodes can be recovered, with high probability as the network size increases, through universal clustering algorithms. This claim is proved for three matrix estimators: i) the Granger estimator that adapts to the partial observability setting the solution that is exact under full observability ; ii) the one-lag correlation matrix; and iii) the residual estimator based on the difference between two consecutive time samples. A detailed characterization of the asymptotic behavior of these estimators is established in terms of an error bias and of the identifiability gap, and a sample complexity analysis is performed to establish how the number of samples scales with the network size to achieve consistent learning. Comparison among the estimators is performed through illustrative examples that show how estimators that are not optimal in the full observability regime can outperform the Granger estimator in the partial observability regime. The analysis reveals that the fundamental property enabling consistent graph learning is the statistical concentration of node degrees.

preprint2020arXiv

Graph Learning Under Partial Observability

Many optimization, inference and learning tasks can be accomplished efficiently by means of decentralized processing algorithms where the network topology (i.e., the graph) plays a critical role in enabling the interactions among neighboring nodes. There is a large body of literature examining the effect of the graph structure on the performance of decentralized processing strategies. In this article, we examine the inverse problem and consider the reverse question: How much information does observing the behavior at the nodes of a graph convey about the underlying topology? For large-scale networks, the difficulty in addressing such inverse problems is compounded by the fact that usually only a limited fraction of the nodes can be probed, giving rise to a second important question: Despite the presence of unobserved nodes, can partial observations still be sufficient to discover the graph linking the probed nodes? The article surveys recent advances on this challenging learning problem and related questions.

preprint2020arXiv

ISL: A novel approach for deep exploration

In this article we explore an alternative approach to address deep exploration and we introduce the ISL algorithm, which is efficient at performing deep exploration. Similarly to maximum entropy RL, we derive the algorithm by augmenting the traditional RL objective with a novel regularization term. A distinctive feature of our approach is that, as opposed to other works that tackle the problem of deep exploration, in our derivation both the learning equations and the exploration-exploitation strategy are derived in tandem as the solution to a well-posed optimization problem whose minimization leads to the optimal value function. Empirically we show that our method exhibits state of the art performance on a range of challenging deep-exploration benchmarks.

preprint2020arXiv

Learning Graph Influence from Social Interactions

In social learning, agents form their opinions or beliefs about certain hypotheses by exchanging local information. This work considers the recent paradigm of weak graphs, where the network is partitioned into sending and receiving components, with the former having the possibility of exerting a domineering effect on the latter. Such graph structures are prevalent over social platforms. We will not be focusing on the direct social learning problem (which examines what agents learn), but rather on the dual or reverse learning problem (which examines how agents learned). Specifically, from observations of the stream of beliefs at certain agents, we would like to examine whether it is possible to learn the strength of the connections (influences) from sending components in the network to these receiving agents.

preprint2020arXiv

Linear Convergence of Primal-Dual Gradient Methods and their Performance in Distributed Optimization

In this work, we revisit a classical incremental implementation of the primal-descent dual-ascent gradient method used for the solution of equality constrained optimization problems. We provide a short proof that establishes the linear (exponential) convergence of the algorithm for smooth strongly-convex cost functions and study its relation to the non-incremental implementation. We also study the effect of the augmented Lagrangian penalty term on the performance of distributed optimization algorithms for the minimization of aggregate cost functions over multi-agent networks.

preprint2020arXiv

Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization

Rapid advances in data collection and processing capabilities have allowed for the use of increasingly complex models that give rise to nonconvex optimization problems. These formulations, however, can be arbitrarily difficult to solve in general, in the sense that even simply verifying that a given point is a local minimum can be NP-hard [1]. Still, some relatively simple algorithms have been shown to lead to surprisingly good empirical results in many contexts of interest. Perhaps the most prominent example is the success of the backpropagation algorithm for training neural networks. Several recent works have pursued rigorous analytical justification for this phenomenon by studying the structure of the nonconvex optimization problems and establishing that simple algorithms, such as gradient descent and its variations, perform well in converging towards local minima and avoiding saddle-points. A key insight in these analyses is that gradient perturbations play a critical role in allowing local descent algorithms to efficiently distinguish desirable from undesirable stationary points and escape from the latter. In this article, we cover recent results on second-order guarantees for stochastic first-order optimization algorithms in centralized, federated, and decentralized architectures.

preprint2020arXiv

Supervised Learning Under Distributed Features

This work studies the problem of learning under both large datasets and large-dimensional feature space scenarios. The feature information is assumed to be spread across agents in a network, where each agent observes some of the features. Through local cooperation, the agents are supposed to interact with each other to solve an inference problem and converge towards the global minimizer of an empirical risk. We study this problem exclusively in the primal domain, and propose new and effective distributed solutions with guaranteed convergence to the minimizer with linear rate under strong convexity. This is achieved by combining a dynamic diffusion construction, a pipeline strategy, and variance-reduced techniques. Simulation results illustrate the conclusions.

preprint2019arXiv

Adaptation and learning over networks under subspace constraints -- Part I: Stability Analysis

This paper considers optimization problems over networks where agents have individual objectives to meet, or individual parameter vectors to estimate, subject to subspace constraints that require the objectives across the network to lie in low-dimensional subspaces. This constrained formulation includes consensus optimization as a special case, and allows for more general task relatedness models such as smoothness. While such formulations can be solved via projected gradient descent, the resulting algorithm is not distributed. Starting from the centralized solution, we propose an iterative and distributed implementation of the projection step, which runs in parallel with the stochastic gradient descent update. We establish in this Part I of the work that, for small step-sizes $μ$, the proposed distributed adaptive strategy leads to small estimation errors on the order of $μ$. We examine in the accompanying Part II [2] the steady-state performance. The results will reveal explicitly the influence of the gradient noise, data characteristics, and subspace constraints, on the network performance. The results will also show that in the small step-size regime, the iterates generated by the distributed algorithm achieve the centralized steady-state performance.

preprint2016arXiv

Diffusion Estimation Over Cooperative Multi-Agent Networks With Missing Data

In many fields, and especially in the medical and social sciences and in recommender systems, data are gathered through clinical studies or targeted surveys. Participants are generally reluctant to respond to all questions in a survey or they may lack information to respond adequately to some questions. The data collected from these studies tend to lead to linear regression models where the regression vectors are only known partially: some of their entries are either missing completely or replaced randomly by noisy values. In this work, assuming missing positions are replaced by noisy values, we examine how a connected network of agents, with each one of them subjected to a stream of data with incomplete regression information, can cooperate with each other through local interactions to estimate the underlying model parameters in the presence of missing data. We explain how to adjust the distributed diffusion through (de)regularization in order to eliminate the bias introduced by the incomplete model. We also propose a technique to recursively estimate the (de)regularization parameter and examine the performance of the resulting strategy. We illustrate the results by considering two applications: one dealing with a mental health survey and the other dealing with a household consumption survey.

preprint2016arXiv

Diffusion-Based Adaptive Distributed Detection: Steady-State Performance in the Slow Adaptation Regime

This work examines the close interplay between cooperation and adaptation for distributed detection schemes over fully decentralized networks. The combined attributes of cooperation and adaptation are necessary to enable networks of detectors to continually learn from streaming data and to continually track drifts in the state of nature when deciding in favor of one hypothesis or another. The results in the paper establish a fundamental scaling law for the steady-state probabilities of miss-detection and false-alarm in the slow adaptation regime, when the agents interact with each other according to distributed strategies that employ small constant step-sizes. The latter are critical to enable continuous adaptation and learning. The work establishes three key results. First, it is shown that the output of the collaborative process at each agent has a steady-state distribution. Second, it is shown that this distribution is asymptotically Gaussian in the slow adaptation regime of small step-sizes. And third, by carrying out a detailed large deviations analysis, closed-form expressions are derived for the decaying rates of the false-alarm and miss-detection probabilities. Interesting insights are gained. In particular, it is verified that as the step-size $μ$ decreases, the error probabilities are driven to zero exponentially fast as functions of $1/μ$, and that the error exponents increase linearly in the number of agents. It is also verified that the scaling laws governing errors of detection and errors of estimation over networks behave very differently, with the former having an exponential decay proportional to $1/μ$, while the latter scales linearly with decay proportional to $μ$. It is shown that the cooperative strategy allows each agent to reach the same detection performance, in terms of detection error exponents, of a centralized stochastic-gradient solution.

preprint2016arXiv

Distributed Detection over Adaptive Networks: Refined Asymptotics and the Role of Connectivity

We consider distributed detection problems over adaptive networks, where dispersed agents learn continually from streaming data by means of local interactions. The simultaneous requirements of adaptation and cooperation are achieved by employing diffusion algorithms with constant step-size μ. In [1], [2] some main features of adaptive distributed detection were revealed. By resorting to large deviations analysis, it was established that the Type-I and Type-II error probabilities of all agents vanish exponentially as functions of 1/μ, and that all agents share the same Type-I and Type-II error exponents. However, numerical evidences presented in [1], [2] showed that the theory of large deviations does not capture the fundamental impact of network connectivity on performance, and that additional tools and efforts are required to obtain accurate predictions for the error probabilities. This work addresses these open issues and extends the results of [1], [2] in several directions. By conducting a refined asymptotic analysis based on the mathematical framework of exact asymptotics, we arrive at a revealing and powerful understanding of the universal behavior of distributed detection over adaptive networks: as functions of 1/μ, the error (log-)probability curves corresponding to different agents stay nearly-parallel to each other (as already discovered in [1], [2]), however, these curves are ordered following a criterion reflecting the degree of connectivity of each agent. Depending on the combination weights, the more connected an agent is, the lower its error probability curve will be. Interesting and somehow unexpected behaviors emerge, in terms of the interplay between the network topology, the combination weights, and the inference performance. The lesson learned is that connectivity matters.

preprint2016arXiv

Excess-Risk of Distributed Stochastic Learners

This work studies the learning ability of consensus and diffusion distributed learners from continuous streams of data arising from different but related statistical distributions. Four distinctive features for diffusion learners are revealed in relation to other decentralized schemes even under left-stochastic combination policies. First, closed-form expressions for the evolution of their excess-risk are derived for strongly-convex risk functions under a diminishing step-size rule. Second, using these results, it is shown that the diffusion strategy improves the asymptotic convergence rate of the excess-risk relative to non-cooperative schemes. Third, it is shown that when the in-network cooperation rules are designed optimally, the performance of the diffusion implementation can outperform that of naive centralized processing. Finally, the arguments further show that diffusion outperforms consensus strategies asymptotically, and that the asymptotic excess-risk expression is invariant to the particular network topology. The framework adopted in this work studies convergence in the stronger mean-square-error sense, rather than in distribution, and develops tools that enable a close examination of the differences between distributed strategies in terms of asymptotic behavior, as well as in terms of convergence rates.

preprint2016arXiv

Multitask diffusion adaptation over asynchronous networks

The multitask diffusion LMS is an efficient strategy to simultaneously infer, in a collaborative manner, multiple parameter vectors. Existing works on multitask problems assume that all agents respond to data synchronously. In several applications, agents may not be able to act synchronously because networks can be subject to several sources of uncertainties such as changing topology, random link failures, or agents turning on and off for energy conservation. In this work, we describe a model for the solution of multitask problems over asynchronous networks and carry out a detailed mean and mean-square error analysis. Results show that sufficiently small step-sizes can still ensure both stability and performance. Simulations and illustrative examples are provided to verify the theoretical findings. The framework is applied to a particular application involving spectral sensing.

preprint2016arXiv

On the Influence of Momentum Acceleration on Online Learning

The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The size of the re-scaling is determined by the value of the momentum parameter. The equivalence result is established for all time instants and not only in steady-state. The analysis is carried out for general strongly convex and smooth risk functions, and is not limited to quadratic risks. One notable conclusion is that the well-known bene ts of momentum constructions for deterministic optimization problems do not necessarily carry over to the adaptive online setting when small constant step-sizes are used to enable continuous adaptation and learn- ing in the presence of persistent gradient noise. From simulations, the equivalence between momentum and standard stochastic gradient methods is also observed for non-differentiable and non-convex problems.

preprint2016arXiv

Online Dual Coordinate Ascent Learning

The stochastic dual coordinate-ascent (S-DCA) technique is a useful alternative to the traditional stochastic gradient-descent algorithm for solving large-scale optimization problems due to its scalability to large data sets and strong theoretical guarantees. However, the available S-DCA formulation is limited to finite sample sizes and relies on performing multiple passes over the same data. This formulation is not well-suited for online implementations where data keep streaming in. In this work, we develop an {\em online} dual coordinate-ascent (O-DCA) algorithm that is able to respond to streaming data and does not need to revisit the past data. This feature embeds the resulting construction with continuous adaptation, learning, and tracking abilities, which are particularly attractive for online learning scenarios.

preprint2016arXiv

Proximal Multitask Learning over Networks with Sparsity-inducing Coregularization

In this work, we consider multitask learning problems where clusters of nodes are interested in estimating their own parameter vector. Cooperation among clusters is beneficial when the optimal models of adjacent clusters have a good number of similar entries. We propose a fully distributed algorithm for solving this problem. The approach relies on minimizing a global mean-square error criterion regularized by non-differentiable terms to promote cooperation among neighboring clusters. A general diffusion forward-backward splitting strategy is introduced. Then, it is specialized to the case of sparsity promoting regularizers. A closed-form expression for the proximal operator of a weighted sum of $\ell_1$-norms is derived to achieve higher efficiency. We also provide conditions on the step-sizes that ensure convergence of the algorithm in the mean and mean-square error sense. Simulations are conducted to illustrate the effectiveness of the strategy.

preprint2015arXiv

Diffusion Adaptation over Multi-Agent Networks with Wireless Link Impairments

We study the performance of diffusion least-mean-square algorithms for distributed parameter estimation in multi-agent networks when nodes exchange information over wireless communication links. Wireless channel impairments, such as fading and path-loss, adversely affect the exchanged data and cause instability and performance degradation if left unattended. To mitigate these effects, we incorporate equalization coefficients into the diffusion combination step and update the combination weights dynamically in the face of randomly changing neighborhoods due to fading conditions. When channel state information (CSI) is unavailable, we determine the equalization factors from pilot-aided channel coefficient estimates. The analysis reveals that by properly monitoring the CSI over the network and choosing sufficiently small adaptation step-sizes, the diffusion strategies are able to deliver satisfactory performance in the presence of fading and path loss.

preprint2015arXiv

Diffusion LMS over Multitask Networks

The diffusion LMS algorithm has been extensively studied in recent years. This efficient strategy allows to address distributed optimization problems over networks in the case where nodes have to collaboratively estimate a single parameter vector. Problems of this type are referred to as single-task problems. Nevertheless, there are several problems in practice that are multitask-oriented in the sense that the optimum parameter vector may not be the same for every node. This brings up the issue of studying the performance of the diffusion LMS algorithm when it is run, either intentionally or unintentionally, in a multitask environment. In this paper, we conduct a theoretical analysis on the stochastic behavior of diffusion LMS in the case where the so-called single-task hypothesis is violated. We explain under what conditions diffusion LMS continues to deliver performance superior to non-cooperative strategies in the multitask environment. When the conditions are violated, we explain how to endow the nodes with the ability to cluster with other similar nodes to remove bias. We propose an unsupervised clustering strategy that allows each node to select, via adaptive adjustments of combination weights, the neighboring nodes with which it can collaborate to estimate a common parameter vector. Simulations are presented to illustrate the theoretical results, and to demonstrate the efficiency of the proposed clustering strategy. The framework is applied to a useful problem involving a multi-target tracking task.

preprint2015arXiv

Estimation of Space-Time Varying Parameters Using a Diffusion LMS Algorithm

We study the problem of distributed adaptive estimation over networks where nodes cooperate to estimate physical parameters that can vary over both space and time domains. We use a set of basis functions to characterize the space-varying nature of the parameters and propose a diffusion least mean-squares (LMS) strategy to recover these parameters from successive time measurements. We analyze the stability and convergence of the proposed algorithm, and derive closed-form expressions to predict its learning behavior and steady-state performance in terms of mean-square error. We find that in the estimation of the space-varying parameters using distributed approaches, the covariance matrix of the regression data at each node becomes rank-deficient. Our analysis reveals that the proposed algorithm can overcome this difficulty to a large extent by benefiting from the network stochastic matrices that are used to combine exchanged information between nodes. We provide computer experiments to illustrate and support the theoretical findings.

preprint2015arXiv

Information Exchange and Learning Dynamics over Weakly-Connected Adaptive Networks

The paper examines the learning mechanism of adaptive agents over weakly-connected graphs and reveals an interesting behavior on how information flows through such topologies. The results clarify how asymmetries in the exchange of data can mask local information at certain agents and make them totally dependent on other agents. A leader-follower relationship develops with the performance of some agents being fully determined by the performance of other agents that are outside their domain of influence. This scenario can arise, for example, due to intruder attacks by malicious agents or as the result of failures by some critical links. The findings in this work help explain why strong-connectivity of the network topology, adaptation of the combination weights, and clustering of agents are important ingredients to equalize the learning abilities of all agents against such disturbances. The results also clarify how weak-connectivity can be helpful in reducing the effect of outlier data on learning performance.

preprint2015arXiv

Information-Sharing over Adaptive Networks with Self-interested Agents

We examine the behavior of multi-agent networks where information-sharing is subject to a positive communications cost over the edges linking the agents. We consider a general mean-square-error formulation where all agents are interested in estimating the same target vector. We first show that, in the absence of any incentives to cooperate, the optimal strategy for the agents is to behave in a selfish manner with each agent seeking the optimal solution independently of the other agents. Pareto inefficiency arises as a result of the fact that agents are not using historical data to predict the behavior of their neighbors and to know whether they will reciprocate and participate in sharing information. Motivated by this observation, we develop a reputation protocol to summarize the opponent's past actions into a reputation score, which can then be used to form a belief about the opponent's subsequent actions. The reputation protocol entices agents to cooperate and turns their optimal strategy into an action-choosing strategy that enhances the overall social benefit of the network. In particular, we show that when the communications cost becomes large, the expected social benefit of the proposed protocol outperforms the social benefit that is obtained by cooperative agents that always share data. We perform a detailed mean-square-error analysis of the evolution of the network over three domains: far field, near-field, and middle-field, and show that the network behavior is stable for sufficiently small step-sizes. The various theoretical results are illustrated by numerical simulations.

preprint2015arXiv

On the Learning Behavior of Adaptive Networks - Part I: Transient Analysis

This work carries out a detailed transient analysis of the learning behavior of multi-agent networks, and reveals interesting results about the learning abilities of distributed strategies. Among other results, the analysis reveals how combination policies influence the learning process of networked agents, and how these policies can steer the convergence point towards any of many possible Pareto optimal solutions. The results also establish that the learning process of an adaptive network undergoes three (rather than two) well-defined stages of evolution with distinctive convergence rates during the first two stages, while attaining a finite mean-square-error (MSE) level in the last stage. The analysis reveals what aspects of the network topology influence performance directly and suggests design procedures that can optimize performance by adjusting the relevant topology parameters. Interestingly, it is further shown that, in the adaptation regime, each agent in a sparsely connected network is able to achieve the same performance level as that of a centralized stochastic-gradient strategy even for left-stochastic combination strategies. These results lead to a deeper understanding and useful insights on the convergence behavior of coupled distributed learners. The results also lead to effective design mechanisms to help diffuse information more thoroughly over networks.

preprint2015arXiv

On the Learning Behavior of Adaptive Networks - Part II: Performance Analysis

Part I of this work examined the mean-square stability and convergence of the learning process of distributed strategies over graphs. The results identified conditions on the network topology, utilities, and data in order to ensure stability; the results also identified three distinct stages in the learning behavior of multi-agent networks related to transient phases I and II and the steady-state phase. This Part II examines the steady-state phase of distributed learning by networked agents. Apart from characterizing the performance of the individual agents, it is shown that the network induces a useful equalization effect across all agents. In this way, the performance of noisier agents is enhanced to the same level as the performance of agents with less noisy data. It is further shown that in the small step-size regime, each agent in the network is able to achieve the same performance level as that of a centralized strategy corresponding to a fully connected network. The results in this part reveal explicitly which aspects of the network topology and operation influence performance and provide important insights into the design of effective mechanisms for the processing and diffusion of information over networks.

preprint2015arXiv

Stability and Performance Limits of Adaptive Primal-Dual Networks

This work studies distributed primal-dual strategies for adaptation and learning over networks from streaming data. Two first-order methods are considered based on the Arrow-Hurwicz (AH) and augmented Lagrangian (AL) techniques. Several revealing results are discovered in relation to the performance and stability of these strategies when employed over adaptive networks. The conclusions establish that the advantages that these methods have for deterministic optimization problems do not necessarily carry over to stochastic optimization problems. It is found that they have narrower stability ranges and worse steady-state mean-square-error performance than primal methods of the consensus and diffusion type. It is also found that the AH technique can become unstable under a partial observation model, while the other techniques are able to recover the unknown under this scenario. A method to enhance the performance of AL strategies is proposed by tying the selection of the step-size to their regularization parameter. It is shown that this method allows the AL algorithm to approach the performance of consensus and diffusion strategies but that it remains less stable than these other strategies.

preprint2014arXiv

Asynchronous Adaptation and Learning over Networks - Part II: Performance Analysis

In Part I \cite{Zhao13TSPasync1}, we introduced a fairly general model for asynchronous events over adaptive networks including random topologies, random link failures, random data arrival times, and agents turning on and off randomly. We performed a stability analysis and established the notable fact that the network is still able to converge in the mean-square-error sense to the desired solution. Once stable behavior is guaranteed, it becomes important to evaluate how fast the iterates converge and how close they get to the optimal solution. This is a demanding task due to the various asynchronous events and due to the fact that agents influence each other. In this Part II, we carry out a detailed analysis of the mean-square-error performance of asynchronous strategies for solving distributed optimization and adaptation problems over networks. We derive analytical expressions for the mean-square convergence rate and the steady-state mean-square-deviation. The expressions reveal how the various parameters of the asynchronous behavior influence network performance. In the process, we establish the interesting conclusion that even under the influence of asynchronous events, all agents in the adaptive network can still reach an $O(ν^{1 + γ_o'})$ near-agreement with some $γ_o' > 0$ while approaching the desired solution within $O(ν)$ accuracy, where $ν$ is proportional to the small step-size parameter for adaptation.

preprint2014arXiv

Asynchronous Adaptation and Learning over Networks - Part III: Comparison Analysis

In Part II [3] we carried out a detailed mean-square-error analysis of the performance of asynchronous adaptation and learning over networks under a fairly general model for asynchronous events including random topologies, random link failures, random data arrival times, and agents turning on and off randomly. In this Part III, we compare the performance of synchronous and asynchronous networks. We also compare the performance of decentralized adaptation against centralized stochastic-gradient (batch) solutions. Two interesting conclusions stand out. First, the results establish that the performance of adaptive networks is largely immune to the effect of asynchronous events: the mean and mean-square convergence rates and the asymptotic bias values are not degraded relative to synchronous or centralized implementations. Only the steady-state mean-square-deviation suffers a degradation in the order of $ν$, which represents the small step-size parameters used for adaptation. Second, the results show that the adaptive distributed network matches the performance of the centralized solution. These conclusions highlight another critical benefit of cooperation by networked agents: cooperation does not only enhance performance in comparison to stand-alone single-agent processing, but it also endows the network with remarkable resilience to various forms of random failure events and is able to deliver performance that is as powerful as batch solutions.

preprint2014arXiv

Asynchronous Adaptation and Learning over Networks --- Part I: Modeling and Stability Analysis

In this work and the supporting Parts II [2] and III [3], we provide a rather detailed analysis of the stability and performance of asynchronous strategies for solving distributed optimization and adaptation problems over networks. We examine asynchronous networks that are subject to fairly general sources of uncertainties, such as changing topologies, random link failures, random data arrival times, and agents turning on and off randomly. Under this model, agents in the network may stop updating their solutions or may stop sending or receiving information in a random manner and without coordination with other agents. We establish in Part I conditions on the first and second-order moments of the relevant parameter distributions to ensure mean-square stable behavior. We derive in Part II expressions that reveal how the various parameters of the asynchronous behavior influence network performance. We compare in Part III the performance of asynchronous networks to the performance of both centralized solutions and synchronous networks. One notable conclusion is that the mean-square-error performance of asynchronous networks shows a degradation only of the order of $O(ν)$, where $ν$ is a small step-size parameter, while the convergence rate remains largely unaltered. The results provide a solid justification for the remarkable resilience of cooperative networks in the face of random failures at multiple levels: agents, links, data arrivals, and topology.

preprint2014arXiv

Dictionary Learning over Distributed Models

In this paper, we consider learning dictionary models over a network of agents, where each agent is only in charge of a portion of the dictionary elements. This formulation is relevant in Big Data scenarios where large dictionary models may be spread over different spatial locations and it is not feasible to aggregate all dictionaries in one location due to communication and privacy considerations. We first show that the dual function of the inference problem is an aggregation of individual cost functions associated with different agents, which can then be minimized efficiently by means of diffusion strategies. The collaborative inference step generates dual variables that are used by the agents to update their dictionaries without the need to share these dictionaries or even the coefficient models for the training data. This is a powerful property that leads to an effective distributed procedure for learning dictionaries over large networks (e.g., hundreds of agents in our experiments). Furthermore, the proposed learning strategy operates in an online manner and is able to respond to streaming data, where each data sample is presented to the network once.

preprint2014arXiv

Distributed Policy Evaluation Under Multiple Behavior Strategies

We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The algorithm can also be applied to off-policy learning, meaning that the agents can predict the response to a behavior different from the actual policies they are following. The proposed distributed strategy is efficient, with linear complexity in both computation time and memory footprint. We provide a mean-square-error performance analysis and establish convergence under constant step-size updates, which endow the network with continuous learning capabilities. The results show a clear gain from cooperation: when the individual agents can estimate the solution, cooperation increases stability and reduces bias and variance of the prediction error; but, more importantly, the network is able to approach the optimal solution even when none of the individual agents can (e.g., when the individual behavior policies restrict each agent to sample a small portion of the state space).

preprint2013arXiv

Adaptive Penalty-Based Distributed Stochastic Convex Optimization

In this work, we study the task of distributed optimization over a network of learners in which each learner possesses a convex cost function, a set of affine equality constraints, and a set of convex inequality constraints. We propose a fully-distributed adaptive diffusion algorithm based on penalty methods that allows the network to cooperatively optimize the global cost function, which is defined as the sum of the individual costs over the network, subject to all constraints. We show that when small constant step-sizes are employed, the expected distance between the optimal solution vector and that obtained at each node in the network can be made arbitrarily small. Two distinguishing features of the proposed solution relative to other related approaches is that the developed strategy does not require the use of projections and is able to adapt to and track drifts in the location of the minimizer due to changes in the constraints or in the aggregate cost itself. The proposed strategy is also able to cope with changing network topology, is robust to network disruptions, and does not require global information or rely on central processors.

preprint2013arXiv

Diffusion Adaptation over Networks

Adaptive networks are well-suited to perform decentralized information processing and optimization tasks and to model various types of self-organized and complex behavior encountered in nature. Adaptive networks consist of a collection of agents with processing and learning abilities. The agents are linked together through a connection topology, and they cooperate with each other through local interactions to solve distributed optimization, estimation, and inference problems in real-time. The continuous diffusion of information across the network enables agents to adapt their performance in relation to streaming data and network conditions; it also results in improved adaptation and learning performance relative to non-cooperative agents. This article provides an overview of diffusion strategies for adaptation and learning over networks. The article is divided into several sections: 1. Motivation; 2. Mean-Square-Error Estimation; 3. Distributed Optimization via Diffusion Strategies; 4. Adaptive Diffusion Strategies; 5. Performance of Steepest-Descent Diffusion Strategies; 6. Performance of Adaptive Diffusion Strategies; 7. Comparing the Performance of Cooperative Strategies; 8. Selecting the Combination Weights; 9. Diffusion with Noisy Information Exchanges; 10. Extensions and Further Considerations; Appendix A: Properties of Kronecker Products; Appendix B: Graph Laplacian and Network Connectivity; Appendix C: Stochastic Matrices; Appendix D: Block Maximum Norm; Appendix E: Comparison with Consensus Strategies; References.

preprint2013arXiv

Distributed Decision-Making over Adaptive Networks

In distributed processing, agents generally collect data generated by the same underlying unknown model (represented by a vector of parameters) and then solve an estimation or inference task cooperatively. In this paper, we consider the situation in which the data observed by the agents may have risen from two different models. Agents do not know beforehand which model accounts for their data and the data of their neighbors. The objective for the network is for all agents to reach agreement on which model to track and to estimate this model cooperatively. In these situations, where agents are subject to data from unknown different sources, conventional distributed estimation strategies would lead to biased estimates relative to any of the underlying models. We first show how to modify existing strategies to guarantee unbiasedness. We then develop a classification scheme for the agents to identify the models that generated the data, and propose a procedure by which the entire network can be made to converge towards the same model through a collaborative decision-making process. The resulting algorithm is applied to model fish foraging behavior in the presence of two food sources.

preprint2013arXiv

On Distributed Online Classification in the Midst of Concept Drifts

In this work, we analyze the generalization ability of distributed online learning algorithms under stationary and non-stationary environments. We derive bounds for the excess-risk attained by each node in a connected network of learners and study the performance advantage that diffusion strategies have over individual non-cooperative processing. We conduct extensive simulations to illustrate the results.

preprint2012arXiv

Diffusion Adaptation over Networks under Imperfect Information Exchange and Non-stationary Data

Adaptive networks rely on in-network and collaborative processing among distributed agents to deliver enhanced performance in estimation and inference tasks. Information is exchanged among the nodes, usually over noisy links. The combination weights that are used by the nodes to fuse information from their neighbors play a critical role in influencing the adaptation and tracking abilities of the network. This paper first investigates the mean-square performance of general adaptive diffusion algorithms in the presence of various sources of imperfect information exchanges, quantization errors, and model non-stationarities. Among other results, the analysis reveals that link noise over the regression data modifies the dynamics of the network evolution in a distinct way, and leads to biased estimates in steady-state. The analysis also reveals how the network mean-square performance is dependent on the combination weights. We use these observations to show how the combination weights can be optimized and adapted. Simulation results illustrate the theoretical findings and match well with theory.

preprint2012arXiv

Diffusion Adaptation Strategies for Distributed Optimization and Learning over Networks

We propose an adaptive diffusion mechanism to optimize a global cost function in a distributed manner over a network of nodes. The cost function is assumed to consist of a collection of individual components. Diffusion adaptation allows the nodes to cooperate and diffuse information in real-time; it also helps alleviate the effects of stochastic gradient noise and measurement noise through a continuous learning process. We analyze the mean-square-error performance of the algorithm in some detail, including its transient and steady-state behavior. We also apply the diffusion algorithm to two problems: distributed estimation with sparse parameters and distributed localization. Compared to well-studied incremental methods, diffusion methods do not require the use of a cyclic path over the nodes and are robust to node and link failure. Diffusion methods also endow networks with adaptation abilities that enable the individual nodes to continue learning even when the cost function changes with time. Examples involving such dynamic cost functions with moving targets are common in the context of biological networks.

preprint2012arXiv

Diffusion Strategies Outperform Consensus Strategies for Distributed Estimation over Adaptive Networks

Adaptive networks consist of a collection of nodes with adaptation and learning abilities. The nodes interact with each other on a local level and diffuse information across the network to solve estimation and inference tasks in a distributed manner. In this work, we compare the mean-square performance of two main strategies for distributed estimation over networks: consensus strategies and diffusion strategies. The analysis in the paper confirms that under constant step-sizes, diffusion strategies allow information to diffuse more thoroughly through the network and this property has a favorable effect on the evolution of the network: diffusion networks are shown to converge faster and reach lower mean-square deviation than consensus networks, and their mean-square stability is insensitive to the choice of the combination weights. In contrast, and surprisingly, it is shown that consensus networks can become unstable even if all the individual nodes are stable and able to solve the estimation task on their own. When this occurs, cooperation over the network leads to a catastrophic failure of the estimation task. This phenomenon does not occur for diffusion networks: we show that stability of the individual nodes always ensures stability of the diffusion network irrespective of the combination topology. Simulation results support the theoretical findings.

preprint2012arXiv

Distributed Pareto Optimization via Diffusion Strategies

We consider solving multi-objective optimization problems in a distributed manner by a network of cooperating and learning agents. The problem is equivalent to optimizing a global cost that is the sum of individual components. The optimizers of the individual components do not necessarily coincide and the network therefore needs to seek Pareto optimal solutions. We develop a distributed solution that relies on a general class of adaptive diffusion strategies. We show how the diffusion process can be represented as the cascade composition of three operators: two combination operators and a gradient descent operator. Using the Banach fixed-point theorem, we establish the existence of a unique fixed point for the composite cascade. We then study how close each agent converges towards this fixed point, and also examine how close the Pareto solution is to the fixed point. We perform a detailed mean-square error analysis and establish that all agents are able to converge to the same Pareto optimal solution within a sufficiently small mean-square-error (MSE) bound even for constant step-sizes. We illustrate one application of the theory to collaborative decision making in finance by a network of agents.

preprint2012arXiv

On the Influence of Informed Agents on Learning and Adaptation over Networks

Adaptive networks consist of a collection of agents with adaptation and learning abilities. The agents interact with each other on a local level and diffuse information across the network through their collaborations. In this work, we consider two types of agents: informed agents and uninformed agents. The former receive new data regularly and perform consultation and in-network tasks, while the latter do not collect data and only participate in the consultation tasks. We examine the performance of adaptive networks as a function of the proportion of informed agents and their distribution in space. The results reveal some interesting and surprising trade-offs between convergence rate and mean-square performance. In particular, among other results, it is shown that the performance of adaptive networks does not necessarily improve with a larger proportion of informed agents. Instead, it is established that the larger the proportion of informed agents is, the faster the convergence rate of the network becomes albeit at the expense of some deterioration in mean-square performance. The results further establish that uninformed agents play an important role in determining the steady-state performance of the network, and that it is preferable to keep some of the highly connected agents uninformed. The arguments reveal an important interplay among three factors: the number and distribution of informed agents in the network, the convergence rate of the learning process, and the estimation accuracy in steady-state. Expressions that quantify these relations are derived, and simulations are included to support the theoretical findings. We further apply the results to two models that are widely used to represent behavior over complex networks, namely, the Erdos-Renyi and scale-free models.

preprint2012arXiv

Performance Limits for Distributed Estimation Over LMS Adaptive Networks

In this work we analyze the mean-square performance of different strategies for distributed estimation over least-mean-squares (LMS) adaptive networks. The results highlight some useful properties for distributed adaptation in comparison to fusion-based centralized solutions. The analysis establishes that, by optimizing over the combination weights, diffusion strategies can deliver lower excess-mean-square-error than centralized solutions employing traditional block or incremental LMS strategies. We first study in some detail the situation involving combinations of two adaptive agents and then extend the results to generic N-node ad-hoc networks. In the later case, we establish that, for sufficiently small step-sizes, diffusion strategies can outperform centralized block or incremental LMS strategies by optimizing over left-stochastic combination weighting matrices. The results suggest more efficient ways for organizing and processing data at fusion centers, and present useful adaptive strategies that are able to enhance performance when implemented in a distributed manner.

preprint2012arXiv

Sparse Distributed Learning Based on Diffusion Adaptation

This article proposes diffusion LMS strategies for distributed estimation over adaptive networks that are able to exploit sparsity in the underlying system model. The approach relies on convex regularization, common in compressive sensing, to enhance the detection of sparsity via a diffusive process over the network. The resulting algorithms endow networks with learning abilities and allow them to learn the sparse structure from the incoming data in real-time, and also to track variations in the sparsity of the model. We provide convergence and mean-square performance analysis of the proposed method and show under what conditions it outperforms the unregularized diffusion version. We also show how to adaptively select the regularization parameter. Simulation results illustrate the advantage of the proposed filters for sparse data recovery.

preprint2009arXiv

Spectrum sensing by cognitive radios at very low SNR

Spectrum sensing is one of the enabling functionalities for cognitive radio (CR) systems to operate in the spectrum white space. To protect the primary incumbent users from interference, the CR is required to detect incumbent signals at very low signal-to-noise ratio (SNR). In this paper, we present a spectrum sensing technique based on correlating spectra for detection of television (TV) broadcasting signals. The basic strategy is to correlate the periodogram of the received signal with the a priori known spectral features of the primary signal. We show that according to the Neyman-Pearson criterion, this spectral correlation-based sensing technique is asymptotically optimal at very low SNR and with a large sensing time. From the system design perspective, we analyze the effect of the spectral features on the spectrum sensing performance. Through the optimization analysis, we obtain useful insights on how to choose effective spectral features to achieve reliable sensing. Simulation results show that the proposed sensing technique can reliably detect analog and digital TV signals at SNR as low as -20 dB.

preprint2008arXiv

Wideband Spectrum Sensing in Cognitive Radio Networks

Spectrum sensing is an essential enabling functionality for cognitive radio networks to detect spectrum holes and opportunistically use the under-utilized frequency bands without causing harmful interference to legacy networks. This paper introduces a novel wideband spectrum sensing technique, called multiband joint detection, which jointly detects the signal energy levels over multiple frequency bands rather than consider one band at a time. The proposed strategy is efficient in improving the dynamic spectrum utilization and reducing interference to the primary users. The spectrum sensing problem is formulated as a class of optimization problems in interference limited cognitive radio networks. By exploiting the hidden convexity in the seemingly non-convex problem formulations, optimal solutions for multiband joint detection are obtained under practical conditions. Simulation results show that the proposed spectrum sensing schemes can considerably improve the system performance. This paper establishes important principles for the design of wideband spectrum sensing algorithms in cognitive radio networks.

Ali H. Sayed

What is connected

Connect this record

See the researcher in context

Building this map preview

55 published item(s)

High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking

Enforcing Privacy in Distributed Learning with Performance Guarantees

Privatized Graph Federated Learning

A Fundamental Limit of Distributed Hypothesis Testing Under Memoryless Quantization

Dencentralized learning in the presence of low-rank noise

Hidden Markov Modeling over Graphs

Social Learning under Randomized Collaborations

A Multi-Agent Primal-Dual Strategy for Composite Optimization over Distributed Features

Adaptation in Online Social Learning

Affine Combination of Diffusion Strategies over Networks

Decentralized Proximal Gradient Algorithms with Linear Convergence Rates

Diffusion LMS with Communication Delays: Stability and Performance Analysis

Dynamic Federated Learning

Graph Learning over Partially Observed Diffusion Networks: Role of Degree Concentration

Graph Learning Under Partial Observability

ISL: A novel approach for deep exploration

Learning Graph Influence from Social Interactions

Linear Convergence of Primal-Dual Gradient Methods and their Performance in Distributed Optimization

Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization

Supervised Learning Under Distributed Features

Adaptation and learning over networks under subspace constraints -- Part I: Stability Analysis

Diffusion Estimation Over Cooperative Multi-Agent Networks With Missing Data

Diffusion-Based Adaptive Distributed Detection: Steady-State Performance in the Slow Adaptation Regime

Distributed Detection over Adaptive Networks: Refined Asymptotics and the Role of Connectivity

Excess-Risk of Distributed Stochastic Learners

Multitask diffusion adaptation over asynchronous networks

On the Influence of Momentum Acceleration on Online Learning

Online Dual Coordinate Ascent Learning

Proximal Multitask Learning over Networks with Sparsity-inducing Coregularization

Diffusion Adaptation over Multi-Agent Networks with Wireless Link Impairments

Diffusion LMS over Multitask Networks

Estimation of Space-Time Varying Parameters Using a Diffusion LMS Algorithm

Information Exchange and Learning Dynamics over Weakly-Connected Adaptive Networks

Information-Sharing over Adaptive Networks with Self-interested Agents

On the Learning Behavior of Adaptive Networks - Part I: Transient Analysis

On the Learning Behavior of Adaptive Networks - Part II: Performance Analysis

Stability and Performance Limits of Adaptive Primal-Dual Networks

Asynchronous Adaptation and Learning over Networks - Part II: Performance Analysis

Asynchronous Adaptation and Learning over Networks - Part III: Comparison Analysis

Asynchronous Adaptation and Learning over Networks --- Part I: Modeling and Stability Analysis

Dictionary Learning over Distributed Models

Distributed Policy Evaluation Under Multiple Behavior Strategies

Adaptive Penalty-Based Distributed Stochastic Convex Optimization

Diffusion Adaptation over Networks

Distributed Decision-Making over Adaptive Networks

On Distributed Online Classification in the Midst of Concept Drifts

Diffusion Adaptation over Networks under Imperfect Information Exchange and Non-stationary Data

Diffusion Adaptation Strategies for Distributed Optimization and Learning over Networks

Diffusion Strategies Outperform Consensus Strategies for Distributed Estimation over Adaptive Networks

Distributed Pareto Optimization via Diffusion Strategies

On the Influence of Informed Agents on Learning and Adaptation over Networks

Performance Limits for Distributed Estimation Over LMS Adaptive Networks

Sparse Distributed Learning Based on Diffusion Adaptation

Spectrum sensing by cognitive radios at very low SNR

Wideband Spectrum Sensing in Cognitive Radio Networks