Researcher profile

Ali H. Sayed

Ali H. Sayed contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
21works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

21 published item(s)

preprint2026arXiv

High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking

We study high-probability (HP) convergence guarantees in decentralized stochastic optimization, where multiple agents collaborate to jointly train a model over a network. Existing HP results in decentralized settings almost exclusively focus on the Decentralized Stochastic Gradient Descent ($\mathtt{DSGD}$) algorithm, which requires strong assumptions, such as bounded data heterogeneity, or strong convexity of each agent's cost. This is contrary to the mean-squared error (MSE) results, where methods incorporating bias-correction techniques are known to converge under relaxed assumptions and achieve better practical performance. In this paper we provide the first step toward bridging the gap, by studying HP convergence of $\mathtt{DSGD}$ incorporating the gradient tracking technique, in the presence of noise satisfying a relaxed sub-Gaussian condition. We show that the resulting method, dubbed $\mathtt{GT-DSGD}$, achieves order-optimal HP convergence rates for both non-convex and Polyak-Łojasiewicz costs, of order $\mathcal{O}\Big(\frac{\log(1/δ)}{\sqrt{nT}}\Big)$ and $\mathcal{O}\Big(\frac{\log(1/δ)}{nT}\Big)$, respectively, where $n$ is the number of agents, $T$ is the time horizon and $δ\in (0,1)$ is the confidence parameter. Our results establish that $\mathtt{GT-DSGD}$ converges in the HP sense under the same conditions on the cost as in the MSE sense, while achieving comparable transient times. To the best of our knowledge, these are the first HP guarantees for decentralized optimization methods incorporating bias-correction. Numerical experiments on real and synthetic data verify our theoretical findings, underlining the superior performance of $\mathtt{GT-DSGD}$ and highlighting that the benefits of incorporating bias-correction are also maintained in the HP sense.

preprint2023arXiv

Enforcing Privacy in Distributed Learning with Performance Guarantees

We study the privatization of distributed learning and optimization strategies. We focus on differential privacy schemes and study their effect on performance. We show that the popular additive random perturbation scheme degrades performance because it is not well-tuned to the graph structure. For this reason, we exploit two alternative graph-homomorphic constructions and show that they improve performance while guaranteeing privacy. Moreover, contrary to most earlier studies, the gradient of the risks is not assumed to be bounded (a condition that rarely holds in practice; e.g., quadratic risk). We avoid this condition and still devise a differentially private scheme with high probability. We examine optimization and learning scenarios and illustrate the theoretical findings through simulations.

preprint2023arXiv

Privatized Graph Federated Learning

Federated learning is a semi-distributed algorithm, where a server communicates with multiple dispersed clients to learn a global model. The federated architecture is not robust and is sensitive to communication and computational overloads due to its one-master multi-client structure. It can also be subject to privacy attacks targeting personal information on the communication links. In this work, we introduce graph federated learning (GFL), which consists of multiple federated units connected by a graph. We then show how graph homomorphic perturbations can be used to ensure the algorithm is differentially private. We conduct both convergence and privacy theoretical analyses and illustrate performance by means of computer simulations.

preprint2022arXiv

A Fundamental Limit of Distributed Hypothesis Testing Under Memoryless Quantization

We study a distributed hypothesis testing setup where peripheral nodes send quantized data to the fusion center in a memoryless fashion. The \emph{expected} number of bits sent by each node under the null hypothesis is kept limited. We characterize the optimal decay rate of the mis-detection (type-II error) probability provided that false alarms (type-I error) are rare, and study the tradeoff between the communication rate and maximal type-II error decay rate. We resort to rate-distortion methods to provide upper bounds to the tradeoff curve and show that at high rates lattice quantization achieves near-optimal performance. We also characterize the tradeoff for the case where nodes are allowed to record and quantize a fixed number of samples. Moreover, under sum-rate constraints, we show that an upper bound to the tradeoff curve is obtained with a water-filling solution.

preprint2022arXiv

Dencentralized learning in the presence of low-rank noise

Observations collected by agents in a network may be unreliable due to observation noise or interference. This paper proposes a distributed algorithm that allows each node to improve the reliability of its own observation by relying solely on local computations and interactions with immediate neighbors, assuming that the field (graph signal) monitored by the network lies in a low-dimensional subspace and that a low-rank noise is present in addition to the usual full-rank noise. While oblique projections can be used to project measurements onto a low-rank subspace along a direction that is oblique to the subspace, the resulting solution is not distributed. Starting from the centralized solution, we propose an algorithm that performs the oblique projection of the overall set of observations onto the signal subspace in an iterative and distributed manner. We then show how the oblique projection framework can be extended to handle distributed learning and adaptation problems over networks.

preprint2022arXiv

Hidden Markov Modeling over Graphs

This work proposes a multi-agent filtering algorithm over graphs for finite-state hidden Markov models (HMMs), which can be used for sequential state estimation or for tracking opinion formation over dynamic social networks. We show that the difference from the optimal centralized Bayesian solution is asymptotically bounded for geometrically ergodic transition models. Experiments illustrate the theoretical findings and in particular, demonstrate the superior performance of the proposed algorithm compared to a state-of-the-art social learning algorithm.

preprint2022arXiv

Social Learning under Randomized Collaborations

We study a social learning scheme where at every time instant, each agent chooses to receive information from one of its neighbors at random. We show that under this sparser communication scheme, the agents learn the truth eventually and the asymptotic convergence rate remains the same as the standard algorithms which use more communication resources. We also derive large deviation estimates of the log-belief ratios for a special case where each agent replaces its belief with that of the chosen neighbor.

preprint2020arXiv

A Multi-Agent Primal-Dual Strategy for Composite Optimization over Distributed Features

This work studies multi-agent sharing optimization problems with the objective function being the sum of smooth local functions plus a convex (possibly non-smooth) function coupling all agents. This scenario arises in many machine learning and engineering applications, such as regression over distributed features and resource allocation. We reformulate this problem into an equivalent saddle-point problem, which is amenable to decentralized solutions. We then propose a proximal primal-dual algorithm and establish its linear convergence to the optimal solution when the local functions are strongly-convex. To our knowledge, this is the first linearly convergent decentralized algorithm for multi-agent sharing problems with a general convex (possibly non-smooth) coupling function.

preprint2020arXiv

Adaptation in Online Social Learning

This work studies social learning under non-stationary conditions. Although designed for online inference, classic social learning algorithms perform poorly under drifting conditions. To mitigate this drawback, we propose the Adaptive Social Learning (ASL) strategy. This strategy leverages an adaptive Bayesian update, where the adaptation degree can be modulated by tuning a suitable step-size parameter. The learning performance of the ASL algorithm is examined by means of a steady-state analysis. It is shown that, under the regime of small step-sizes: i) consistent learning is possible; ii) an accurate prediction of the performance can be furnished in terms of a Gaussian approximation.

preprint2020arXiv

Affine Combination of Diffusion Strategies over Networks

Diffusion adaptation is a powerful strategy for distributed estimation and learning over networks. Motivated by the concept of combining adaptive filters, this work proposes a combination framework that aggregates the operation of multiple diffusion strategies for enhanced performance. By assigning a combination coefficient to each node, and using an adaptation mechanism to minimize the network error, we obtain a combined diffusion strategy that benefits from the best characteristics of all component strategies simultaneously in terms of excess-mean-square error (EMSE). Analyses of the universality are provided to show the superior performance of affine combination scheme and to characterize its behavior in the mean and mean-square sense. Simulation results are presented to demonstrate the effectiveness of the proposed strategies, as well as the accuracy of theoretical findings.

preprint2020arXiv

Decentralized Proximal Gradient Algorithms with Linear Convergence Rates

This work studies a class of non-smooth decentralized multi-agent optimization problems where the agents aim at minimizing a sum of local strongly-convex smooth components plus a common non-smooth term. We propose a general primal-dual algorithmic framework that unifies many existing state-of-the-art algorithms. We establish linear convergence of the proposed method to the exact solution in the presence of the non-smooth term. Moreover, for the more general class of problems with agent specific non-smooth terms, we show that linear convergence cannot be achieved (in the worst case) for the class of algorithms that uses the gradients and the proximal mappings of the smooth and non-smooth parts, respectively. We further provide a numerical counterexample that shows how some state-of-the-art algorithms fail to converge linearly for strongly-convex objectives and different local non-smooth terms.

preprint2020arXiv

Diffusion LMS with Communication Delays: Stability and Performance Analysis

We study the problem of distributed estimation over adaptive networks where communication delays exist between nodes. In particular, we investigate the diffusion Least-Mean- Square (LMS) strategy where delayed intermediate estimates (due to the communication channels) are employed during the combination step. One important question is: Do the delays affect the stability condition and performance? To answer this question, we conduct a detailed performance analysis in the mean and in the mean-square-error sense of the diffusion LMS with delayed estimates. Stability conditions, transient and steady-state mean-square-deviation (MSD) expressions are provided. One of the main findings is that diffusion LMS with delays can still converge under the same step-sizes condition of the diffusion LMS without delays. Finally, simulation results illustrate the theoretical findings.

preprint2020arXiv

Dynamic Federated Learning

Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. While many federated learning architectures process data in an online manner, and are hence adaptive by nature, most performance analyses assume static optimization problems and offer no guarantees in the presence of drifts in the problem solution or data characteristics. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm. The results clarify the trade-off between convergence and tracking performance.

preprint2020arXiv

Graph Learning over Partially Observed Diffusion Networks: Role of Degree Concentration

This work examines the problem of graph learning over a diffusion network when data can be collected from a limited portion of the network (partial observability). The main question is to establish technical guarantees of consistent recovery of the subgraph of probed network nodes, i) despite the presence of unobserved nodes; and ii) under different connectivity regimes, including the dense regime where the probed nodes are influenced by many connections coming from the unobserved ones. We ascertain that suitable estimators of the combination matrix (i.e., the matrix that quantifies the pairwise interaction between nodes) possess an identifiability gap that enables the discrimination between connected and disconnected nodes. Fundamental conditions are established under which the subgraph of monitored nodes can be recovered, with high probability as the network size increases, through universal clustering algorithms. This claim is proved for three matrix estimators: i) the Granger estimator that adapts to the partial observability setting the solution that is exact under full observability ; ii) the one-lag correlation matrix; and iii) the residual estimator based on the difference between two consecutive time samples. A detailed characterization of the asymptotic behavior of these estimators is established in terms of an error bias and of the identifiability gap, and a sample complexity analysis is performed to establish how the number of samples scales with the network size to achieve consistent learning. Comparison among the estimators is performed through illustrative examples that show how estimators that are not optimal in the full observability regime can outperform the Granger estimator in the partial observability regime. The analysis reveals that the fundamental property enabling consistent graph learning is the statistical concentration of node degrees.

preprint2020arXiv

Graph Learning Under Partial Observability

Many optimization, inference and learning tasks can be accomplished efficiently by means of decentralized processing algorithms where the network topology (i.e., the graph) plays a critical role in enabling the interactions among neighboring nodes. There is a large body of literature examining the effect of the graph structure on the performance of decentralized processing strategies. In this article, we examine the inverse problem and consider the reverse question: How much information does observing the behavior at the nodes of a graph convey about the underlying topology? For large-scale networks, the difficulty in addressing such inverse problems is compounded by the fact that usually only a limited fraction of the nodes can be probed, giving rise to a second important question: Despite the presence of unobserved nodes, can partial observations still be sufficient to discover the graph linking the probed nodes? The article surveys recent advances on this challenging learning problem and related questions.

preprint2020arXiv

ISL: A novel approach for deep exploration

In this article we explore an alternative approach to address deep exploration and we introduce the ISL algorithm, which is efficient at performing deep exploration. Similarly to maximum entropy RL, we derive the algorithm by augmenting the traditional RL objective with a novel regularization term. A distinctive feature of our approach is that, as opposed to other works that tackle the problem of deep exploration, in our derivation both the learning equations and the exploration-exploitation strategy are derived in tandem as the solution to a well-posed optimization problem whose minimization leads to the optimal value function. Empirically we show that our method exhibits state of the art performance on a range of challenging deep-exploration benchmarks.

preprint2020arXiv

Learning Graph Influence from Social Interactions

In social learning, agents form their opinions or beliefs about certain hypotheses by exchanging local information. This work considers the recent paradigm of weak graphs, where the network is partitioned into sending and receiving components, with the former having the possibility of exerting a domineering effect on the latter. Such graph structures are prevalent over social platforms. We will not be focusing on the direct social learning problem (which examines what agents learn), but rather on the dual or reverse learning problem (which examines how agents learned). Specifically, from observations of the stream of beliefs at certain agents, we would like to examine whether it is possible to learn the strength of the connections (influences) from sending components in the network to these receiving agents.

preprint2020arXiv

Linear Convergence of Primal-Dual Gradient Methods and their Performance in Distributed Optimization

In this work, we revisit a classical incremental implementation of the primal-descent dual-ascent gradient method used for the solution of equality constrained optimization problems. We provide a short proof that establishes the linear (exponential) convergence of the algorithm for smooth strongly-convex cost functions and study its relation to the non-incremental implementation. We also study the effect of the augmented Lagrangian penalty term on the performance of distributed optimization algorithms for the minimization of aggregate cost functions over multi-agent networks.

preprint2020arXiv

Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization

Rapid advances in data collection and processing capabilities have allowed for the use of increasingly complex models that give rise to nonconvex optimization problems. These formulations, however, can be arbitrarily difficult to solve in general, in the sense that even simply verifying that a given point is a local minimum can be NP-hard [1]. Still, some relatively simple algorithms have been shown to lead to surprisingly good empirical results in many contexts of interest. Perhaps the most prominent example is the success of the backpropagation algorithm for training neural networks. Several recent works have pursued rigorous analytical justification for this phenomenon by studying the structure of the nonconvex optimization problems and establishing that simple algorithms, such as gradient descent and its variations, perform well in converging towards local minima and avoiding saddle-points. A key insight in these analyses is that gradient perturbations play a critical role in allowing local descent algorithms to efficiently distinguish desirable from undesirable stationary points and escape from the latter. In this article, we cover recent results on second-order guarantees for stochastic first-order optimization algorithms in centralized, federated, and decentralized architectures.

preprint2020arXiv

Supervised Learning Under Distributed Features

This work studies the problem of learning under both large datasets and large-dimensional feature space scenarios. The feature information is assumed to be spread across agents in a network, where each agent observes some of the features. Through local cooperation, the agents are supposed to interact with each other to solve an inference problem and converge towards the global minimizer of an empirical risk. We study this problem exclusively in the primal domain, and propose new and effective distributed solutions with guaranteed convergence to the minimizer with linear rate under strong convexity. This is achieved by combining a dynamic diffusion construction, a pipeline strategy, and variance-reduced techniques. Simulation results illustrate the conclusions.

preprint2019arXiv

Adaptation and learning over networks under subspace constraints -- Part I: Stability Analysis

This paper considers optimization problems over networks where agents have individual objectives to meet, or individual parameter vectors to estimate, subject to subspace constraints that require the objectives across the network to lie in low-dimensional subspaces. This constrained formulation includes consensus optimization as a special case, and allows for more general task relatedness models such as smoothness. While such formulations can be solved via projected gradient descent, the resulting algorithm is not distributed. Starting from the centralized solution, we propose an iterative and distributed implementation of the projection step, which runs in parallel with the stochastic gradient descent update. We establish in this Part I of the work that, for small step-sizes $μ$, the proposed distributed adaptive strategy leads to small estimation errors on the order of $μ$. We examine in the accompanying Part II [2] the steady-state performance. The results will reveal explicitly the influence of the gradient noise, data characteristics, and subspace constraints, on the network performance. The results will also show that in the small step-size regime, the iterates generated by the distributed algorithm achieve the centralized steady-state performance.