Source author record

Sulaiman A. Alghunaim

Sulaiman A. Alghunaim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Distributed, Parallel, and Cluster Computing Machine Learning

Catalog footprint

What is connected

6works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

An Enhanced Gradient-Tracking Bound for Distributed Online Stochastic Convex Optimization

Gradient-tracking (GT) based decentralized methods have emerged as an effective and viable alternative method to decentralized (stochastic) gradient descent (DSGD) when solving distributed online stochastic optimization problems. Initial studies of GT methods implied that GT methods have worse network dependent rate than DSGD, contradicting experimental results. This dilemma has recently been resolved, and tighter rates for GT methods have been established, which improves upon DSGD. In this work, we establish more enhanced rates for GT methods under the online stochastic convex settings. We present an alternative approach for analyzing GT methods for convex problems and over static graphs. When compared to previous analyses, this approach allows us to establish enhanced network dependent rates.

preprint2022arXiv

A Unified and Refined Convergence Analysis for Non-Convex Decentralized Learning

We study the consensus decentralized optimization problem where the objective function is the average of $n$ agents private non-convex cost functions; moreover, the agents can only communicate to their neighbors on a given network topology. The stochastic learning setting is considered in this paper where each agent can only access a noisy estimate of its gradient. Many decentralized methods can solve such problem including EXTRA, Exact-Diffusion/D$^2$, and gradient-tracking. Unlike the famed DSGD algorithm, these methods have been shown to be robust to the heterogeneity across the local cost functions. However, the established convergence rates for these methods indicate that their sensitivity to the network topology is worse than DSGD. Such theoretical results imply that these methods can perform much worse than DSGD over sparse networks, which, however, contradicts empirical experiments where DSGD is observed to be more sensitive to the network topology. In this work, we study a general stochastic unified decentralized algorithm (SUDA) that includes the above methods as special cases. We establish the convergence of SUDA under both non-convex and the Polyak-Lojasiewicz condition settings. Our results provide improved network topology dependent bounds for these methods (such as Exact-Diffusion/D$^2$ and gradient-tracking) compared with existing literature. Moreover, our results show that these methods are often less sensitive to the network topology compared to DSGD, which agrees with numerical experiments.

preprint2022arXiv

Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD

We consider the decentralized stochastic optimization problems, where a network of $n$ nodes, each owning a local cost function, cooperate to find a minimizer of the globally-averaged cost. A widely studied decentralized algorithm for this problem is decentralized SGD (D-SGD), in which each node averages only with its neighbors. D-SGD is efficient in single-iteration communication, but it is very sensitive to the network topology. For smooth objective functions, the transient stage (which measures the number of iterations the algorithm has to experience before achieving the linear speedup stage) of D-SGD is on the order of $Ω(n/(1-β)^2)$ and $Ω(n^3/(1-β)^4)$ for strongly and generally convex cost functions, respectively, where $1-β\in (0,1)$ is a topology-dependent quantity that approaches $0$ for a large and sparse network. Hence, D-SGD suffers from slow convergence for large and sparse networks. In this work, we study the non-asymptotic convergence property of the D$^2$/Exact-diffusion algorithm. By eliminating the influence of data heterogeneity between nodes, D$^2$/Exact-diffusion is shown to have an enhanced transient stage that is on the order of $\tildeΩ(n/(1-β))$ and $Ω(n^3/(1-β)^2)$ for strongly and generally convex cost functions, respectively. Moreover, when D$^2$/Exact-diffusion is implemented with gradient accumulation and multi-round gossip communications, its transient stage can be further improved to $\tildeΩ(1/(1-β)^{\frac{1}{2}})$ and $\tildeΩ(n/(1-β))$ for strongly and generally convex cost functions, respectively. These established results for D$^2$/Exact-Diffusion have the best (i.e., weakest) dependence on network topology to our knowledge compared to existing decentralized algorithms. We also conduct numerical simulations to validate our theories.

preprint2020arXiv

A Multi-Agent Primal-Dual Strategy for Composite Optimization over Distributed Features

This work studies multi-agent sharing optimization problems with the objective function being the sum of smooth local functions plus a convex (possibly non-smooth) function coupling all agents. This scenario arises in many machine learning and engineering applications, such as regression over distributed features and resource allocation. We reformulate this problem into an equivalent saddle-point problem, which is amenable to decentralized solutions. We then propose a proximal primal-dual algorithm and establish its linear convergence to the optimal solution when the local functions are strongly-convex. To our knowledge, this is the first linearly convergent decentralized algorithm for multi-agent sharing problems with a general convex (possibly non-smooth) coupling function.

preprint2020arXiv

Decentralized Proximal Gradient Algorithms with Linear Convergence Rates

This work studies a class of non-smooth decentralized multi-agent optimization problems where the agents aim at minimizing a sum of local strongly-convex smooth components plus a common non-smooth term. We propose a general primal-dual algorithmic framework that unifies many existing state-of-the-art algorithms. We establish linear convergence of the proposed method to the exact solution in the presence of the non-smooth term. Moreover, for the more general class of problems with agent specific non-smooth terms, we show that linear convergence cannot be achieved (in the worst case) for the class of algorithms that uses the gradients and the proximal mappings of the smooth and non-smooth parts, respectively. We further provide a numerical counterexample that shows how some state-of-the-art algorithms fail to converge linearly for strongly-convex objectives and different local non-smooth terms.

preprint2020arXiv

Linear Convergence of Primal-Dual Gradient Methods and their Performance in Distributed Optimization

In this work, we revisit a classical incremental implementation of the primal-descent dual-ascent gradient method used for the solution of equality constrained optimization problems. We provide a short proof that establishes the linear (exponential) convergence of the algorithm for smooth strongly-convex cost functions and study its relation to the non-incremental implementation. We also study the effect of the augmented Lagrangian penalty term on the performance of distributed optimization algorithms for the minimization of aggregate cost functions over multi-agent networks.

Sulaiman A. Alghunaim

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

An Enhanced Gradient-Tracking Bound for Distributed Online Stochastic Convex Optimization

A Unified and Refined Convergence Analysis for Non-Convex Decentralized Learning

Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD

A Multi-Agent Primal-Dual Strategy for Composite Optimization over Distributed Features

Decentralized Proximal Gradient Algorithms with Linear Convergence Rates

Linear Convergence of Primal-Dual Gradient Methods and their Performance in Distributed Optimization