Source author record

Xinmeng Huang

Xinmeng Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Distributed, Parallel, and Cluster Computing Machine Learning

Catalog footprint

What is connected

2works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD

We consider the decentralized stochastic optimization problems, where a network of $n$ nodes, each owning a local cost function, cooperate to find a minimizer of the globally-averaged cost. A widely studied decentralized algorithm for this problem is decentralized SGD (D-SGD), in which each node averages only with its neighbors. D-SGD is efficient in single-iteration communication, but it is very sensitive to the network topology. For smooth objective functions, the transient stage (which measures the number of iterations the algorithm has to experience before achieving the linear speedup stage) of D-SGD is on the order of $Ω(n/(1-β)^2)$ and $Ω(n^3/(1-β)^4)$ for strongly and generally convex cost functions, respectively, where $1-β\in (0,1)$ is a topology-dependent quantity that approaches $0$ for a large and sparse network. Hence, D-SGD suffers from slow convergence for large and sparse networks. In this work, we study the non-asymptotic convergence property of the D$^2$/Exact-diffusion algorithm. By eliminating the influence of data heterogeneity between nodes, D$^2$/Exact-diffusion is shown to have an enhanced transient stage that is on the order of $\tildeΩ(n/(1-β))$ and $Ω(n^3/(1-β)^2)$ for strongly and generally convex cost functions, respectively. Moreover, when D$^2$/Exact-diffusion is implemented with gradient accumulation and multi-round gossip communications, its transient stage can be further improved to $\tildeΩ(1/(1-β)^{\frac{1}{2}})$ and $\tildeΩ(n/(1-β))$ for strongly and generally convex cost functions, respectively. These established results for D$^2$/Exact-Diffusion have the best (i.e., weakest) dependence on network topology to our knowledge compared to existing decentralized algorithms. We also conduct numerical simulations to validate our theories.

preprint2020arXiv

Tight Coefficients of Averaged Operators via Scaled Relative Graph

Many iterative methods in optimization are fixed-point iterations with averaged operators. As such methods converge at an $\mathcal{O}(1/k)$ rate with the constant determined by the averagedness coefficient, establishing small averagedness coefficients for operators is of broad interest. In this paper, we show that the averagedness coefficients of the composition of averaged operators by Ogura and Yamada (Numer Func Anal Opt 32(1--2):113--137, 2002) and the three-operator splitting by Davis and Yin (Set-Valued Var Anal 25(4):829--858, 2017) are tight. The analysis relies on the scaled relative graph, a geometric tool recently proposed by Ryu, Hannah, and Yin (arXiv:1902.09788, 2019).