Source author record

Qing Ling

Qing Ling appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Distributed, Parallel, and Cluster Computing Computer Vision Artificial Intelligence eess.SP Information Theory math.IT Computation and Language Cryptography and Security Multiagent Systems Networking and Internet Architecture Systems and Control

Catalog footprint

What is connected

26works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Topology-Independent Robustness of the Weighted Mean under Label Poisoning Attacks in Heterogeneous Decentralized Learning

Robustness to malicious attacks is crucial for practical decentralized signal processing and machine learning systems. A typical example of such attacks is label poisoning, meaning that some agents possess corrupted local labels and share models trained on these poisoned data. To defend against malicious attacks, existing works often focus on designing robust aggregators; meanwhile, the weighted mean aggregator is typically considered a simple, vulnerable baseline. This paper analyzes the robustness of decentralized gradient descent under label poisoning attacks, considering both robust and weighted mean aggregators. Theoretical results reveal that the learning errors of robust aggregators depend on the network topology, whereas the performance of weighted mean aggregator is topology-independent. Remarkably, the weighted mean aggregator, although often considered vulnerable, can outperform robust aggregators under sufficient heterogeneity, particularly when: (i) the global contamination rate (i.e., the fraction of poisoned agents for the entire network) is smaller than the local contamination rate (i.e., the maximal fraction of poisoned neighbors for the regular agents); (ii) the network of regular agents is disconnected; or (iii) the network of regular agents is sparse and the local contamination rate is high. Empirical results support our theoretical findings, highlighting the important role of network topology in the robustness to label poisoning attacks.

preprint2022arXiv

Bridging Differential Privacy and Byzantine-Robustness via Model Aggregation

This paper aims at jointly addressing two seemly conflicting issues in federated learning: differential privacy (DP) and Byzantine-robustness, which are particularly challenging when the distributed data are non-i.i.d. (independent and identically distributed). The standard DP mechanisms add noise to the transmitted messages, and entangles with robust stochastic gradient aggregation to defend against Byzantine attacks. In this paper, we decouple the two issues via robust stochastic model aggregation, in the sense that our proposed DP mechanisms and the defense against Byzantine attacks have separated influence on the learning performance. Leveraging robust stochastic model aggregation, at each iteration, each worker calculates the difference between the local model and the global one, followed by sending the element-wise signs to the master node, which enables robustness to Byzantine attacks. Further, we design two DP mechanisms to perturb the uploaded signs for the purpose of privacy preservation, and prove that they are $(ε,0)$-DP by exploiting the properties of noise distributions. With the tools of Moreau envelop and proximal point projection, we establish the convergence of the proposed algorithm when the cost function is nonconvex. We analyze the trade-off between privacy preservation and learning performance, and show that the influence of our proposed DP mechanisms is decoupled with that of robust stochastic model aggregation. Numerical experiments demonstrate the effectiveness of the proposed algorithm.

preprint2022arXiv

Lazy Queries Can Reduce Variance in Zeroth-order Optimization

A major challenge of applying zeroth-order (ZO) methods is the high query complexity, especially when queries are costly. We propose a novel gradient estimation technique for ZO methods based on adaptive lazy queries that we term as LAZO. Different from the classic one-point or two-point gradient estimation methods, LAZO develops two alternative ways to check the usefulness of old queries from previous iterations, and then adaptively reuses them to construct the low-variance gradient estimates. We rigorously establish that through judiciously reusing the old queries, LAZO can reduce the variance of stochastic gradient estimates so that it not only saves queries per iteration but also achieves the regret bound for the symmetric two-point method. We evaluate the numerical performance of LAZO, and demonstrate the low-variance property and the performance gain of LAZO in both regret and query complexity relative to several existing ZO methods. The idea of LAZO is general, and can be applied to other variants of ZO methods.

preprint2022arXiv

Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning: Part II

In Part I of this work, we have proposed a general framework of decentralized stochastic quasi-Newton methods, which converge linearly to the optimal solution under the assumption that the local Hessian inverse approximations have bounded positive eigenvalues. In Part II, we specify two fully decentralized stochastic quasi-Newton methods, damped regularized limited-memory DFP (Davidon-Fletcher-Powell) and damped limited-memory BFGS (Broyden-Fletcher-Goldfarb-Shanno), to locally construct such Hessian inverse approximations without extra sampling or communication. Both of the methods use a fixed moving window of $M$ past local gradient approximations and local decision variables to adaptively construct positive definite Hessian inverse approximations with bounded eigenvalues, satisfying the assumption in Part I for the linear convergence. For the proposed damped regularized limited-memory DFP, a regularization term is added to improve the performance. For the proposed damped limited-memory BFGS, a two-loop recursion is applied, leading to low storage and computation complexity. Numerical experiments demonstrate that the proposed quasi-Newton methods are much faster than the existing decentralized stochastic first-order algorithms.

preprint2020arXiv

A Newton Tracking Algorithm with Exact Linear Convergence Rate for Decentralized Consensus Optimization

This paper considers the decentralized consensus optimization problem defined over a network where each node holds a second-order differentiable local objective function. Our goal is to minimize the summation of local objective functions and find the exact optimal solution using only local computation and neighboring communication. We propose a novel Newton tracking algorithm, where each node updates its local variable along a local Newton direction modified with neighboring and historical information. We investigate the connections between the proposed Newton tracking algorithm and several existing methods, including gradient tracking and second-order algorithms. Under the strong convexity assumption, we prove that it converges to the exact optimal solution at a linear rate. Numerical experiments demonstrate the efficacy of Newton tracking and validate the theoretical findings.

preprint2020arXiv

Communication-Censored Distributed Stochastic Gradient Descent

This paper develops a communication-efficient algorithm to solve the stochastic optimization problem defined over a distributed network, aiming at reducing the burdensome communication in applications such as distributed machine learning.Different from the existing works based on quantization and sparsification, we introduce a communication-censoring technique to reduce the transmissions of variables, which leads to our communication-Censored distributed Stochastic Gradient Descent (CSGD) algorithm. Specifically, in CSGD, the latest mini-batch stochastic gradient at a worker will be transmitted to the server if and only if it is sufficiently informative. When the latest gradient is not available, the stale one will be reused at the server. To implement this communication-censoring strategy, the batch-size is increasing in order to alleviate the effect of stochastic gradient noise. Theoretically, CSGD enjoys the same order of convergence rate as that of SGD, but effectively reduces communication. Numerical experiments demonstrate the sizable communication saving of CSGD.

preprint2020arXiv

Conditional Augmentation for Aspect Term Extraction via Masked Sequence-to-Sequence Generation

Aspect term extraction aims to extract aspect terms from review texts as opinion targets for sentiment analysis. One of the big challenges with this task is the lack of sufficient annotated data. While data augmentation is potentially an effective technique to address the above issue, it is uncontrollable as it may change aspect words and aspect labels unexpectedly. In this paper, we formulate the data augmentation as a conditional generation task: generating a new sentence while preserving the original opinion targets and labels. We propose a masked sequence-to-sequence method for conditional augmentation of aspect term extraction. Unlike existing augmentation approaches, ours is controllable and allows us to generate more diversified sentences. Experimental results confirm that our method alleviates the data scarcity problem significantly. It also effectively boosts the performances of several current models for aspect term extraction.

preprint2020arXiv

Convolutional Neural Networks for Space-Time Block Coding Recognition

We apply the latest advances in machine learning with deep neural networks to the tasks of radio modulation recognition, channel coding recognition, and spectrum monitoring. This paper first proposes an identification algorithm for space-time block coding of a signal. The feature between spatial multiplexing and Alamouti signals is extracted by adapting convolutional neural networks after preprocessing the received sequence. Unlike other algorithms, this method requires no prior information of channel coefficients and noise power, and consequently is well-suited for noncooperative contexts. Results show that the proposed algorithm performs well even at a low signal-to-noise ratio

preprint2020arXiv

HierTrain: Fast Hierarchical Edge AI Learning with Hybrid Parallelism in Mobile-Edge-Cloud Computing

Nowadays, deep neural networks (DNNs) are the core enablers for many emerging edge AI applications. Conventional approaches to training DNNs are generally implemented at central servers or cloud centers for centralized learning, which is typically time-consuming and resource-demanding due to the transmission of a large amount of data samples from the device to the remote cloud. To overcome these disadvantages, we consider accelerating the learning process of DNNs on the Mobile-Edge-Cloud Computing (MECC) paradigm. In this paper, we propose HierTrain, a hierarchical edge AI learning framework, which efficiently deploys the DNN training task over the hierarchical MECC architecture. We develop a novel \textit{hybrid parallelism} method, which is the key to HierTrain, to adaptively assign the DNN model layers and the data samples across the three levels of edge device, edge server and cloud center. We then formulate the problem of scheduling the DNN training tasks at both layer-granularity and sample-granularity. Solving this optimization problem enables us to achieve the minimum training time. We further implement a hardware prototype consisting of an edge device, an edge server and a cloud server, and conduct extensive experiments on it. Experimental results demonstrate that HierTrain can achieve up to 6.9x speedup compared to the cloud-based hierarchical training approach.

preprint2019arXiv

Communication-Censored Linearized ADMM for Decentralized Consensus Optimization

In this paper, we propose a communication- and computation-efficient algorithm to solve a convex consensus optimization problem defined over a decentralized network. A remarkable existing algorithm to solve this problem is the alternating direction method of multipliers (ADMM), in which at every iteration every node updates its local variable through combining neighboring variables and solving an optimization subproblem. The proposed algorithm, called as COmmunication-censored Linearized ADMM (COLA), leverages a linearization technique to reduce the iteration-wise computation cost of ADMM and uses a communication-censoring strategy to alleviate the communication cost. To be specific, COLA introduces successive linearization approximations to the local cost functions such that the resultant computation is first-order and light-weight. Since the linearization technique slows down the convergence speed, COLA further adopts the communication-censoring strategy to avoid transmissions of less informative messages. A node is allowed to transmit only if the distance between the current local variable and its previously transmitted one is larger than a censoring threshold. COLA is proven to be convergent when the local cost functions have Lipschitz continuous gradients and the censoring threshold is summable. When the local cost functions are further strongly convex, we establish the linear (sublinear) convergence rate of COLA, given that the censoring threshold linearly (sublinearly) decays to 0. Numerical experiments corroborate with the theoretical findings and demonstrate the satisfactory communication-computation tradeoff of COLA.

preprint2016arXiv

$\mathbf{D^3}$: Deep Dual-Domain Based Fast Restoration of JPEG-Compressed Images

In this paper, we design a Deep Dual-Domain ($\mathbf{D^3}$) based fast restoration model to remove artifacts of JPEG compressed images. It leverages the large learning capacity of deep networks, as well as the problem-specific expertise that was hardly incorporated in the past design of deep architectures. For the latter, we take into consideration both the prior knowledge of the JPEG compression scheme, and the successful practice of the sparsity-based dual-domain approach. We further design the One-Step Sparse Inference (1-SI) module, as an efficient and light-weighted feed-forward approximation of sparse coding. Extensive experiments verify the superiority of the proposed $D^3$ model over several state-of-the-art methods. Specifically, our best model is capable of outperforming the latest deep model for around 1 dB in PSNR, and is 30 times faster.

preprint2016arXiv

A Decentralized Second-Order Method for Dynamic Optimization

This paper considers decentralized dynamic optimization problems where nodes of a network try to minimize a sequence of time-varying objective functions in a real-time scheme. At each time slot, nodes have access to different summands of an instantaneous global objective function and they are allowed to exchange information only with their neighbors. This paper develops the application of the Exact Second-Order Method (ESOM) to solve the dynamic optimization problem in a decentralized manner. The proposed dynamic ESOM algorithm operates by primal descending and dual ascending on a quadratic approximation of an augmented Lagrangian of the instantaneous consensus optimization problem. The convergence analysis of dynamic ESOM indicates that a Lyapunov function of the sequence of primal and dual errors converges linearly to an error bound when the local functions are strongly convex and have Lipschitz continuous gradients. Numerical results demonstrate the claim that the sequence of iterates generated by the proposed method is able to track the sequence of optimal arguments.

preprint2016arXiv

A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization

This paper considers decentralized consensus optimization problems where different summands of a global objective function are available at nodes of a network that can communicate with neighbors only. The proximal method of multipliers is considered as a powerful tool that relies on proximal primal descent and dual ascent updates on a suitably defined augmented Lagrangian. The structure of the augmented Lagrangian makes this problem non-decomposable, which precludes distributed implementations. This problem is regularly addressed by the use of the alternating direction method of multipliers. The exact second order method (ESOM) is introduced here as an alternative that relies on: (i) The use of a separable quadratic approximation of the augmented Lagrangian. (ii) A truncated Taylor's series to estimate the solution of the first order condition imposed on the minimization of the quadratic approximation of the augmented Lagrangian. The sequences of primal and dual variables generated by ESOM are shown to converge linearly to their optimal arguments when the aggregate cost function is strongly convex and its gradients are Lipschitz continuous. Numerical results demonstrate advantages of ESOM relative to decentralized alternatives in solving least squares and logistic regression problems.

preprint2016arXiv

Learning A Deep $\ell_\infty$ Encoder for Hashing

We investigate the $\ell_\infty$-constrained representation which demonstrates robustness to quantization errors, utilizing the tool of deep learning. Based on the Alternating Direction Method of Multipliers (ADMM), we formulate the original convex minimization problem as a feed-forward neural network, named \textit{Deep $\ell_\infty$ Encoder}, by introducing the novel Bounded Linear Unit (BLU) neuron and modeling the Lagrange multipliers as network biases. Such a structural prior acts as an effective network regularization, and facilitates the model initialization. We then investigate the effective use of the proposed model in the application of hashing, by coupling the proposed encoders under a supervised pairwise loss, to develop a \textit{Deep Siamese $\ell_\infty$ Network}, which can be optimized from end to end. Extensive experiments demonstrate the impressive performances of the proposed model. We also provide an in-depth analysis of its behaviors against the competitors.

preprint2016arXiv

Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Asynchronous parallel optimization algorithms for solving large-scale machine learning problems have drawn significant attention from academia to industry recently. This paper proposes a novel algorithm, decoupled asynchronous proximal stochastic gradient descent (DAP-SGD), to minimize an objective function that is the composite of the average of multiple empirical losses and a regularization term. Unlike the traditional asynchronous proximal stochastic gradient descent (TAP-SGD) in which the master carries much of the computation load, the proposed algorithm off-loads the majority of computation tasks from the master to workers, and leaves the master to conduct simple addition operations. This strategy yields an easy-to-parallelize algorithm, whose performance is justified by theoretical convergence analyses. To be specific, DAP-SGD achieves an $O(\log T/T)$ rate when the step-size is diminishing and an ergodic $O(1/\sqrt{T})$ rate when the step-size is constant, where $T$ is the number of total iterations.

preprint2016arXiv

Stacked Approximated Regression Machine: A Simple Deep Learning Approach

With the agreement of my coauthors, I Zhangyang Wang would like to withdraw the manuscript "Stacked Approximated Regression Machine: A Simple Deep Learning Approach". Some experimental procedures were not included in the manuscript, which makes a part of important claims not meaningful. In the relevant research, I was solely responsible for carrying out the experiments; the other coauthors joined in the discussions leading to the main algorithm. Please see the updated text for more details.

preprint2015arXiv

Decentralized learning for wireless communications and networking

This chapter deals with decentralized learning algorithms for in-network processing of graph-valued data. A generic learning problem is formulated and recast into a separable form, which is iteratively minimized using the alternating-direction method of multipliers (ADMM) so as to gain the desired degree of parallelization. Without exchanging elements from the distributed training sets and keeping inter-node communications at affordable levels, the local (per-node) learners consent to the desired quantity inferred globally, meaning the one obtained if the entire training data set were centrally available. Impact of the decentralized learning framework to contemporary wireless communications and networking tasks is illustrated through case studies including target tracking using wireless sensor networks, unveiling Internet traffic anomalies, power system state estimation, as well as spectrum cartography for wireless cognitive radio networks.

preprint2015arXiv

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

This paper considers an optimization problem that components of the objective function are available at different nodes of a network and nodes are allowed to only exchange information with their neighbors. The decentralized alternating method of multipliers (DADMM) is a well-established iterative method for solving this category of problems; however, implementation of DADMM requires solving an optimization subproblem at each iteration for each node. This procedure is often computationally costly for the nodes. We introduce a decentralized quadratic approximation of ADMM (DQM) that reduces computational complexity of DADMM by minimizing a quadratic approximation of the objective function. Notwithstanding that DQM successively minimizes approximations of the cost, it converges to the optimal arguments at a linear rate which is identical to the convergence rate of DADMM. Further, we show that as time passes the coefficient of linear convergence for DQM approaches the one for DADMM. Numerical results demonstrate the effectiveness of DQM.

preprint2015arXiv

DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

This paper considers decentralized consensus optimization problems where nodes of a network have access to different summands of a global objective function. Nodes cooperate to minimize the global objective by exchanging information with neighbors only. A decentralized version of the alternating directions method of multipliers (DADMM) is a common method for solving this category of problems. DADMM exhibits linear convergence rate to the optimal objective but its implementation requires solving a convex optimization problem at each iteration. This can be computationally costly and may result in large overall convergence times. The decentralized quadratically approximated ADMM algorithm (DQM), which minimizes a quadratic approximation of the objective function that DADMM minimizes at each iteration, is proposed here. The consequent reduction in computational time is shown to have minimal effect on convergence properties. Convergence still proceeds at a linear rate with a guaranteed constant that is asymptotically equivalent to the DADMM linear convergence rate constant. Numerical results demonstrate advantages of DQM relative to DADMM and other alternatives in a logistic regression problem.

preprint2015arXiv

Learning Deep $\ell_0$ Encoders

Despite its nonconvex nature, $\ell_0$ sparse approximation is desirable in many theoretical and application cases. We study the $\ell_0$ sparse approximation problem with the tool of deep learning, by proposing Deep $\ell_0$ Encoders. Two typical forms, the $\ell_0$ regularized problem and the $M$-sparse problem, are investigated. Based on solid iterative algorithms, we model them as feed-forward neural networks, through introducing novel neurons and pooling functions. Enforcing such structural priors acts as an effective network regularization. The deep encoders also enjoy faster inference, larger learning capacity, and better scalability compared to conventional sparse coding solutions. Furthermore, under task-driven losses, the models can be conveniently optimized from end to end. Numerical results demonstrate the impressive performances of the proposed encoders.

preprint2015arXiv

Network Newton-Part I: Algorithm and Convergence

We study the problem of minimizing a sum of convex objective functions where the components of the objective are available at different nodes of a network and nodes are allowed to only communicate with their neighbors. The use of distributed gradient methods is a common approach to solve this problem. Their popularity notwithstanding, these methods exhibit slow convergence and a consequent large number of communications between nodes to approach the optimal argument because they rely on first order information only. This paper proposes the network Newton (NN) method as a distributed algorithm that incorporates second order information. This is done via distributed implementation of approximations of a suitably chosen Newton step. The approximations are obtained by truncation of the Newton step's Taylor expansion. This leads to a family of methods defined by the number $K$ of Taylor series terms kept in the approximation. When keeping $K$ terms of the Taylor series, the method is called NN-$K$ and can be implemented through the aggregation of information in $K$-hop neighborhoods. Convergence to a point close to the optimal argument at a rate that is at least linear is proven and the existence of a tradeoff between convergence time and the distance to the optimal argument is shown. Convergence rate, several practical implementation matters, and numerical analyses are presented in a companion paper [3].

preprint2015arXiv

Network Newton-Part II: Convergence Rate and Implementation

The use of network Newton methods for the decentralized optimization of a sum cost distributed through agents of a network is considered. Network Newton methods reinterpret distributed gradient descent as a penalty method, observe that the corresponding Hessian is sparse, and approximate the Newton step by truncating a Taylor expansion of the inverse Hessian. Truncating the series at $K$ terms yields the NN-$K$ that requires aggregating information from $K$ hops away. Network Newton is introduced and shown to converge to the solution of the penalized objective function at a rate that is at least linear in a companion paper [3]. The contributions of this work are: (i) To complement the convergence analysis by studying the methods' rate of convergence. (ii) To introduce adaptive formulations that converge to the optimal argument of the original objective. (iii) To perform numerical evaluations of NN-$K$ methods. The convergence analysis relates the behavior of NN-$K$ with the behavior of (regular) Newton's method and shows that the method goes through a quadratic convergence phase in a specific interval. The length of this quadratic phase grows with $K$ and can be made arbitrarily large. The numerical experiments corroborate reductions in the number of iterations and the communication cost that are necessary to achieve convergence relative to distributed gradient descent.

preprint2015arXiv

On the Convergence of Decentralized Gradient Descent

Consider the consensus problem of minimizing $f(x)=\sum_{i=1}^n f_i(x)$ where each $f_i$ is only known to one individual agent $i$ out of a connected network of $n$ agents. All the agents shall collaboratively solve this problem and obtain the solution subject to data exchanges restricted to between neighboring agents. Such algorithms avoid the need of a fusion center, offer better network load balance, and improve data privacy. We study the decentralized gradient descent method in which each agent $i$ updates its variable $x_{(i)}$, which is a local approximate to the unknown variable $x$, by combining the average of its neighbors' with the negative gradient step $-α\nabla f_i(x_{(i)})$. The iteration is $$x_{(i)}(k+1) \gets \sum_{\text{neighbor} j \text{of} i} w_{ij} x_{(j)}(k) - α\nabla f_i(x_{(i)}(k)),\quad\text{for each agent} i,$$ where the averaging coefficients form a symmetric doubly stochastic matrix $W=[w_{ij}] \in \mathbb{R}^{n \times n}$. We analyze the convergence of this iteration and derive its converge rate, assuming that each $f_i$ is proper closed convex and lower bounded, $\nabla f_i$ is Lipschitz continuous with constant $L_{f_i}$, and stepsize $α$ is fixed. Provided that $α< O(1/L_h)$ where $L_h=\max_i\{L_{f_i}\}$, the objective error at the averaged solution, $f(\frac{1}{n}\sum_i x_{(i)}(k))-f^*$, reduces at a speed of $O(1/k)$ until it reaches $O(α)$. If $f_i$ are further (restricted) strongly convex, then both $\frac{1}{n}\sum_i x_{(i)}(k)$ and each $x_{(i)}(k)$ converge to the global minimizer $x^*$ at a linear rate until reaching an $O(α)$-neighborhood of $x^*$. We also develop an iteration for decentralized basis pursuit and establish its linear convergence to an $O(α)$-neighborhood of the true unknown sparse signal.

preprint2014arXiv

EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization

Recently, there have been growing interests in solving consensus optimization problems in a multi-agent network. In this paper, we develop a decentralized algorithm for the consensus optimization problem $$\min\limits_{x\in\mathbb{R}^p}~\bar{f}(x)=\frac{1}{n}\sum\limits_{i=1}^n f_i(x),$$ which is defined over a connected network of $n$ agents, where each function $f_i$ is held privately by agent $i$ and encodes the agent's data and objective. All the agents shall collaboratively find the minimizer while each agent can only communicate with its neighbors. Such a computation scheme avoids a data fusion center or long-distance communication and offers better load balance to the network. This paper proposes a novel decentralized EXact firsT-ordeR Algorithm (abbreviated as EXTRA) to solve the consensus optimization problem. "exact" means that it can converge to the exact solution. EXTRA can use a fixed large step size, {which is independent of the network size}, and has synchronized iterations. The local variable of every agent $i$ converges uniformly and consensually to an exact minimizer of $\bar{f}$. In contrast, the well-known decentralized gradient descent (DGD) method must use diminishing step sizes in order to converge to an exact minimizer. EXTRA and DGD have the same choice of mixing matrices and similar per-iteration complexity. EXTRA, however, uses the gradients of last two iterates, unlike DGD which uses just that of last iterate. EXTRA has the best known convergence rates among the existing first-order decentralized algorithms. Specifically, if $f_i$'s are convex and have Lipschitz continuous gradients, EXTRA has an ergodic convergence rate $O(\frac{1}{k})$ in terms of the first-order optimality residual. If $\bar{f}$ is also restricted strongly convex, EXTRA converges to an optimal solution at a linear rate $O(C^{-k})$ for some constant $C>1$.

preprint2014arXiv

Network Newton

We consider minimization of a sum of convex objective functions where the components of the objective are available at different nodes of a network and nodes are allowed to only communicate with their neighbors. The use of distributed subgradient or gradient methods is widespread but they often suffer from slow convergence since they rely on first order information, which leads to a large number of local communications between nodes in the network. In this paper we propose the Network Newton (NN) method as a distributed algorithm that incorporates second order information via distributed evaluation of approximations to Newton steps. We also introduce adaptive (A)NN in order to establish exact convergence. Numerical analyses show significant improvement in both convergence time and number of communications for NN relative to existing (first order) alternatives.

preprint2014arXiv

On the Linear Convergence of the ADMM in Decentralized Consensus Optimization

In decentralized consensus optimization, a connected network of agents collaboratively minimize the sum of their local objective functions over a common decision variable, where their information exchange is restricted between the neighbors. To this end, one can first obtain a problem reformulation and then apply the alternating direction method of multipliers (ADMM). The method applies iterative computation at the individual agents and information exchange between the neighbors. This approach has been observed to converge quickly and deemed powerful. This paper establishes its linear convergence rate for decentralized consensus optimization problem with strongly convex local objective functions. The theoretical convergence rate is explicitly given in terms of the network topology, the properties of local objective functions, and the algorithm parameter. This result is not only a performance guarantee but also a guideline toward accelerating the ADMM convergence.

Qing Ling

What is connected

Connect this record

See the researcher in context

Building this map preview

26 published item(s)

Topology-Independent Robustness of the Weighted Mean under Label Poisoning Attacks in Heterogeneous Decentralized Learning

Bridging Differential Privacy and Byzantine-Robustness via Model Aggregation

Lazy Queries Can Reduce Variance in Zeroth-order Optimization

Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning: Part II

A Newton Tracking Algorithm with Exact Linear Convergence Rate for Decentralized Consensus Optimization

Communication-Censored Distributed Stochastic Gradient Descent

Conditional Augmentation for Aspect Term Extraction via Masked Sequence-to-Sequence Generation

Convolutional Neural Networks for Space-Time Block Coding Recognition

HierTrain: Fast Hierarchical Edge AI Learning with Hybrid Parallelism in Mobile-Edge-Cloud Computing

Communication-Censored Linearized ADMM for Decentralized Consensus Optimization

$\mathbf{D^3}$: Deep Dual-Domain Based Fast Restoration of JPEG-Compressed Images

A Decentralized Second-Order Method for Dynamic Optimization

A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization

Learning A Deep $\ell_\infty$ Encoder for Hashing

Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Stacked Approximated Regression Machine: A Simple Deep Learning Approach

Decentralized learning for wireless communications and networking

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

Learning Deep $\ell_0$ Encoders

Network Newton-Part I: Algorithm and Convergence

Network Newton-Part II: Convergence Rate and Implementation

On the Convergence of Decentralized Gradient Descent

EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization

Network Newton

On the Linear Convergence of the ADMM in Decentralized Consensus Optimization