Source author record

Wotao Yin

Wotao Yin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Information Theory math.IT math.NA Distributed, Parallel, and Cluster Computing Numerical Analysis Computer Vision Artificial Intelligence Computation Computational Engineering, Finance, and Science Computational Geometry math.ST Multiagent Systems physics.med-ph Statistics Theory

Catalog footprint

What is connected

49works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Subsampled Ensemble Can Improve Generalization Tail Exponentially

Ensemble learning is a popular technique to improve the accuracy of machine learning models. It traditionally hinges on the rationale that aggregating multiple weak models can lead to better models with lower variance and hence higher stability, especially for discontinuous base learners. In this paper, we provide a new perspective on ensembling. By selecting the most frequently generated model from the base learner when repeatedly applied to subsamples, we can attain exponentially decaying tails for the excess risk, even if the base learner suffers from slow (i.e., polynomial) decay rates. This tail enhancement power of ensembling applies to base learners that have reasonable predictive power to begin with and is stronger than variance reduction in the sense of exhibiting rate improvement. We demonstrate how our ensemble methods can substantially improve out-of-sample performances in a range of numerical examples involving heavy-tailed data or intrinsically slow rates.

preprint2024arXiv

Decomposition Methods for Global Solutions of Mixed-Integer Linear Programs

This paper introduces two decomposition-based methods for two-block mixed-integer linear programs (MILPs), which aim to take advantage of separable structures of the original problem by solving a sequence of lower-dimensional MILPs. The first method is based on the $\ell_1$-augmented Lagrangian method (ALM), and the second one is based on a modified alternating direction method of multipliers (ADMM). In the presence of certain block-angular structures, both methods create parallel subproblems in one block of variables, and add nonconvex cuts to update the other block; they converge to globally optimal solutions of the original MILP under proper conditions. Numerical experiments on three classes of MILPs demonstrate the advantages of the proposed methods on structured problems over the state-of-the-art MILP solvers.

preprint2022arXiv

A One-bit, Comparison-Based Gradient Estimator

We study zeroth-order optimization for convex functions where we further assume that function evaluations are unavailable. Instead, one only has access to a $\textit{comparison oracle}$, which given two points $x$ and $y$ returns a single bit of information indicating which point has larger function value, $f(x)$ or $f(y)$. By treating the gradient as an unknown signal to be recovered, we show how one can use tools from one-bit compressed sensing to construct a robust and reliable estimator of the normalized gradient. We then propose an algorithm, coined SCOBO, that uses this estimator within a gradient descent scheme. We show that when $f(x)$ has some low dimensional structure that can be exploited, SCOBO outperforms the state-of-the-art in terms of query complexity. Our theoretical claims are verified by extensive numerical experiments.

preprint2022arXiv

A Single-Timescale Method for Stochastic Bilevel Optimization

Stochastic bilevel optimization generalizes the classic stochastic optimization from the minimization of a single objective to the minimization of an objective function that depends the solution of another optimization problem. Recently, stochastic bilevel optimization is regaining popularity in emerging machine learning applications such as hyper-parameter optimization and model-agnostic meta learning. To solve this class of stochastic optimization problems, existing methods require either double-loop or two-timescale updates, which are sometimes less efficient. This paper develops a new optimization method for a class of stochastic bilevel problems that we term Single-Timescale stochAstic BiLevEl optimization (STABLE) method. STABLE runs in a single loop fashion, and uses a single-timescale update with a fixed batch size. To achieve an $ε$-stationary point of the bilevel problem, STABLE requires ${\cal O}(ε^{-2})$ samples in total; and to achieve an $ε$-optimal solution in the strongly convex case, STABLE requires ${\cal O}(ε^{-1})$ samples. To the best of our knowledge, this is the first bilevel optimization algorithm achieving the same order of sample complexity as the stochastic gradient descent method for the single-level stochastic optimization.

preprint2022arXiv

From the simplex to the sphere: Faster constrained optimization using the Hadamard parametrization

The standard simplex in R^n, also known as the probability simplex, is the set of nonnegative vectors whose entries sum up to 1. They frequently appear as constraints in optimization problems that arise in machine learning, statistics, data science, operations research, and beyond. We convert the standard simplex to the unit sphere and thus transform the corresponding constrained optimization problem into an optimization problem on a simple, smooth manifold. We show that KKT points and strict-saddle points of the minimization problem on the standard simplex all correspond to those of the transformed problem, and vice versa. So, solving one problem is equivalent to solving the other problem. Then, we propose several simple, efficient, and projection-free algorithms using the manifold structure. The equivalence and the proposed algorithm can be extended to optimization problems with unit simplex, weighted probability simplex, or `1-norm sphere constraints. Numerical experiments between the new algorithms and existing ones show the advantages of the new approach

preprint2021arXiv

Hybrid Federated Learning: Algorithms and Implementation

Federated learning (FL) is a recently proposed distributed machine learning paradigm dealing with distributed and private data sets. Based on the data partition pattern, FL is often categorized into horizontal, vertical, and hybrid settings. Despite the fact that many works have been developed for the first two approaches, the hybrid FL setting (which deals with partially overlapped feature space and sample space) remains less explored, though this setting is extremely important in practice. In this paper, we first set up a new model-matching-based problem formulation for hybrid FL, then propose an efficient algorithm that can collaboratively train the global and local models to deal with full and partial featured data. We conduct numerical experiments on the multi-view ModelNet40 data set to validate the performance of the proposed algorithm. To the best of our knowledge, this is the first formulation and algorithm developed for the hybrid FL.

preprint2020arXiv

A mean field game inverse problem

Mean-field games arise in various fields including economics, engineering, and machine learning. They study strategic decision making in large populations where the individuals interact via certain mean-field quantities. The ground metrics and running costs of the games are of essential importance but are often unknown or only partially known. In this paper, we propose mean-field game inverse-problem models to reconstruct the ground metrics and interaction kernels in the running costs. The observations are the macro motions, to be specific, the density distribution, and the velocity field of the agents. They can be corrupted by noise to some extent. Our models are PDE constrained optimization problems, which are solvable by first-order primal-dual methods. Besides, we apply Bregman iterations to find the optimal model parameters. We numerically demonstrate that our model is both efficient and robust to noise.

preprint2020arXiv

An Improved Analysis of Stochastic Gradient Descent with Momentum

SGD with momentum (SGDM) has been widely applied in many machine learning tasks, and it is often applied with dynamic stepsizes and momentum weights tuned in a stagewise manner. Despite of its empirical advantage over SGD, the role of momentum is still unclear in general since previous analyses on SGDM either provide worse convergence bounds than those of SGD, or assume Lipschitz or quadratic objectives, which fail to hold in practice. Furthermore, the role of dynamic parameters has not been addressed. In this work, we show that SGDM converges as fast as SGD for smooth objectives under both strongly convex and nonconvex settings. We also establish \textit{the first} convergence guarantee for the multistage setting, and show that the multistage strategy is beneficial for SGDM compared to using fixed parameters. Finally, we verify these theoretical claims by numerical experiments.

preprint2020arXiv

AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity

In this paper, we propose AsyncQVI, an asynchronous-parallel Q-value iteration for discounted Markov decision processes whose transition and reward can only be sampled through a generative model. Given such a problem with $|\mathcal{S}|$ states, $|\mathcal{A}|$ actions, and a discounted factor $γ\in(0,1)$, AsyncQVI uses memory of size $\mathcal{O}(|\mathcal{S}|)$ and returns an $\varepsilon$-optimal policy with probability at least $1-δ$ using $$\tilde{\mathcal{O}}\big(\frac{|\mathcal{S}||\mathcal{A}|}{(1-γ)^5\varepsilon^2}\log(\frac{1}δ)\big)$$ samples. AsyncQVI is also the first asynchronous-parallel algorithm for discounted Markov decision processes that has a sample complexity, which nearly matches the theoretical lower bound. The relatively low memory footprint and parallel ability make AsyncQVI suitable for large-scale applications. In numerical tests, we compare AsyncQVI with four sample-based value iteration methods. The results show that our algorithm is highly efficient and achieves linear parallel speedup.

preprint2020arXiv

CADA: Communication-Adaptive Distributed Adam

Stochastic gradient descent (SGD) has taken the stage as the primary workhorse for large-scale machine learning. It is often used with its adaptive variants such as AdaGrad, Adam, and AMSGrad. This paper proposes an adaptive stochastic gradient descent method for distributed machine learning, which can be viewed as the communication-adaptive counterpart of the celebrated Adam method - justifying its name CADA. The key components of CADA are a set of new rules tailored for adaptive stochastic gradients that can be implemented to save communication upload. The new algorithms adaptively reuse the stale Adam gradients, thus saving communication, and still have convergence rates comparable to original Adam. In numerical experiments, CADA achieves impressive empirical performance in terms of total communication round reduction.

preprint2020arXiv

Decentralized Accelerated Gradient Methods With Increasing Penalty Parameters

In this paper, we study the communication and (sub)gradient computation costs in distributed optimization and give a sharp complexity analysis for the proposed distributed accelerated gradient methods. We present two algorithms based on the framework of the accelerated penalty method with increasing penalty parameters. Our first algorithm is for smooth distributed optimization and it obtains the near optimal $O\left(\sqrt{\frac{L}{ε(1-σ_2(W))}}\log\frac{1}ε\right)$ communication complexity and the optimal $O\left(\sqrt{\frac{L}ε}\right)$ gradient computation complexity for $L$-smooth convex problems, where $σ_2(W)$ denotes the second largest singular value of the weight matrix $W$ associated to the network and $ε$ is the target accuracy. When the problem is $μ$-strongly convex and $L$-smooth, our algorithm has the near optimal $O\left(\sqrt{\frac{L}{μ(1-σ_2(W))}}\log^2\frac{1}ε\right)$ complexity for communications and the optimal $O\left(\sqrt{\frac{L}μ}\log\frac{1}ε\right)$ complexity for gradient computations. Our communication complexities are only worse by a factor of $\left(\log\frac{1}ε\right)$ than the lower bounds for the smooth distributed optimization. %As far as we know, our method is the first to achieve both communication and gradient computation lower bounds up to an extra logarithm factor for smooth distributed optimization. Our second algorithm is designed for non-smooth distributed optimization and it achieves both the optimal $O\left(\frac{1}{ε\sqrt{1-σ_2(W)}}\right)$ communication complexity and $O\left(\frac{1}{ε^2}\right)$ subgradient computation complexity, which match the communication and subgradient computation complexity lower bounds for non-smooth distributed optimization.

preprint2020arXiv

Decentralized Learning with Lazy and Approximate Dual Gradients

This paper develops algorithms for decentralized machine learning over a network, where data are distributed, computation is localized, and communication is restricted between neighbors. A line of recent research in this area focuses on improving both computation and communication complexities. The methods SSDA and MSDA \cite{scaman2017optimal} have optimal communication complexity when the objective is smooth and strongly convex, and are simple to derive. However, they require solving a subproblem at each step. We propose new algorithms that save computation through using (stochastic) gradients and saves communications when previous information is sufficiently useful. Our methods remain relatively simple -- rather than solving a subproblem, they run Katyusha for a small, fixed number of steps from the latest point. An easy-to-compute, local rule is used to decide if a worker can skip a round of communication. Furthermore, our methods provably reduce communication and computation complexities of SSDA and MSDA. In numerical experiments, our algorithms achieve significant computation and communication reduction compared with the state-of-the-art.

preprint2020arXiv

How Does an Approximate Model Help in Reinforcement Learning?

One of the key approaches to save samples in reinforcement learning (RL) is to use knowledge from an approximate model such as its simulator. However, how much does an approximate model help to learn a near-optimal policy of the true unknown model? Despite numerous empirical studies of transfer reinforcement learning, an answer to this question is still elusive. In this paper, we study the sample complexity of RL while an approximate model of the environment is provided. For an unknown Markov decision process (MDP), we show that the approximate model can effectively reduce the complexity by eliminating sub-optimal actions from the policy searching space. In particular, we provide an algorithm that uses $\widetilde{O}(N/(1-γ)^3/\varepsilon^2)$ samples in a generative model to learn an $\varepsilon$-optimal policy, where $γ$ is the discount factor and $N$ is the number of near-optimal actions in the approximate model. This can be much smaller than the learning-from-scratch complexity $\widetildeΘ(SA/(1-γ)^3/\varepsilon^2)$, where $S$ and $A$ are the sizes of state and action spaces respectively. We also provide a lower bound showing that the above upper bound is nearly-tight if the value gap between near-optimal actions and sub-optimal actions in the approximate model is sufficiently large. Our results provide a very precise characterization of how an approximate model helps reinforcement learning when no additional assumption on the model is posed.

preprint2020arXiv

LASG: Lazily Aggregated Stochastic Gradients for Communication-Efficient Distributed Learning

This paper targets solving distributed machine learning problems such as federated learning in a communication-efficient fashion. A class of new stochastic gradient descent (SGD) approaches have been developed, which can be viewed as the stochastic generalization to the recently developed lazily aggregated gradient (LAG) method --- justifying the name LASG. LAG adaptively predicts the contribution of each round of communication and chooses only the significant ones to perform. It saves communication while also maintains the rate of convergence. However, LAG only works with deterministic gradients, and applying it to stochastic gradients yields poor performance. The key components of LASG are a set of new rules tailored for stochastic gradients that can be implemented either to save download, upload, or both. The new algorithms adaptively choose between fresh and stale stochastic gradients and have convergence rates comparable to the original SGD. LASG achieves impressive empirical performance --- it typically saves total communication by an order of magnitude.

preprint2020arXiv

Tight Coefficients of Averaged Operators via Scaled Relative Graph

Many iterative methods in optimization are fixed-point iterations with averaged operators. As such methods converge at an $\mathcal{O}(1/k)$ rate with the constant determined by the averagedness coefficient, establishing small averagedness coefficients for operators is of broad interest. In this paper, we show that the averagedness coefficients of the composition of averaged operators by Ogura and Yamada (Numer Func Anal Opt 32(1--2):113--137, 2002) and the three-operator splitting by Davis and Yin (Set-Valued Var Anal 25(4):829--858, 2017) are tight. The analysis relies on the scaled relative graph, a geometric tool recently proposed by Ryu, Hannah, and Yin (arXiv:1902.09788, 2019).

preprint2020arXiv

VAFL: a Method of Vertical Asynchronous Federated Learning

Horizontal Federated learning (FL) handles multi-client data that share the same set of features, and vertical FL trains a better predictor that combine all the features from different clients. This paper targets solving vertical FL in an asynchronous fashion, and develops a simple FL method. The new method allows each client to run stochastic gradient algorithms without coordination with other clients, so it is suitable for intermittent connectivity of clients. This method further uses a new technique of perturbed local embedding to ensure data privacy and improve communication efficiency. Theoretically, we present the convergence rate and privacy level of our method for strongly convex, nonconvex and even nonsmooth objectives separately. Empirically, we apply our method to FL on various image and healthcare datasets. The results compare favorably to centralized and synchronous FL methods.

preprint2017arXiv

On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed with outdated information, and the age of the outdated information, which we call delay, is the number of times it has been updated since its creation. Almost all recent works prove convergence under the assumption of a finite maximum delay and set their stepsize parameters accordingly. However, the maximum delay is practically unknown. This paper presents convergence analysis of an async-parallel method from a probabilistic viewpoint, and it allows for large unbounded delays. An explicit formula of stepsize that guarantees convergence is given depending on delays' statistics. With $p+1$ identical processors, we empirically measured that delays closely follow the Poisson distribution with parameter $p$, matching our theoretical model, and thus the stepsize can be set accordingly. Simulations on both convex and nonconvex optimization problems demonstrate the validness of our analysis and also show that the existing maximum-delay induced stepsize is too conservative, often slowing down the convergence of the algorithm.

preprint2016arXiv

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates

Finding a fixed point to a nonexpansive operator, i.e., $x^*=Tx^*$, abstracts many problems in numerical linear algebra, optimization, and other areas of scientific computing. To solve fixed-point problems, we propose ARock, an algorithmic framework in which multiple agents (machines, processors, or cores) update $x$ in an asynchronous parallel fashion. Asynchrony is crucial to parallel computing since it reduces synchronization wait, relaxes communication bottleneck, and thus speeds up computing significantly. At each step of ARock, an agent updates a randomly selected coordinate $x_i$ based on possibly out-of-date information on $x$. The agents share $x$ through either global memory or communication. If writing $x_i$ is atomic, the agents can read and write $x$ without memory locks. Theoretically, we show that if the nonexpansive operator $T$ has a fixed point, then with probability one, ARock generates a sequence that converges to a fixed points of $T$. Our conditions on $T$ and step sizes are weaker than comparable work. Linear convergence is also obtained. We propose special cases of ARock for linear systems, convex optimization, machine learning, as well as distributed and decentralized consensus problems. Numerical experiments of solving sparse logistic regression problems are presented.

preprint2016arXiv

Coordinate Friendly Structures, Algorithms and Applications

This paper focuses on coordinate update methods, which are useful for solving problems involving large or high-dimensional datasets. They decompose a problem into simple subproblems, where each updates one, or a small block of, variables while fixing others. These methods can deal with linear and nonlinear mappings, smooth and nonsmooth functions, as well as convex and nonconvex problems. In addition, they are easy to parallelize. The great performance of coordinate update methods depends on solving simple sub-problems. To derive simple subproblems for several new classes of applications, this paper systematically studies coordinate-friendly operators that perform low-cost coordinate updates. Based on the discovered coordinate friendly operators, as well as operator splitting techniques, we obtain new coordinate update algorithms for a variety of problems in machine learning, image processing, as well as sub-areas of optimization. Several problems are treated with coordinate update for the first time in history. The obtained algorithms are scalable to large instances through parallel and even asynchronous computing. We present numerical examples to illustrate how effective these algorithms are.

preprint2016arXiv

Expander Graph and Communication-Efficient Decentralized Optimization

In this paper, we discuss how to design the graph topology to reduce the communication complexity of certain algorithms for decentralized optimization. Our goal is to minimize the total communication needed to achieve a prescribed accuracy. We discover that the so-called expander graphs are near-optimal choices. We propose three approaches to construct expander graphs for different numbers of nodes and node degrees. Our numerical results show that the performance of decentralized optimization is significantly better on expander graphs than other regular graphs.

preprint2016arXiv

Sparse Recovery via Differential Inclusions

In this paper, we recover sparse signals from their noisy linear measurements by solving nonlinear differential inclusions, which is based on the notion of inverse scale space (ISS) developed in applied mathematics. Our goal here is to bring this idea to address a challenging problem in statistics, \emph{i.e.} finding the oracle estimator which is unbiased and sign-consistent using dynamics. We call our dynamics \emph{Bregman ISS} and \emph{Linearized Bregman ISS}. A well-known shortcoming of LASSO and any convex regularization approaches lies in the bias of estimators. However, we show that under proper conditions, there exists a bias-free and sign-consistent point on the solution paths of such dynamics, which corresponds to a signal that is the unbiased estimate of the true signal and whose entries have the same signs as those of the true signs, \emph{i.e.} the oracle estimator. Therefore, their solution paths are regularization paths better than the LASSO regularization path, since the points on the latter path are biased when sign-consistency is reached. We also show how to efficiently compute their solution paths in both continuous and discretized settings: the full solution paths can be exactly computed piece by piece, and a discretization leads to \emph{Linearized Bregman iteration}, which is a simple iterative thresholding rule and easy to parallelize. Theoretical guarantees such as sign-consistency and minimax optimal $l_2$-error bounds are established in both continuous and discrete settings for specific points on the paths. Early-stopping rules for identifying these points are given. The key treatment relies on the development of differential inequalities for differential inclusions and their discretizations, which extends the previous results and leads to exponentially fast recovering of sparse signals before selecting wrong ones.

preprint2016arXiv

TMAC: A Toolbox of Modern Async-Parallel, Coordinate, Splitting, and Stochastic Methods

TMAC is a toolbox written in C++11 that implements algorithms based on a set of modern methods for large-scale optimization. It covers a variety of optimization problems, which can be both smooth and nonsmooth, convex and nonconvex, as well as constrained and unconstrained. The algorithms implemented in TMAC, such as the coordinate up- date method and operator splitting method, are scalable as they decompose a problem into simple subproblems. These algorithms can run in a multi-threaded fashion, either synchronously or asynchronously, to take advantages of all the cores available. TMAC architecture mimics how a scientist writes down an optimization algorithm. Therefore, it is easy for one to obtain a new algorithm by making simple modifications such as adding a new operator and adding a new splitting, while maintaining the multicore parallelism and other features. The package is available at https://github.com/uclaopt/TMAC.

preprint2015arXiv

A globally convergent algorithm for nonconvex optimization based on block coordinate update

Nonconvex optimization problems arise in many areas of computational science and engineering and are (approximately) solved by a variety of algorithms. Existing algorithms usually only have local convergence or subsequence convergence of their iterates. We propose an algorithm for a generic nonconvex optimization formulation, establish the convergence of its whole iterate sequence to a critical point along with a rate of convergence, and numerically demonstrate its efficiency. Specially, we consider the problem of minimizing a nonconvex objective function. Its variables can be treated as one block or be partitioned into multiple disjoint blocks. It is assumed that each non-differentiable component of the objective function or each constraint applies to one block of variables. The differentiable components of the objective function, however, can apply to one or multiple blocks of variables together. Our algorithm updates one block of variables at time by minimizing a certain prox-linear surrogate. The order of update can be either deterministic or randomly shuffled in each round. We obtain the convergence of the whole iterate sequence under fairly loose conditions including, in particular, the Kurdyka-Łojasiewicz (KL) condition, which is satisfied by a broad class of nonconvex/nonsmooth applications. We apply our convergence result to the coordinate descent method for non-convex regularized linear regression and also a modified rank-one residue iteration method for nonnegative matrix factorization. We show that both the methods have global convergence. Numerically, we test our algorithm on nonnegative matrix and tensor factorization problems, with random shuffling enable to avoid local solutions.

preprint2015arXiv

A Three-Operator Splitting Scheme and its Optimization Applications

Operator splitting schemes have been successfully used in computational sciences to reduce complex problems into a series of simpler subproblems. Since 1950s, these schemes have been widely used to solve problems in PDE and control. Recently, large-scale optimization problems in machine learning, signal processing, and imaging have created a resurgence of interest in operator-splitting based algorithms because they often have simple descriptions, are easy to code, and have (nearly) state-of-the-art performance for large-scale optimization problems. Although operator splitting techniques were introduced over 60 years ago, their importance has significantly increased in the past decade. This paper introduces a new operator-splitting scheme for solving a variety of problems that are reduced to a monotone inclusion of three operators, one of which is cocoercive. Our scheme is very simple, and it does not reduce to any existing splitting schemes. Our scheme recovers the existing forward-backward, Douglas-Rachford, and forward-Douglas-Rachford splitting schemes as special cases. Our new splitting scheme leads to a set of new and simple algorithms for a variety of other problems, including the 3-set split feasibility problems, 3-objective minimization problems, and doubly and multiple regularization problems, as well as the simplest extension of the classic ADMM from 2 to 3 blocks of variables. In addition to the basic scheme, we introduce several modifications and enhancements that can improve the convergence rate in practice, including an acceleration that achieves the optimal rate of convergence for strongly monotone inclusions. Finally, we evaluate the algorithm on several applications.

preprint2015arXiv

Block stochastic gradient iteration for convex and nonconvex optimization

The stochastic gradient (SG) method can minimize an objective function composed of a large number of differentiable functions, or solve a stochastic optimization problem, to a moderate accuracy. The block coordinate descent/update (BCD) method, on the other hand, handles problems with multiple blocks of variables by updating them one at a time; when the blocks of variables are easier to update individually than together, BCD has a lower per-iteration cost. This paper introduces a method that combines the features of SG and BCD for problems with many components in the objective and with multiple (blocks of) variables. Specifically, a block stochastic gradient (BSG) method is proposed for solving both convex and nonconvex programs. At each iteration, BSG approximates the gradient of the differentiable part of the objective by randomly sampling a small set of data or sampling a few functions from the sum term in the objective, and then, using those samples, it updates all the blocks of variables in either a deterministic or a randomly shuffled order. Its convergence for both convex and nonconvex cases are established in different senses. In the convex case, the proposed method has the same order of convergence rate as the SG method. In the nonconvex case, its convergence is established in terms of the expected violation of a first-order optimality condition. The proposed method was numerically tested on problems including stochastic least squares and logistic regression, which are convex, as well as low-rank tensor recovery and bilinear logistic regression, which are nonconvex.

preprint2015arXiv

Convergence rate analysis of several splitting schemes

Splitting schemes are a class of powerful algorithms that solve complicated monotone inclusions and convex optimization problems that are built from many simpler pieces. They give rise to algorithms in which the simple pieces of the decomposition are processed individually. This leads to easily implementable and highly parallelizable algorithms, which often obtain nearly state-of-the-art performance. In the first part of this paper, we analyze the convergence rates of several general splitting algorithms and provide examples to prove the tightness of our results. The most general rates are proved for the \emph{fixed-point residual} (FPR) of the Krasnosel'skiĭ-Mann (KM) iteration of nonexpansive operators, where we improve the known big-$O$ rate to little-$o$. We show the tightness of this result and improve it in several special cases. In the second part of this paper, we use the convergence rates derived for the KM iteration to analyze the \emph{objective error} convergence rates for the Douglas-Rachford (DRS), Peaceman-Rachford (PRS), and ADMM splitting algorithms under general convexity assumptions. We show, by way of example, that the rates obtained for these algorithms are tight in all cases and obtain the surprising statement: The DRS algorithm is nearly as fast as the proximal point algorithm (PPA) in the ergodic sense and nearly as slow as the subgradient method in the nonergodic sense. Finally, we provide several applications of our result to feasibility problems, model fitting, and distributed optimization. Our analysis is self-contained, and most results are deduced from a basic lemma that derives convergence rates for summable sequences, a simple diagram that decomposes each relaxed PRS iteration, and fundamental inequalities that relate the FPR to objective error.

preprint2015arXiv

Democratic Representations

Minimization of the $\ell_{\infty}$ (or maximum) norm subject to a constraint that imposes consistency to an underdetermined system of linear equations finds use in a large number of practical applications, including vector quantization, approximate nearest neighbor search, peak-to-average power ratio (or "crest factor") reduction in communication systems, and peak force minimization in robotics and control. This paper analyzes the fundamental properties of signal representations obtained by solving such a convex optimization problem. We develop bounds on the maximum magnitude of such representations using the uncertainty principle (UP) introduced by Lyubarskii and Vershynin, and study the efficacy of $\ell_{\infty}$-norm-based dynamic range reduction. Our analysis shows that matrices satisfying the UP, such as randomly subsampled Fourier or i.i.d. Gaussian matrices, enable the computation of what we call democratic representations, whose entries all have small and similar magnitude, as well as low dynamic range. To compute democratic representations at low computational complexity, we present two new, efficient convex optimization algorithms. We finally demonstrate the efficacy of democratic representations for dynamic range reduction in a DVB-T2-based broadcast system.

preprint2015arXiv

Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions

Splitting schemes are a class of powerful algorithms that solve complicated monotone inclusion and convex optimization problems that are built from many simpler pieces. They give rise to algorithms in which the simple pieces of the decomposition are processed individually. This leads to easily implementable and highly parallelizable algorithms, which often obtain nearly state-of-the-art performance. In this paper, we provide a comprehensive convergence rate analysis of the Douglas-Rachford splitting (DRS), Peaceman-Rachford splitting (PRS), and alternating direction method of multipliers (ADMM) algorithms under various regularity assumptions including strong convexity, Lipschitz differentiability, and bounded linear regularity. The main consequence of this work is that relaxed PRS and ADMM automatically adapt to the regularity of the problem and achieve convergence rates that improve upon the (tight) worst-case rates that hold in the absence of such regularity. All of the results are obtained using simple techniques.

preprint2015arXiv

On the Convergence of Decentralized Gradient Descent

Consider the consensus problem of minimizing $f(x)=\sum_{i=1}^n f_i(x)$ where each $f_i$ is only known to one individual agent $i$ out of a connected network of $n$ agents. All the agents shall collaboratively solve this problem and obtain the solution subject to data exchanges restricted to between neighboring agents. Such algorithms avoid the need of a fusion center, offer better network load balance, and improve data privacy. We study the decentralized gradient descent method in which each agent $i$ updates its variable $x_{(i)}$, which is a local approximate to the unknown variable $x$, by combining the average of its neighbors' with the negative gradient step $-α\nabla f_i(x_{(i)})$. The iteration is $$x_{(i)}(k+1) \gets \sum_{\text{neighbor} j \text{of} i} w_{ij} x_{(j)}(k) - α\nabla f_i(x_{(i)}(k)),\quad\text{for each agent} i,$$ where the averaging coefficients form a symmetric doubly stochastic matrix $W=[w_{ij}] \in \mathbb{R}^{n \times n}$. We analyze the convergence of this iteration and derive its converge rate, assuming that each $f_i$ is proper closed convex and lower bounded, $\nabla f_i$ is Lipschitz continuous with constant $L_{f_i}$, and stepsize $α$ is fixed. Provided that $α< O(1/L_h)$ where $L_h=\max_i\{L_{f_i}\}$, the objective error at the averaged solution, $f(\frac{1}{n}\sum_i x_{(i)}(k))-f^*$, reduces at a speed of $O(1/k)$ until it reaches $O(α)$. If $f_i$ are further (restricted) strongly convex, then both $\frac{1}{n}\sum_i x_{(i)}(k)$ and each $x_{(i)}(k)$ converge to the global minimizer $x^*$ at a linear rate until reaching an $O(α)$-neighborhood of $x^*$. We also develop an iteration for decentralized basis pursuit and establish its linear convergence to an $O(α)$-neighborhood of the true unknown sparse signal.

preprint2015arXiv

Optimal Sparse Kernel Learning for Hyperspectral Anomaly Detection

In this paper, a novel framework of sparse kernel learning for Support Vector Data Description (SVDD) based anomaly detection is presented. In this work, optimal sparse feature selection for anomaly detection is first modeled as a Mixed Integer Programming (MIP) problem. Due to the prohibitively high computational complexity of the MIP, it is relaxed into a Quadratically Constrained Linear Programming (QCLP) problem. The QCLP problem can then be practically solved by using an iterative optimization method, in which multiple subsets of features are iteratively found as opposed to a single subset. The QCLP-based iterative optimization problem is solved in a finite space called the \emph{Empirical Kernel Feature Space} (EKFS) instead of in the input space or \emph{Reproducing Kernel Hilbert Space} (RKHS). This is possible because of the fact that the geometrical properties of the EKFS and the corresponding RKHS remain the same. Now, an explicit nonlinear exploitation of the data in a finite EKFS is achievable, which results in optimal feature ranking. Experimental results based on a hyperspectral image show that the proposed method can provide improved performance over the current state-of-the-art techniques.

preprint2015arXiv

Parallel matrix factorization for low-rank tensor completion

Higher-order low-rank tensors naturally arise in many applications including hyperspectral data recovery, video inpainting, seismic data recon- struction, and so on. We propose a new model to recover a low-rank tensor by simultaneously performing low-rank matrix factorizations to the all-mode ma- tricizations of the underlying tensor. An alternating minimization algorithm is applied to solve the model, along with two adaptive rank-adjusting strategies when the exact rank is not known. Phase transition plots reveal that our algorithm can recover a variety of synthetic low-rank tensors from significantly fewer samples than the compared methods, which include a matrix completion method applied to tensor recovery and two state-of-the-art tensor completion methods. Further tests on real- world data show similar advantages. Although our model is non-convex, our algorithm performs consistently throughout the tests and give better results than the compared methods, some of which are based on convex models. In addition, the global convergence of our algorithm can be established in the sense that the gradient of Lagrangian function converges to zero.

preprint2015arXiv

Self Equivalence of the Alternating Direction Method of Multipliers

The alternating direction method of multipliers (ADM or ADMM) breaks a complex optimization problem into much simpler subproblems. The ADM algorithms are typically short and easy to implement yet exhibit (nearly) state-of-the-art performance for large-scale optimization problems. To apply ADM, we first formulate a given problem into the "ADM-ready" form, so the final algorithm depends on the formulation. A problem like $\mbox{minimize}_\mathbf{x} u(\mathbf{x}) + v(\mathbf{C}\mathbf{x})$ has six different "ADM-ready" formulations. They can be in the primal or dual forms, and they differ by how dummy variables are introduced. To each "ADM-ready" formulation, ADM can be applied in two different orders depending on how the primal variables are updated. Finally, we get twelve different ADM algorithms! How do they compare to each other? Which algorithm should one choose? In this paper, we show that many of the different ways of applying ADM are equivalent. Specifically, we show that ADM applied to a primal formulation is equivalent to ADM applied to its Lagrange dual; ADM is equivalent to a primal-dual algorithm applied to the saddle-point formulation of the same problem. These results are surprising since the primal and dual variables in ADM are seemingly treated very differently, and some previous work exhibit preferences in one over the other on specific problems. In addition, when one of the two objective functions is quadratic, possibly subject to an affine constraint, we show that swapping the update order of the two primal variables in ADM gives the same algorithm. These results identify the few truly different ADM algorithms for a problem, which generally have different forms of subproblems from which it is easy to pick one with the most computationally friendly subproblems.

preprint2014arXiv

A fast patch-dictionary method for whole image recovery

Various algorithms have been proposed for dictionary learning. Among those for image processing, many use image patches to form dictionaries. This paper focuses on whole-image recovery from corrupted linear measurements. We address the open issue of representing an image by overlapping patches: the overlapping leads to an excessive number of dictionary coefficients to determine. With very few exceptions, this issue has limited the applications of image-patch methods to the local kind of tasks such as denoising, inpainting, cartoon-texture decomposition, super-resolution, and image deblurring, for which one can process a few patches at a time. Our focus is global imaging tasks such as compressive sensing and medical image recovery, where the whole image is encoded together, making it either impossible or very ineffective to update a few patches at a time. Our strategy is to divide the sparse recovery into multiple subproblems, each of which handles a subset of non-overlapping patches, and then the results of the subproblems are averaged to yield the final recovery. This simple strategy is surprisingly effective in terms of both quality and speed. In addition, we accelerate computation of the learned dictionary by applying a recent block proximal-gradient method, which not only has a lower per-iteration complexity but also takes fewer iterations to converge, compared to the current state-of-the-art. We also establish that our algorithm globally converges to a stationary point. Numerical results on synthetic data demonstrate that our algorithm can recover a more faithful dictionary than two state-of-the-art methods. Combining our whole-image recovery and dictionary-learning methods, we numerically simulate image inpainting, compressive sensing recovery, and deblurring. Our recovery is more faithful than those of a total variation method and a method based on overlapping patches.

preprint2014arXiv

EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization

Recently, there have been growing interests in solving consensus optimization problems in a multi-agent network. In this paper, we develop a decentralized algorithm for the consensus optimization problem $$\min\limits_{x\in\mathbb{R}^p}~\bar{f}(x)=\frac{1}{n}\sum\limits_{i=1}^n f_i(x),$$ which is defined over a connected network of $n$ agents, where each function $f_i$ is held privately by agent $i$ and encodes the agent's data and objective. All the agents shall collaboratively find the minimizer while each agent can only communicate with its neighbors. Such a computation scheme avoids a data fusion center or long-distance communication and offers better load balance to the network. This paper proposes a novel decentralized EXact firsT-ordeR Algorithm (abbreviated as EXTRA) to solve the consensus optimization problem. "exact" means that it can converge to the exact solution. EXTRA can use a fixed large step size, {which is independent of the network size}, and has synchronized iterations. The local variable of every agent $i$ converges uniformly and consensually to an exact minimizer of $\bar{f}$. In contrast, the well-known decentralized gradient descent (DGD) method must use diminishing step sizes in order to converge to an exact minimizer. EXTRA and DGD have the same choice of mixing matrices and similar per-iteration complexity. EXTRA, however, uses the gradients of last two iterates, unlike DGD which uses just that of last iterate. EXTRA has the best known convergence rates among the existing first-order decentralized algorithms. Specifically, if $f_i$'s are convex and have Lipschitz continuous gradients, EXTRA has an ergodic convergence rate $O(\frac{1}{k})$ in terms of the first-order optimality residual. If $\bar{f}$ is also restricted strongly convex, EXTRA converges to an optimal solution at a linear rate $O(C^{-k})$ for some constant $C>1$.

preprint2014arXiv

On the Linear Convergence of the ADMM in Decentralized Consensus Optimization

In decentralized consensus optimization, a connected network of agents collaboratively minimize the sum of their local objective functions over a common decision variable, where their information exchange is restricted between the neighbors. To this end, one can first obtain a problem reformulation and then apply the alternating direction method of multipliers (ADMM). The method applies iterative computation at the individual agents and information exchange between the neighbors. This approach has been observed to converge quickly and deemed powerful. This paper establishes its linear convergence rate for decentralized consensus optimization problem with strongly convex local objective functions. The theoretical convergence rate is explicitly given in terms of the network topology, the properties of local objective functions, and the algorithm parameter. This result is not only a performance guarantee but also a guideline toward accelerating the ADMM convergence.

preprint2014arXiv

One condition for solution uniqueness and robustness of both l1-synthesis and l1-analysis minimizations

The $\ell_1$-synthesis model and the $\ell_1$-analysis model recover structured signals from their undersampled measurements. The solution of former is a sparse sum of dictionary atoms, and that of the latter makes sparse correlations with dictionary atoms. This paper addresses the question: when can we trust these models to recover specific signals? We answer the question with a condition that is both necessary and sufficient to guarantee the recovery to be unique and exact and, in presence of measurement noise, to be robust. The condition is one--for--all in the sense that it applies to both of the $\ell_1$-synthesis and $\ell_1$-analysis models, to both of their constrained and unconstrained formulations, and to both the exact recovery and robust recovery cases. Furthermore, a convex infinity--norm program is introduced for numerically verifying the condition. A comprehensive comparison with related existing conditions are included.

preprint2014arXiv

Parallel Multi-Block ADMM with o(1/k) Convergence

This paper introduces a parallel and distributed extension to the alternating direction method of multipliers (ADMM) for solving convex problem: minimize $\sum_{i=1}^N f_i(x_i)$ subject to $\sum_{i=1}^N A_i x_i=c, x_i\in \mathcal{X}_i$. The algorithm decomposes the original problem into N smaller subproblems and solves them in parallel at each iteration. This Jacobian-type algorithm is well suited for distributed computing and is particularly attractive for solving certain large-scale problems. This paper introduces a few novel results. Firstly, it shows that extending ADMM straightforwardly from the classic Gauss-Seidel setting to the Jacobian setting, from 2 blocks to N blocks, will preserve convergence if matrices $A_i$ are mutually near-orthogonal and have full column-rank. Secondly, for general matrices $A_i$, this paper proposes to add proximal terms of different kinds to the N subproblems so that the subproblems can be solved in flexible and efficient ways and the algorithm converges globally at a rate of o(1/k). Thirdly, a simple technique is introduced to improve some existing convergence rates from O(1/k) to o(1/k). In practice, some conditions in our convergence theorems are conservative. Therefore, we introduce a strategy for dynamically tuning the parameters in the algorithm, leading to substantial acceleration of the convergence in practice. Numerical results are presented to demonstrate the efficiency of the proposed method in comparison with several existing parallel algorithms. We implemented our algorithm on Amazon EC2, an on-demand public computing cloud, and report its performance on very large-scale basis pursuit problems with distributed data.

preprint2014arXiv

Video Compressive Sensing for Dynamic MRI

We present a video compressive sensing framework, termed kt-CSLDS, to accelerate the image acquisition process of dynamic magnetic resonance imaging (MRI). We are inspired by a state-of-the-art model for video compressive sensing that utilizes a linear dynamical system (LDS) to model the motion manifold. Given compressive measurements, the state sequence of an LDS can be first estimated using system identification techniques. We then reconstruct the observation matrix using a joint structured sparsity assumption. In particular, we minimize an objective function with a mixture of wavelet sparsity and joint sparsity within the observation matrix. We derive an efficient convex optimization algorithm through alternating direction method of multipliers (ADMM), and provide a theoretical guarantee for global convergence. We demonstrate the performance of our approach for video compressive sensing, in terms of reconstruction accuracy. We also investigate the impact of various sampling strategies. We apply this framework to accelerate the acquisition process of dynamic MRI and show it achieves the best reconstruction accuracy with the least computational time compared with existing algorithms in the literature.

preprint2013arXiv

A dual algorithm for a class of augmented convex models

Convex optimization models find interesting applications, especially in signal/image processing and compressive sensing. We study some augmented convex models, which are perturbed by strongly convex functions, and propose a dual gradient algorithm. The proposed algorithm includes the linearized Bregman algorithm and the singular value thresholding algorithm as special cases. Based on fundamental properties of proximal operators, we present a concise approach to establish the convergence of both primal and dual sequences, improving the results in the existing literature.

preprint2013arXiv

Augmented L1 and Nuclear-Norm Models with a Globally Linearly Convergent Algorithm

This paper studies the long-existing idea of adding a nice smooth function to "smooth" a non-differentiable objective function in the context of sparse optimization, in particular, the minimization of $||x||_1+1/(2α)||x||_2^2$, where $x$ is a vector, as well as the minimization of $||X||_*+1/(2α)||X||_F^2$, where $X$ is a matrix and $||X||_*$ and $||X||_F$ are the nuclear and Frobenius norms of $X$, respectively. We show that they can efficiently recover sparse vectors and low-rank matrices. In particular, they enjoy exact and stable recovery guarantees similar to those known for minimizing $||x||_1$ and $||X||_*$ under the conditions on the sensing operator such as its null-space property, restricted isometry property, spherical section property, or RIPless property. To recover a (nearly) sparse vector $x^0$, minimizing $||x||_1+1/(2α)||x||_2^2$ returns (nearly) the same solution as minimizing $||x||_1$ almost whenever $α\ge 10||x^0||_\infty$. The same relation also holds between minimizing $||X||_*+1/(2α)||X||_F^2$ and minimizing $||X||_*$ for recovering a (nearly) low-rank matrix $X^0$, if $α\ge 10||X^0||_2$. Furthermore, we show that the linearized Bregman algorithm for minimizing $||x||_1+1/(2α)||x||_2^2$ subject to $Ax=b$ enjoys global linear convergence as long as a nonzero solution exists, and we give an explicit rate of convergence. The convergence property does not require a solution solution or any properties on $A$. To our knowledge, this is the best known global convergence result for first-order sparse optimization algorithms.

preprint2013arXiv

Gradient methods for convex minimization: better rates under weaker conditions

The convergence behavior of gradient methods for minimizing convex differentiable functions is one of the core questions in convex optimization. This paper shows that their well-known complexities can be achieved under conditions weaker than the commonly accepted ones. We relax the common gradient Lipschitz-continuity condition and strong convexity condition to ones that hold only over certain line segments. Specifically, we establish complexities $O(\frac{R}ε)$ and $O(\sqrt{\frac{R}ε})$ for the ordinary and accelerate gradient methods, respectively, assuming that $\nabla f$ is Lipschitz continuous with constant $R$ over the line segment joining $x$ and $x-\frac{1}{R}\nabla f$ for each $x\in\dom f$. Then we improve them to $O(\frac{R}ν\log(\frac{1}ε))$ and $O(\sqrt{\frac{R}ν}\log(\frac{1}ε))$ for function $f$ that also satisfies the secant inequality $\ < \nabla f(x), x- x^*\ > \ge ν\|x-x^*\|^2$ for each $x\in \dom f$ and its projection $x^*$ to the minimizer set of $f$. The secant condition is also shown to be necessary for the geometric decay of solution error. Not only are the relaxed conditions met by more functions, the restrictions give smaller $R$ and larger $ν$ than they are without the restrictions and thus lead to better complexity bounds. We apply these results to sparse optimization and demonstrate a faster algorithm.

preprint2012arXiv

Extracting respiratory signals from thoracic cone beam CT projections

Patient respiratory signal associated with the cone beam CT (CBCT) projections is important for lung cancer radiotherapy. In contrast to monitoring an external surrogate of respiration, such signal can be extracted directly from the CBCT projections. In this paper, we propose a novel local principle component analysis (LPCA) method to extract the respiratory signal by distinguishing the respiration motion-induced content change from the gantry rotation-induced content change in the CBCT projections. The LPCA method is evaluated by comparing with three state-of-the-art projection-based methods, namely, the Amsterdam Shroud (AS) method, the intensity analysis (IA) method, and the Fourier-transform based phase analysis (FT-p) method. The clinical CBCT projection data of eight patients, acquired under various clinical scenarios, were used to investigate the performance of each method. We found that the proposed LPCA method has demonstrated the best overall performance for cases tested and thus is a promising technique for extracting respiratory signal. We also identified the applicability of each existing method.

preprint2012arXiv

Necessary and sufficient conditions of solution uniqueness in $\ell_1$ minimization

This paper shows that the solutions to various convex $\ell_1$ minimization problems are \emph{unique} if and only if a common set of conditions are satisfied. This result applies broadly to the basis pursuit model, basis pursuit denoising model, Lasso model, as well as other $\ell_1$ models that either minimize $f(Ax-b)$ or impose the constraint $f(Ax-b)\leqσ$, where $f$ is a strictly convex function. For these models, this paper proves that, given a solution $x^*$ and defining $I=\supp(x^*)$ and $s=\sign(x^*_I)$, $x^*$ is the unique solution if and only if $A_I$ has full column rank and there exists $y$ such that $A_I^Ty=s$ and $|a_i^Ty|_\infty<1$ for $i\not\in I$. This condition is previously known to be sufficient for the basis pursuit model to have a unique solution supported on $I$. Indeed, it is also necessary, and applies to a variety of other $\ell_1$ models. The paper also discusses ways to recognize unique solutions and verify the uniqueness conditions numerically.

preprint2011arXiv

An Alternating Direction Algorithm for Matrix Completion with Nonnegative Factors

This paper introduces an algorithm for the nonnegative matrix factorization-and-completion problem, which aims to find nonnegative low-rank matrices X and Y so that the product XY approximates a nonnegative data matrix M whose elements are partially known (to a certain accuracy). This problem aggregates two existing problems: (i) nonnegative matrix factorization where all entries of M are given, and (ii) low-rank matrix completion where nonnegativity is not required. By taking the advantages of both nonnegativity and low-rankness, one can generally obtain superior results than those of just using one of the two properties. We propose to solve the non-convex constrained least-squares problem using an algorithm based on the classic alternating direction augmented Lagrangian method. Preliminary convergence properties of the algorithm and numerical simulation results are presented. Compared to a recent algorithm for nonnegative matrix factorization, the proposed algorithm produces factorizations of similar quality using only about half of the matrix entries. On tasks of recovering incomplete grayscale and hyperspectral images, the proposed algorithm yields overall better qualities than those produced by two recent matrix-completion algorithms that do not exploit nonnegativity.

preprint2011arXiv

Fast Linearized Bregman Iteration for Compressive Sensing and Sparse Denoising

We propose and analyze an extremely fast, efficient, and simple method for solving the problem:min{parallel to u parallel to(1) : Au = f, u is an element of R-n}.This method was first described in [J. Darbon and S. Osher, preprint, 2007], with more details in [W. Yin, S. Osher, D. Goldfarb and J. Darbon, SIAM J. Imaging Sciences, 1(1), 143-168, 2008] and rigorous theory given in [J. Cai, S. Osher and Z. Shen, Math. Comp., to appear, 2008, see also UCLA CAM Report 08-06] and [J. Cai, S. Osher and Z. Shen, UCLA CAM Report, 08-52, 2008]. The motivation was compressive sensing, which now has a vast and exciting history, which seems to have started with Candes, et. al. [E. Candes, J. Romberg and T. Tao, 52(2), 489-509, 2006] and Donoho, [D. L. Donoho, IEEE Trans. Inform. Theory, 52, 1289-1306, 2006]. See [W. Yin, S. Osher, D. Goldfarb and J. Darbon, SIAM J. Imaging Sciences 1(1), 143-168, 2008] and [J. Cai, S. Osher and Z. Shen, Math. Comp., to appear, 2008, see also UCLA CAM Report, 08-06] and [J. Cai, S. Osher and Z. Shen, UCLA CAM Report, 08-52, 2008] for a large set of references. Our method introduces an improvement called "kicking" of the very efficient method of [J. Darbon and S. Osher, preprint, 2007] and [W. Yin, S. Osher, D. Goldfarb and J. Darbon, SIAM J. Imaging Sciences, 1(1), 143-168, 2008] and also applies it to the problem of denoising of undersampled signals. The use of Bregman iteration for denoising of images began in [S. Osher, M. Burger, D. Goldfarb, J. Xu and W. Yin, Multiscale Model. Simul, 4(2), 460-489, 2005] and led to improved results for total variation based methods. Here we apply it to denoise signals, especially essentially sparse signals, which might even be undersampled.

preprint2010arXiv

Collaborative Spectrum Sensing from Sparse Observations in Cognitive Radio Networks

Spectrum sensing, which aims at detecting spectrum holes, is the precondition for the implementation of cognitive radio (CR). Collaborative spectrum sensing among the cognitive radio nodes is expected to improve the ability of checking complete spectrum usage. Due to hardware limitations, each cognitive radio node can only sense a relatively narrow band of radio spectrum. Consequently, the available channel sensing information is far from being sufficient for precisely recognizing the wide range of unoccupied channels. Aiming at breaking this bottleneck, we propose to apply matrix completion and joint sparsity recovery to reduce sensing and transmitting requirements and improve sensing results. Specifically, equipped with a frequency selective filter, each cognitive radio node senses linear combinations of multiple channel information and reports them to the fusion center, where occupied channels are then decoded from the reports by using novel matrix completion and joint sparsity recovery algorithms. As a result, the number of reports sent from the CRs to the fusion center is significantly reduced. We propose two decoding approaches, one based on matrix completion and the other based on joint sparsity recovery, both of which allow exact recovery from incomplete reports. The numerical results validate the effectiveness and robustness of our approaches. In particular, in small-scale networks, the matrix completion approach achieves exact channel detection with a number of samples no more than 50% of the number of channels in the network, while joint sparsity recovery achieves similar performance in large-scale networks.

preprint2010arXiv

Collaborative Spectrum Sensing from Sparse Observations Using Matrix Completion for Cognitive Radio Networks

In cognitive radio, spectrum sensing is a key component to detect spectrum holes (i.e., channels not used by any primary users). Collaborative spectrum sensing among the cognitive radio nodes is expected to improve the ability of checking complete spectrum usage states. Unfortunately, due to power limitation and channel fading, available channel sensing information is far from being sufficient to tell the unoccupied channels directly. Aiming at breaking this bottleneck, we apply recent matrix completion techniques to greatly reduce the sensing information needed. We formulate the collaborative sensing problem as a matrix completion subproblem and a joint-sparsity reconstruction subproblem. Results of numerical simulations that validated the effectiveness and robustness of the proposed approach are presented. In particular, in noiseless cases, when number of primary user is small, exact detection was obtained with no more than 8% of the complete sensing information, whilst as number of primary user increases, to achieve a detection rate of 95.55%, the required information percentage was merely 16.8%.

preprint2010arXiv

Sparse Signal Reconstruction via Iterative Support Detection

We present a novel sparse signal reconstruction method "ISD", aiming to achieve fast reconstruction and a reduced requirement on the number of measurements compared to the classical l_1 minimization approach. ISD addresses failed reconstructions of l_1 minimization due to insufficient measurements. It estimates a support set I from a current reconstruction and obtains a new reconstruction by solving the minimization problem \min{\sum_{i\not\in I}|x_i|:Ax=b}, and it iterates these two steps for a small number of times. ISD differs from the orthogonal matching pursuit (OMP) method, as well as its variants, because (i) the index set I in ISD is not necessarily nested or increasing and (ii) the minimization problem above updates all the components of x at the same time. We generalize the Null Space Property to Truncated Null Space Property and present our analysis of ISD based on the latter. We introduce an efficient implementation of ISD, called threshold--ISD, for recovering signals with fast decaying distributions of nonzeros from compressive sensing measurements. Numerical experiments show that threshold--ISD has significant advantages over the classical l_1 minimization approach, as well as two state--of--the--art algorithms: the iterative reweighted l_1 minimization algorithm (IRL1) and the iterative reweighted least--squares algorithm (IRLS). MATLAB code is available for download from http://www.caam.rice.edu/~optimization/L1/ISD/.

preprint2009arXiv

A Matlab Implementation of a Flat Norm Motivated Polygonal Edge Matching Method using a Decomposition of Boundary into Four 1-Dimensional Currents

We describe and provide code and examples for a polygonal edge matching method.

Wotao Yin

What is connected

Connect this record

See the researcher in context

Building this map preview

49 published item(s)

Subsampled Ensemble Can Improve Generalization Tail Exponentially

Decomposition Methods for Global Solutions of Mixed-Integer Linear Programs

A One-bit, Comparison-Based Gradient Estimator

A Single-Timescale Method for Stochastic Bilevel Optimization

From the simplex to the sphere: Faster constrained optimization using the Hadamard parametrization

Hybrid Federated Learning: Algorithms and Implementation

A mean field game inverse problem

An Improved Analysis of Stochastic Gradient Descent with Momentum

AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity

CADA: Communication-Adaptive Distributed Adam

Decentralized Accelerated Gradient Methods With Increasing Penalty Parameters

Decentralized Learning with Lazy and Approximate Dual Gradients

How Does an Approximate Model Help in Reinforcement Learning?

LASG: Lazily Aggregated Stochastic Gradients for Communication-Efficient Distributed Learning

Tight Coefficients of Averaged Operators via Scaled Relative Graph

VAFL: a Method of Vertical Asynchronous Federated Learning

On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates

Coordinate Friendly Structures, Algorithms and Applications

Expander Graph and Communication-Efficient Decentralized Optimization

Sparse Recovery via Differential Inclusions

TMAC: A Toolbox of Modern Async-Parallel, Coordinate, Splitting, and Stochastic Methods

A globally convergent algorithm for nonconvex optimization based on block coordinate update

A Three-Operator Splitting Scheme and its Optimization Applications

Block stochastic gradient iteration for convex and nonconvex optimization

Convergence rate analysis of several splitting schemes

Democratic Representations

Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions

On the Convergence of Decentralized Gradient Descent

Optimal Sparse Kernel Learning for Hyperspectral Anomaly Detection

Parallel matrix factorization for low-rank tensor completion

Self Equivalence of the Alternating Direction Method of Multipliers

A fast patch-dictionary method for whole image recovery

EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization

On the Linear Convergence of the ADMM in Decentralized Consensus Optimization

One condition for solution uniqueness and robustness of both l1-synthesis and l1-analysis minimizations

Parallel Multi-Block ADMM with o(1/k) Convergence

Video Compressive Sensing for Dynamic MRI

A dual algorithm for a class of augmented convex models

Augmented L1 and Nuclear-Norm Models with a Globally Linearly Convergent Algorithm

Gradient methods for convex minimization: better rates under weaker conditions

Extracting respiratory signals from thoracic cone beam CT projections

Necessary and sufficient conditions of solution uniqueness in $\ell_1$ minimization

An Alternating Direction Algorithm for Matrix Completion with Nonnegative Factors

Fast Linearized Bregman Iteration for Compressive Sensing and Sparse Denoising

Collaborative Spectrum Sensing from Sparse Observations in Cognitive Radio Networks

Collaborative Spectrum Sensing from Sparse Observations Using Matrix Completion for Cognitive Radio Networks

Sparse Signal Reconstruction via Iterative Support Detection

A Matlab Implementation of a Flat Norm Motivated Polygonal Edge Matching Method using a Decomposition of Boundary into Four 1-Dimensional Currents