Source author record

Quoc Tran-Dinh

Quoc Tran-Dinh appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Information Theory math.IT Applications Cryptography and Security

Catalog footprint

What is connected

24works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way -- and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central "aggregator" which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We improve state-of-the-art literature and show $O(\sqrt{K}$) communication rounds for heterogeneous data for strongly convex problems, where $K$ is the total number of gradient computations across all local compute nodes. For our scheme, we prove a \textit{tight} and novel non-trivial convergence analysis for strongly convex problems for {\em heterogeneous} data which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased (i.e., heterogeneous) and unbiased local data sets.

preprint2021arXiv

Sieve-SDP: a simple facial reduction algorithm to preprocess semidefinite programs

We introduce Sieve-SDP, a simple facial reduction algorithm to preprocess semidefinite programs (SDPs). Sieve-SDP inspects the constraints of the problem to detect lack of strict feasibility, deletes redundant rows and columns, and reduces the size of the variable matrix. It often detects infeasibility. It does not rely on any optimization solver: the only subroutine it needs is Cholesky factorization, hence it can be implemented in a few lines of code in machine precision. We present extensive computational results on several problem collections from the literature, with many SDPs coming from polynomial optimization.

preprint2020arXiv

A Hybrid Stochastic Optimization Framework for Stochastic Composite Nonconvex Optimization

We introduce a new approach to develop stochastic optimization algorithms for a class of stochastic composite and possibly nonconvex optimization problems. The main idea is to combine two stochastic estimators to create a new hybrid one. We first introduce our hybrid estimator and then investigate its fundamental properties to form a foundational theory for algorithmic development. Next, we apply our theory to develop several variants of stochastic gradient methods to solve both expectation and finite-sum composite optimization problems. Our first algorithm can be viewed as a variant of proximal stochastic gradient methods with a single-loop, but can achieve $\mathcal{O}(σ^3\varepsilon^{-1} + σ\varepsilon^{-3})$-oracle complexity bound, matching the best-known ones from state-of-the-art double-loop algorithms in the literature, where $σ> 0$ is the variance and $\varepsilon$ is a desired accuracy. Then, we consider two different variants of our method: adaptive step-size and restarting schemes that have similar theoretical guarantees as in our first algorithm. We also study two mini-batch variants of the proposed methods. In all cases, we achieve the best-known complexity bounds under standard assumptions. We test our methods on several numerical examples with real datasets and compare them with state-of-the-arts. Our numerical experiments show that the new methods are comparable and, in many cases, outperform their competitors.

preprint2020arXiv

A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

We propose a novel hybrid stochastic policy gradient estimator by combining an unbiased policy gradient estimator, the REINFORCE estimator, with another biased one, an adapted SARAH estimator for policy optimization. The hybrid policy gradient estimator is shown to be biased, but has variance reduced property. Using this estimator, we develop a new Proximal Hybrid Stochastic Policy Gradient Algorithm (ProxHSPGA) to solve a composite policy optimization problem that allows us to handle constraints or regularizers on the policy parameters. We first propose a single-looped algorithm then introduce a more practical restarting variant. We prove that both algorithms can achieve the best-known trajectory complexity $\mathcal{O}\left(\varepsilon^{-3}\right)$ to attain a first-order stationary point for the composite problem which is better than existing REINFORCE/GPOMDP $\mathcal{O}\left(\varepsilon^{-4}\right)$ and SVRPG $\mathcal{O}\left(\varepsilon^{-10/3}\right)$ in the non-composite setting. We evaluate the performance of our algorithm on several well-known examples in reinforcement learning. Numerical results show that our algorithm outperforms two existing methods on these examples. Moreover, the composite settings indeed have some advantages compared to the non-composite ones on certain problems.

preprint2020arXiv

A Newton Frank-Wolfe Method for Constrained Self-Concordant Minimization

We demonstrate how to scalably solve a class of constrained self-concordant minimization problems using linear minimization oracles (LMO) over the constraint set. We prove that the number of LMO calls of our method is nearly the same as that of the Frank-Wolfe method in the L-smooth case. Specifically, our Newton Frank-Wolfe method uses $\mathcal{O}(ε^{-ν})$ LMO's, where $ε$ is the desired accuracy and $ν:= 1 + o(1)$. In addition, we demonstrate how our algorithm can exploit the improved variants of the LMO-based schemes, including away-steps, to attain linear convergence rates. We also provide numerical evidence with portfolio design with the competitive ratio, D-optimal experimental design, and logistic regression with the elastic net where Newton Frank-Wolfe outperforms the state-of-the-art.

preprint2020arXiv

An Optimal Hybrid Variance-Reduced Algorithm for Stochastic Composite Nonconvex Optimization

In this note we propose a new variant of the hybrid variance-reduced proximal gradient method in [7] to solve a common stochastic composite nonconvex optimization problem under standard assumptions. We simply replace the independent unbiased estimator in our hybrid- SARAH estimator introduced in [7] by the stochastic gradient evaluated at the same sample, leading to the identical momentum-SARAH estimator introduced in [2]. This allows us to save one stochastic gradient per iteration compared to [7], and only requires two samples per iteration. Our algorithm is very simple and achieves optimal stochastic oracle complexity bound in terms of stochastic gradient evaluations (up to a constant factor). Our analysis is essentially inspired by [7], but we do not use two different step-sizes.

preprint2020arXiv

Asynchronous Federated Learning with Reduced Number of Rounds and with Differential Privacy from Less Aggregated Gaussian Noise

The feasibility of federated learning is highly constrained by the server-clients infrastructure in terms of network communication. Most newly launched smartphones and IoT devices are equipped with GPUs or sufficient computing hardware to run powerful AI models. However, in case of the original synchronous federated learning, client devices suffer waiting times and regular communication between clients and server is required. This implies more sensitivity to local model training times and irregular or missed updates, hence, less or limited scalability to large numbers of clients and convergence rates measured in real time will suffer. We propose a new algorithm for asynchronous federated learning which eliminates waiting times and reduces overall network communication - we provide rigorous theoretical analysis for strongly convex objective functions and provide simulation results. By adding Gaussian noise we show how our algorithm can be made differentially private -- new theorems show how the aggregated added Gaussian noise is significantly reduced.

preprint2020arXiv

Composite Convex Optimization with Global and Local Inexact Oracles

We introduce new global and local inexact oracle concepts for a wide class of convex functions in composite convex minimization. Such inexact oracles naturally come from primal-dual framework, barrier smoothing, inexact computations of gradients and Hessian, and many other situations. We also provide examples showing that the class of convex functions equipped with the newly inexact second-order oracles is larger than standard self-concordant as well as Lipschitz gradient function classes. Further, we investigate several properties of convex and/or self-concordant functions under the inexact second-order oracles which are useful for algorithm development. Next, we apply our theory to develop inexact proximal Newton-type schemes for minimizing general composite convex minimization problems equipped with such inexact oracles. Our theoretical results consist of new optimization algorithms, accompanied with global convergence guarantees to solve a wide class of composite convex optimization problems. When the first objective term is additionally self-concordant, we establish different local convergence results for our method. In particular, we prove that depending on the choice of accuracy levels of the inexact second-order oracles, we obtain different local convergence rates ranging from $R$-linear and $R$-superlinear to $R$-quadratic. In special cases, where convergence bounds are known, our theory recovers the best known rates. We also apply our settings to derive a new primal-dual method for composite convex minimization problems. Finally, we present some representative numerical examples to illustrate the benefit of our new algorithms.

preprint2020arXiv

Construction and Iteration-Complexity of Primal Sequences in Alternating Minimization Algorithms

We introduce a new weighted averaging scheme using "Fenchel-type" operators to recover primal solutions in the alternating minimization-type algorithm (AMA) for prototype constrained convex optimization. Our approach combines the classical AMA idea in \cite{Tseng1991} and Nesterov's prox-function smoothing technique without requiring the strong convexity of the objective function. We develop a new non-accelerated primal-dual AMA method and estimate its primal convergence rate both on the objective residual and on the feasibility gap. Then, we incorporate Nesterov's accelerated step into this algorithm and obtain a new accelerated primal-dual AMA variant endowed with a rigorous convergence rate guarantee. We show that the worst-case iteration-complexity in this algorithm is optimal (in the sense of first-oder black-box models), without imposing the full strong convexity assumption on the objective.

preprint2020arXiv

Non-Stationary First-Order Primal-Dual Algorithms with Faster Convergence Rates

In this paper, we propose two novel non-stationary first-order primal-dual algorithms to solve nonsmooth composite convex optimization problems. Unlike existing primal-dual schemes where the parameters are often fixed, our methods use pre-defined and dynamic sequences for parameters. We prove that our first algorithm can achieve $\mathcal{O}(1/k)$ convergence rate on the primal-dual gap, and primal and dual objective residuals, where $k$ is the iteration counter. Our rate is on the non-ergodic (i.e., the last iterate) sequence of the primal problem and on the ergodic (i.e., the averaging) sequence of the dual problem, which we call semi-ergodic rate. By modifying the step-size update rule, this rate can be boosted even faster on the primal objective residual. When the problem is strongly convex, we develop a second primal-dual algorithm that exhibits $\mathcal{O}(1/k^2)$ convergence rate on the same three types of guarantees. Again by modifying the step-size update rule, this rate becomes faster on the primal objective residual. Our primal-dual algorithms are the first ones to achieve such fast convergence rate guarantees under mild assumptions compared to existing works, to the best of our knowledge. As byproducts, we apply our algorithms to solve constrained convex optimization problems and prove the same convergence rates on both the objective residuals and the feasibility violation. We still obtain at least $\mathcal{O}(1/k^2)$ rates even when the problem is "semi-strongly" convex. We verify our theoretical results via two well-known numerical examples.

preprint2020arXiv

Stability Analysis of Real-Time Methods for Equality Constrained NMPC

In this paper, a proof of asymptotic stability for the combined system-optimizer dynamics associated with a class of real-time methods for equality constrained nonlinear model predictive control is presented. General Q-linearly convergent online optimization methods are considered and asymptotic stability results are derived for the case where a single iteration of the optimizer is carried out per sampling time. In particular, it is shown that, if the underlying sampling time is sufficiently short, asymptotic stability can be guaranteed. The results constitute an extension to existing attractivity results for the well-known real-time iteration strategy.

preprint2020arXiv

Stochastic Gauss-Newton Algorithms for Nonconvex Compositional Optimization

We develop two new stochastic Gauss-Newton algorithms for solving a class of non-convex stochastic compositional optimization problems frequently arising in practice. We consider both the expectation and finite-sum settings under standard assumptions, and use both classical stochastic and SARAH estimators for approximating function values and Jacobians. In the expectation case, we establish $\mathcal{O}(\varepsilon^{-2})$ iteration-complexity to achieve a stationary point in expectation and estimate the total number of stochastic oracle calls for both function value and its Jacobian, where $\varepsilon$ is a desired accuracy. In the finite sum case, we also estimate $\mathcal{O}(\varepsilon^{-2})$ iteration-complexity and the total oracle calls with high probability. To our best knowledge, this is the first time such global stochastic oracle complexity is established for stochastic Gauss-Newton methods. Finally, we illustrate our theoretical results via two numerical examples on both synthetic and real datasets.

preprint2016arXiv

A simple preprocessing algorithm for semidefinite programming

We propose a very simple preprocessing algorithm for semidefinite programming. Our algorithm inspects the constraints of the problem, deletes redundant rows and columns in the constraints, and reduces the size of the variable matrix. It often detects infeasibility. Our algorithm does not rely on any optimization solver: the only subroutine it needs is Cholesky factorization, hence it can be implemented with a few lines of code in machine precision. We present computational results on a set of problems arising mostly from polynomial optimization.

preprint2016arXiv

A single-phase, proximal path-following framework

We propose a new proximal, path-following framework for a class of constrained convex problems. We consider settings where the nonlinear---and possibly non-smooth---objective part is endowed with a proximity operator, and the constraint set is equipped with a self-concordant barrier. Our approach relies on the following two main ideas. First, we re-parameterize the optimality condition as an auxiliary problem, such that a good initial point is available; by doing so, a family of alternative paths towards the optimum is generated. Second, we combine the proximal operator with path-following ideas to design a single-phase, proximal, path-following algorithm. Our method has several advantages. First, it allows handling non-smooth objectives via proximal operators; this avoids lifting the problem dimension in order to accommodate non-smooth components in optimization. Second, it consists of only a \emph{single phase}: While the overall convergence rate of classical path-following schemes for self-concordant objectives does not suffer from the initialization phase, proximal path-following schemes undergo slow convergence, in order to obtain a good starting point \cite{TranDinh2013e}. In this work, we show how to overcome this limitation in the proximal setting and prove that our scheme has the same $\mathcal{O}(\sqrtν\log(1/\varepsilon))$ worst-case iteration-complexity with standard approaches \cite{Nesterov2004,Nesterov1994} without requiring an initial phase, where $ν$ is the barrier parameter and $\varepsilon$ is a desired accuracy. Finally, our framework allows errors in the calculation of proximal-Newton directions, without sacrificing the worst-case iteration complexity. We demonstrate the merits of our algorithm via three numerical examples, where proximal operators play a key role.

preprint2016arXiv

Adaptive Smoothing Algorithms for Nonsmooth Composite Convex Minimization

We propose an adaptive smoothing algorithm based on Nesterov's smoothing technique in \cite{Nesterov2005c} for solving "fully" nonsmooth composite convex optimization problems. Our method combines both Nesterov's accelerated proximal gradient scheme and a new homotopy strategy for smoothness parameter. By an appropriate choice of smoothing functions, we develop a new algorithm that has the $\mathcal{O}\left(\frac{1}{\varepsilon}\right)$-worst-case iteration-complexity while preserves the same complexity-per-iteration as in Nesterov's method and allows one to automatically update the smoothness parameter at each iteration. Then, we customize our algorithm to solve four special cases that cover various applications. We also specify our algorithm to solve constrained convex optimization problems and show its convergence guarantee on a primal sequence of iterates. We demonstrate our algorithm through three numerical examples and compare it with other related algorithms.

preprint2016arXiv

Convex block-sparse linear regression with expanders -- provably

Sparse matrices are favorable objects in machine learning and optimization. When such matrices are used, in place of dense ones, the overall complexity requirements in optimization can be significantly reduced in practice, both in terms of space and run-time. Prompted by this observation, we study a convex optimization scheme for block-sparse recovery from linear measurements. To obtain linear sketches, we use expander matrices, i.e., sparse matrices containing only few non-zeros per column. Hitherto, to the best of our knowledge, such algorithmic solutions have been only studied from a non-convex perspective. Our aim here is to theoretically characterize the performance of convex approaches under such setting. Our key novelty is the expression of the recovery error in terms of the model-based norm, while assuring that solution lives in the model. To achieve this, we show that sparse model-based matrices satisfy a group version of the null-space property. Our experimental findings on synthetic and real applications support our claims for faster recovery in the convex setting -- as opposed to using dense sensing matrices, while showing a competitive recovery performance.

preprint2016arXiv

Frank-Wolfe Works for Non-Lipschitz Continuous Gradient Objectives: Scalable Poisson Phase Retrieval

We study a phase retrieval problem in the Poisson noise model. Motivated by the PhaseLift approach, we approximate the maximum-likelihood estimator by solving a convex program with a nuclear norm constraint. While the Frank-Wolfe algorithm, together with the Lanczos method, can efficiently deal with nuclear norm constraints, our objective function does not have a Lipschitz continuous gradient, and hence existing convergence guarantees for the Frank-Wolfe algorithm do not apply. In this paper, we show that the Frank-Wolfe algorithm works for the Poisson phase retrieval problem, and has a global convergence rate of O(1/t), where t is the iteration counter. We provide rigorous theoretical guarantee and illustrating numerical results.

preprint2015arXiv

A new splitting method for solving composite monotone inclusions involving parallel-sum operators

We propose a new primal-dual splitting method for solving composite inclusions involving Lipschitzian, and parallel-sum-type monotone operators. Our approach extends the framework in \cite{Siopt4} to a more general class of monotone inclusions in a nontrivial fashion. The main idea is to represent the solution set of both the primal and dual problems using their associated Kuhn-Tucker set, and then develop a projected method to successively approximate a feasible point of the Kuhn-Tucker set. We propose a splitting algorithm based on the resolvent of each maximally monotone operator to construct a primal-dual sequence that weakly converges to a solution of the original problem. The key feature of our method is that it only employes the resolvent of each monotone operator separately, which is different from existing methods in the literature. As a byproduct, our algorithm can be specialized to solve composite convex minimization problems that uses the proximal-operator of each objective component independently, and is equipped with a weakly convergence guarantee.

preprint2015arXiv

A Primal-Dual Algorithmic Framework for Constrained Convex Minimization

We present a primal-dual algorithmic framework to obtain approximate solutions to a prototypical constrained convex optimization problem, and rigorously characterize how common structural assumptions affect the numerical efficiency. Our main analysis technique provides a fresh perspective on Nesterov's excessive gap technique in a structured fashion and unifies it with smoothing and primal-dual methods. For instance, through the choices of a dual smoothing strategy and a center point, our framework subsumes decomposition algorithms, augmented Lagrangian as well as the alternating direction method-of-multipliers methods as its special cases, and provides optimal convergence rates on the primal objective residual as well as the primal feasibility gap of the iterates for all.

preprint2015arXiv

A Universal Primal-Dual Convex Optimization Framework

We propose a new primal-dual algorithmic framework for a prototypical constrained convex optimization template. The algorithmic instances of our framework are universal since they can automatically adapt to the unknown Holder continuity degree and constant within the dual formulation. They are also guaran- teed to have optimal convergence rates in the objective residual and the feasibility gap for each Holder smoothness degree. In contrast to existing primal-dual algorithms, our framework avoids the proximity operator of the objective function. We instead leverage computationally cheaper, Fenchel-type operators, which are the main workhorses of the generalized conditional gradient (GCG)-type methods. In contrast to the GCG-type methods, our framework does not require the objective function to be differentiable, and can also process additional general linear inclusion constraints, while guarantees the convergence rate on the primal problem

preprint2015arXiv

Adaptive inexact fast augmented Lagrangian methods for constrained convex optimization

In this paper we analyze several inexact fast augmented Lagrangian methods for solving linearly constrained convex optimization problems. Mainly, our methods rely on the combination of excessive-gap-like smoothing technique developed in [15] and the newly introduced inexact oracle framework from [4]. We analyze several algorithmic instances with constant and adaptive smoothing parameters and derive total computational complexity results in terms of projections onto a simple primal set. For the basic inexact fast augmented Lagrangian algorithm we obtain the overall computational complexity of order $\mathcal{O}\left(\frac{1}{ε^{5/4}}\right)$, while for the adaptive variant we get $\mathcal{O}\left(\frac{1}ε\right)$, projections onto a primal set in order to obtain an $ε-$optimal solution for our original problem.

preprint2015arXiv

Structured Sparsity: Discrete and Convex approaches

Compressive sensing (CS) exploits sparsity to recover sparse or compressible signals from dimensionality reducing, non-adaptive sensing mechanisms. Sparsity is also used to enhance interpretability in machine learning and statistics applications: While the ambient dimension is vast in modern data analysis problems, the relevant information therein typically resides in a much lower dimensional space. However, many solutions proposed nowadays do not leverage the true underlying structure. Recent results in CS extend the simple sparsity idea to more sophisticated {\em structured} sparsity models, which describe the interdependency between the nonzero components of a signal, allowing to increase the interpretability of the results and lead to better recovery performance. In order to better understand the impact of structured sparsity, in this chapter we analyze the connections between the discrete models and their convex relaxations, highlighting their relative advantages. We start with the general group sparse model and then elaborate on two important special cases: the dispersive and the hierarchical models. For each, we present the models in their discrete nature, discuss how to solve the ensuing discrete problems and then describe convex relaxations. We also consider more general structures as defined by set functions and present their convex proxies. Further, we discuss efficient optimization solutions for structured sparsity problems and illustrate structured sparsity in action via three applications.

preprint2014arXiv

Composite Self-Concordant Minimization

We propose a variable metric framework for minimizing the sum of a self-concordant function and a possibly non-smooth convex function, endowed with an easily computable proximal operator. We theoretically establish the convergence of our framework without relying on the usual Lipschitz gradient assumption on the smooth part. An important highlight of our work is a new set of analytic step-size selection and correction procedures based on the structure of the problem. We describe concrete algorithmic instances of our framework for several interesting applications and demonstrate them numerically on both synthetic and real data.

preprint2014arXiv

Scalable sparse covariance estimation via self-concordance

We consider the class of convex minimization problems, composed of a self-concordant function, such as the $\log\det$ metric, a convex data fidelity term $h(\cdot)$ and, a regularizing -- possibly non-smooth -- function $g(\cdot)$. This type of problems have recently attracted a great deal of interest, mainly due to their omnipresence in top-notch applications. Under this \emph{locally} Lipschitz continuous gradient setting, we analyze the convergence behavior of proximal Newton schemes with the added twist of a probable presence of inexact evaluations. We prove attractive convergence rate guarantees and enhance state-of-the-art optimization schemes to accommodate such developments. Experimental results on sparse covariance estimation show the merits of our algorithm, both in terms of recovery efficiency and complexity.

Quoc Tran-Dinh

What is connected

Connect this record

See the researcher in context

Building this map preview

24 published item(s)

Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

Sieve-SDP: a simple facial reduction algorithm to preprocess semidefinite programs

A Hybrid Stochastic Optimization Framework for Stochastic Composite Nonconvex Optimization

A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

A Newton Frank-Wolfe Method for Constrained Self-Concordant Minimization

An Optimal Hybrid Variance-Reduced Algorithm for Stochastic Composite Nonconvex Optimization

Asynchronous Federated Learning with Reduced Number of Rounds and with Differential Privacy from Less Aggregated Gaussian Noise

Composite Convex Optimization with Global and Local Inexact Oracles

Construction and Iteration-Complexity of Primal Sequences in Alternating Minimization Algorithms

Non-Stationary First-Order Primal-Dual Algorithms with Faster Convergence Rates

Stability Analysis of Real-Time Methods for Equality Constrained NMPC

Stochastic Gauss-Newton Algorithms for Nonconvex Compositional Optimization

A simple preprocessing algorithm for semidefinite programming

A single-phase, proximal path-following framework

Adaptive Smoothing Algorithms for Nonsmooth Composite Convex Minimization

Convex block-sparse linear regression with expanders -- provably

Frank-Wolfe Works for Non-Lipschitz Continuous Gradient Objectives: Scalable Poisson Phase Retrieval

A new splitting method for solving composite monotone inclusions involving parallel-sum operators

A Primal-Dual Algorithmic Framework for Constrained Convex Minimization

A Universal Primal-Dual Convex Optimization Framework

Adaptive inexact fast augmented Lagrangian methods for constrained convex optimization

Structured Sparsity: Discrete and Convex approaches

Composite Self-Concordant Minimization

Scalable sparse covariance estimation via self-concordance