Source author record

Dominique Orban

Dominique Orban appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC math.NA Numerical Analysis Data Structures and Algorithms Machine Learning Mathematical Software

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

A Levenberg-Marquardt Method for Nonsmooth Regularized Least Squares

We develop a Levenberg-Marquardt method for minimizing the sum of a smooth nonlinear least-squar es term $f(x) = \tfrac{1}{2} \|F(x)\|_2^2$ and a nonsmooth term $h$. Both $f$ and $h$ may be nonconvex. Steps are computed by minimizing the sum of a regularized linear least-squares model and a model of $h$ using a first-order method such as the proximal gradient method. We establish global convergence to a first-order stationary point of both a trust-region and a regularization variant of the Levenberg-Marquardt method under the assumptions that $F$ and its Jacobian are Lipschitz continuous and $h$ is proper and lower semi-continuous. In the worst case, both methods perform $O(ε^{-2})$ iterations to bring a measure of stationarity below $ε\in (0, 1)$. We report numerical results on three examples: a group-lasso basis-pursuit denoise example, a nonlinear support vector machine, and parameter estimation in neuron firing. For those examples to be implementable, we describe in detail how to evaluate proximal operators for separable $h$ and for the group lasso with trust-region constraint. In all cases, the Levenberg-Marquardt methods perform fewer outer iterations than a proximal-gradient method with adaptive step length and a quasi-Newton trust-region method, neither of which exploit the least-squares structure of the problem. Our results also highlight the need for more sophisticated subproblem solvers than simple first-order methods.

preprint2022arXiv

A semi-conjugate gradient method for solving unsymmetric positive definite linear systems

The conjugate gradient (CG) method is a classic Krylov subspace method for solving symmetric positive definite linear systems. We introduce an analogous semi-conjugate gradient (SCG) method for unsymmetric positive definite linear systems. Unlike CG, SCG requires the solution of a lower triangular linear system to produce each semi-conjugate direction. We prove that SCG is theoretically equivalent to the full orthogonalization method (FOM), which is based on the Arnoldi process and converges in a finite number of steps. Because SCG's triangular system increases in size each iteration, we study a sliding window implementation (SWI) to improve efficiency, and show that the directions produced are still locally semi-conjugate. A counterexample illustrates that SWI is different from the direct incomplete orthogonalization method (DIOM), which is FOM with a sliding window. Numerical experiments from the convection-diffusion equation and other applications show that SCG is robust and that the sliding window implementation SWI allows SCG to solve large systems efficiently.

preprint2022arXiv

A Stochastic Proximal Method for Nonsmooth Regularized Finite Sum Optimization

We consider the problem of training a deep neural network with nonsmooth regularization to retrieve a sparse and efficient sub-structure. Our regularizer is only assumed to be lower semi-continuous and prox-bounded. We combine an adaptive quadratic regularization approach with proximal stochastic gradient principles to derive a new solver, called SR2, whose convergence and worst-case complexity are established without knowledge or approximation of the gradient's Lipschitz constant. We formulate a stopping criteria that ensures an appropriate first-order stationarity measure converges to zero under certain conditions. We establish a worst-case iteration complexity of $\mathcal{O}(ε^{-2})$ that matches those of related methods like ProxGEN, where the learning rate is assumed to be related to the Lipschitz constant. Our experiments on network instances trained on CIFAR-10 and CIFAR-100 with $\ell_1$ and $\ell_0$ regularizations show that SR2 consistently achieves higher sparsity and accuracy than related methods such as ProxGEN and ProxSGD.

preprint2022arXiv

Computing a Sparse Projection into a Box

We describe a procedure to compute a projection of $w \in \mathbb{R}^n$ into the intersection of the so-called \emph{zero-norm} ball $k \mathbb{B}_0$ of radius $k$, i.e., the set of $k$-sparse vectors, with a box centered at a point of $k \mathbb{B}_0$. The need for such projection arises in the context of certain trust-region methods for nonsmooth regularized optimization. Although the set into which we wish to project is nonconvex, we show that a solution may be found in $O(n \log(n))$ operations. We describe our Julia implementation and illustrate our procedure in the context of two trust-region methods for nonsmooth regularized optimization.

preprint2021arXiv

A Julia implementation of Algorithm NCL for constrained optimization

Algorithm NCL is designed for general smooth optimization problems where first and second derivatives are available, including problems whose constraints may not be linearly independent at a solution (i.e., do not satisfy the LICQ). It is equivalent to the LANCELOT augmented Lagrangian method, reformulated as a short sequence of nonlinearly constrained subproblems that can be solved efficiently by IPOPT and KNITRO, with warm starts on each subproblem. We give numerical results from a Julia implementation of Algorithm NCL on tax policy models that do not satisfy the LICQ, and on nonlinear least-squares problems and general problems from the CUTEst test set.

preprint2021arXiv

Constraint-Preconditioned Krylov Solvers for Regularized Saddle-Point Systems

We consider the iterative solution of regularized saddle-point systems. When the leading block is symmetric and positive semi-definite on an appropriate subspace, Dollar, Gould, Schilders, and Wathen (2006) describe how to apply the conjugate gradient (CG) method coupled with a constraint preconditioner, a choice that has proved to be effective in optimization applications. We investigate the design of constraint-preconditioned variants of other Krylov methods for regularized systems by focusing on the underlying basis-generation process. We build upon principles laid out by Gould, Orban, and Rees (2014) to provide general guidelines that allow us to specialize any Krylov method to regularized saddle-point systems. In particular, we obtain constraint-preconditioned variants of Lanczos and Arnoldi-based methods, including the Lanczos version of CG, MINRES, SYMMLQ, GMRES(m) and DQGMRES. We also provide MATLAB implementations in hopes that they are useful as a basis for the development of more sophisticated software. Finally, we illustrate the numerical behavior of constraint-preconditioned Krylov solvers using symmetric and nonsymmetric systems arising from constrained optimization.

preprint2020arXiv

Scaled Projected-Directions Methods with Application to Transmission Tomography

Statistical image reconstruction in X-Ray computed tomography yields large-scale regularized linear least-squares problems with nonnegativity bounds, where the memory footprint of the operator is a concern. Discretizing images in cylindrical coordinates results in significant memory savings, and allows parallel operator-vector products without on-the-fly computation of the operator, without necessarily decreasing image quality. However, it deteriorates the conditioning of the operator. We improve the Hessian conditioning by way of a block-circulant scaling operator and we propose a strategy to handle nondiagonal scaling in the context of projected-directions methods for bound-constrained problems. We describe our implementation of the scaling strategy using two algorithms: TRON, a trust-region method with exact second derivatives, and L-BFGS-B, a linesearch method with a limited-memory quasi-Newton Hessian approximation. We compare our approach with one where a change of variable is made in the problem. On two reconstruction problems, our approach converges faster than the change of variable approach, and achieves much tighter accuracy in terms of optimality residual than a first-order method.

preprint2019arXiv

Implementing a smooth exact penalty function for equality-constrained nonlinear optimization

We develop a general equality-constrained nonlinear optimization algorithm based on a smooth penalty function proposed by Fletcher (1970). Although it was historically considered to be computationally prohibitive in practice, we demonstrate that the computational kernels required are no more expensive than other widely accepted methods for nonlinear optimization. The main kernel required to evaluate the penalty function and its derivatives is solving a structured linear system. We show how to solve this system efficiently by storing a single factorization each iteration when the matrices are available explicitly. We further show how to adapt the penalty function to the class of factorization-free algorithms by solving the linear system iteratively. The penalty function therefore has promise when the linear system can be solved efficiently, e.g., for PDE-constrained optimization problems where efficient preconditioners exist. We discuss extensions including handling simple constraints explicitly, regularizing the penalty function, and inexact evaluation of the penalty function and its gradients. We demonstrate the merits of the approach and its various features on some nonlinear programs from a standard test set, and some PDE-constrained optimization problems.

preprint2019arXiv

Implementing a smooth exact penalty function for general constrained nonlinear optimization

We build upon Estrin et al. (2019) to develop a general constrained nonlinear optimization algorithm based on a smooth penalty function proposed by Fletcher (1970, 1973b). Although Fletcher's approach has historically been considered impractical, we show that the computational kernels required are no more expensive than those in other widely accepted methods for nonlinear optimization. The main kernel for evaluating the penalty function and its derivatives solves structured linear systems. When the matrices are available explicitly, we store a single factorization each iteration. Otherwise, we obtain a factorization-free optimization algorithm by solving each linear system iteratively. The penalty function shows promise in cases where the linear systems can be solved efficiently, e.g., PDE-constrained optimization problems when efficient preconditioners exist. We demonstrate the merits of the approach, and give numerical results on several PDE-constrained and standard test problems.

Dominique Orban

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

A Levenberg-Marquardt Method for Nonsmooth Regularized Least Squares

A semi-conjugate gradient method for solving unsymmetric positive definite linear systems

A Stochastic Proximal Method for Nonsmooth Regularized Finite Sum Optimization

Computing a Sparse Projection into a Box

A Julia implementation of Algorithm NCL for constrained optimization

Constraint-Preconditioned Krylov Solvers for Regularized Saddle-Point Systems

Scaled Projected-Directions Methods with Application to Transmission Tomography

Implementing a smooth exact penalty function for equality-constrained nonlinear optimization

Implementing a smooth exact penalty function for general constrained nonlinear optimization