Source author record

Baoyu Zhou

Baoyu Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning

Catalog footprint

What is connected

4works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method

The classical AdaGrad method adapts the learning rate by dividing by the square root of a sum of squared gradients. Because this sum on the denominator is increasing, the method can only decrease step sizes over time, and requires a learning rate scaling hyper-parameter to be carefully tuned. To overcome this restriction, we introduce GradaGrad, a method in the same family that naturally grows or shrinks the learning rate based on a different accumulation in the denominator, one that can both increase and decrease. We show that it obeys a similar convergence rate as AdaGrad and demonstrate its non-monotone adaptation capability with experiments.

preprint2022arXiv

Manifold Sampling for Optimizing Nonsmooth Nonconvex Compositions

We propose a manifold sampling algorithm for minimizing a nonsmooth composition $f= h\circ F$, where we assume $h$ is nonsmooth and may be inexpensively computed in closed form and $F$ is smooth but its Jacobian may not be available. We additionally assume that the composition $h\circ F$ defines a continuous selection. Manifold sampling algorithms can be classified as model-based derivative-free methods, in that models of $F$ are combined with particularly sampled information about $h$ to yield local models for use within a trust-region framework. We demonstrate that cluster points of the sequence of iterates generated by the manifold sampling algorithm are Clarke stationary. We consider the tractability of three particular subproblems generated by the manifold sampling algorithm and the extent to which inexact solutions to these subproblems may be tolerated. Numerical results demonstrate that manifold sampling as a derivative-free algorithm is competitive with state-of-the-art algorithms for nonsmooth optimization that utilize first-order information about $f$.

preprint2020arXiv

Limited-Memory BFGS with Displacement Aggregation

A displacement aggregation strategy is proposed for the curvature pairs stored in a limited-memory BFGS (a.k.a. L-BFGS) method such that the resulting (inverse) Hessian approximations are equal to those that would be derived from a full-memory BFGS method. This means that, if a sufficiently large number of pairs are stored, then an optimization algorithm employing the limited-memory method can achieve the same theoretical convergence properties as when full-memory (inverse) Hessian approximations are stored and employed, such as a local superlinear rate of convergence under assumptions that are common for attaining such guarantees. To the best of our knowledge, this is the first work in which a local superlinear convergence rate guarantee is offered by a quasi-Newton scheme that does not either store all curvature pairs throughout the entire run of the optimization algorithm or store an explicit (inverse) Hessian approximation. Numerical results are presented to show that displacement aggregation within an adaptive L-BFGS scheme can lead to better performance than standard L-BFGS.

preprint2020arXiv

Sequential Quadratic Optimization for Nonlinear Equality Constrained Stochastic Optimization

Sequential quadratic optimization algorithms are proposed for solving smooth nonlinear optimization problems with equality constraints. The main focus is an algorithm proposed for the case when the constraint functions are deterministic, and constraint function and derivative values can be computed explicitly, but the objective function is stochastic. It is assumed in this setting that it is intractable to compute objective function and derivative values explicitly, although one can compute stochastic function and gradient estimates. As a starting point for this stochastic setting, an algorithm is proposed for the deterministic setting that is modeled after a state-of-the-art line-search SQP algorithm, but uses a stepsize selection scheme based on Lipschitz constants (or adaptively estimated Lipschitz constants) in place of the line search. This sets the stage for the proposed algorithm for the stochastic setting, for which it is assumed that line searches would be intractable. Under reasonable assumptions, convergence (resp.,~convergence in expectation) from remote starting points is proved for the proposed deterministic (resp.,~stochastic) algorithm. The results of numerical experiments demonstrate the practical performance of our proposed techniques.

Baoyu Zhou

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method

Manifold Sampling for Optimizing Nonsmooth Nonconvex Compositions

Limited-Memory BFGS with Displacement Aggregation

Sequential Quadratic Optimization for Nonlinear Equality Constrained Stochastic Optimization