Source author record

Hideaki Iiduka

Hideaki Iiduka appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning

Catalog footprint

What is connected

10works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One

Practical results have shown that deep learning optimizers using small constant learning rates, hyperparameters close to one, and large batch sizes can find the model parameters of deep neural networks that minimize the loss functions. We first show theoretical evidence that the momentum method (Momentum) and adaptive moment estimation (Adam) perform well in the sense that the upper bound of the theoretical performance measure is small with a small constant learning rate, hyperparameters close to one, and a large batch size. Next, we show that there exists a batch size called the critical batch size minimizing the stochastic first-order oracle (SFO) complexity, which is the stochastic gradient computation cost, and that SFO complexity increases once the batch size exceeds the critical batch size. Finally, we provide numerical results that support our theoretical results. That is, the numerical results indicate that Adam using a small constant learning rate, hyperparameters close to one, and the critical batch size minimizing SFO complexity has faster convergence than Momentum and stochastic gradient descent (SGD).

preprint2022arXiv

Global Convergence of Hager-Zhang type Riemannian Conjugate Gradient Method

This paper presents the Hager-Zhang (HZ)-type Riemannian conjugate gradient method that uses the exponential retraction. We also present global convergence analyses of our proposed method under two kinds of assumptions. Moreover, we numerically compare our proposed methods with the existing methods by solving two kinds of Riemannian optimization problems on the unit sphere. The numerical results show that our proposed method has much better performance than the existing methods, i.e., the FR, DY, PRP and HS methods. In particular, they show that it has much higher performance than existing methods including the hybrid ones in computing the stability number of graphs problem.

preprint2022arXiv

Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness

Convergence and convergence rate analyses of adaptive methods, such as Adaptive Moment Estimation (Adam) and its variants, have been widely studied for nonconvex optimization. The analyses are based on assumptions that the expected or empirical average loss function is Lipschitz smooth (i.e., its gradient is Lipschitz continuous) and the learning rates depend on the Lipschitz constant of the Lipschitz continuous gradient. Meanwhile, numerical evaluations of Adam and its variants have clarified that using small constant learning rates without depending on the Lipschitz constant and hyperparameters ($β_1$ and $β_2$) close to one is advantageous for training deep neural networks. Since computing the Lipschitz constant is NP-hard, the Lipschitz smoothness condition would be unrealistic. This paper provides theoretical analyses of Adam without assuming the Lipschitz smoothness condition in order to bridge the gap between theory and practice. The main contribution is to show theoretical evidence that Adam using small learning rates and hyperparameters close to one performs well, whereas the previous theoretical results were all for hyperparameters close to zero. Our analysis also leads to the finding that Adam performs well with large batch sizes. Moreover, we show that Adam performs well when it uses diminishing learning rates and hyperparameters close to one.

preprint2020arXiv

Conjugate-gradient-based Adam for stochastic optimization and its application to deep learning

This paper proposes a conjugate-gradient-based Adam algorithm blending Adam with nonlinear conjugate gradient methods and shows its convergence analysis. Numerical experiments on text classification and image classification show that the proposed algorithm can train deep neural network models in fewer epochs than the existing adaptive stochastic optimization algorithms can.

preprint2020arXiv

Hybrid Riemannian Conjugate Gradient Methods with Global Convergence Properties

This paper presents new Riemannian conjugate gradient methods and global convergence analyses under the strong Wolfe conditions. The main idea of the new methods is to combine the good global convergence properties of the Dai-Yuan method with the efficient numerical performance of the Hestenes-Stiefel method. The proposed methods compare well numerically with the existing methods for the Rayleigh quotient minimization problem on the unit sphere. Numerical comparisons show that they perform better than the existing ones.

preprint2016arXiv

Fixed Point Algorithm for Solving Nonmonotone Variational Inequalities in Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF), which is the approximation of a data matrix as the product of two nonnegative matrices, is a key issue in machine learning and data analysis. One approach to NMF is to formulate the problem as a nonconvex optimization problem of minimizing the distance between a data matrix and the product of two nonnegative matrices with nonnegativity constraints and then solve the problem using an iterative algorithm. The algorithms commonly used are the multiplicative update algorithm and the alternating least-squares algorithm. Although both algorithms converge quickly, they may not converge to a stationary point to the problem that is equal to the solution to a nonmonotone variational inequality for the gradient of the distance function. This paper presents an iterative algorithm for solving the problem that is based on the Krasnosel'ski\uı-Mann fixed point algorithm. Convergence analysis showed that, under certain assumptions, any accumulation point of the sequence generated by the proposed algorithm belongs to the solution set of the variational inequality. Application of the {\tt 'mult'} and {\tt'als'} algorithms in MATLAB and the proposed algorithm to various NMF problems showed that the proposed algorithm had fast convergence and was effective.

preprint2016arXiv

Proximal Point Algorithms for Nonsmooth Convex Optimization with Fixed Point Constraints

The problem of minimizing the sum of nonsmooth, convex objective functions defined on a real Hilbert space over the intersection of fixed point sets of nonexpansive mappings, onto which the projections cannot be efficiently computed, is considered. The use of proximal point algorithms that use the proximity operators of the objective functions and incremental optimization techniques is proposed for solving the problem. With the focus on fixed point approximation techniques, two algorithms are devised for solving the problem. One blends an incremental subgradient method, which is a useful algorithm for nonsmooth convex optimization, with a Halpern-type fixed point iteration algorithm. The other is based on an incremental subgradient method and the Krasnosel'ski\uı-Mann fixed point algorithm. It is shown that any weak sequential cluster point of the sequence generated by the Halpern-type algorithm belongs to the solution set of the problem and that there exists a weak sequential cluster point of the sequence generated by the Krasnosel'ski\uı-Mann-type algorithm, which also belongs to the solution set. Numerical comparisons of the two proposed algorithms with existing subgradient methods for concrete nonsmooth convex optimization show that the proposed algorithms achieve faster convergence.

preprint2015arXiv

Almost Sure Convergence of Random Projected Proximal and Subgradient Algorithms for Distributed Nonsmooth Convex Optimization

Two distributed algorithms are described that enable all users connected over a network to cooperatively solve the problem of minimizing the sum of all users' objective functions over the intersection of all users' constraint sets, where each user has its own private nonsmooth convex objective function and closed convex constraint set, which is the intersection of a number of simple, closed convex sets. One algorithm enables each user to adjust its estimate by using a proximity operator of its objective function and the metric projection onto one set randomly selected from the simple, closed convex sets. The other is a distributed random projection algorithm that determines each user's estimate by using a subgradient of its objective function instead of the proximity operator. Investigation of the two algorithms' convergence properties for a diminishing step-size rule revealed that, under certain assumptions, the sequences of all users generated by each of the two algorithms converge almost surely to the same solution. Moreover, convergence rate analysis of the two algorithms is provided, and desired choices of the step size sequences such that the two algorithms have fast convergence are discussed. Numerical comparisons for concrete nonsmooth convex optimization support the convergence analysis and demonstrate the effectiveness of the two algorithms.

preprint2015arXiv

Convergence Analysis of Iterative Methods for Nonsmooth Convex Optimization over Fixed Point Sets of Quasi-Nonexpansive Mappings

This paper considers a networked system with a finite number of users and supposes that each user tries to minimize its own private objective function over its own private constraint set. It is assumed that each user's constraint set can be expressed as a fixed point set of a certain quasi-nonexpansive mapping. This enables us to consider the case in which the projection onto the constraint set cannot be computed efficiently. This paper proposes two methods for solving the problem of minimizing the sum of their nondifferentiable, convex objective functions over the intersection of their fixed point sets of quasi-nonexpansive mappings in a real Hilbert space. One method is a parallel subgradient method that can be implemented under the assumption that each user can communicate with other users. The other is an incremental subgradient method that can be implemented under the assumption that each user can communicate with its neighbors. Investigation of the two methods' convergence properties for a constant step size reveals that, with a small constant step size, they approximate a solution to the problem. Consideration of the case in which the step-size sequence is diminishing demonstrates that the sequence generated by each of the two methods strongly converges to the solution to the problem under certain assumptions. Convergence rate analysis of the two methods under certain situations is provided to illustrate the two methods' efficiency. This paper also discusses nonsmooth convex optimization over sublevel sets of convex functions and provides numerical comparisons that demonstrate the effectiveness of the proposed methods.

preprint2015arXiv

Line Search Fixed Point Algorithms Based on Nonlinear Conjugate Gradient Directions: Application to Constrained Smooth Convex Optimization

This paper considers the fixed point problem for a nonexpansive mapping on a real Hilbert space and proposes novel line search fixed point algorithms to accelerate the search. The termination conditions for the line search are based on the well-known Wolfe conditions that are used to ensure the convergence and stability of unconstrained optimization algorithms. The directions to search for fixed points are generated by using the ideas of the steepest descent direction and conventional nonlinear conjugate gradient directions for unconstrained optimization. We perform convergence as well as convergence rate analyses on the algorithms for solving the fixed point problem under certain assumptions. The main contribution of this paper is to make a concrete response to an issue of constrained smooth convex optimization; that is, whether or not we can devise nonlinear conjugate gradient algorithms to solve constrained smooth convex optimization problems. We show that the proposed fixed point algorithms include ones with nonlinear conjugate gradient directions which can solve constrained smooth convex optimization problems. To illustrate the practicality of the algorithms, we apply them to concrete constrained smooth convex optimization problems, such as constrained quadratic programming problems and generalized convex feasibility problems, and numerically compare them with previous algorithms based on the Krasnosel'ski\uı-Mann fixed point algorithm. The results show that the proposed algorithms dramatically reduce the running time and iterations needed to find optimal solutions to the concrete optimization problems compared with the previous algorithms.

Hideaki Iiduka

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One

Global Convergence of Hager-Zhang type Riemannian Conjugate Gradient Method

Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness

Conjugate-gradient-based Adam for stochastic optimization and its application to deep learning

Hybrid Riemannian Conjugate Gradient Methods with Global Convergence Properties

Fixed Point Algorithm for Solving Nonmonotone Variational Inequalities in Nonnegative Matrix Factorization

Proximal Point Algorithms for Nonsmooth Convex Optimization with Fixed Point Constraints

Almost Sure Convergence of Random Projected Proximal and Subgradient Algorithms for Distributed Nonsmooth Convex Optimization

Convergence Analysis of Iterative Methods for Nonsmooth Convex Optimization over Fixed Point Sets of Quasi-Nonexpansive Mappings

Line Search Fixed Point Algorithms Based on Nonlinear Conjugate Gradient Directions: Application to Constrained Smooth Convex Optimization