Source author record

Haishan Ye

Haishan Ye appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Numerical Analysis Machine Learning Computer Vision

Catalog footprint

What is connected

11works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Decentralized Stochastic Variance Reduced Extragradient Method

This paper studies decentralized convex-concave minimax optimization problems of the form $\min_x\max_y f(x,y) \triangleq\frac{1}{m}\sum_{i=1}^m f_i(x,y)$, where $m$ is the number of agents and each local function can be written as $f_i(x,y)=\frac{1}{n}\sum_{j=1}^n f_{i,j}(x,y)$. We propose a novel decentralized optimization algorithm, called multi-consensus stochastic variance reduced extragradient, which achieves the best known stochastic first-order oracle (SFO) complexity for this problem. Specifically, each agent requires $\mathcal O((n+κ\sqrt{n})\log(1/\varepsilon))$ SFO calls for strongly-convex-strongly-concave problem and $\mathcal O((n+\sqrt{n}L/\varepsilon)\log(1/\varepsilon))$ SFO call for general convex-concave problem to achieve $\varepsilon$-accurate solution in expectation, where $κ$ is the condition number and $L$ is the smoothness parameter. The numerical experiments show the proposed method performs better than baselines.

preprint2022arXiv

Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic Objectives with Skewed Hessian Spectrums

Learning rate schedulers have been widely adopted in training deep neural networks. Despite their practical importance, there is a discrepancy between its practice and its theoretical analysis. For instance, it is not known what schedules of SGD achieve best convergence, even for simple problems such as optimizing quadratic objectives. In this paper, we propose Eigencurve, the first family of learning rate schedules that can achieve minimax optimal convergence rates (up to a constant) for SGD on quadratic objectives when the eigenvalue distribution of the underlying Hessian matrix is skewed. The condition is quite common in practice. Experimental results show that Eigencurve can significantly outperform step decay in image classification tasks on CIFAR-10, especially when the number of epochs is small. Moreover, the theory inspires two simple learning rate schedulers for practical applications that can approximate eigencurve. For some problems, the optimal shape of the proposed schedulers resembles that of cosine decay, which sheds light to the success of cosine decay for such situations. For other situations, the proposed schedulers are superior to cosine decay.

preprint2022arXiv

Explicit Convergence Rates of Greedy and Random Quasi-Newton Methods

Optimization is important in machine learning problems, and quasi-Newton methods have a reputation as the most efficient numerical schemes for smooth unconstrained optimization. In this paper, we consider the explicit superlinear convergence rates of quasi-Newton methods and address two open problems mentioned by Rodomanov and Nesterov. First, we extend Rodomanov and Nesterov's results to random quasi-Newton methods, which include common DFP, BFGS, SR1 methods. Such random methods adopt a random direction for updating the approximate Hessian matrix in each iteration. Second, we focus on the specific quasi-Newton methods: SR1 and BFGS methods. We provide improved versions of greedy and random methods with provable better explicit (local) superlinear convergence rates. Our analysis is closely related to the approximation of a given Hessian matrix, unconstrained quadratic objective, as well as the general strongly convex, smooth, and strongly self-concordant functions.

preprint2022arXiv

Explicit Superlinear Convergence Rates of Broyden's Methods in Nonlinear Equations

In this paper, we study the explicit superlinear convergence rates of quasi-Newton methods. We particularly focus on the classical Broyden's method for solving nonlinear equations. We establish its explicit (local) superlinear convergence rate when the initial point is close enough to a solution and the initial Jacobian approximation is also close enough to the exact Jacobian related to the solution. Our results present the explicit superlinear convergence rates of Broyden's "good" and "bad" update schemes. These explicit convergence rates in turn provide some important insights on the performance difference between the "good" and "bad" schemes, which are also validated empirically.

preprint2021arXiv

DeEPCA: Decentralized Exact PCA with Linear Convergence Rate

Due to the rapid growth of smart agents such as weakly connected computational nodes and sensors, developing decentralized algorithms that can perform computations on local agents becomes a major research direction. This paper considers the problem of decentralized Principal components analysis (PCA), which is a statistical method widely used for data analysis. We introduce a technique called subspace tracking to reduce the communication cost, and apply it to power iterations. This leads to a decentralized PCA algorithm called \texttt{DeEPCA}, which has a convergence rate similar to that of the centralized PCA, while achieving the best communication complexity among existing decentralized PCA algorithms. \texttt{DeEPCA} is the first decentralized PCA algorithm with the number of communication rounds for each power iteration independent of target precision. Compared to existing algorithms, the proposed method is easier to tune in practice, with an improved overall communication cost. Our experiments validate the advantages of \texttt{DeEPCA} empirically.

preprint2020arXiv

Approximate Newton Methods

Many machine learning models involve solving optimization problems. Thus, it is important to deal with a large-scale optimization problem in big data applications. Recently, subsampled Newton methods have emerged to attract much attention due to their efficiency at each iteration, rectified a weakness in the ordinary Newton method of suffering a high cost in each iteration while commanding a high convergence rate. Other efficient stochastic second order methods are also proposed. However, the convergence properties of these methods are still not well understood. There are also several important gaps between the current convergence theory and the performance in real applications. In this paper, we aim to fill these gaps. We propose a unifying framework to analyze both local and global convergence properties of second order methods. Based on this framework, we present our theoretical results which match the performance in real applications well.

preprint2020arXiv

MiLeNAS: Efficient Neural Architecture Search via Mixed-Level Reformulation

Many recently proposed methods for Neural Architecture Search (NAS) can be formulated as bilevel optimization. For efficient implementation, its solution requires approximations of second-order methods. In this paper, we demonstrate that gradient errors caused by such approximations lead to suboptimality, in the sense that the optimization procedure fails to converge to a (locally) optimal solution. To remedy this, this paper proposes \mldas, a mixed-level reformulation for NAS that can be optimized efficiently and reliably. It is shown that even when using a simple first-order method on the mixed-level formulation, \mldas\ can achieve a lower validation error for NAS problems. Consequently, architectures obtained by our method achieve consistently higher accuracies than those obtained from bilevel optimization. Moreover, \mldas\ proposes a framework beyond DARTS. It is upgraded via model size-based search and early stopping strategies to complete the search process in around 5 hours. Extensive experiments within the convolutional architecture search space validate the effectiveness of our approach.

preprint2016arXiv

Revisiting Sub-sampled Newton Methods

Many machine learning models depend on solving a large scale optimization problem. Recently, sub-sampled Newton methods have emerged to attract much attention for optimization due to their efficiency at each iteration, rectified a weakness in the ordinary Newton method of suffering a high cost at each iteration while commanding a high convergence rate. In this work we propose two new efficient Newton-type methods, Refined Sub-sampled Newton and Refined Sketch Newton. Our methods exhibit a great advantage over existing sub-sampled Newton methods, especially when Hessian-vector multiplication can be calculated efficiently. Specifically, the proposed methods are shown to converge superlinearly in general case and quadratically under a little stronger assumption. The proposed methods can be generalized to a unifying framework for the convergence proof of several existing sub-sampled Newton methods, revealing new convergence properties. Finally, we empirically evaluate the performance of our methods on several standard datasets and the results show consistent improvement in computational efficiency.

preprint2016arXiv

Tighter bound of Sketched Generalized Matrix Approximation

Generalized matrix approximation plays a fundamental role in many machine learning problems, such as CUR decomposition, kernel approximation, and matrix low rank approximation. Especially with today's applications involved in larger and larger dataset, more and more efficient generalized matrix approximation algorithems become a crucially important research issue. In this paper, we find new sketching techniques to reduce the size of the original data matrix to develop new matrix approximation algorithms. Our results derive a much tighter bound for the approximation than previous works: we obtain a $(1+ε)$ approximation ratio with small sketched dimensions which implies a more efficient generalized matrix approximation.

preprint2015arXiv

Accelerating Random Kaczmarz Algorithm Based on Clustering Information

Kaczmarz algorithm is an efficient iterative algorithm to solve overdetermined consistent system of linear equations. During each updating step, Kaczmarz chooses a hyperplane based on an individual equation and projects the current estimate for the exact solution onto that space to get a new estimate. Many vairants of Kaczmarz algorithms are proposed on how to choose better hyperplanes. Using the property of randomly sampled data in high-dimensional space, we propose an accelerated algorithm based on clustering information to improve block Kaczmarz and Kaczmarz via Johnson-Lindenstrauss lemma. Additionally, we theoretically demonstrate convergence improvement on block Kaczmarz algorithm.

preprint2015arXiv

Fast Spectral Low Rank Matrix Approximation

First, we extend the results of approximate matrix multiplication from the Frobenius norm to the spectral norm. Second, We develop a class of fast approximate generalized linear regression algorithms with respect to the spectral norm. Finally, We give a fast approximate SVD.

Haishan Ye

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Decentralized Stochastic Variance Reduced Extragradient Method

Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic Objectives with Skewed Hessian Spectrums

Explicit Convergence Rates of Greedy and Random Quasi-Newton Methods

Explicit Superlinear Convergence Rates of Broyden's Methods in Nonlinear Equations

DeEPCA: Decentralized Exact PCA with Linear Convergence Rate

Approximate Newton Methods

MiLeNAS: Efficient Neural Architecture Search via Mixed-Level Reformulation

Revisiting Sub-sampled Newton Methods

Tighter bound of Sketched Generalized Matrix Approximation

Accelerating Random Kaczmarz Algorithm Based on Clustering Information

Fast Spectral Low Rank Matrix Approximation