Researcher profile

Dachao Lin

Dachao Lin contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Explicit Convergence Rates of Greedy and Random Quasi-Newton Methods

Optimization is important in machine learning problems, and quasi-Newton methods have a reputation as the most efficient numerical schemes for smooth unconstrained optimization. In this paper, we consider the explicit superlinear convergence rates of quasi-Newton methods and address two open problems mentioned by Rodomanov and Nesterov. First, we extend Rodomanov and Nesterov's results to random quasi-Newton methods, which include common DFP, BFGS, SR1 methods. Such random methods adopt a random direction for updating the approximate Hessian matrix in each iteration. Second, we focus on the specific quasi-Newton methods: SR1 and BFGS methods. We provide improved versions of greedy and random methods with provable better explicit (local) superlinear convergence rates. Our analysis is closely related to the approximation of a given Hessian matrix, unconstrained quadratic objective, as well as the general strongly convex, smooth, and strongly self-concordant functions.

preprint2022arXiv

Explicit Superlinear Convergence Rates of Broyden's Methods in Nonlinear Equations

In this paper, we study the explicit superlinear convergence rates of quasi-Newton methods. We particularly focus on the classical Broyden's method for solving nonlinear equations. We establish its explicit (local) superlinear convergence rate when the initial point is close enough to a solution and the initial Jacobian approximation is also close enough to the exact Jacobian related to the solution. Our results present the explicit superlinear convergence rates of Broyden's "good" and "bad" update schemes. These explicit convergence rates in turn provide some important insights on the performance difference between the "good" and "bad" schemes, which are also validated empirically.

preprint2022arXiv

Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer

In this paper, we follow Eftekhari's work to give a non-local convergence analysis of deep linear networks. Specifically, we consider optimizing deep linear networks which have a layer with one neuron under quadratic loss. We describe the convergent point of trajectories with arbitrary starting point under gradient flow, including the paths which converge to one of the saddle points or the original point. We also show specific convergence rates of trajectories that converge to the global minimizer by stages. To achieve these results, this paper mainly extends the machinery in Eftekhari's work to provably identify the rank-stable set and the global minimizer convergent set. We also give specific examples to show the necessity of our definitions. Crucially, as far as we know, our results appear to be the first to give a non-local global analysis of linear neural networks from arbitrary initialized points, rather than the lazy training regime which has dominated the literature of neural networks, and restricted benign initialization in Eftekhari's work. We also note that extending our results to general linear networks without one hidden neuron assumption remains a challenging open problem.

preprint2022arXiv

On the Landscape of One-hidden-layer Sparse Networks and Beyond

Sparse neural networks have received increasing interest due to their small size compared to dense networks. Nevertheless, most existing works on neural network theory have focused on dense neural networks, and the understanding of sparse networks is very limited. In this paper, we study the loss landscape of one-hidden-layer sparse networks. First, we consider sparse networks with a dense final layer. We show that linear networks can have no spurious valleys under special sparse structures, and non-linear networks could also admit no spurious valleys under a wide final layer. Second, we discover that spurious valleys and spurious minima can exist for wide sparse networks with a sparse final layer. This is different from wide dense networks which do not have spurious valleys under mild assumptions.

preprint2020arXiv

Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond

Quantized Neural Networks (QNNs) use low bit-width fixed-point numbers for representing weight parameters and activations, and are often used in real-world applications due to their saving of computation resources and reproducibility of results. Batch Normalization (BN) poses a challenge for QNNs for requiring floating points in reciprocal operations, and previous QNNs either require computing BN at high precision or revise BN to some variants in heuristic ways. In this work, we propose a novel method to quantize BN by converting an affine transformation of two floating points to a fixed-point operation with shared quantized scale, which is friendly for hardware acceleration and model deployment. We confirm that our method maintains same outputs through rigorous theoretical analysis and numerical analysis. Accuracy and efficiency of our quantization method are verified by experiments at layer level on CIFAR and ImageNet datasets. We also believe that our method is potentially useful in other problems involving quantization.