Source author record

Haochuan Li

Haochuan Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC astro-ph.GA astro-ph.HE Multiagent Systems

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Byzantine-Robust Federated Linear Bandits

In this paper, we study a linear bandit optimization problem in a federated setting where a large collection of distributed agents collaboratively learn a common linear bandit model. Standard federated learning algorithms applied to this setting are vulnerable to Byzantine attacks on even a small fraction of agents. We propose a novel algorithm with a robust aggregation oracle that utilizes the geometric median. We prove that our proposed algorithm is robust to Byzantine attacks on fewer than half of agents and achieves a sublinear $\tilde{\mathcal{O}}({T^{3/4}})$ regret with $\mathcal{O}(\sqrt{T})$ steps of communication in $T$ steps. Moreover, we make our algorithm differentially private via a tree-based mechanism. Finally, if the level of corruption is known to be small, we show that using the geometric median of mean oracle for robust aggregation further improves the regret bound.

preprint2022arXiv

Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective

This work examines the deep disconnect between existing theoretical analyses of gradient-based algorithms and the practice of training deep neural networks. Specifically, we provide numerical evidence that in large-scale neural network training (e.g., ImageNet + ResNet101, and WT103 + TransformerXL models), the neural network's weights do not converge to stationary points where the gradient of the loss is zero. Remarkably, however, we observe that even though the weights do not converge to stationary points, the progress in minimizing the loss function halts and training loss stabilizes. Inspired by this observation, we propose a new perspective based on ergodic theory of dynamical systems to explain it. Rather than studying the evolution of weights, we study the evolution of the distribution of weights. We prove convergence of the distribution of weights to an approximate invariant measure, thereby explaining how the training loss can stabilize without weights necessarily converging to stationary points. We further discuss how this perspective can better align optimization theory with empirical observations in machine learning practice.

preprint2022arXiv

On Convergence of Gradient Descent Ascent: A Tight Local Analysis

Gradient Descent Ascent (GDA) methods are the mainstream algorithms for minimax optimization in generative adversarial networks (GANs). Convergence properties of GDA have drawn significant interest in the recent literature. Specifically, for $\min_{\mathbf{x}} \max_{\mathbf{y}} f(\mathbf{x};\mathbf{y})$ where $f$ is strongly-concave in $\mathbf{y}$ and possibly nonconvex in $\mathbf{x}$, (Lin et al., 2020) proved the convergence of GDA with a stepsize ratio $η_{\mathbf{y}}/η_{\mathbf{x}}=Θ(κ^2)$ where $η_{\mathbf{x}}$ and $η_{\mathbf{y}}$ are the stepsizes for $\mathbf{x}$ and $\mathbf{y}$ and $κ$ is the condition number for $\mathbf{y}$. While this stepsize ratio suggests a slow training of the min player, practical GAN algorithms typically adopt similar stepsizes for both variables, indicating a wide gap between theoretical and empirical results. In this paper, we aim to bridge this gap by analyzing the \emph{local convergence} of general \emph{nonconvex-nonconcave} minimax problems. We demonstrate that a stepsize ratio of $Θ(κ)$ is necessary and sufficient for local convergence of GDA to a Stackelberg Equilibrium, where $κ$ is the local condition number for $\mathbf{y}$. We prove a nearly tight convergence rate with a matching lower bound. We further extend the convergence guarantees to stochastic GDA and extra-gradient methods (EG). Finally, we conduct several numerical experiments to support our theoretical findings.

preprint2021arXiv

Lightcurve Evolution of the nearest Tidal Disruption Event: A late-time, radio-only flare

Tidal disruption events (TDEs) occur when a star passes close enough to a galaxy's supermassive black hole to be disrupted by tidal forces. We discuss new observations of IGRJ12580+0134, a TDE observed in NGC 4845 (d=17 Mpc) in November 2010, with the Karl G. Jansky Very Large Array (JVLA). We also discuss a reanalysis of 2010-2011 Swift and XMM-Newton observations, as well as new, late-time Swift observations. Our JVLA observations show a decay of the nuclear radio flux until 2015, when a plateau was seen, and then a significant (~factor 3) radio flare during 2016. The 2016 radio flare was also accompanied by radio spectral changes, but was not seen in the X-rays. We model the flare as resulting from the interaction of the nuclear jet with a cloud in the interstellar medium. This is distinct from late-time X-ray flares in a few other TDEs where changes in the accretion state and/or a fallback event were suggested, neither of which appears possible in this case. Our reanalysis of the Swift and XMM-Newton data from 2011 shows significant evidence for thermal emission from a disk, as well as a very soft power-law. This, in addition to the extreme X-ray flux increase seen in 2010 (a factor of >$100) bolsters the identification of IGRJ12580+0134 as a TDE, not an unusual AGN variability event.

Haochuan Li

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Byzantine-Robust Federated Linear Bandits

Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective

On Convergence of Gradient Descent Ascent: A Tight Local Analysis

Lightcurve Evolution of the nearest Tidal Disruption Event: A late-time, radio-only flare