Source author record

X. Y. Han

X. Y. Han appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC Artificial Intelligence Computational Geometry Computer Vision math.DG math.NA Numerical Analysis physics.atom-ph quant-ph

Catalog footprint

What is connected

6works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path

The recently discovered Neural Collapse (NC) phenomenon occurs pervasively in today's deep net training paradigm of driving cross-entropy (CE) loss towards zero. During NC, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Recent works demonstrated that deep nets trained with mean squared error (MSE) loss perform comparably to those trained with CE. As a preliminary, we empirically establish that NC emerges in such MSE-trained deep nets as well through experiments on three canonical networks and five benchmark datasets. We provide, in a Google Colab notebook, PyTorch code for reproducing MSE-NC and CE-NC: at https://colab.research.google.com/github/neuralcollapse/neuralcollapse/blob/main/neuralcollapse.ipynb. The analytically-tractable MSE loss offers more mathematical opportunities than the hard-to-analyze CE loss, inspiring us to leverage MSE loss towards the theoretical investigation of NC. We develop three main contributions: (I) We show a new decomposition of the MSE loss into (A) terms directly interpretable through the lens of NC and which assume the last-layer classifier is exactly the least-squares classifier; and (B) a term capturing the deviation from this least-squares classifier. (II) We exhibit experiments on canonical datasets and networks demonstrating that term-(B) is negligible during training. This motivates us to introduce a new theoretical construct: the central path, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics. (III) By studying renormalized gradient flow along the central path, we derive exact dynamics that predict NC.

preprint2022arXiv

Survey Descent: A Multipoint Generalization of Gradient Descent for Nonsmooth Optimization

For strongly convex objectives that are smooth, the classical theory of gradient descent ensures linear convergence relative to the number of gradient evaluations. An analogous nonsmooth theory is challenging. Even when the objective is smooth at every iterate, the corresponding local models are unstable and the number of cutting planes invoked by traditional remedies is difficult to bound, leading to convergences guarantees that are sublinear relative to the cumulative number of gradient evaluations. We instead propose a multipoint generalization of the gradient descent iteration for local optimization. While designed with general objectives in mind, we are motivated by a ``max-of-smooth'' model that captures the subdifferential dimension at optimality. We prove linear convergence when the objective is itself max-of-smooth, and experiments suggest a more general phenomenon.

preprint2020arXiv

Disk matrices and the proximal mapping for the numerical radius

Optimal matrices for problems involving the matrix numerical radius often have fields of values that are disks, a phenomenon associated with partial smoothness. Such matrices are highly structured: we experiment in particular with the proximal mapping for the radius, which often maps n-by-n random matrix inputs into a particular manifold of disk matrices that has real codimension 2n. The outputs, computed via semidefinite programming, also satisfy an unusual rank property at optimality.

preprint2020arXiv

Error analysis in suppression of unwanted qubit interactions for a parametric gate in a tunable superconducting circuit

We experimentally demonstrate a parametric iSWAP gate in a superconducting circuit based on a tunable coupler for achieving a continuous tunability to eliminate unwanted qubit interactions. We implement the twoqubit iSWAP gate by applying a fast-flux bias modulation pulse on the coupler to turn on parametric exchange interaction between computational qubits. The controllable interaction can provide an extra degree of freedom to verify the optimal condition for constructing the parametric gate. Aiming to fully investigate error sources of the two-qubit gates, we perform quantum process tomography measurements and numerical simulations as varying static ZZ coupling strength. We quantitatively calculate the dynamic ZZ coupling parasitizing in two-qubit gate operation, and extract the particular gate error from the decoherence, dynamic ZZ coupling and high-order oscillation terms. Our results reveal that the main gate error comes from the decoherence, while the increase in the dynamic ZZ coupling and high-order oscillation error degrades the parametric gate performance. This approach, which has not yet been previously explored, provides a guiding principle to improve gate fidelity of parametric iSWAP gate by suppression of the unwanted qubit interactions. This controllable interaction, together with the parametric modulation technique, is desirable for crosstalk free multiqubit quantum circuits and quantum simulation applications.

preprint2020arXiv

Prevalence of Neural Collapse during the terminal phase of deep learning training

Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes; During TPT, the training error stays effectively zero while training loss is pushed towards zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasive inductive bias we call Neural Collapse, involving four deeply interconnected phenomena: (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class-means; (NC2) The class-means collapse to the vertices of a Simplex Equiangular Tight Frame (ETF); (NC3) Up to rescaling, the last-layer classifiers collapse to the class-means, or in other words to the Simplex ETF, i.e. to a self-dual configuration; (NC4) For a given activation, the classifier's decision collapses to simply choosing whichever class has the closest train class-mean, i.e. the Nearest Class Center (NCC) decision rule. The symmetric and very simple geometry induced by the TPT confers important benefits, including better generalization performance, better robustness, and better interpretability.

preprint2014arXiv

Momentum Distribution of Near-Zero-Energy Photoelectrons in the Strong-Field Tunneling Ionization in the Long Wavelength Limit

We investigate the ionization dynamics of Argon atoms irradiated by an ultrashort intense laser of a wavelength up to 3100 nm, addressing the momentum distribution of the photoelectrons with near-zero-energy. We find a surprising accumulation in the momentum distribution corresponding to meV energy and a \textquotedblleft V"-like structure at the slightly larger transverse momenta. Semiclassical simulations indicate the crucial role of the Coulomb attraction between the escaping electron and the remaining ion at extremely large distance. Tracing back classical trajectories, we find the tunneling electrons born in a certain window of the field phase and transverse velocity are responsible for the striking accumulation. Our theoretical results are consistent with recent meV-resolved high-precision measurements.

X. Y. Han

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path

Survey Descent: A Multipoint Generalization of Gradient Descent for Nonsmooth Optimization

Disk matrices and the proximal mapping for the numerical radius

Error analysis in suppression of unwanted qubit interactions for a parametric gate in a tunable superconducting circuit

Prevalence of Neural Collapse during the terminal phase of deep learning training

Momentum Distribution of Near-Zero-Energy Photoelectrons in the Strong-Field Tunneling Ionization in the Long Wavelength Limit