Researcher profile

Clarice Poon

Clarice Poon contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

On the global convergence of gradient descent for wide shallow models with bounded nonlinearities

A surprising phenomenon in the training of neural networks is the ability of gradient descent to find global minimizers of the training loss despite its non-convexity. Following earlier works, we investigate this behavior for wide shallow networks. Existing results essentially cover the case of ReLU activations and the case of sigmoid activations with scalar output weights. We study a large class of models that includes multi-head attention layers and two-layer sigmoid networks with vector output weights. Building upon [Chizat and Bach, 2018], we prove that all non-global minimizers of the training loss are unstable under gradient descent dynamics. Thus, when the initial distribution of the parameters has full support (which includes the popular Gaussian case), and in the many hidden neurons or attention heads limit, continuous-time gradient descent can only converge to global minimizers. Establishing the instability of non-global minimizers corresponds to the construction of an ``escaping active set'' -- we complete the proof of [Chizat and Bach, 2018] to construct this set for models with bounded nonlinearities and scalar output weights. We also extend this construction to new cases for models with vector output weights. Finally, we show the well-posedness and the stability with respect to discretization of the mean field training dynamic for sub-Gaussian initializations.

preprint2020arXiv

Geometry of First-Order Methods and Adaptive Acceleration

First-order operator splitting methods are ubiquitous among many fields through science and engineering, such as inverse problems, signal/image processing, statistics, data science and machine learning, to name a few. In this paper, we study a geometric property of first-order methods when applying to solve non-smooth optimization problems. With the tool of "partial smoothness", we design a framework to analyze the trajectory of the fixed-point sequence generated by first-order methods and show that locally, the fixed-point sequence settles onto a regular trajectory such as a straight line or a spiral. Based on this finding, we discuss the limitation of current widely used "inertial acceleration" technique, and propose a trajectory following adaptive acceleration algorithm. Global convergence is established for the proposed acceleration scheme based on the perturbation of fixed-point iteration. Locally, we first build connections between the acceleration scheme and the well-studied "vector extrapolation technique" in the field of numerical analysis, and then discuss local acceleration guarantees of the proposed acceleration scheme. Moreover, our result provides a geometric interpretation of these vector extrapolation techniques. Numerical experiments on various first-order methods are provided to demonstrate the advantage of the proposed adaptive acceleration scheme.

preprint2020arXiv

The geometry of off-the-grid compressed sensing

This paper presents a sharp geometric analysis of the recovery performance of sparse regularization. More specifically, we analyze the BLASSO method which estimates a sparse measure (sum of Dirac masses) from randomized sub-sampled measurements. This is a "continuous", often called off-the-grid, extension of the compressed sensing problem, where the $\ell^1$ norm is replaced by the total variation of measures. This extension is appealing from a numerical perspective because it avoids to discretize the the space by some grid. But more importantly, it makes explicit the geometry of the problem since the positions of the Diracs can now freely move over the parameter space. On a methodological level, our contribution is to propose the Fisher geodesic distance on this parameter space as the canonical metric to analyze super-resolution in a way which is invariant to reparameterization of this space. Switching to the Fisher metric allows us to take into account measurement operators which are not translation invariant, which is crucial for applications such as Laplace inversion in imaging, Gaussian mixtures estimation and training of multilayer perceptrons with one hidden layer. On a theoretical level, our main contribution shows that if the Fisher distance between spikes is larger than a Rayleigh separation constant, then the BLASSO recovers in a stable way a stream of Diracs, provided that the number of measurements is proportional (up to log factors) to the number of Diracs. We measure the stability using an optimal transport distance constructed on top of the Fisher geodesic distance. Our result is (up to log factor) sharp and does not require any randomness assumption on the amplitudes of the underlying measure. Our proof technique relies on an infinite-dimensional extension of the so-called "golfing scheme" which operates over the space of measures and is of general interest.

preprint2019arXiv

On instabilities of deep learning in image reconstruction - Does AI come at a cost?

Deep learning, due to its unprecedented success in tasks such as image classification, has emerged as a new tool in image reconstruction with potential to change the field. In this paper we demonstrate a crucial phenomenon: deep learning typically yields unstablemethods for image reconstruction. The instabilities usually occur in several forms: (1) tiny, almost undetectable perturbations, both in the image and sampling domain, may result in severe artefacts in the reconstruction, (2) a small structural change, for example a tumour, may not be captured in the reconstructed image and (3) (a counterintuitive type of instability) more samples may yield poorer performance. Our new stability test with algorithms and easy to use software detects the instability phenomena. The test is aimed at researchers to test their networks for instabilities and for government agencies, such as the Food and Drug Administration (FDA), to secure safe use of deep learning methods.