Researcher profile

Philip Thompson

Philip Thompson contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
2close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2024arXiv

Outlier-robust additive matrix decomposition

We study least-squares trace regression when the parameter is the sum of a $r$-low-rank matrix and a $s$-sparse matrix and a fraction $ε$ of the labels is corrupted. For subgaussian distributions and feature-dependent noise, we highlight three needed design properties, each one derived from a different process inequality: a "product process inequality", "Chevet's inequality" and a "multiplier process inequality". These properties handle, simultaneously, additive decomposition, label contamination and design-noise interaction. They imply the near-optimality of a tractable estimator with respect to the effective dimensions $d_{eff,r}$ and $d_{eff,s}$ of the low-rank and sparse components, $ε$ and the failure probability $δ$. The near-optimal rate is $\mathsf{r}(n,d_{eff,r}) + \mathsf{r}(n,d_{eff,s}) + \sqrt{(1+\log(1/δ))/n} + ε\log(1/ε)$, where $\mathsf{r}(n,d_{eff,r})+\mathsf{r}(n,d_{eff,s})$ is the optimal rate in average with no contamination. Our estimator is adaptive to $(s,r,ε,δ)$ and, for fixed absolute constant $c>0$, it attains the mentioned rate with probability $1-δ$ uniformly over all $δ\ge\exp(-cn)$. Without matrix decomposition, our analysis also entails optimal bounds for a robust estimator adapted to the noise variance. Our estimators are based on "sorted" versions of Huber's loss. We present simulations matching the theory. In particular, it reveals the superiority of "sorted" Huber's losses over the classical Huber's loss.

preprint2022arXiv

A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance \& heterogeneous noise

We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $εn$ arbitrary outliers. We wish to estimate a $p$-dimensional parameter $b^*$ given such sample of a label-feature pair $(y,x)$ satisfying $y=\langle x,b^*\rangle+ξ$ with heavy-tailed $(x,ξ)$. We only assume $x$ is $L^4-L^2$ hypercontractive with constant $L>0$ and has covariance matrix $Σ$ with minimum eigenvalue $1/μ^2>0$ and bounded condition number $κ>0$. The noise $ξ$ can be arbitrarily dependent on $x$ and nonsymmetric as long as $ξx$ has finite covariance matrix $Ξ$. We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Σ,Ξ)$ nor the operator norm of $Ξ$. With probability at least $1-δ$, our proposed estimator attains the statistical rate $μ^2\VertΞ\Vert^{1/2}(\frac{p}{n}+\frac{\log(1/δ)}{n}+ε)^{1/2}$ and breakdown-point $ε\lesssim\frac{1}{L^4κ^2}$, both optimal in the $\ell_2$-norm, assuming the near-optimal minimum sample size $L^4κ^2(p\log p + \log(1/δ))\lesssim n$, up to a log factor. To the best of our knowledge, this is the first computationally tractable algorithm satisfying simultaneously all the mentioned properties. Our estimator is based on a two-stage Multiplicative Weight Update algorithm. The first stage estimates a descent direction $\hat v$ with respect to the (unknown) pre-conditioned inner product $\langleΣ(\cdot),\cdot\rangle$. The second stage estimate the descent direction $Σ\hat v$ with respect to the (known) inner product $\langle\cdot,\cdot\rangle$, without knowing nor estimating $Σ$.

preprint2022arXiv

Sample average approximation with heavier tails I: non-asymptotic bounds with weak assumptions and stochastic constraints

We derive new and improved non-asymptotic deviation inequalities for the sample average approximation (SAA) of an optimization problem. Our results give strong error probability bounds that are "sub-Gaussian"~even when the randomness of the problem is fairly heavy tailed. Additionally, we obtain good (often optimal) dependence on the sample size and geometrical parameters of the problem. Finally, we allow for random constraints on the SAA and unbounded feasible sets, which also do not seem to have been considered before in the non-asymptotic literature. Our proofs combine different ideas of potential independent interest: an adaptation of Talagrand's "generic chaining"~bound for sub-Gaussian processes; "localization"~ideas from the Statistical Learning literature; and the use of standard conditions in Optimization (metric regularity, Slater-type conditions) to control fluctuations of the feasible set.