Source author record

Philip Thompson

Philip Thompson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning math.OC

Catalog footprint

What is connected

3works

4topics

2close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Outlier-robust additive matrix decomposition

We study least-squares trace regression when the parameter is the sum of a $r$-low-rank matrix and a $s$-sparse matrix and a fraction $ε$ of the labels is corrupted. For subgaussian distributions and feature-dependent noise, we highlight three needed design properties, each one derived from a different process inequality: a "product process inequality", "Chevet's inequality" and a "multiplier process inequality". These properties handle, simultaneously, additive decomposition, label contamination and design-noise interaction. They imply the near-optimality of a tractable estimator with respect to the effective dimensions $d_{eff,r}$ and $d_{eff,s}$ of the low-rank and sparse components, $ε$ and the failure probability $δ$. The near-optimal rate is $\mathsf{r}(n,d_{eff,r}) + \mathsf{r}(n,d_{eff,s}) + \sqrt{(1+\log(1/δ))/n} + ε\log(1/ε)$, where $\mathsf{r}(n,d_{eff,r})+\mathsf{r}(n,d_{eff,s})$ is the optimal rate in average with no contamination. Our estimator is adaptive to $(s,r,ε,δ)$ and, for fixed absolute constant $c>0$, it attains the mentioned rate with probability $1-δ$ uniformly over all $δ\ge\exp(-cn)$. Without matrix decomposition, our analysis also entails optimal bounds for a robust estimator adapted to the noise variance. Our estimators are based on "sorted" versions of Huber's loss. We present simulations matching the theory. In particular, it reveals the superiority of "sorted" Huber's losses over the classical Huber's loss.

preprint2022arXiv

A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance \& heterogeneous noise

We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $εn$ arbitrary outliers. We wish to estimate a $p$-dimensional parameter $b^*$ given such sample of a label-feature pair $(y,x)$ satisfying $y=\langle x,b^*\rangle+ξ$ with heavy-tailed $(x,ξ)$. We only assume $x$ is $L^4-L^2$ hypercontractive with constant $L>0$ and has covariance matrix $Σ$ with minimum eigenvalue $1/μ^2>0$ and bounded condition number $κ>0$. The noise $ξ$ can be arbitrarily dependent on $x$ and nonsymmetric as long as $ξx$ has finite covariance matrix $Ξ$. We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Σ,Ξ)$ nor the operator norm of $Ξ$. With probability at least $1-δ$, our proposed estimator attains the statistical rate $μ^2\VertΞ\Vert^{1/2}(\frac{p}{n}+\frac{\log(1/δ)}{n}+ε)^{1/2}$ and breakdown-point $ε\lesssim\frac{1}{L^4κ^2}$, both optimal in the $\ell_2$-norm, assuming the near-optimal minimum sample size $L^4κ^2(p\log p + \log(1/δ))\lesssim n$, up to a log factor. To the best of our knowledge, this is the first computationally tractable algorithm satisfying simultaneously all the mentioned properties. Our estimator is based on a two-stage Multiplicative Weight Update algorithm. The first stage estimates a descent direction $\hat v$ with respect to the (unknown) pre-conditioned inner product $\langleΣ(\cdot),\cdot\rangle$. The second stage estimate the descent direction $Σ\hat v$ with respect to the (known) inner product $\langle\cdot,\cdot\rangle$, without knowing nor estimating $Σ$.

preprint2022arXiv

Sample average approximation with heavier tails I: non-asymptotic bounds with weak assumptions and stochastic constraints

We derive new and improved non-asymptotic deviation inequalities for the sample average approximation (SAA) of an optimization problem. Our results give strong error probability bounds that are "sub-Gaussian"~even when the randomness of the problem is fairly heavy tailed. Additionally, we obtain good (often optimal) dependence on the sample size and geometrical parameters of the problem. Finally, we allow for random constraints on the SAA and unbounded feasible sets, which also do not seem to have been considered before in the non-asymptotic literature. Our proofs combine different ideas of potential independent interest: an adaptation of Talagrand's "generic chaining"~bound for sub-Gaussian processes; "localization"~ideas from the Statistical Learning literature; and the use of standard conditions in Optimization (metric regularity, Slater-type conditions) to control fluctuations of the feasible set.

Philip Thompson

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Outlier-robust additive matrix decomposition

A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance \& heterogeneous noise

Sample average approximation with heavier tails I: non-asymptotic bounds with weak assumptions and stochastic constraints