A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance \& heterogeneous noise
We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $εn$ arbitrary outliers. We wish to estimate a $p$-dimensional parameter $b^*$ given such sample of a label-feature pair $(y,x)$ satisfying $y=\langle x,b^*\rangle+ξ$ with heavy-tailed $(x,ξ)$. We only assume $x$ is $L^4-L^2$ hypercontractive with constant $L>0$ and has covariance matrix $Σ$ with minimum eigenvalue $1/μ^2>0$ and bounded condition number $κ>0$. The noise $ξ$ can be arbitrarily dependent on $x$ and nonsymmetric as long as $ξx$ has finite covariance matrix $Ξ$. We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Σ,Ξ)$ nor the operator norm of $Ξ$. With probability at least $1-δ$, our proposed estimator attains the statistical rate $μ^2\VertΞ\Vert^{1/2}(\frac{p}{n}+\frac{\log(1/δ)}{n}+ε)^{1/2}$ and breakdown-point $ε\lesssim\frac{1}{L^4κ^2}$, both optimal in the $\ell_2$-norm, assuming the near-optimal minimum sample size $L^4κ^2(p\log p + \log(1/δ))\lesssim n$, up to a log factor. To the best of our knowledge, this is the first computationally tractable algorithm satisfying simultaneously all the mentioned properties. Our estimator is based on a two-stage Multiplicative Weight Update algorithm. The first stage estimates a descent direction $\hat v$ with respect to the (unknown) pre-conditioned inner product $\langleΣ(\cdot),\cdot\rangle$. The second stage estimate the descent direction $Σ\hat v$ with respect to the (known) inner product $\langle\cdot,\cdot\rangle$, without knowing nor estimating $Σ$.