The NP-hard problem of computing the maximal sample variance over interval data is solvable in almost linear time with high probability
We consider the algorithm by Ferson et al. (Reliable computing 11(3), p. 207-233, 2005) designed for solving the NP-hard problem of computing the maximal sample variance over interval data, motivated by robust statistics (in fact, the formulation can be written as a nonconvex quadratic program with a specific structure). First, we propose a new version of the algorithm improving its original time bound $O(n^2 2^ω)$ to $O(n \log n+n\cdot 2^ω)$, where $n$ is number of input data and $ω$ is the clique number in a certain intersection graph. Then we treat input data as random variables as it is usual in statistics) and introduce a natural probabilistic data generating model. We get $2^ω= O(n^{1/\log\log n})$ and $ω= O(\log n / \log\log n)$ on average. This results in average computing time $O(n^{1+ε})$ for $ε> 0$ arbitrarily small, which may be considered as "surprisingly good" average time complexity for solving an NP-hard problem. Moreover, we prove the following tail bound on the distribution of computation time: hard instances, forcing the algorithm to compute in time $2^{Ω(n)}$, occur rarely, with probability tending to zero at the rate $e^{-n\log\log n}$.