On Hardness of Jumbled Indexing
Jumbled indexing is the problem of indexing a text $T$ for queries that ask whether there is a substring of $T$ matching a pattern represented as a Parikh vector, i.e., the vector of frequency counts for each character. Jumbled indexing has garnered a lot of interest in the last four years. There is a naive algorithm that preprocesses all answers in $O(n^2|Σ|)$ time allowing quick queries afterwards, and there is another naive algorithm that requires no preprocessing but has $O(n\log|Σ|)$ query time. Despite a tremendous amount of effort there has been little improvement over these running times. In this paper we provide good reason for this. We show that, under a 3SUM-hardness assumption, jumbled indexing for alphabets of size $ω(1)$ requires $Ω(n^{2-ε})$ preprocessing time or $Ω(n^{1-δ})$ query time for any $ε,δ>0$. In fact, under a stronger 3SUM-hardness assumption, for any constant alphabet size $r\ge 3$ there exist describable fixed constant $ε_r$ and $δ_r$ such that jumbled indexing requires $Ω(n^{2-ε_r})$ preprocessing time or $Ω(n^{1-δ_r})$ query time.