Paper detail

On the Problem of $p_1^{-1}$ in Locality-Sensitive Hashing

A Locality-Sensitive Hash (LSH) function is called $(r,cr,p_1,p_2)$-sensitive, if two data-points with a distance less than $r$ collide with probability at least $p_1$ while data points with a distance greater than $cr$ collide with probability at most $p_2$. These functions form the basis of the successful Indyk-Motwani algorithm (STOC 1998) for nearest neighbour problems. In particular one may build a $c$-approximate nearest neighbour data structure with query time $\tilde O(n^ρ/p_1)$ where $ρ=\frac{\log1/p_1}{\log1/p_2}\in(0,1)$. That is, sub-linear time, as long as $p_1$ is not too small. This is significant since most high dimensional nearest neighbour problems suffer from the curse of dimensionality, and can't be solved exact, faster than a brute force linear-time scan of the database. Unfortunately, the best LSH functions tend to have very low collision probabilities, $p_1$ and $p_2$. Including the best functions for Cosine and Jaccard Similarity. This means that the $n^ρ/p_1$ query time of LSH is often not sub-linear after all, even for approximate nearest neighbours! In this paper, we improve the general Indyk-Motwani algorithm to reduce the query time of LSH to $\tilde O(n^ρ/p_1^{1-ρ})$ (and the space usage correspondingly.) Since $n^ρp_1^{ρ-1} < n \Leftrightarrow p_1 > n^{-1}$, our algorithm always obtains sublinear query time, for any collision probabilities at least $1/n$. For $p_1$ and $p_2$ small enough, our improvement over all previous methods can be \emph{up to a factor $n$} in both query time and space. The improvement comes from a simple change to the Indyk-Motwani algorithm, which can easily be implemented in existing software packages.

preprint2020arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.