Abstract
We consider the following problem, which arises in many database and web-based applications: Given a set P of n points in a high-dimensional space Rd and a distance r , we want to report all pairs of points of P at Euclidean distance at most r . We present two randomized algorithms, one based on randomly shifted grids, and the other on randomly shifted and rotated grids. The running time of both algorithms is of the form C (d)(n + k ) log n, where k is the output size and C (d) is a constant that depends on the dimension d. The log n factor is needed to guarantee, with high probability, that all neighbor pairs are reported and can be dropped if it suffices to report, in expectation, an ? d )d for arbitrarily large fraction of the pairs. When only translations are used, C (d) is of the form(a some (small) absolute constant a ? 0. 484; this bound is worst-case tight, up to an exponential factor of about 2d . When both rotations and translations are used, C (d) can be improved to roughly 6.74d, getting rid of the superexponential factor ? d d . When the input set (lies in a subset of d-space that) has low doubling dimension ö , the performance of the first algorithm ? improves to C (d, ö )(n + k ) log n ? ? (or to C (d, ö )(n + k )), where C (d, ö ) = O ((ed/ö )ö ) for ö ? d. Otherwise, C (d, ö ) = O (e d d ö ). We also present experimental results on several large data sets, demonstrating that our algorithms run significantly faster than all the leading existing algorithms for reporting neighbors. c- 2014 Society for Industrial and Applied Mathematics Key words. computational geometry, nearest neighbors, near-neighbor searching, high-dimensional spaces, locality sensitive hashing, random grids author has also been supported by grant 822/10 from the Israel Science Fund and by grant 2006/204 from the U.S.-Israel Binational Science Foundation. Work by Micha Sharir has also been supported by grant 338/09 from the Israel Science Fund, and by the Hermann Minkowski-MINERVA Center for Geometry at Tel Aviv University. The second and third authors have been supported by the Israeli Centers of Research Excellence (I-CORE) program (Center 4/11).
Original language | English |
---|---|
Pages (from-to) | 1363-1395 |
Number of pages | 33 |
Journal | SIAM Journal on Computing |
Volume | 43 |
Issue number | 4 |
DOIs | |
State | Published - 1 Jan 2014 |
Externally published | Yes |
ASJC Scopus subject areas
- General Computer Science
- General Mathematics