I re-ran the "lat/lon points in rects around London, UK" perf test from luceneutil (IndexOSM*.java and SearchOSM*.java sources).
This test indexes 60.8 M lat/lon points derived from Open Street Maps data and then runs varying regularly spaced rectangles (225 queries in all) around London, UK.
I used SMS and LogDocsMP to get to a 5/5/5 segment structure for all three tests, and so only a single thread is used throughout for fair comparison of indexing times:
Spatial module, using RecursivePrefixTreeStrategy with PackedQuadPrefixTree at 25 levels:
- 1,464 sec to index
- 7.8 GB index on disk
- 239 MB in-heap (ramBytesUsed summed across all segments)
- 3.98 sec to run 225 searches (best of 100 iters)
- 497 sec to index
- 3.2 GB index on disk
- 86 MB heap (ramBytesUsed summed across all segments)
- 4.48 sec to run 225 searches (best of 100 iters)
Dimensional values (this patch) using default codec's dimensional format
- 744 sec to index
- 704 MB index on disk
- 2.3 MB heap (ramBytesUsed summed across all segments)
- 2.85 sec to run 225 searches (best of 100 iters)
The spatial module is purely postings, geo point field is postings + doc values, and dimensional values is the new BKD tree.
Net/net indexing time for dimensional values approach is in between geo point field and spatial, but the resulting index as well as heap required at search time is much smaller, and the searching is faster.
The search time for dimensional values is a bit slower than the specialized (to lat/lon) doc-values based BKD from
LUCENE-6477 / LUCENE-6645 (2.32 sec to run 225 searches) but I think we can optimize things later.
I haven't tested the 1D case, and I suspect there are important specializations we can make there, but I'll save that for a follow-on.