Description
Today Lucene uses postings to index a numeric value at multiple
precision levels for fast range searching. It's somewhat costly: each
numeric value is indexed with multiple terms (4 terms by default)
... I think a dedicated 1D BKD tree should be more compact and perform
better.
It should also easily generalize beyond 64 bits to arbitrary byte[],
e.g. for LUCENE-5596, but I haven't explored that here.
A 1D BKD tree just sorts all values, and then indexes adjacent leaf
blocks of size 512-1024 (by default) values per block, and their
docIDs, into a fully balanced binary tree. Building the range filter
is then just a recursive walk through this tree.
It's the same structure we use for 2D lat/lon BKD tree, just with 1D
instead. I implemented it as a DocValuesFormat that also writes the
numeric tree on the side.