-
Type:
Improvement
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 6.0
-
Component/s: None
-
Labels:None
-
Lucene Fields:New
Dimensional values give a smaller index, and faster search times, for indexing ordered byte[] values across one or more dimensions, vs our existing approaches, but the indexing time is substantially slower.
Since the 1D case is so important/common (numeric fields, range query) I think it's worth optimizing its indexing time. It should also be possible to optimize the N > 1 dimensions case too, but it's more complex ... we can postpone that.
So for the 1D case, I changed the merge method to do a merge sort (like postings) of the already sorted segments dimensional values, instead of simply re-indexing all values from the incoming segments, and this was a big speedup.
I also changed from InPlaceMergeSorter to IntroSorter (this is what postings use, and it's faster but still safe) and this was another good speedup, which should also help the > 1D cases.
Finally, I added a BKDReader.verify method (currently it's dark: NOT called) that walks the index and then check that every value in each leaf block does in fact fall within what the index expected/claimed. This is useful for finding bugs! Maybe we can cleanly fold it into CheckIndex somehow later.