Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6901

Optimize 1D dimensional value indexing

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 6.0
    • None
    • None
    • New

    Description

      Dimensional values give a smaller index, and faster search times, for indexing ordered byte[] values across one or more dimensions, vs our existing approaches, but the indexing time is substantially slower.

      Since the 1D case is so important/common (numeric fields, range query) I think it's worth optimizing its indexing time. It should also be possible to optimize the N > 1 dimensions case too, but it's more complex ... we can postpone that.

      So for the 1D case, I changed the merge method to do a merge sort (like postings) of the already sorted segments dimensional values, instead of simply re-indexing all values from the incoming segments, and this was a big speedup.

      I also changed from InPlaceMergeSorter to IntroSorter (this is what postings use, and it's faster but still safe) and this was another good speedup, which should also help the > 1D cases.

      Finally, I added a BKDReader.verify method (currently it's dark: NOT called) that walks the index and then check that every value in each leaf block does in fact fall within what the index expected/claimed. This is useful for finding bugs! Maybe we can cleanly fold it into CheckIndex somehow later.

      Attachments

        1. LUCENE-6901.patch
          56 kB
          Michael McCandless
        2. LUCENE-6901.patch
          55 kB
          Michael McCandless
        3. LUCENE-6901-timsort.patch
          58 kB
          Michael McCandless

        Activity

          People

            mikemccand Michael McCandless
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: