Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7457

Default doc values format should optimize for iterator access

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 7.0
    • None
    • None
    • New

    Description

      In LUCENE-7407 we switched doc values consumption from random access API to an iterator API, but nothing was done there to improve the codec. We should do that here.

      At a bare minimum we should fix the existing very-sparse case to be a true iterator, and not wrapped with the silly legacy wrappers.

      I think we should also increase the threshold (currently 1%?) when we switch from dense to sparse encoding. This should fix LUCENE-7253, making merging of sparse doc values efficient ("pay for what you use").

      I'm sure there are many other things to explore to let codecs "take advantage" of the fact that they no longer need to offer random access to doc values.

      Attachments

        1. LUCENE-7457.patch
          17 kB
          Adrien Grand

        Issue Links

          Activity

            People

              jpountz Adrien Grand
              mikemccand Michael McCandless
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: