Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7457

Default doc values format should optimize for iterator access

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      In LUCENE-7407 we switched doc values consumption from random access API to an iterator API, but nothing was done there to improve the codec. We should do that here.

      At a bare minimum we should fix the existing very-sparse case to be a true iterator, and not wrapped with the silly legacy wrappers.

      I think we should also increase the threshold (currently 1%?) when we switch from dense to sparse encoding. This should fix LUCENE-7253, making merging of sparse doc values efficient ("pay for what you use").

      I'm sure there are many other things to explore to let codecs "take advantage" of the fact that they no longer need to offer random access to doc values.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jpountz Adrien Grand
                Reporter:
                mikemccand Michael McCandless
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: