Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7407

Explore switching doc values to an iterator API

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0
    • Component/s: None
    • Labels:
    • Lucene Fields:
      New

      Description

      I think it could be compelling if we restricted doc values to use an
      iterator API at read time, instead of the more general random access
      API we have today:

      • It would make doc values disk usage more of a "you pay for what
        what you actually use", like postings, which is a compelling
        reduction for sparse usage.
      • I think codecs could compress better and maybe speed up decoding
        of doc values, even in the non-sparse case, since the read-time
        API is more restrictive "forward only" instead of random access.
      • We could remove getDocsWithField entirely, since that's
        implicit in the iteration, and the awkward "return 0 if the
        document didn't have this field" would go away.
      • We can remove the annoying thread locals we must make today in
        CodecReader, and close the trappy "I accidentally shared a
        single XXXDocValues instance across threads", since an iterator is
        inherently "use once".
      • We could maybe leverage the numerous optimizations we've done for
        postings over time, since the two problems ("iterate over doc ids
        and store something interesting for each") are very similar.

      This idea has come up many in the past, e.g. LUCENE-7253 is a recent
      example, and very early iterations of doc values started with exactly
      this

      However, it's a truly enormous change, likely 7.0 only. Or maybe we
      could have the new iterator APIs also ported to 6.x side by side with
      the deprecate existing random-access APIs.

        Attachments

        1. LUCENE-7407.patch
          1.20 MB
          Michael McCandless

          Issue Links

            Activity

              People

              • Assignee:
                mikemccand Michael McCandless
                Reporter:
                mikemccand Michael McCandless
              • Votes:
                1 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: