Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9850

Explore PFOR for Doc ID delta encoding (instead of FOR)

Details

    • Task
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 9.0
    • 9.0
    • core/codecs
    • None
    • New

    Description

      It'd be interesting to explore using PFOR instead of FOR for doc ID encoding. Right now PFOR is used for positions, frequencies and payloads, but FOR is used for doc ID deltas. From a recent conversation on the dev mailing list, it sounds like this decision was made based on the optimization possible when expanding the deltas.

      I'd be interesting in measuring the index size reduction possible with switching to PFOR compared to the performance reduction we might see by no longer being able to apply the deltas in as optimal a way.

      Attachments

        1. for.png
          207 kB
          Greg Miller
        2. pfor.png
          250 kB
          Greg Miller
        3. apply_exceptions.png
          146 kB
          Greg Miller
        4. bulk_read_1.png
          234 kB
          Greg Miller
        5. bulk_read_2.png
          154 kB
          Greg Miller

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gsmiller Greg Miller
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 6h 40m
                  6h 40m