Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10672

Re-evaluate different ways to encode postings

Details

    • Task
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • New

    Description

      In Lucene 4, we moved to FOR to encode postings because it woud give better throughput compared to VInts that we had been using until then. This was a time when Lucene would often need to evaluate entire postings lists, and optimizations like BS1 were very important for good performance.

      Nowadays, Lucene performs more dynamic pruning and it's less frequent that Lucene needs to evaluate all hits that match a query. So the performance of nextDoc() has become a bit less relevant while the performance of advance(target) has become more relevant.

      I wonder if we should re-evaluate other ways to encode postings that are theoretically better at skipping, such as Elias-Fano coding, since they support skipping directly on the encoded representation instead of requiring decoding a full block of integers where only a couple of them would be relevant.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jpountz Adrien Grand
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: