Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7081

Docvalues terms dict should sometimes prefix-compress fixed-length data.

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 6.0
    • None
    • None
    • New

    Description

      For Sorted/SortedSet types, we encode ordinals and a term dictionary (similar to old lucene 3 term dictionary).

      Originally we had no prefix compression, so we "save space" in the fixed-width case by avoiding addressing, we can just use multiplication: https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/lucene54/Lucene54DocValuesConsumer.java#L423-L425

      But it means no compression whatsoever of the actual bytes, even if values are enormous, I don't think its necessarily a good tradeoff. The lack of prefix compression can become much more magnified now that we have fixed width 128-bit point types in the sandbox...

      Attachments

        1. LUCENE-7081.patch
          2 kB
          Robert Muir

        Activity

          People

            Unassigned Unassigned
            rcmuir Robert Muir
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment