Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7081

Docvalues terms dict should sometimes prefix-compress fixed-length data.

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      For Sorted/SortedSet types, we encode ordinals and a term dictionary (similar to old lucene 3 term dictionary).

      Originally we had no prefix compression, so we "save space" in the fixed-width case by avoiding addressing, we can just use multiplication: https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/lucene54/Lucene54DocValuesConsumer.java#L423-L425

      But it means no compression whatsoever of the actual bytes, even if values are enormous, I don't think its necessarily a good tradeoff. The lack of prefix compression can become much more magnified now that we have fixed width 128-bit point types in the sandbox...

        Attachments

        1. LUCENE-7081.patch
          2 kB
          Robert Muir

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rcmuir Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: