Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6668

Optimize SortedSet/SortedNumeric storage for the few unique sets use-case

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 5.3
    • None
    • None
    • New

    Description

      Robert suggested this idea: if there are few unique sets of values, we could build a lookup table and then map each doc to an ord in this table, just like we already do for table compression for numerics.

      I think this is especially compelling given that SortedSet/SortedNumeric are our two only doc values types that use O(maxDoc) memory because of the offsets map. When this new strategy is used, memory usage could be bounded to a constant.

      Attachments

        1. LUCENE-6668.patch
          32 kB
          Adrien Grand
        2. LUCENE-6668.patch
          24 kB
          Adrien Grand

        Activity

          People

            jpountz Adrien Grand
            jpountz Adrien Grand
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: