Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6030

Add norms patched compression which uses table for most common values

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We have added the PATCHED norms sub format in lucene 50, which uses a bitset to mark documents that have the most common value (when >97% of the documents have that value). This works well for fields that have a predominant value length, and then a small number of docs with some other random values. But another common case is having a handful of very common value lengths, like with a title field.

      We can use a table (see TABLE_COMPRESSION) to store the most common values, and save an oridinal for the "other" case, at which point we can lookup in the secondary patch table.

        Attachments

          Activity

            People

            • Assignee:
              rjernst Ryan Ernst
              Reporter:
              rjernst Ryan Ernst
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: