Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6030

Add norms patched compression which uses table for most common values

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 5.0, 6.0
    • None
    • None
    • New

    Description

      We have added the PATCHED norms sub format in lucene 50, which uses a bitset to mark documents that have the most common value (when >97% of the documents have that value). This works well for fields that have a predominant value length, and then a small number of docs with some other random values. But another common case is having a handful of very common value lengths, like with a title field.

      We can use a table (see TABLE_COMPRESSION) to store the most common values, and save an oridinal for the "other" case, at which point we can lookup in the secondary patch table.

      Attachments

        1. LUCENE-6030.patch
          33 kB
          Ryan Ernst

        Activity

          People

            rjernst Ryan Ernst
            rjernst Ryan Ernst
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: