Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9211

Adding compression to BinaryDocValues storage

    XMLWordPrintableJSON

    Details

    • Lucene Fields:
      New

      Description

      While SortedSetDocValues can be used today to store identical values in a compact form this is not effective for data with many unique values.

      The proposal is that BinaryDocValues should be stored in LZ4 compressed blocks which can dramatically reduce disk storage costs in many cases. The proposal is blocks of a number of documents are stored as a single compressed blob along with metadata that records offsets where the original document values can be found in the uncompressed content.

      There's a trade-off here between efficient compression (more docs-per-block = better compression) and fast retrieval times (fewer docs-per-block = faster read access for single values). A fixed block size of 32 docs seems like it would be a reasonable compromise for most scenarios.

      A PR is up for review here https://github.com/apache/lucene-solr/pull/1234

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mharwood Mark Harwood
                Reporter:
                mharwood Mark Harwood
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h