Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9529

Larger stored fields block sizes mean we're more likely to disable optimized bulk merging

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 8.7
    • None
    • None
    • New

    Description

      Whenever possible when merging stored fields, Lucene tries to copy the compressed data instead of decompressing the source segment to then re-compressing in the destination segment. A problem with this approach is that if some blocks are incomplete (typically the last block of a segment) then it remains incomplete in the destination segment too, and if we do it for too long we end up with a bad compression ratio. So Lucene keeps track of these incomplete blocks, and makes sure to keep a ratio of incomplete blocks below 1%.

      But as we increased the block size, it has become more likely to have a high ratio of incomplete blocks. E.g. if you have a segment with 1MB of stored fields, with 16kB blocks like before, you have 63 complete blocks and 1 incomplete block, or 1.6%. But now with ~512kB blocks, you have one complete block and 1 incomplete block, ie. 50%.

      I'm not sure how to fix it or even whether it should be fixed but wanted to open an issue to track this.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jpountz Adrien Grand
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m