Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2357

Reduce transient RAM usage while merging by using packed ints array for docID re-mapping

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 4.0-ALPHA, 6.0
    • core/index
    • None
    • New

    Description

      We allocate this int[] to remap docIDs due to compaction of deleted ones.

      This uses alot of RAM for large segment merges, and can fail to allocate due to fragmentation on 32 bit JREs.

      Now that we have packed ints, a simple fix would be to use a packed int array... and maybe instead of storing abs docID in the mapping, we could store the number of del docs seen so far (so the remap would do a lookup then a subtract). This may add some CPU cost to merging but should bring down transient RAM usage quite a bit.

      Attachments

        1. LUCENE-2357.patch
          17 kB
          Adrien Grand
        2. LUCENE-2357.patch
          17 kB
          Michael McCandless
        3. LUCENE-2357.patch
          16 kB
          Adrien Grand
        4. LUCENE-2357.patch
          16 kB
          Michael McCandless
        5. LUCENE-2357.patch
          15 kB
          Adrien Grand

        Issue Links

          Activity

            People

              jpountz Adrien Grand
              mikemccand Michael McCandless
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Slack

                  Issue deployment