Lucene - Core
  1. Lucene - Core
  2. LUCENE-4022

Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.6, 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA, 3.6.1
    • Component/s: modules/spellchecker
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      The Sorter we use for offline sorting seems to use the MIN_BUFFER_SIZE as a upper bound even if there is more memory available. See this snippet:

      long half = free/2;
      if (half >= ABSOLUTE_MIN_SORT_BUFFER_SIZE) { 
        return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
      }
            
      // by max mem (heap will grow)
      half = (max - total) / 2;
      return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
      

      use use use Math.max instead of min here.

      1. LUCENE-4022.patch
        3 kB
        Simon Willnauer

        Activity

        Hide
        Uwe Schindler added a comment -

        Bulk close for 3.6.1

        Show
        Uwe Schindler added a comment - Bulk close for 3.6.1
        Hide
        Simon Willnauer added a comment -

        committed to 3.6 branch and trunk

        Show
        Simon Willnauer added a comment - committed to 3.6 branch and trunk
        Hide
        Simon Willnauer added a comment -

        How did you come up with the 10x factor though? Is it something off the top of your head?

        I wanted to differentiate between a significantly bigger "unallocated" heap to force a grow if it makes sense so factor 10 seemed to be a good start. I mean this automatic stuff should be a conservative default that gives you reasonable performance. In the first place it should make sure your system is stable and doesn't run into OOM etc. It might seem somewhat arbitrarily. I will add a changes entry and commit this stuff. Seems like robert wants to roll a 3.6.1 soonish

        Show
        Simon Willnauer added a comment - How did you come up with the 10x factor though? Is it something off the top of your head? I wanted to differentiate between a significantly bigger "unallocated" heap to force a grow if it makes sense so factor 10 seemed to be a good start. I mean this automatic stuff should be a conservative default that gives you reasonable performance. In the first place it should make sure your system is stable and doesn't run into OOM etc. It might seem somewhat arbitrarily. I will add a changes entry and commit this stuff. Seems like robert wants to roll a 3.6.1 soonish
        Hide
        Dawid Weiss added a comment -

        Looks good to me Simon. How did you come up with the 10x factor though? Is it something off the top of your head?

        Show
        Dawid Weiss added a comment - Looks good to me Simon. How did you come up with the 10x factor though? Is it something off the top of your head?
        Hide
        Simon Willnauer added a comment -

        here is a patch with a slightly change algorithm. It still takes free/2 as the base buffer size but checks if it is reasonable to grow the heap if the total available mem is 10x larger than the free memory or if the free memory is smaller than MIN_BUFFER_SIZE_MB. If we run into small heaps like on mobile phones where you only have up to 3MB this falls back to the 1/2 or the ABSOLUTE_MIN_SORT_BUFFER_SIZE.

        The actual buffer size is bounded by Integer.MAX_VALUE

        Show
        Simon Willnauer added a comment - here is a patch with a slightly change algorithm. It still takes free/2 as the base buffer size but checks if it is reasonable to grow the heap if the total available mem is 10x larger than the free memory or if the free memory is smaller than MIN_BUFFER_SIZE_MB. If we run into small heaps like on mobile phones where you only have up to 3MB this falls back to the 1/2 or the ABSOLUTE_MIN_SORT_BUFFER_SIZE. The actual buffer size is bounded by Integer.MAX_VALUE
        Hide
        Dawid Weiss added a comment -

        It's a bug, don't know if it was a regression when we talked about how to estimate "half available heap" or if it was there even before then, but it should be Math.max().

        Should we check for max array size overflows (for folks with super-large heaps)?

        Show
        Dawid Weiss added a comment - It's a bug, don't know if it was a regression when we talked about how to estimate "half available heap" or if it was there even before then, but it should be Math.max(). Should we check for max array size overflows (for folks with super-large heaps)?

          People

          • Assignee:
            Simon Willnauer
            Reporter:
            Simon Willnauer
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development