Lucene - Core
  1. Lucene - Core
  2. LUCENE-1741

Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.9
    • Fix Version/s: 2.9
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This is a followup for java-user thred: http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b

      It is easy to implement, just add a setter method for this parameter to MMapDir.

      1. LUCENE-1741.patch
        4 kB
        Uwe Schindler
      2. LUCENE-1741.patch
        4 kB
        Uwe Schindler
      3. LUCENE-1741.patch
        4 kB
        Uwe Schindler

        Activity

        Hide
        Uwe Schindler added a comment -

        Patch that allows configuration of chunk size. I will commit in the evening (MEZ).

        Show
        Uwe Schindler added a comment - Patch that allows configuration of chunk size. I will commit in the evening (MEZ).
        Hide
        Michael McCandless added a comment -

        Should we default the chunking size to something smaller (128 MB?) on 32 bit JRE?

        Show
        Michael McCandless added a comment - Should we default the chunking size to something smaller (128 MB?) on 32 bit JRE?
        Hide
        Uwe Schindler added a comment -

        Good idea. Do we have still this 64bit detection property in the utils? If yes, this could be easily done.

        Show
        Uwe Schindler added a comment - Good idea. Do we have still this 64bit detection property in the utils? If yes, this could be easily done.
        Hide
        Uwe Schindler added a comment -

        Attached is a patch using the JRE_IS_64BIT in Constants. I set the default to 256 MiBytes (128 seems to small for large indexes, if the index is e.g. about 1.5 GiBytes, you would get 6 junks.

        I have no test data which size is good, it is just trying out (and depends e.g. on how often you reboot Windows, as Eks said).

        Show
        Uwe Schindler added a comment - Attached is a patch using the JRE_IS_64BIT in Constants. I set the default to 256 MiBytes (128 seems to small for large indexes, if the index is e.g. about 1.5 GiBytes, you would get 6 junks. I have no test data which size is good, it is just trying out (and depends e.g. on how often you reboot Windows, as Eks said).
        Hide
        Michael McCandless added a comment -

        Patch looks good!

        Show
        Michael McCandless added a comment - Patch looks good!
        Hide
        Uwe Schindler added a comment -

        Eks Dev wrote in java-dev:

        I have no test data which size is good, it is just trying out

        Sure, for this you need bad OS and large index, you are not as lucky as I am to have it

        Anyhow, I would argument against default value. An algorithm is quite simple, if you hit OOM on map(), reduce this value until it fits
        no need to touch it if it works...

        Show
        Uwe Schindler added a comment - Eks Dev wrote in java-dev: I have no test data which size is good, it is just trying out Sure, for this you need bad OS and large index, you are not as lucky as I am to have it Anyhow, I would argument against default value. An algorithm is quite simple, if you hit OOM on map(), reduce this value until it fits no need to touch it if it works...
        Hide
        Uwe Schindler added a comment -

        OK, we have two patches, we can think about using one of them.

        In my opinion, there is no problem with limiting the chunk size on 32 bit systems. The overhead of choosing the right chunk is neglectible, as it only affects seeking. Normal sequential reads must only check, if the current chunk has enough data and if not, move to the next. The non-chunked stream does this check, too (to throw EOF). With a chunk size of 256 MB, the theoretical maximum number of chunks is 8 (which can be never reached...).

        Any other comments?

        Eks: What was you value, that fixed your problem without rebooting. And: How big was your biggest index file?

        Show
        Uwe Schindler added a comment - OK, we have two patches, we can think about using one of them. In my opinion, there is no problem with limiting the chunk size on 32 bit systems. The overhead of choosing the right chunk is neglectible, as it only affects seeking. Normal sequential reads must only check, if the current chunk has enough data and if not, move to the next. The non-chunked stream does this check, too (to throw EOF). With a chunk size of 256 MB, the theoretical maximum number of chunks is 8 (which can be never reached...). Any other comments? Eks: What was you value, that fixed your problem without rebooting. And: How big was your biggest index file?
        Hide
        Paul Smith added a comment -

        An algorithm is nice if there are no specific settings specified, but in an environment where large indexes may be opened more frequently than the common use cases, then what is happening is that the Memory layer is getting OOM conditions too much, forcing too much GC activity to attempt the operation.

        I'd vote for checking if settings have been requested and using them, and if not set rely on a self-tuning algorithm.

        In a really long running application, the process address space may become more and more fragmented, and the malloc library may not be able to defragment it, so the auto-tuning is nice, but it may not be great for all peoples needs.

        For example, our specific use case (crazy as this may be) is to have many different indexes open at any one time, closing and opening them frequently (the Realtime Search stuff we are following very closely indeed.. ). I'm just thinking that our VM (64bit) may find it difficult to find the contiguous non-heap space for the MMap operation after many days/weeks in operation.

        Maybe I'm just paranoid. But for operational purposes, it'd be nice to know we could change the setting based on our observations.

        thanks!

        Show
        Paul Smith added a comment - An algorithm is nice if there are no specific settings specified, but in an environment where large indexes may be opened more frequently than the common use cases, then what is happening is that the Memory layer is getting OOM conditions too much, forcing too much GC activity to attempt the operation. I'd vote for checking if settings have been requested and using them, and if not set rely on a self-tuning algorithm. In a really long running application, the process address space may become more and more fragmented, and the malloc library may not be able to defragment it, so the auto-tuning is nice, but it may not be great for all peoples needs. For example, our specific use case (crazy as this may be) is to have many different indexes open at any one time, closing and opening them frequently (the Realtime Search stuff we are following very closely indeed.. ). I'm just thinking that our VM (64bit) may find it difficult to find the contiguous non-heap space for the MMap operation after many days/weeks in operation. Maybe I'm just paranoid. But for operational purposes, it'd be nice to know we could change the setting based on our observations. thanks!
        Hide
        Eks Dev added a comment -

        Uwe, you convinced me, I looked at the code, and indeed, no performance penalty for this.

        what helped me was 1.1G... (I've tried to find maximum); Max file size is 1.4G ... but 1.1 is just OS coincidence, no magic about it.

        I guess 512mb makes a good value, if memory is so fragmented that you cannot allocate 0.5G, you are definitely having some other problems around. We are taliking here about VM memory, and even on windows having 512Mb in block is not an issue (or better said, I have never seen problems with this value).

        @Paul: It is misunderstanding, my "algorithm" was meant to be manual... no catching OOM and retry (I've burned my fingers already on catching RuntimeException, do only when absolutely desperate . Uwe made this value user settable anyhow.

        Thanks Uwe!

        Show
        Eks Dev added a comment - Uwe, you convinced me, I looked at the code, and indeed, no performance penalty for this. what helped me was 1.1G... (I've tried to find maximum); Max file size is 1.4G ... but 1.1 is just OS coincidence, no magic about it. I guess 512mb makes a good value, if memory is so fragmented that you cannot allocate 0.5G, you are definitely having some other problems around. We are taliking here about VM memory, and even on windows having 512Mb in block is not an issue (or better said, I have never seen problems with this value). @Paul: It is misunderstanding, my "algorithm" was meant to be manual... no catching OOM and retry (I've burned my fingers already on catching RuntimeException, do only when absolutely desperate . Uwe made this value user settable anyhow. Thanks Uwe!
        Hide
        Michael McCandless added a comment -

        I'd be more comfortable w/ 256 MB (or, smaller); I think fragmentation could easily cause 512MB to give the false OOM. I don't think we'll see real perf costs from buffer switching unless chunk size is very small (eg < 1 MB).

        In any event, Uwe can you add to the javadocs describing this false OOM problem and what to do if you hit it?

        Show
        Michael McCandless added a comment - I'd be more comfortable w/ 256 MB (or, smaller); I think fragmentation could easily cause 512MB to give the false OOM. I don't think we'll see real perf costs from buffer switching unless chunk size is very small (eg < 1 MB). In any event, Uwe can you add to the javadocs describing this false OOM problem and what to do if you hit it?
        Hide
        Uwe Schindler added a comment -

        Javadocs state (in FileChannel#map): "For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory."

        So it should be as big as possible. A second problem with too many buffers is, that the MMU/TLB cannot handle too many of them effective.

        In my opinion, maybe we could enhance MMapDirectory to work together with FileSwitchDirectory or something like that, to only use mmap for large files and all others handled by NIO/Simple. E.g. mapping the segments.gen file into memory is really wasting resources. So MMapDir would only return the MMapIndexInput, if the underlying file is > X Bytes (e.g. 8 Megabytes per default) and fall back to SimpleFSIndexInput otherwise.

        In any event, Uwe can you add to the javadocs describing this false OOM problem and what to do if you hit it?

        Will do this tomorrow, will go to bed now.

        Here are also some other numbers about this problem: http://groups.google.com/group/jsr203-interest/browse_thread/thread/66f6a5042f2b0c4a/12228bbd57d1956d

        Show
        Uwe Schindler added a comment - Javadocs state (in FileChannel#map): "For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory." So it should be as big as possible. A second problem with too many buffers is, that the MMU/TLB cannot handle too many of them effective. In my opinion, maybe we could enhance MMapDirectory to work together with FileSwitchDirectory or something like that, to only use mmap for large files and all others handled by NIO/Simple. E.g. mapping the segments.gen file into memory is really wasting resources. So MMapDir would only return the MMapIndexInput, if the underlying file is > X Bytes (e.g. 8 Megabytes per default) and fall back to SimpleFSIndexInput otherwise. In any event, Uwe can you add to the javadocs describing this false OOM problem and what to do if you hit it? Will do this tomorrow, will go to bed now. Here are also some other numbers about this problem: http://groups.google.com/group/jsr203-interest/browse_thread/thread/66f6a5042f2b0c4a/12228bbd57d1956d
        Hide
        Uwe Schindler added a comment -

        Updated patch with Mike's suggestion.

        Show
        Uwe Schindler added a comment - Updated patch with Mike's suggestion.
        Hide
        Michael McCandless added a comment -

        Patch looks good; thanks Uwe.

        Show
        Michael McCandless added a comment - Patch looks good; thanks Uwe.
        Hide
        Uwe Schindler added a comment -

        Committed revision: 793826

        Thanks Eks!

        About the automatic fallback to a SimpleFSIndexInput for small files like segment*, *.del, I open another issue targeted to 3.1. MMapping of small files is wasting system resources and may be slower than just reading a few bytes with SimpleFSIndexInput.

        Show
        Uwe Schindler added a comment - Committed revision: 793826 Thanks Eks! About the automatic fallback to a SimpleFSIndexInput for small files like segment*, *.del, I open another issue targeted to 3.1. MMapping of small files is wasting system resources and may be slower than just reading a few bytes with SimpleFSIndexInput.

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development