Lucene - Core
  1. Lucene - Core
  2. LUCENE-5673

MMapDirectory shouldn't pass along OOM wrapped as IOException

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.8
    • Fix Version/s: 4.9, 6.0
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The bug here is in java (not MMapDir), but i think we shoudl do something.

      Users get confused when they configure their JVM to trigger something on OOM, and then see "OutOfMemoryError: Map Failed": but their trigger doesnt fire.

      Thats because in the jdk, when it maps files it catches OutOfMemoryError, asks for a garbage collection, sleeps for 100 milliseconds, then tries to map again. if it fails a second time it wraps the OOM in a generic IOException.

      I think we should add a try/catch to our filechannel.map

      1. LUCENE-5673.patch
        4 kB
        Uwe Schindler
      2. LUCENE-5673.patch
        3 kB
        Uwe Schindler
      3. LUCENE-5673.patch
        3 kB
        Uwe Schindler
      4. LUCENE-5673.patch
        1 kB
        Uwe Schindler

        Activity

        Hide
        Uwe Schindler added a comment -

        Hi,
        this is indeed a problem. The existence of the OOM somewhere in the stack trace confuses users, because whenever they see OOM, they start to raise -Xmx and by that make the problem worse.
        The behaviour is correct from the API, because the javadocs of FileChannel.map specify that it throws IOException when mapping fails. But the implementation in OpenJDK is bullshit:
        We should do something like this:

        ...catch (IOException ioe) {
          if (ioe.getCause() instanceof OutOfMemoryError) {
            throw new IOException("Memory mapping failed. There might not be enough unfragmented address space available to mmap the index file: " + ioe.getCause().getMessage()); // without root cause!!!
          }
          throw ioe;
        }
        

        By that the user just gets a good IOException not referring to OOM.

        Indeed the OOM is the real bug, because its caused by OutOfAddressSpaceError and not OutOfMemoryError

        Show
        Uwe Schindler added a comment - Hi, this is indeed a problem. The existence of the OOM somewhere in the stack trace confuses users, because whenever they see OOM, they start to raise -Xmx and by that make the problem worse. The behaviour is correct from the API, because the javadocs of FileChannel.map specify that it throws IOException when mapping fails. But the implementation in OpenJDK is bullshit: We should do something like this: ... catch (IOException ioe) { if (ioe.getCause() instanceof OutOfMemoryError) { throw new IOException( "Memory mapping failed. There might not be enough unfragmented address space available to mmap the index file: " + ioe.getCause().getMessage()); // without root cause!!! } throw ioe; } By that the user just gets a good IOException not referring to OOM. Indeed the OOM is the real bug, because its caused by OutOfAddressSpaceError and not OutOfMemoryError
        Hide
        Uwe Schindler added a comment -

        Here is a patch, hiding the OOM (because it is definitley no real OOM). This patch also takes care of "other more correct" impls around (like J9). It only triggers by OOM as root cause of IOE.

        Show
        Uwe Schindler added a comment - Here is a patch, hiding the OOM (because it is definitley no real OOM). This patch also takes care of "other more correct" impls around (like J9). It only triggers by OOM as root cause of IOE.
        Hide
        Robert Muir added a comment -

        I think its a good start but we need to be careful here.

        First of all my problem is with the "OutOfMemoryError" text. I do still think the message should start with "Map failed" instead of "Memory mapping failed". We want users to be able to google the error and still find some assistance.

        If we are going to offer more explanation in addition to that, it woudl be good to try to add practical stuff: e.g. mention 'ulimit' and 'sysctl vm.max_map_count' and so on.

        Show
        Robert Muir added a comment - I think its a good start but we need to be careful here. First of all my problem is with the "OutOfMemoryError" text. I do still think the message should start with "Map failed" instead of "Memory mapping failed". We want users to be able to google the error and still find some assistance. If we are going to offer more explanation in addition to that, it woudl be good to try to add practical stuff: e.g. mention 'ulimit' and 'sysctl vm.max_map_count' and so on.
        Hide
        Uwe Schindler added a comment -

        Thanks Robert!
        I already improved the error message in my patch to also include the file name and the original message of the OOM. We can improve that.
        The message already contains some hints, but no direct refercces to sysctl, becaus ethis is too OS specific. Of course we could add a switch statement with constants giving some hints per OS.
        The logic in the patch is ready, we can now just improve the error message.

        Show
        Uwe Schindler added a comment - Thanks Robert! I already improved the error message in my patch to also include the file name and the original message of the OOM. We can improve that. The message already contains some hints, but no direct refercces to sysctl, becaus ethis is too OS specific. Of course we could add a switch statement with constants giving some hints per OS. The logic in the patch is ready, we can now just improve the error message.
        Hide
        Uwe Schindler added a comment -

        Slightly improved patch:

        • Preserves the original OOM message text at the beginning of the message
        • Followed by FilleChannel.toString()
        • Then add more explaining text, including the buffer size that failed to map
        Show
        Uwe Schindler added a comment - Slightly improved patch: Preserves the original OOM message text at the beginning of the message Followed by FilleChannel.toString() Then add more explaining text, including the buffer size that failed to map
        Hide
        Robert Muir added a comment -

        Wait, we absolutely don't want the original text. This is the whole thing that causes confusion, the whole reason I opened this issue.

        The whole reason its confusing is because its "OutOfMemoryError: Map failed". Why can't it just start with "Map failed"

        If you want the text to say "OutOfMemoryError", then please, unwrap the OutOfMemoryError from the IOE and throw that.

        Show
        Robert Muir added a comment - Wait, we absolutely don't want the original text. This is the whole thing that causes confusion, the whole reason I opened this issue. The whole reason its confusing is because its "OutOfMemoryError: Map failed". Why can't it just start with "Map failed" If you want the text to say "OutOfMemoryError", then please, unwrap the OutOfMemoryError from the IOE and throw that.
        Hide
        Uwe Schindler added a comment -

        The attached patch should bring only the "Map failed". But in any case we can also hardcode the text, so we can remove the getMessage(). I just wanted to preserve the original message. The "OutOfMemoryError" comes from the wrapped exception, but is not part of the message (see FileChannelImpl of Java 7): throw new IOException("Map failed", oome);. My code takes the message of the IOException ("Map failed"), ignores the cause and adds more information like resourceDescription and the hint "why it failed".

        I was thinking about the problem a bit more, we should always add the resource description, so have 2 exception reformats:

        • Change IOExceptions with OOM wrapped to have a hard-coded text "Map failed: resourceDescription (this may be caused...)"
        • All other IOExceptions maybe get the resourceDescription just appended? I am not sure about this, which is a more general issue of adding resourceDescription to all IOExceptions our DirectoryImpls throw.
        Show
        Uwe Schindler added a comment - The attached patch should bring only the "Map failed". But in any case we can also hardcode the text, so we can remove the getMessage(). I just wanted to preserve the original message. The "OutOfMemoryError" comes from the wrapped exception, but is not part of the message (see FileChannelImpl of Java 7): throw new IOException("Map failed", oome); . My code takes the message of the IOException ("Map failed"), ignores the cause and adds more information like resourceDescription and the hint "why it failed". I was thinking about the problem a bit more, we should always add the resource description, so have 2 exception reformats: Change IOExceptions with OOM wrapped to have a hard-coded text "Map failed: resourceDescription (this may be caused...)" All other IOExceptions maybe get the resourceDescription just appended? I am not sure about this, which is a more general issue of adding resourceDescription to all IOExceptions our DirectoryImpls throw.
        Hide
        Uwe Schindler added a comment -

        Attached is a patch with hardcoded "Map failed" message. The inner OOM is ignored completely, it is just used to detect this special case.

        I also used resourceDescription in the unmapper and removed the Java7-obsolete initCause() there.

        Show
        Uwe Schindler added a comment - Attached is a patch with hardcoded "Map failed" message. The inner OOM is ignored completely, it is just used to detect this special case. I also used resourceDescription in the unmapper and removed the Java7-obsolete initCause() there.
        Hide
        Robert Muir added a comment -

        ok now we just need the practical advice to the message...

        CONSTANTS.32BIT: "get a new computer"
        CONSTANTS.WINDOWS: "get a new operating system"
        CONSTANTS.LINUX: "please review 'ulimit -v' and 'sysctl vm.max_map_count'"

        Show
        Robert Muir added a comment - ok now we just need the practical advice to the message... CONSTANTS.32BIT: "get a new computer" CONSTANTS.WINDOWS: "get a new operating system" CONSTANTS.LINUX: "please review 'ulimit -v' and 'sysctl vm.max_map_count'"
        Hide
        Uwe Schindler added a comment -

        Here is a new patch:

        • The additional information and resourceDescription is now used on any IOException while mapping.
        • If the cause is a OOM, the new Exception does not get a cause anymore, just "Map failed" and the additional infos
        • In all other cases, the original message is preserved and annotated with our information. The original cause is preserved (initCause on new Ex).
        • the stack trace of original Exception is preserved
        Show
        Uwe Schindler added a comment - Here is a new patch: The additional information and resourceDescription is now used on any IOException while mapping. If the cause is a OOM, the new Exception does not get a cause anymore, just "Map failed" and the additional infos In all other cases, the original message is preserved and annotated with our information. The original cause is preserved (initCause on new Ex). the stack trace of original Exception is preserved
        Hide
        Uwe Schindler added a comment -

        I will now do some tests on Linux with very limited ulimit -v.

        Show
        Uwe Schindler added a comment - I will now do some tests on Linux with very limited ulimit -v .
        Hide
        Uwe Schindler added a comment -

        I tried the latest patch with Linux 32 bit and an ulimit -v 1_000_000:

           [junit4] ERROR    132s | Test4GBStoredFields.test <<<
           [junit4]    > Throwable #1: java.io.IOException: Map failed: MMapIndexInput(path="/media/sf_Host/Projects/lucene/trunk-lusolr1/lucene/build/core/test/J0/lucene.index.Test4GBStoredFields-C16129C282E2746E-001/4GBStoredFields-001/_0.fdt") [this may be caused by lack of enough unfragmented virtual address space or too restrictive virtual memory limits enforced by the operating system, preventing us to map a chunk of 268435456 bytes. MMapDirectory should only be used on 64bit platforms, because the address space on 32bit operating systems is too small. More information: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html]
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([C16129C282E2746E:493516182C1E1996]:0)
           [junit4]    > 	at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:888)
           [junit4]    > 	at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:271)
           [junit4]    > 	at org.apache.lucene.store.MMapDirectory$MMapIndexInput.<init>(MMapDirectory.java:221)
           [junit4]    > 	at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:196)
           [junit4]    > 	at org.apache.lucene.store.Directory.copy(Directory.java:187)
           [junit4]    > 	at org.apache.lucene.store.MockDirectoryWrapper.copy(MockDirectoryWrapper.java:947)
           [junit4]    > 	at org.apache.lucene.store.TrackingDirectoryWrapper.copy(TrackingDirectoryWrapper.java:50)
           [junit4]    > 	at org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4504)
           [junit4]    > 	at org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:485)
           [junit4]    > 	at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:452)
           [junit4]    > 	at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:518)
           [junit4]    > 	at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:629)
           [junit4]    > 	at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3042)
           [junit4]    > 	at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3018)
           [junit4]    > 	at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1671)
           [junit4]    > 	at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1647)
           [junit4]    > 	at org.apache.lucene.index.Test4GBStoredFields.test(Test4GBStoredFields.java:83)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:744)
        
        Show
        Uwe Schindler added a comment - I tried the latest patch with Linux 32 bit and an ulimit -v 1_000_000 : [junit4] ERROR 132s | Test4GBStoredFields.test <<< [junit4] > Throwable #1: java.io.IOException: Map failed: MMapIndexInput(path="/media/sf_Host/Projects/lucene/trunk-lusolr1/lucene/build/core/test/J0/lucene.index.Test4GBStoredFields-C16129C282E2746E-001/4GBStoredFields-001/_0.fdt") [this may be caused by lack of enough unfragmented virtual address space or too restrictive virtual memory limits enforced by the operating system, preventing us to map a chunk of 268435456 bytes. MMapDirectory should only be used on 64bit platforms, because the address space on 32bit operating systems is too small. More information: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html] [junit4] > at __randomizedtesting.SeedInfo.seed([C16129C282E2746E:493516182C1E1996]:0) [junit4] > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:888) [junit4] > at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:271) [junit4] > at org.apache.lucene.store.MMapDirectory$MMapIndexInput.<init>(MMapDirectory.java:221) [junit4] > at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:196) [junit4] > at org.apache.lucene.store.Directory.copy(Directory.java:187) [junit4] > at org.apache.lucene.store.MockDirectoryWrapper.copy(MockDirectoryWrapper.java:947) [junit4] > at org.apache.lucene.store.TrackingDirectoryWrapper.copy(TrackingDirectoryWrapper.java:50) [junit4] > at org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4504) [junit4] > at org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:485) [junit4] > at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:452) [junit4] > at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:518) [junit4] > at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:629) [junit4] > at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3042) [junit4] > at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3018) [junit4] > at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1671) [junit4] > at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1647) [junit4] > at org.apache.lucene.index.Test4GBStoredFields.test(Test4GBStoredFields.java:83) [junit4] > at java.lang.Thread.run(Thread.java:744)
        Hide
        Robert Muir added a comment -

        +1 !

        this is going to save a lot of headaches.

        Show
        Robert Muir added a comment - +1 ! this is going to save a lot of headaches.
        Hide
        Uwe Schindler added a comment - - edited

        Should I backport to 4.8.1? This does not brea logic, it just cleans up the exception, so no risk to break something.

        Show
        Uwe Schindler added a comment - - edited Should I backport to 4.8.1? This does not brea logic, it just cleans up the exception, so no risk to break something.
        Hide
        Uwe Schindler added a comment -

        Oh the vote was already called. If we respin we can add this, but I will for now only do 4.9 and 5.0

        Show
        Uwe Schindler added a comment - Oh the vote was already called. If we respin we can add this, but I will for now only do 4.9 and 5.0
        Hide
        ASF subversion and git services added a comment -

        Commit 1595213 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1595213 ]

        LUCENE-5673: MMapDirectory: Work around a "bug" in the JDK that throws a confusing OutOfMemoryError wrapped inside IOException if the FileChannel mapping failed because of lack of virtual address space. The IOException is rethrown with more useful information about the problem, omitting the incorrect OutOfMemoryError

        Show
        ASF subversion and git services added a comment - Commit 1595213 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1595213 ] LUCENE-5673 : MMapDirectory: Work around a "bug" in the JDK that throws a confusing OutOfMemoryError wrapped inside IOException if the FileChannel mapping failed because of lack of virtual address space. The IOException is rethrown with more useful information about the problem, omitting the incorrect OutOfMemoryError
        Hide
        ASF subversion and git services added a comment -

        Commit 1595214 from Uwe Schindler in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1595214 ]

        Merged revision(s) 1595213 from lucene/dev/trunk:
        LUCENE-5673: MMapDirectory: Work around a "bug" in the JDK that throws a confusing OutOfMemoryError wrapped inside IOException if the FileChannel mapping failed because of lack of virtual address space. The IOException is rethrown with more useful information about the problem, omitting the incorrect OutOfMemoryError

        Show
        ASF subversion and git services added a comment - Commit 1595214 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1595214 ] Merged revision(s) 1595213 from lucene/dev/trunk: LUCENE-5673 : MMapDirectory: Work around a "bug" in the JDK that throws a confusing OutOfMemoryError wrapped inside IOException if the FileChannel mapping failed because of lack of virtual address space. The IOException is rethrown with more useful information about the problem, omitting the incorrect OutOfMemoryError
        Hide
        Uwe Schindler added a comment -

        Reopen, if we backport on respin!

        Show
        Uwe Schindler added a comment - Reopen, if we backport on respin!

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development