Lucene - Core
  1. Lucene - Core
  2. LUCENE-1566

Large Lucene index can hit false OOM due to Sun JRE issue

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4.1
    • Fix Version/s: 2.9
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This is not a Lucene issue, but I want to open this so future google
      diggers can more easily find it.

      There's this nasty bug in Sun's JRE:

      http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546

      The gist seems to be, if you try to read a large (eg 200 MB) number of
      bytes during a single RandomAccessFile.read call, you can incorrectly
      hit OOM. Lucene does this, with norms, since we read in one byte per
      doc per field with norms, as a contiguous array of length maxDoc().

      The workaround was a custom patch to do large file reads as several
      smaller reads.

      Background here:

      http://www.nabble.com/problems-with-large-Lucene-index-td22347854.html

      1. LUCENE-1566.patch
        18 kB
        Michael McCandless
      2. LUCENE_1566_IndexInput_Changes.patch
        21 kB
        Simon Willnauer
      3. LUCENE_1566_IndexInput_Changes.patch
        14 kB
        Simon Willnauer
      4. LUCENE_1566_IndexInput.patch
        17 kB
        Simon Willnauer
      5. LUCENE-1566.patch
        4 kB
        Simon Willnauer
      6. LUCENE-1566.patch
        4 kB
        Simon Willnauer

        Activity

        Hide
        Simon Willnauer added a comment -

        I will look into this issue closer next week, Once I can reproduce it I will write up a custom patch.

        simon

        Show
        Simon Willnauer added a comment - I will look into this issue closer next week, Once I can reproduce it I will write up a custom patch. simon
        Hide
        Michael McCandless added a comment -

        Excellent, thanks Simon!

        Show
        Michael McCandless added a comment - Excellent, thanks Simon!
        Hide
        Simon Willnauer added a comment -

        I was able to reproduce the bug on my machine using several JVMs. The attached patch is what I got ready by now - I though I get it out there as soon as possible for discussion.
        Test pass on my side!

        Show
        Simon Willnauer added a comment - I was able to reproduce the bug on my machine using several JVMs. The attached patch is what I got ready by now - I though I get it out there as soon as possible for discussion. Test pass on my side!
        Hide
        Simon Willnauer added a comment -

        missed to enable asserts in testcase - updated patch to correctly assert offset / length

        Show
        Simon Willnauer added a comment - missed to enable asserts in testcase - updated patch to correctly assert offset / length
        Hide
        Simon Willnauer added a comment -

        @Mike: could you have a look at this if you have time, thanks!

        Show
        Simon Willnauer added a comment - @Mike: could you have a look at this if you have time, thanks!
        Hide
        Michael McCandless added a comment -

        Yes, I'll take this. Thanks Simon!

        Show
        Michael McCandless added a comment - Yes, I'll take this. Thanks Simon!
        Hide
        Michael McCandless added a comment -

        Could we move the fix down into SimpleFSDir.FSIndexInput? We are only hitting it with norms today, but in the future if we read other large things from the index (eg with column-stride fields) we'd like the workaround to apply there as well.

        Can we turn it on by default, only on 32 bit JREs?

        I think you can read directly into the norms byte[], instead of allocating a temp buffer, as long as you read in smaller chunks. From the email thread it looked like smaller was up to ~100 MB, which I think is fine to simply do by default. Maybe offer a setter (instead of static property) to turn this workaround on/off in SimpleFSDir.

        Show
        Michael McCandless added a comment - Could we move the fix down into SimpleFSDir.FSIndexInput? We are only hitting it with norms today, but in the future if we read other large things from the index (eg with column-stride fields) we'd like the workaround to apply there as well. Can we turn it on by default, only on 32 bit JREs? I think you can read directly into the norms byte[], instead of allocating a temp buffer, as long as you read in smaller chunks. From the email thread it looked like smaller was up to ~100 MB, which I think is fine to simply do by default. Maybe offer a setter (instead of static property) to turn this workaround on/off in SimpleFSDir.
        Hide
        Simon Willnauer added a comment -

        Could we move the fix down into SimpleFSDir.FSIndexInput? We are only hitting it with norms today, but in the future if we read other large things from the index (eg with column-stride fields) we'd like the workaround to apply there as well.

        sure!

        Can we turn it on by default, only on 32 bit JREs?

        good point! - will try to do so.

        I think you can read directly into the norms byte[], instead of allocating a temp buffer, as long as you read in smaller chunks. From the email thread it looked like smaller was up to ~100 MB, which I think is fine to simply do by default. Maybe offer a setter (instead of static property) to turn this workaround on/off in SimpleFSDir.

        I did hit the error while I did that but I will verify again.

        Thanks for the comments thats why I hacked it together and did throw it out ASAP. Will look at it tomorrow.

        simon

        Show
        Simon Willnauer added a comment - Could we move the fix down into SimpleFSDir.FSIndexInput? We are only hitting it with norms today, but in the future if we read other large things from the index (eg with column-stride fields) we'd like the workaround to apply there as well. sure! Can we turn it on by default, only on 32 bit JREs? good point! - will try to do so. I think you can read directly into the norms byte[], instead of allocating a temp buffer, as long as you read in smaller chunks. From the email thread it looked like smaller was up to ~100 MB, which I think is fine to simply do by default. Maybe offer a setter (instead of static property) to turn this workaround on/off in SimpleFSDir. I did hit the error while I did that but I will verify again. Thanks for the comments thats why I hacked it together and did throw it out ASAP. Will look at it tomorrow. simon
        Hide
        Michael McCandless added a comment -

        I did hit the error while I did that but I will verify again.

        Ugh. If that's the case then maybe catch the OOM in the read, and fallback to the temp buffer read solution?

        Show
        Michael McCandless added a comment - I did hit the error while I did that but I will verify again. Ugh. If that's the case then maybe catch the OOM in the read, and fallback to the temp buffer read solution?
        Hide
        Michael McCandless added a comment -

        Assigning this one back to you Simon!

        Show
        Michael McCandless added a comment - Assigning this one back to you Simon!
        Hide
        Thulasi Ram Naidu P added a comment -

        any temporary solution for this problem?

        Show
        Thulasi Ram Naidu P added a comment - any temporary solution for this problem?
        Hide
        Michael McCandless added a comment -

        Alas no simple workaround here. You could try using the current patch, here (though this patch will only apply to 2.9, I think). Or, create your own subclass of FSDir that overrides the read method of FSIndexInput.

        Are you hitting the issue? Can you provide some details about the size of your index, etc? Maybe the output of running "java org.apache.lucene.index.CheckIndex /path/to/index"?

        Show
        Michael McCandless added a comment - Alas no simple workaround here. You could try using the current patch, here (though this patch will only apply to 2.9, I think). Or, create your own subclass of FSDir that overrides the read method of FSIndexInput. Are you hitting the issue? Can you provide some details about the size of your index, etc? Maybe the output of running "java org.apache.lucene.index.CheckIndex /path/to/index"?
        Hide
        Simon Willnauer added a comment -

        @Mike: Thanks for your comments.
        I did run my testcase to reproduce the OOM with some other directory implementation (SimpleFSDirectory and NIOFSDirectory) and both of them suffer from the JVM bug. My testcase is the following.
        1. Start JVM with -Xmx2500M (32bit) either 1.5 / 1.6 – I hit the error with all of my VMs
        2. index 250000000 simple documents and optimize the index once the last document is added.
        3. open IndexReader with either a SimpleFSDirectory or NIOFSDirectory
        4. Catch the error

        I added a workaround to FSDirectory / NIOFSDirectory / SimpleFSDirectory as well as a testcase to test the added code for correctness. The included testcase will not trigger the JVM bug as I need such a specific setup to trigger it.

        Any comments welcome.
        simon

        Show
        Simon Willnauer added a comment - @Mike: Thanks for your comments. I did run my testcase to reproduce the OOM with some other directory implementation (SimpleFSDirectory and NIOFSDirectory) and both of them suffer from the JVM bug. My testcase is the following. 1. Start JVM with -Xmx2500M (32bit) either 1.5 / 1.6 – I hit the error with all of my VMs 2. index 250000000 simple documents and optimize the index once the last document is added. 3. open IndexReader with either a SimpleFSDirectory or NIOFSDirectory 4. Catch the error I added a workaround to FSDirectory / NIOFSDirectory / SimpleFSDirectory as well as a testcase to test the added code for correctness. The included testcase will not trigger the JVM bug as I need such a specific setup to trigger it. Any comments welcome. simon
        Hide
        Michael McCandless added a comment -

        Looks good Simon!

        • From your post above, I thought the "read in 100MB chunks into the
          big array" failed to work around the issue? Ie that's why you
          allocated a tiny intermediate buffer? Did that turn out not to be
          the case?
        • How about, instead of ignoring the chunk size on 64bit jre, we
          conditionally default it to Long.MAX_VALUE?
        • Can you link also to the Sun VM bug in the javadocs for these
          set/getChunkSize?
        Show
        Michael McCandless added a comment - Looks good Simon! From your post above, I thought the "read in 100MB chunks into the big array" failed to work around the issue? Ie that's why you allocated a tiny intermediate buffer? Did that turn out not to be the case? How about, instead of ignoring the chunk size on 64bit jre, we conditionally default it to Long.MAX_VALUE? Can you link also to the Sun VM bug in the javadocs for these set/getChunkSize?
        Hide
        Simon Willnauer added a comment -

        From your post above, I thought the "read in 100MB chunks into the

        big array" failed to work around the issue? Ie that's why you
        allocated a tiny intermediate buffer? Did that turn out not to be
        the case?

        I verified that again with a 1.5.0_08 as well as the latest JVM on linux 32bit. Did not run the tests on windows yet but will do so tomorrow. Haven't had time to install the JVMs (or at least one) listed in the Sun Bug Report. Reading in chunks into a big array is fine.

        How about, instead of ignoring the chunk size on 64bit jre, we

        conditionally default it to Long.MAX_VALUE?

        Hmm, would remove "code duplication" - good point. The readBytes method seems to be critical but I guess there won't be any performance impact by assigning one or two extra variables.

        Can you link also to the Sun VM bug in the javadocs for these

        set/getChunkSize?

        Sure, I will also add CHANGES.TXT in the patch.
        will do as soon as I get into the office tomorrow.

        Show
        Simon Willnauer added a comment - From your post above, I thought the "read in 100MB chunks into the big array" failed to work around the issue? Ie that's why you allocated a tiny intermediate buffer? Did that turn out not to be the case? I verified that again with a 1.5.0_08 as well as the latest JVM on linux 32bit. Did not run the tests on windows yet but will do so tomorrow. Haven't had time to install the JVMs (or at least one) listed in the Sun Bug Report. Reading in chunks into a big array is fine. How about, instead of ignoring the chunk size on 64bit jre, we conditionally default it to Long.MAX_VALUE? Hmm, would remove "code duplication" - good point. The readBytes method seems to be critical but I guess there won't be any performance impact by assigning one or two extra variables. Can you link also to the Sun VM bug in the javadocs for these set/getChunkSize? Sure, I will also add CHANGES.TXT in the patch. will do as soon as I get into the office tomorrow.
        Hide
        Simon Willnauer added a comment -
        • Set chunkSize to Integer.MAX_VALUE on 64 Bit JVM
        • Removed 64bit JVM condition as chunkSize is set to maximum in 64bit case
        • Added CHANGES.TXT to patch

        @Mike: once you commit this change I will close this issue.

        Simon

        Show
        Simon Willnauer added a comment - Set chunkSize to Integer.MAX_VALUE on 64 Bit JVM Removed 64bit JVM condition as chunkSize is set to maximum in 64bit case Added CHANGES.TXT to patch @Mike: once you commit this change I will close this issue. Simon
        Hide
        Michael McCandless added a comment -

        SimpleFSDirectory is missing from the last patch?

        Show
        Michael McCandless added a comment - SimpleFSDirectory is missing from the last patch?
        Hide
        Simon Willnauer added a comment -

        added missing SimpleFSDirectory.

        Show
        Simon Willnauer added a comment - added missing SimpleFSDirectory.
        Hide
        Simon Willnauer added a comment -

        SimpleFSDirectory is missing from the last patch?

        ups!

        Show
        Simon Willnauer added a comment - SimpleFSDirectory is missing from the last patch? ups!
        Hide
        Michael McCandless added a comment -

        OK I reworked the patch some, tweaking javadocs, changes, etc., and
        simplifying the loops that read the bytes inside NIOFSDir &
        SimpleFSDir. I think it's ready to commit. Simon can you take a
        look? Thanks.

        Show
        Michael McCandless added a comment - OK I reworked the patch some, tweaking javadocs, changes, etc., and simplifying the loops that read the bytes inside NIOFSDir & SimpleFSDir. I think it's ready to commit. Simon can you take a look? Thanks.
        Hide
        Simon Willnauer added a comment -

        Simon can you take a look?

        I looked at it closely and did run testcases again. Looks good to me! Thanks for looking at it again!

        simon

        Show
        Simon Willnauer added a comment - Simon can you take a look? I looked at it closely and did run testcases again. Looks good to me! Thanks for looking at it again! simon
        Hide
        Michael McCandless added a comment -

        OK thanks Simon; I'll commit shortly.

        Show
        Michael McCandless added a comment - OK thanks Simon; I'll commit shortly.
        Hide
        Michael McCandless added a comment -

        Thanks Simon!

        Show
        Michael McCandless added a comment - Thanks Simon!

          People

          • Assignee:
            Simon Willnauer
            Reporter:
            Michael McCandless
          • Votes:
            3 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development