Lucene - Core
  1. Lucene - Core
  2. LUCENE-4650

Test framework should be more robust under OOM conditions

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1, 6.0
    • Component/s: general/test
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When tests OOM (or run out of their permgen space) things to wild (hung runner, etc.).

        Issue Links

          Activity

          Hide
          Dawid Weiss added a comment -

          randomizedtesting 2.0.8 has some tweaks to make it work. I tend to believe a "real" solution doesn't exist, this is the best I could come up with – there is a preallocated last-resort memory pool of 1MB that gets released when OOM is thrown. A small delay is introduced to allow concurrent GC threads to do their job, then the process attempts to serialize the exception and write back some messages about the OOM.

          It still doesn't always work: I had some random OOMs even in System.out.printlns on J9, whether the GC does or doesn't reclaim the released memory is unknown and is an inherent race condition, the VM itself (hotspot) seems to require some memory when it does Runtime.halt; under low-memory conditions it always returns with exit code 1, regardless of the status passed to halt.

          PermGen space errors are even more difficult. I tried preloading classes at startup but this turns out to be such an unholy mess (and still without the guarantee that something is not omitted) that I reverted this change. To my defense – I tried with Maven's surefire and it also craps out (hangs) under such conditions.

          Show
          Dawid Weiss added a comment - randomizedtesting 2.0.8 has some tweaks to make it work. I tend to believe a "real" solution doesn't exist, this is the best I could come up with – there is a preallocated last-resort memory pool of 1MB that gets released when OOM is thrown. A small delay is introduced to allow concurrent GC threads to do their job, then the process attempts to serialize the exception and write back some messages about the OOM. It still doesn't always work: I had some random OOMs even in System.out.printlns on J9, whether the GC does or doesn't reclaim the released memory is unknown and is an inherent race condition, the VM itself (hotspot) seems to require some memory when it does Runtime.halt; under low-memory conditions it always returns with exit code 1, regardless of the status passed to halt. PermGen space errors are even more difficult. I tried preloading classes at startup but this turns out to be such an unholy mess (and still without the guarantee that something is not omitted) that I reverted this change. To my defense – I tried with Maven's surefire and it also craps out (hangs) under such conditions.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Dawid Weiss
          http://svn.apache.org/viewvc?view=revision&revision=1427696

          LUCENE-4650: Upgrade randomized testing to version 2.0.8: make the test framework more robust under low memory conditions.

          Show
          Commit Tag Bot added a comment - [trunk commit] Dawid Weiss http://svn.apache.org/viewvc?view=revision&revision=1427696 LUCENE-4650 : Upgrade randomized testing to version 2.0.8: make the test framework more robust under low memory conditions.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Dawid Weiss
          http://svn.apache.org/viewvc?view=revision&revision=1427697

          LUCENE-4650: Upgrade randomized testing to version 2.0.8: make the test framework more robust under low memory conditions.

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Dawid Weiss http://svn.apache.org/viewvc?view=revision&revision=1427697 LUCENE-4650 : Upgrade randomized testing to version 2.0.8: make the test framework more robust under low memory conditions.

            People

            • Assignee:
              Dawid Weiss
              Reporter:
              Dawid Weiss
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development