Lucene - Core
  1. Lucene - Core
  2. LUCENE-4901

TestIndexWriterOnJRECrash should work on any JRE vendor via Runtime.halt()

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4, 6.0
    • Component/s: general/test
    • Labels:
      None
    • Environment:

      Red Hat EL 6.3
      IBM Java 1.6.0
      ANT 1.9.0

    • Lucene Fields:
      New, Patch Available

      Description

      I successfully compiled Lucene 4.2 with IBM.
      Then ran unit tests with the nightly option set to "true"

      The test case TestIndexWriterOnJRECrash was skipped returning "IBM Corporation JRE not supported":

      [junit4:junit4] Suite: org.apache.lucene.index.TestIndexWriterOnJRECrash
      [junit4:junit4] IGNOR/A 0.28s | TestIndexWriterOnJRECrash.testNRTThreads
      [junit4:junit4] > Assumption #1: IBM Corporation JRE not supported.
      [junit4:junit4] Completed in 0.68s, 1 test, 1 skipped

      1. LUCENE-4901.patch
        5 kB
        Dawid Weiss
      2. LUCENE-4901.patch
        2 kB
        Dawid Weiss
      3. test-IBM-java-vendor.patch
        0.9 kB
        Rodrigo Trujillo

        Activity

        Hide
        Rodrigo Trujillo added a comment -

        Test if IBM JRE is being used.
        Avoid TestIndexWriterOnJRECrash be skipped

        Show
        Rodrigo Trujillo added a comment - Test if IBM JRE is being used. Avoid TestIndexWriterOnJRECrash be skipped
        Hide
        Dawid Weiss added a comment -

        I think I can actually handle this one – I have JRE crash code for all major vendors (never thought I'd say that...). Let me see.

        Show
        Dawid Weiss added a comment - I think I can actually handle this one – I have JRE crash code for all major vendors (never thought I'd say that...). Let me see.
        Hide
        Rodrigo Trujillo added a comment -

        Hi Dawid,
        just attached a patch.

        It worked fine in my tests.

        I wonder whether it could be applied to trunk too

        Show
        Rodrigo Trujillo added a comment - Hi Dawid, just attached a patch. It worked fine in my tests. I wonder whether it could be applied to trunk too
        Hide
        Robert Muir added a comment -

        I remember the current crash code worked for IBM, but there was some reason we disabled it.

        I am having trouble remembering the reason why: I think it was something crazy, like IBM being pretty verbose when it crashes and filling up your home directory with huge logs? I'm not sure about that though...

        Show
        Robert Muir added a comment - I remember the current crash code worked for IBM, but there was some reason we disabled it. I am having trouble remembering the reason why: I think it was something crazy, like IBM being pretty verbose when it crashes and filling up your home directory with huge logs? I'm not sure about that though...
        Hide
        Dawid Weiss added a comment -

        IBM definitely dumps a lot of stuff when it crashes. But isn't build/ cwd cleaned up on each build? I run such tests for randomizedtesting so if needed I can crash anything (J9, jrockit, hotspot). The way I do it is I link to a small snippet of native code (there are binaries for all platforms) which dereferences a null pointer. This core-dumps the JVM sort of by-design.

        Show
        Dawid Weiss added a comment - IBM definitely dumps a lot of stuff when it crashes. But isn't build/ cwd cleaned up on each build? I run such tests for randomizedtesting so if needed I can crash anything (J9, jrockit, hotspot). The way I do it is I link to a small snippet of native code (there are binaries for all platforms) which dereferences a null pointer. This core-dumps the JVM sort of by-design.
        Hide
        Robert Muir added a comment -

        According to SVN logs, when i disabled this it was because:

        Log:
        can't reliably crash recent IBM J9 jvms this way
        

        And i did this because occasionally, instead of crashing, it would cause the jvm to hang instead.

        so we need a better crash code for this JVM

        Show
        Robert Muir added a comment - According to SVN logs, when i disabled this it was because: Log: can't reliably crash recent IBM J9 jvms this way And i did this because occasionally, instead of crashing, it would cause the jvm to hang instead. so we need a better crash code for this JVM
        Hide
        Uwe Schindler added a comment -

        This remembers me to upgrade IBM J9 on Policeman Jenkins and try Jenkins tests again. Currently the Lucene tests does not successfully pass with IBM J9 for every random seed, because IBM J9 has too many bugs (miscompiled bytecode/loops/...).

        Show
        Uwe Schindler added a comment - This remembers me to upgrade IBM J9 on Policeman Jenkins and try Jenkins tests again. Currently the Lucene tests does not successfully pass with IBM J9 for every random seed, because IBM J9 has too many bugs (miscompiled bytecode/loops/...).
        Hide
        Dawid Weiss added a comment -

        It crashed for me on every build, always, on everything I've tried. Perhaps if it traps and there are some native threads elsewhere it can hang, don't know.

        Show
        Dawid Weiss added a comment - It crashed for me on every build, always, on everything I've tried. Perhaps if it traps and there are some native threads elsewhere it can hang, don't know.
        Hide
        Dawid Weiss added a comment -

        Wait... I'm not saying to dereference an NPE via Unsafe – this, indeed, doesn't always work. I'm talking about using a snippet of native code to do it (and this is bulletproof).

        Show
        Dawid Weiss added a comment - Wait... I'm not saying to dereference an NPE via Unsafe – this, indeed, doesn't always work. I'm talking about using a snippet of native code to do it (and this is bulletproof).
        Hide
        Robert Muir added a comment -

        Right, but i'm not sure its worth it to bring in native code just for this test (its cheating, and would complicate a lot of things in the build for this one guy).

        i'm also unhappy with the test currently, because it doesn't even detect the bugs it was written to find (like LUCENE-4738).

        Show
        Robert Muir added a comment - Right, but i'm not sure its worth it to bring in native code just for this test (its cheating, and would complicate a lot of things in the build for this one guy). i'm also unhappy with the test currently, because it doesn't even detect the bugs it was written to find (like LUCENE-4738 ).
        Hide
        Robert Muir added a comment -

        Maybe the 'alternative' crash should be just that... grab its own PID and fork off a process to kill -9 itself?

        Show
        Robert Muir added a comment - Maybe the 'alternative' crash should be just that... grab its own PID and fork off a process to kill -9 itself?
        Hide
        Dawid Weiss added a comment -

        The native bits are tiny and could be unpacked dynamically. But you're right – it's a pain. I don't know if there's a kill -9 equivalent on Windows, otherwise you're right – it'd be a nice solution.

        (Just remember it's impossible to get the PID of a forked process from Java below Java 8 – you need to use the workarounds we use, pass the ID from the child back to the parent etc. Uglier than native bits I think).

        Show
        Dawid Weiss added a comment - The native bits are tiny and could be unpacked dynamically. But you're right – it's a pain. I don't know if there's a kill -9 equivalent on Windows, otherwise you're right – it'd be a nice solution. (Just remember it's impossible to get the PID of a forked process from Java below Java 8 – you need to use the workarounds we use, pass the ID from the child back to the parent etc. Uglier than native bits I think).
        Hide
        Uwe Schindler added a comment - - edited

        I am not sure: Instead of crashing, maybe using Runtime.halt() from a parallel thread after some timeout? Runtime.halt() kills the JVM like kill -9 and does not run any shutdown hooks or let GC finish its work? Open files are closed in any case, because the kernel does this when the process exits (also on Unsafe's SIGSEGV).

        Show
        Uwe Schindler added a comment - - edited I am not sure: Instead of crashing, maybe using Runtime.halt() from a parallel thread after some timeout? Runtime.halt() kills the JVM like kill -9 and does not run any shutdown hooks or let GC finish its work? Open files are closed in any case, because the kernel does this when the process exits (also on Unsafe's SIGSEGV).
        Hide
        Robert Muir added a comment -

        If we change the test to do this, we need to give it a backdoor for the security manager i think.

        Show
        Robert Muir added a comment - If we change the test to do this, we need to give it a backdoor for the security manager i think.
        Hide
        Uwe Schindler added a comment -

        This is a child JVM of the test, it has no security manager (at least it needs none)?

        Show
        Uwe Schindler added a comment - This is a child JVM of the test, it has no security manager (at least it needs none)?
        Hide
        Dawid Weiss added a comment -

        Yep, system.halt will work here too. I just like it when it really breaks without the possibility to clean up properly You get to see what each jvm dumps etc.

        But seriously, a halt from a background thread should indeed do fine and is portable. I did encounter very rare cases of halt() not exiting under very low memory conditions but I don't think this is the case here.

        Show
        Dawid Weiss added a comment - Yep, system.halt will work here too. I just like it when it really breaks without the possibility to clean up properly You get to see what each jvm dumps etc. But seriously, a halt from a background thread should indeed do fine and is portable. I did encounter very rare cases of halt() not exiting under very low memory conditions but I don't think this is the case here.
        Hide
        Dawid Weiss added a comment -

        I've removed the vendor-check assumption entirely and call Runtime.halt() instead of messing with Unsafe/ zero pointer dereference.

        Take a look, if there are no objections I'll commit it in.

        Show
        Dawid Weiss added a comment - I've removed the vendor-check assumption entirely and call Runtime.halt() instead of messing with Unsafe/ zero pointer dereference. Take a look, if there are no objections I'll commit it in.
        Hide
        Michael McCandless added a comment -

        Is Runtime.halt too "nice"? Ie is there any chance it will do any cleanup at all? (The javadocs seem to indicate no but still....).

        I like the crashing version because it's most "accurate" to what we are trying to test here ... ie rather than asking the JVM to shoot itself, we do the shooting.

        Show
        Michael McCandless added a comment - Is Runtime.halt too "nice"? Ie is there any chance it will do any cleanup at all? (The javadocs seem to indicate no but still....). I like the crashing version because it's most "accurate" to what we are trying to test here ... ie rather than asking the JVM to shoot itself, we do the shooting.
        Hide
        Dawid Weiss added a comment -

        I think Runtime.halt is a very similar thing – it (should) stop the threads immediately without any finalization blocks/ propagation of ThreadDeathException or otherwise. Any open file handles will be closed by the operating system on a "real" crash so I don't think this makes any difference.

        I do have native code that crashes J9 and JRockit too but this has side effects – J9 dumps a lot of debugging state information, for example. Right now we're testing just a subset of JVMs, if the tradeoff is to increase the coverage on other JVMs I think it's worth it to use Runtime.halt().

        Show
        Dawid Weiss added a comment - I think Runtime.halt is a very similar thing – it (should) stop the threads immediately without any finalization blocks/ propagation of ThreadDeathException or otherwise. Any open file handles will be closed by the operating system on a "real" crash so I don't think this makes any difference. I do have native code that crashes J9 and JRockit too but this has side effects – J9 dumps a lot of debugging state information, for example. Right now we're testing just a subset of JVMs, if the tradeoff is to increase the coverage on other JVMs I think it's worth it to use Runtime.halt().
        Hide
        Michael McCandless added a comment -

        Maybe we could use our gun on the JVMs where it works / doesn't have heavy side effects, and Runtime.halt on all others?

        Show
        Michael McCandless added a comment - Maybe we could use our gun on the JVMs where it works / doesn't have heavy side effects, and Runtime.halt on all others?
        Hide
        Dawid Weiss added a comment -

        Sure.

        Show
        Dawid Weiss added a comment - Sure.
        Hide
        Dawid Weiss added a comment -

        A version with runtime.halt() fallback for JVMs on which unsafe npe doesn't crash.

        I also fixed the stream pumping from subprocess; previously it could have hung if something was printed to stderr (due to blocked pipe).

        Show
        Dawid Weiss added a comment - A version with runtime.halt() fallback for JVMs on which unsafe npe doesn't crash. I also fixed the stream pumping from subprocess; previously it could have hung if something was printed to stderr (due to blocked pipe).
        Hide
        Steve Rowe added a comment -

        Bulk close resolved 4.4 issues

        Show
        Steve Rowe added a comment - Bulk close resolved 4.4 issues

          People

          • Assignee:
            Dawid Weiss
            Reporter:
            Rodrigo Trujillo
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development