Lucene - Core
  1. Lucene - Core
  2. LUCENE-4987

Test framework may fail internally under J9 (some serious JVM exclusive-section issue).

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4, Trunk
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This was reported by Shai. The runner failed with an exception:

      [junit4:junit4] Caused by: java.util.NoSuchElementException
      [junit4:junit4]         at java.util.ArrayDeque.removeFirst(ArrayDeque.java:289)
      [junit4:junit4]         at java.util.ArrayDeque.pop(ArrayDeque.java:518)
      [junit4:junit4]         at com.carrotsearch.ant.tasks.junit4.JUnit4$1.onSlaveIdle(JUnit4.java:809)
      [junit4:junit4]         ... 17 more
      

      The problem is that this is impossible because the code around JUnit4.java:809 looks like this:

           final Deque<String> stealingQueue = new ArrayDeque<String>(...);
           aggregatedBus.register(new Object() {
              @Subscribe
              public void onSlaveIdle(SlaveIdle slave) {
                if (stealingQueue.isEmpty()) {
                  ...
                } else {
                  String suiteName = stealingQueue.pop();
                  ...
                }
              }
            });
      

      and the contract on Guava's EventBus states that:

       * <p>The EventBus guarantees that it will not call a handler method from
       * multiple threads simultaneously, unless the method explicitly allows it by
       * bearing the {@link AllowConcurrentEvents} annotation.  If this annotation is
       * not present, handler methods need not worry about being reentrant, unless
       * also called from outside the EventBus
      

      I wrote a simple snippet of code that does it in a loop and indeed, two threads can appear in the critical section at once. This is not reproducible on Hotspot and only appears to be the problem on J9/1.7/Windows (J9 1.6 works fine).

      I'll provide a workaround in the runner (an explicit monitor seems to be working) but this is some serious J9 issue.

      1. j9.zip
        2.22 MB
        Dawid Weiss

        Activity

        Dawid Weiss created issue -
        Hide
        Dawid Weiss added a comment -

        Compile and run under J9 (and hotspot):

        javac -cp guava-14.0.1.jar;junit-4.10.jar;randomizedtesting-runner-2.0.9.jar J9SanityCheck.java
        java -cp guava-14.0.1.jar;junit-4.10.jar;randomizedtesting-runner-2.0.9.jar;. J9SanityCheck
        
        Show
        Dawid Weiss added a comment - Compile and run under J9 (and hotspot): javac -cp guava-14.0.1.jar;junit-4.10.jar;randomizedtesting-runner-2.0.9.jar J9SanityCheck.java java -cp guava-14.0.1.jar;junit-4.10.jar;randomizedtesting-runner-2.0.9.jar;. J9SanityCheck
        Dawid Weiss made changes -
        Field Original Value New Value
        Attachment j9.zip [ 12582291 ]
        Hide
        Dawid Weiss added a comment -

        I think I found the reason. J9 seems to be optimizing away the following code:

          @Override public synchronized void handleEvent(Object event)
              throws InvocationTargetException {
            super.handleEvent(event);
          }
        

        this method is removed entirely as the stack trace from the 'wtf' exception shows:

        java.lang.reflect.InvocationTargetException
        	at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
        	at java.lang.reflect.Method.invoke(Method.java:613)
        	at com.google.common.eventbus.EventHandler.handleEvent(EventHandler.java:74)
        	at com.google.common.eventbus.EventBus.dispatch(EventBus.java:314)
        	at com.google.common.eventbus.EventBus.dispatchQueuedEvents(EventBus.java:296)
        	at com.google.common.eventbus.EventBus.post(EventBus.java:267)
        	at com.carrotsearch.ant.tasks.junit4.it.TestJ9SanityCheck$2.call(TestJ9SanityCheck.java:75)
        	at com.carrotsearch.ant.tasks.junit4.it.TestJ9SanityCheck$2.call(TestJ9SanityCheck.java:1)
        	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:345)
        	at java.util.concurrent.FutureTask.run(FutureTask.java:177)
        	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1121)
        	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
        	at java.lang.Thread.run(Thread.java:777)
        Caused by: java.lang.RuntimeException: Wtf? two threads in a handler: Thread[pool-100-thread-1,5,TGRP-TestJ9SanityCheck] and Thread[pool-100-thread-2,5,TGRP-TestJ9SanityCheck]
        	at com.carrotsearch.ant.tasks.junit4.it.TestJ9SanityCheck$1.onSlaveIdle(TestJ9SanityCheck.java:52)
        	... 14 more
        

        The eventhandler type is definitely SynchronizedEventHandler and on Hotspot you do get a synchronized indirection layer (a call under a synchronized monitor), whereas on J9 this becomes a full race condition.

        Show
        Dawid Weiss added a comment - I think I found the reason. J9 seems to be optimizing away the following code: @Override public synchronized void handleEvent( Object event) throws InvocationTargetException { super .handleEvent(event); } this method is removed entirely as the stack trace from the 'wtf' exception shows: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:613) at com.google.common.eventbus.EventHandler.handleEvent(EventHandler.java:74) at com.google.common.eventbus.EventBus.dispatch(EventBus.java:314) at com.google.common.eventbus.EventBus.dispatchQueuedEvents(EventBus.java:296) at com.google.common.eventbus.EventBus.post(EventBus.java:267) at com.carrotsearch.ant.tasks.junit4.it.TestJ9SanityCheck$2.call(TestJ9SanityCheck.java:75) at com.carrotsearch.ant.tasks.junit4.it.TestJ9SanityCheck$2.call(TestJ9SanityCheck.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:345) at java.util.concurrent.FutureTask.run(FutureTask.java:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1121) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) at java.lang. Thread .run( Thread .java:777) Caused by: java.lang.RuntimeException: Wtf? two threads in a handler: Thread [pool-100-thread-1,5,TGRP-TestJ9SanityCheck] and Thread [pool-100-thread-2,5,TGRP-TestJ9SanityCheck] at com.carrotsearch.ant.tasks.junit4.it.TestJ9SanityCheck$1.onSlaveIdle(TestJ9SanityCheck.java:52) ... 14 more The eventhandler type is definitely SynchronizedEventHandler and on Hotspot you do get a synchronized indirection layer (a call under a synchronized monitor), whereas on J9 this becomes a full race condition.
        Hide
        Shai Erera added a comment -

        Thanks Dawid. I filed a PMR with the J9 team. Will post back when they respond/resolve the issue.

        Show
        Shai Erera added a comment - Thanks Dawid. I filed a PMR with the J9 team. Will post back when they respond/resolve the issue.
        Hide
        Dawid Weiss added a comment -

        Yeah, I'm confident it's this. I've replaced the synchronized method flag with an explicit monitor and it works then.

          @Override public void handleEvent(Object event)
              throws InvocationTargetException {
              synchronized (this) {
                  super.handleEvent(event);
              }
          }
        

        So it's a JVM issue.

        Show
        Dawid Weiss added a comment - Yeah, I'm confident it's this. I've replaced the synchronized method flag with an explicit monitor and it works then. @Override public void handleEvent( Object event) throws InvocationTargetException { synchronized ( this ) { super .handleEvent(event); } } So it's a JVM issue.
        Hide
        Dawid Weiss added a comment - - edited

        I've released and pushed rr 2.0.10 to maven repositories with a workaround for J9's bug.

        Show
        Dawid Weiss added a comment - - edited I've released and pushed rr 2.0.10 to maven repositories with a workaround for J9's bug.
        Dawid Weiss made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 5.0 [ 12321663 ]
        Resolution Fixed [ 1 ]
        Dawid Weiss made changes -
        Fix Version/s 4.4 [ 12324323 ]
        Hide
        Uwe Schindler added a comment -

        I added this bug to the JVM bugs wiki page: http://wiki.apache.org/lucene-java/JavaBugs

        Show
        Uwe Schindler added a comment - I added this bug to the JVM bugs wiki page: http://wiki.apache.org/lucene-java/JavaBugs
        Hide
        Shai Erera added a comment -

        The JIT team has confirmed this to be a bug in the JIT generation code – it apparently wrongly optimized away handleEvent calling super(), neglecting the fact that it is synchronized. They are working on a fix. Once I'll know which J9 version will include it, I'll update the issue.

        In the meanwhile they suggested we try to run with -Xjit:disableNewMethodOverride as a workaround.

        Show
        Shai Erera added a comment - The JIT team has confirmed this to be a bug in the JIT generation code – it apparently wrongly optimized away handleEvent calling super(), neglecting the fact that it is synchronized. They are working on a fix. Once I'll know which J9 version will include it, I'll update the issue. In the meanwhile they suggested we try to run with -Xjit:disableNewMethodOverride as a workaround.
        Hide
        Dawid Weiss added a comment -

        Mhmm... took them a while, don't you think? I mean:

        I think I found the reason. J9 seems to be optimizing away the following code: [...]

        took me an hour or so to pinpoint this, even without access to the JIT's source code

        Show
        Dawid Weiss added a comment - Mhmm... took them a while, don't you think? I mean: I think I found the reason. J9 seems to be optimizing away the following code: [...] took me an hour or so to pinpoint this, even without access to the JIT's source code
        Hide
        Dawid Weiss added a comment -

        But thanks for the follow-up anyway, Shai.

        Show
        Dawid Weiss added a comment - But thanks for the follow-up anyway, Shai.
        Hide
        Shai Erera added a comment -

        Sigh ... what can I say?

        I think it took them a while to understand why it's optimizing away. From what I understand, their code failed to check the method flags, in this case 'synchronized' and therefore assumed this is just an empty override.

        Show
        Shai Erera added a comment - Sigh ... what can I say? I think it took them a while to understand why it's optimizing away. From what I understand, their code failed to check the method flags, in this case 'synchronized' and therefore assumed this is just an empty override.
        Hide
        Shai Erera added a comment -

        Word is that the fix will be included in the next J9 SR.

        Show
        Shai Erera added a comment - Word is that the fix will be included in the next J9 SR.
        Hide
        Steve Rowe added a comment -

        Bulk close resolved 4.4 issues

        Show
        Steve Rowe added a comment - Bulk close resolved 4.4 issues
        Steve Rowe made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        7h 37m 1 Dawid Weiss 08/May/13 21:52
        Resolved Resolved Closed Closed
        75d 21h 44m 1 Steve Rowe 23/Jul/13 19:37

          People

          • Assignee:
            Dawid Weiss
            Reporter:
            Dawid Weiss
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development