Accumulo
  1. Accumulo
  2. ACCUMULO-2764

Stopping MAC before it's processes have fully started causes an indefinite hang

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.1, 1.6.0
    • Fix Version/s: 1.5.2, 1.6.1, 1.7.0
    • Component/s: mini
    • Labels:
      None
    • Environment:

      OpenJDK 1.6.0, CentOS 6.5, 2CPU, 6GB RAM (virtual hardware)

      Description

      I saw this testing 1.6.0-RC5.

      Calling process.destroy() and then process.waitFor(), as MiniAccumuloCluster does in it's stop method, before the process is fully started, appears to create an indefinite hang.

      I saw this most recently in MiniAccumuloClusterGCTest.testAccurateProcessListReturned, which gets a ProcessReference and then immediately shuts down MAC, though it was also the root cause of ACCUMULO-2756. In this instance, the test got stuck in the MAC teardown.

      "main" prio=10 tid=0x00007f3cf4008800 nid=0x2b19 in Object.wait() [0x00007f3cf8f9c000]
         java.lang.Thread.State: WAITING (on object monitor)
              at java.lang.Object.wait(Native Method)
              - waiting on <0x00000000e29dd2e8> (a java.lang.UNIXProcess)
              at java.lang.Object.wait(Object.java:502)
              at java.lang.UNIXProcess.waitFor(UNIXProcess.java:181)
              - locked <0x00000000e29dd2e8> (a java.lang.UNIXProcess)
              at org.apache.accumulo.minicluster.impl.MiniAccumuloClusterImpl.stop(MiniAccumuloClusterImpl.java:607)
              at org.apache.accumulo.minicluster.impl.MiniAccumuloClusterGCTest.tearDownMiniCluster(MiniAccumuloClusterGCTest.java:74)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:622)
              at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
              at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
              at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
              at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
              at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
              at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
              at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
              at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
              at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
              at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
              at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
      

      It appears that destroy() doesn't actually succeed in destroying a process which is still starting, so the waitFor() waits indefinitely. I haven't debugged further. It may be a JVM bug, or a limitation in the java Process API, or some UNIX signal handling quirk with process instantiation that destroy() cannot know.

      One fix could be to make start() wait until the metadata table can be scanned before it returns, to ensure all processes are actually running and ready. Another fix would be to have the teardown code try another destroy if waitFor() doesn't return after a reasonable amount of time.

        Issue Links

          Activity

          Hide
          Josh Elser added a comment -

          Wrapped the destroy and waitFor calls in a Callable with a 30s timeout. It still doesn't address whatever the underlying reason was that this was happening (because I still don't know what it was), but it should prevent indefinitely running tests.

          Show
          Josh Elser added a comment - Wrapped the destroy and waitFor calls in a Callable with a 30s timeout. It still doesn't address whatever the underlying reason was that this was happening (because I still don't know what it was), but it should prevent indefinitely running tests.
          Hide
          ASF subversion and git services added a comment -

          Commit 57f27635b0414ae3198995f932ccac2501eb73cd in accumulo's branch refs/heads/master from Josh Elser
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57f2763 ]

          ACCUMULO-2764 Wrap the MAC process termination in a Callable to get timeout semantics

          Show
          ASF subversion and git services added a comment - Commit 57f27635b0414ae3198995f932ccac2501eb73cd in accumulo's branch refs/heads/master from Josh Elser [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57f2763 ] ACCUMULO-2764 Wrap the MAC process termination in a Callable to get timeout semantics
          Hide
          ASF subversion and git services added a comment -

          Commit 57f27635b0414ae3198995f932ccac2501eb73cd in accumulo's branch refs/heads/1.6.1-SNAPSHOT from Josh Elser
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57f2763 ]

          ACCUMULO-2764 Wrap the MAC process termination in a Callable to get timeout semantics

          Show
          ASF subversion and git services added a comment - Commit 57f27635b0414ae3198995f932ccac2501eb73cd in accumulo's branch refs/heads/1.6.1-SNAPSHOT from Josh Elser [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57f2763 ] ACCUMULO-2764 Wrap the MAC process termination in a Callable to get timeout semantics
          Hide
          ASF subversion and git services added a comment -

          Commit 57f27635b0414ae3198995f932ccac2501eb73cd in accumulo's branch refs/heads/1.5.2-SNAPSHOT from Josh Elser
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57f2763 ]

          ACCUMULO-2764 Wrap the MAC process termination in a Callable to get timeout semantics

          Show
          ASF subversion and git services added a comment - Commit 57f27635b0414ae3198995f932ccac2501eb73cd in accumulo's branch refs/heads/1.5.2-SNAPSHOT from Josh Elser [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57f2763 ] ACCUMULO-2764 Wrap the MAC process termination in a Callable to get timeout semantics
          Hide
          Josh Elser added a comment -

          We do the same destroy() && waitFor() in 1.5 too.

          Show
          Josh Elser added a comment - We do the same destroy() && waitFor() in 1.5 too.

            People

            • Assignee:
              Josh Elser
              Reporter:
              Christopher Tubbs
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 0.5h
                0.5h

                  Development