Hadoop Common
  1. Hadoop Common
  2. HADOOP-7888

TestFailoverProxy fails intermittently on trunk

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.24.0
    • Fix Version/s: 2.0.0-alpha
    • Component/s: test
    • Labels:
      None

      Description

      TestFailoverProxy can fail intermittently with the failures occurring in testConcurrentMethodFailures(). The test has a race condition where the two threads may be sequentially invoking the unreliable interface rather than concurrently. Currently the proxy provider's getProxy() method contains the thread synchronization to enforce a concurrent invocation, but examining the source to RetryInvocationHandler.invoke() shows that the call to getProxy() during failover is too late to enforce a truly concurrent invocation.

      For this particular test, one thread could race ahead and block on the CountDownLatch in getProxy() before the other thread even enters RetryInvocationHandler.invoke(). If that happens the second thread will cache the newly updated value for proxyProviderFailoverCount, since the failover has mostly been processed by the original thread. Therefore the second thread ends up assuming no other thread is present, performs a failover, and the test fails because two failovers occurred instead of one.

        Activity

        Jason Lowe created issue -
        Hide
        Jason Lowe added a comment -

        This patch addresses the race condition by moving the thread synchronization out of FlipFlopProxyProvider and into the method interface being invoked. All threads will block in the method invocation before throwing the exception that triggers the failover. Therefore the failovers will be concurrent wrt. RetryInvocationHandler.invoke() because both threads will always be in invokeMethod() at the same time.

        Speaking of RetryInvocationHandler.invoke(), I also moved the proxyProvider.getProxy() call to occur only when the failover is performed per the previous comment. It appears this was only moved out of that condition to avoid deadlock when the test thread synchronization was in FlipFlopProxyProvider.getProxy().

        Show
        Jason Lowe added a comment - This patch addresses the race condition by moving the thread synchronization out of FlipFlopProxyProvider and into the method interface being invoked. All threads will block in the method invocation before throwing the exception that triggers the failover. Therefore the failovers will be concurrent wrt. RetryInvocationHandler.invoke() because both threads will always be in invokeMethod() at the same time. Speaking of RetryInvocationHandler.invoke(), I also moved the proxyProvider.getProxy() call to occur only when the failover is performed per the previous comment. It appears this was only moved out of that condition to avoid deadlock when the test thread synchronization was in FlipFlopProxyProvider.getProxy().
        Jason Lowe made changes -
        Field Original Value New Value
        Attachment hadoop-7888.patch [ 12506379 ]
        Jason Lowe made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Target Version/s 0.24.0 [ 12317652 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12506379/hadoop-7888.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated 9 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/441//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/441//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506379/hadoop-7888.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 9 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/441//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/441//console This message is automatically generated.
        Hide
        Jason Lowe added a comment -

        I believe the -1 javadoc is unrelated to this patch.

        Show
        Jason Lowe added a comment - I believe the -1 javadoc is unrelated to this patch.
        Eli Collins made changes -
        Assignee Jason Lowe [ jlowe ]
        Hide
        Aaron T. Myers added a comment -

        Thanks a lot for tracking down this issue and providing a patch, Jason.

        A question for you about testing - given that this test would only fail intermittently before, have you tried running this test in a loop with and without the patch applied to ensure that the patch addresses the issue? I believe it should fix it, but just want to make sure. Also, can you comment on the frequency with which you'd observe this spurious test failure without the patch?

        The patch looks good to me. I'll commit it once Jason comments on the above question.

        Show
        Aaron T. Myers added a comment - Thanks a lot for tracking down this issue and providing a patch, Jason. A question for you about testing - given that this test would only fail intermittently before, have you tried running this test in a loop with and without the patch applied to ensure that the patch addresses the issue? I believe it should fix it, but just want to make sure. Also, can you comment on the frequency with which you'd observe this spurious test failure without the patch? The patch looks good to me. I'll commit it once Jason comments on the above question.
        Hide
        Jason Lowe added a comment -

        Before I submitted the patch I stepped through the code with the debugger to make sure I was seeing the two threads synchronizing within the invokeMethod(), so I have high confidence it should address the issue. RE: failure rate, I was seeing it very intermittently when running the test directly via Eclipse, but then on my machine I can see the issue nearly 100% (e.g.: 34 out of 35 tries) with this build command:

        mvn test -Dtest=TestFailoverProxy

        With the patch I've never seen it fail from within Eclipse nor from the build command even when placed in a test loop.

        Show
        Jason Lowe added a comment - Before I submitted the patch I stepped through the code with the debugger to make sure I was seeing the two threads synchronizing within the invokeMethod(), so I have high confidence it should address the issue. RE: failure rate, I was seeing it very intermittently when running the test directly via Eclipse, but then on my machine I can see the issue nearly 100% (e.g.: 34 out of 35 tries) with this build command: mvn test -Dtest=TestFailoverProxy With the patch I've never seen it fail from within Eclipse nor from the build command even when placed in a test loop.
        Hide
        Aaron T. Myers added a comment -

        +1

        Thanks a lot for this info, Jason. I'll commit the patch momentarily.

        Show
        Aaron T. Myers added a comment - +1 Thanks a lot for this info, Jason. I'll commit the patch momentarily.
        Hide
        Aaron T. Myers added a comment -

        I've just committed this to trunk. Thanks a lot, Jason!

        Show
        Aaron T. Myers added a comment - I've just committed this to trunk. Thanks a lot, Jason!
        Aaron T. Myers made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 0.24.0 [ 12317652 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #1452 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1452/)
        HADOOP-7888. TestFailoverProxy fails intermittently on trunk. Contributed by Jason Lowe.

        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211728
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestFailoverProxy.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1452 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1452/ ) HADOOP-7888 . TestFailoverProxy fails intermittently on trunk. Contributed by Jason Lowe. atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211728 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestFailoverProxy.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #1378 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1378/)
        HADOOP-7888. TestFailoverProxy fails intermittently on trunk. Contributed by Jason Lowe.

        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211728
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestFailoverProxy.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1378 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1378/ ) HADOOP-7888 . TestFailoverProxy fails intermittently on trunk. Contributed by Jason Lowe. atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211728 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestFailoverProxy.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #1402 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1402/)
        HADOOP-7888. TestFailoverProxy fails intermittently on trunk. Contributed by Jason Lowe.

        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211728
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestFailoverProxy.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1402 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1402/ ) HADOOP-7888 . TestFailoverProxy fails intermittently on trunk. Contributed by Jason Lowe. atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211728 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestFailoverProxy.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #921 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/921/)
        HADOOP-7888. TestFailoverProxy fails intermittently on trunk. Contributed by Jason Lowe.

        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211728
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestFailoverProxy.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #921 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/921/ ) HADOOP-7888 . TestFailoverProxy fails intermittently on trunk. Contributed by Jason Lowe. atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211728 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestFailoverProxy.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #888 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/888/)
        HADOOP-7888. TestFailoverProxy fails intermittently on trunk. Contributed by Jason Lowe.

        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211728
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestFailoverProxy.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #888 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/888/ ) HADOOP-7888 . TestFailoverProxy fails intermittently on trunk. Contributed by Jason Lowe. atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211728 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestFailoverProxy.java
        Suresh Srinivas made changes -
        Fix Version/s 0.23.3 [ 12320059 ]
        Target Version/s 0.24.0 [ 12317652 ] 0.24.0, 0.23.3 [ 12317652, 12320059 ]
        Arun C Murthy made changes -
        Fix Version/s 2.0.0 [ 12320352 ]
        Fix Version/s 0.24.0 [ 12317652 ]
        Fix Version/s 0.23.3 [ 12320059 ]
        Arun C Murthy made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Jason Lowe
            Reporter:
            Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development