Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12611

TestZKSignerSecretProvider#testMultipleInit occasionally fail

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      https://builds.apache.org/job/Hadoop-Common-trunk/2053/testReport/junit/org.apache.hadoop.security.authentication.util/TestZKSignerSecretProvider/testMultipleInit/

      Error Message

      expected null, but was:<[B@142bad79>

      Stacktrace

      java.lang.AssertionError: expected null, but was:<[B@142bad79>
      at org.junit.Assert.fail(Assert.java:88)
      at org.junit.Assert.failNotNull(Assert.java:664)
      at org.junit.Assert.assertNull(Assert.java:646)
      at org.junit.Assert.assertNull(Assert.java:656)
      at org.apache.hadoop.security.authentication.util.TestZKSignerSecretProvider.testMultipleInit(TestZKSignerSecretProvider.java:149)

      I think the failure was introduced after HADOOP-12181

      This is likely where the root cause is:

      2015-11-29 00:24:33,325 ERROR ZKSignerSecretProvider - An unexpected exception occurred while pulling data fromZooKeeper
      java.lang.IllegalStateException: instance must be started before calling this method
      at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
      at org.apache.curator.framework.imps.CuratorFrameworkImpl.getData(CuratorFrameworkImpl.java:363)
      at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider.pullFromZK(ZKSignerSecretProvider.java:341)
      at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider.rollSecret(ZKSignerSecretProvider.java:264)
      at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider$$EnhancerByMockitoWithCGLIB$$575f06d8.CGLIB$rollSecret$2(<generated>)
      at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider$$EnhancerByMockitoWithCGLIB$$575f06d8$$FastClassByMockitoWithCGLIB$$6f94a716.invoke(<generated>)
      at org.mockito.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:216)
      at org.mockito.internal.creation.AbstractMockitoMethodProxy.invokeSuper(AbstractMockitoMethodProxy.java:10)
      at org.mockito.internal.invocation.realmethod.CGLIBProxyRealMethod.invoke(CGLIBProxyRealMethod.java:22)
      at org.mockito.internal.invocation.realmethod.FilteredCGLIBProxyRealMethod.invoke(FilteredCGLIBProxyRealMethod.java:27)
      at org.mockito.internal.invocation.Invocation.callRealMethod(Invocation.java:211)
      at org.mockito.internal.stubbing.answers.CallsRealMethods.answer(CallsRealMethods.java:36)
      at org.mockito.internal.MockHandler.handle(MockHandler.java:99)
      at org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
      at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider$$EnhancerByMockitoWithCGLIB$$575f06d8.rollSecret(<generated>)
      at org.apache.hadoop.security.authentication.util.RolloverSignerSecretProvider$1.run(RolloverSignerSecretProvider.java:97)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)

      1. HADOOP-12611.006.patch
        16 kB
        Eric Badger
      2. HADOOP-12611.005.patch
        9 kB
        Eric Badger
      3. HADOOP-12611.004.patch
        14 kB
        Eric Badger
      4. HADOOP-12611.003.patch
        11 kB
        Eric Badger
      5. HADOOP-12611.002.patch
        6 kB
        Wei-Chiu Chuang
      6. HADOOP-12611.001.patch
        10 kB
        Wei-Chiu Chuang

        Issue Links

          Activity

          Hide
          rkanter Robert Kanter added a comment -

          Sure. Also committed to branch-2.8!

          Show
          rkanter Robert Kanter added a comment - Sure. Also committed to branch-2.8!
          Hide
          ebadger Eric Badger added a comment -

          Robert Kanter, can we commit this to 2.8 as well? The cherry-pick is clean and this is where I originally saw the failure.

          Show
          ebadger Eric Badger added a comment - Robert Kanter , can we commit this to 2.8 as well? The cherry-pick is clean and this is where I originally saw the failure.
          Hide
          ebadger Eric Badger added a comment -

          Thanks, Robert Kanter!

          Show
          ebadger Eric Badger added a comment - Thanks, Robert Kanter !
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10565 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10565/)
          HADOOP-12611. TestZKSignerSecretProvider#testMultipleInit occasionally (rkanter: rev c183b9de8d072a35dcde96a20b1550981f886e86)

          • (edit) hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/util/TestZKSignerSecretProvider.java
          • (edit) hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/RolloverSignerSecretProvider.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10565 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10565/ ) HADOOP-12611 . TestZKSignerSecretProvider#testMultipleInit occasionally (rkanter: rev c183b9de8d072a35dcde96a20b1550981f886e86) (edit) hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/util/TestZKSignerSecretProvider.java (edit) hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/RolloverSignerSecretProvider.java
          Hide
          rkanter Robert Kanter added a comment -

          Thanks Eric Badger. Committed to trunk and branch-2!

          Show
          rkanter Robert Kanter added a comment - Thanks Eric Badger . Committed to trunk and branch-2!
          Hide
          rkanter Robert Kanter added a comment -

          +1

          Show
          rkanter Robert Kanter added a comment - +1
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 5s trunk passed
          +1 compile 7m 15s trunk passed
          +1 checkstyle 0m 12s trunk passed
          +1 mvnsite 0m 17s trunk passed
          +1 mvneclipse 0m 11s trunk passed
          +1 findbugs 0m 21s trunk passed
          +1 javadoc 0m 11s trunk passed
          +1 mvninstall 0m 13s the patch passed
          +1 compile 7m 8s the patch passed
          +1 javac 7m 8s the patch passed
          -0 checkstyle 0m 12s hadoop-common-project/hadoop-auth: The patch generated 27 new + 2 unchanged - 2 fixed = 29 total (was 4)
          +1 mvnsite 0m 17s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 30s the patch passed
          +1 javadoc 0m 11s the patch passed
          +1 unit 3m 6s hadoop-auth in the patch passed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          29m 22s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HADOOP-12611
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831957/HADOOP-12611.006.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 9ff44ff61ca7 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 35b9d7d
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/10691/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-auth.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10691/testReport/
          modules C: hadoop-common-project/hadoop-auth U: hadoop-common-project/hadoop-auth
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10691/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 5s trunk passed +1 compile 7m 15s trunk passed +1 checkstyle 0m 12s trunk passed +1 mvnsite 0m 17s trunk passed +1 mvneclipse 0m 11s trunk passed +1 findbugs 0m 21s trunk passed +1 javadoc 0m 11s trunk passed +1 mvninstall 0m 13s the patch passed +1 compile 7m 8s the patch passed +1 javac 7m 8s the patch passed -0 checkstyle 0m 12s hadoop-common-project/hadoop-auth: The patch generated 27 new + 2 unchanged - 2 fixed = 29 total (was 4) +1 mvnsite 0m 17s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 30s the patch passed +1 javadoc 0m 11s the patch passed +1 unit 3m 6s hadoop-auth in the patch passed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 29m 22s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HADOOP-12611 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831957/HADOOP-12611.006.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 9ff44ff61ca7 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 35b9d7d Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/10691/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-auth.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10691/testReport/ modules C: hadoop-common-project/hadoop-auth U: hadoop-common-project/hadoop-auth Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10691/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          ebadger Eric Badger added a comment -

          Looks like I accidentally uploaded the diff of my commits, not the diff against trunk. Attaching new patch against trunk.

          Show
          ebadger Eric Badger added a comment - Looks like I accidentally uploaded the diff of my commits, not the diff against trunk. Attaching new patch against trunk.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          -1 patch 0m 5s HADOOP-12611 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



          Subsystem Report/Notes
          JIRA Issue HADOOP-12611
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831821/HADOOP-12611.005.patch
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10680/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 5s HADOOP-12611 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue HADOOP-12611 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831821/HADOOP-12611.005.patch Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10680/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          ebadger Eric Badger added a comment -

          Robert Kanter, thanks for the comments! I'm attaching a patch that I believe addresses each of them.

          1. Done
          2. Done
          3. This is a little bit trickier, actually. Since allSecrets is an array of 2 secrets, we need to roll the secret twice before it will actually reflect that the "race" was won by A or B. This is what I gathered by studying the code and by testing it out through unit testing. So I've added in 2 rolls of the secret which are both "won" by either A or B depending on the order.
          4. Done

          I also removed testMultipleInit.

          Show
          ebadger Eric Badger added a comment - Robert Kanter , thanks for the comments! I'm attaching a patch that I believe addresses each of them. 1. Done 2. Done 3. This is a little bit trickier, actually. Since allSecrets is an array of 2 secrets, we need to roll the secret twice before it will actually reflect that the "race" was won by A or B. This is what I gathered by studying the code and by testing it out through unit testing. So I've added in 2 rolls of the secret which are both "won" by either A or B depending on the order . 4. Done I also removed testMultipleInit .
          Hide
          rkanter Robert Kanter added a comment - - edited

          Ya, looking at it now with the changes, testMultipleInit is a duplicate of case 1 in testMultipleUnsynchronized; so I think removing it makes sense.

          A few comments on testMultipleUnsynchronized:

          1. The javadoc shouldn't have int in the @param
          2. We should do more asserting of the initial state (e.g. there's no checking on secretProviderB). You can probably just copy stuff from testMultipleInit for this.
          3. At the end, when we're doing this:
                  if (Arrays.equals(secretA3, currentSecretA)) {
                    Assert.assertArrayEquals(secretA3, allSecretsA[0]);
                  } else if (Arrays.equals(secretB3, currentSecretB)) {
                    Assert.assertArrayEquals(secretB3, allSecretsA[0]);
                  } else {
                    Assert.fail("It appears that they all agreed on the same secret, but "
                            + "not one of the secrets they were supposed to");
                  }
            

            we now know the correct order that should have happened (via order), so we can make this another switch statement instead. That way, we'll assert the expected order based on the roll order, instead of just one of the two orders.

          4. I think we can just rename this to testMultiple now.
          Show
          rkanter Robert Kanter added a comment - - edited Ya, looking at it now with the changes, testMultipleInit is a duplicate of case 1 in testMultipleUnsynchronized ; so I think removing it makes sense. A few comments on testMultipleUnsynchronized : The javadoc shouldn't have int in the @param We should do more asserting of the initial state (e.g. there's no checking on secretProviderB ). You can probably just copy stuff from testMultipleInit for this. At the end, when we're doing this: if (Arrays.equals(secretA3, currentSecretA)) { Assert.assertArrayEquals(secretA3, allSecretsA[0]); } else if (Arrays.equals(secretB3, currentSecretB)) { Assert.assertArrayEquals(secretB3, allSecretsA[0]); } else { Assert.fail( "It appears that they all agreed on the same secret, but " + "not one of the secrets they were supposed to" ); } we now know the correct order that should have happened (via order ), so we can make this another switch statement instead. That way, we'll assert the expected order based on the roll order, instead of just one of the two orders. I think we can just rename this to testMultiple now.
          Hide
          ebadger Eric Badger added a comment -

          Does that make sense?

          Yes. It does. However, it means that testMultipleInit only has 1 possible outcome, since there is only 1 rollSecret call. It makes me wonder what it gives us that testMultipleUnsynchronized doesn't already give us. testMultipleUnsynchronized has 3 secretProviders and rolls the secrets twice, while testMultipleInit has 2 secretProviders and rolls the secrets once. So it seems like testMultipleInit is just a subset of testMultipleUnsynchronized. Robert Kanter, what do you think?

          I'm uploading a patch that makes it so that the tests only test the ordering between rollSecret calls (meaning that testMultipleInit only has 1 outcome, while testMultipleUnsynchronized has 2). If you agree that testMultipleInit is a redundant subset of testMultipleUnsynchronized then I can upload a patch removing it completely. And of course if you think that there is functionality in testMultipleInit that I've stripped out, please do let me know.

          Show
          ebadger Eric Badger added a comment - Does that make sense? Yes. It does. However, it means that testMultipleInit only has 1 possible outcome, since there is only 1 rollSecret call. It makes me wonder what it gives us that testMultipleUnsynchronized doesn't already give us. testMultipleUnsynchronized has 3 secretProviders and rolls the secrets twice, while testMultipleInit has 2 secretProviders and rolls the secrets once. So it seems like testMultipleInit is just a subset of testMultipleUnsynchronized . Robert Kanter , what do you think? I'm uploading a patch that makes it so that the tests only test the ordering between rollSecret calls (meaning that testMultipleInit only has 1 outcome, while testMultipleUnsynchronized has 2). If you agree that testMultipleInit is a redundant subset of testMultipleUnsynchronized then I can upload a patch removing it completely. And of course if you think that there is functionality in testMultipleInit that I've stripped out, please do let me know.
          Hide
          rkanter Robert Kanter added a comment - - edited

          I looked through the test in more detail and I think there's actually only two cases here:

          1. providerX wins the first rollover, providerX wins the second rollover
          2. providerX wins the first rollover, providerY wins the second rollover

          Whether X and Y are A, B, or C doesn't really matter, right? They're all identical code, so in a vacuum, whether A wins, B wins, or C wins is the same case. Given that, what we really only have the two above cases where either the original winner wins the second time or not.

          So I think we only need to run two orderings, e.g:

          1. providerA wins both rollovers
          2. providerA wins the first rollover, providerB wins the second rollover

          Does that make sense?

          Show
          rkanter Robert Kanter added a comment - - edited I looked through the test in more detail and I think there's actually only two cases here: providerX wins the first rollover, providerX wins the second rollover providerX wins the first rollover, providerY wins the second rollover Whether X and Y are A, B, or C doesn't really matter, right? They're all identical code, so in a vacuum, whether A wins, B wins, or C wins is the same case. Given that, what we really only have the two above cases where either the original winner wins the second time or not. So I think we only need to run two orderings, e.g: providerA wins both rollovers providerA wins the first rollover, providerB wins the second rollover Does that make sense?
          Hide
          ebadger Eric Badger added a comment -

          Robert Kanter, it seems like redundant work to iterate through all 6 of the orderings for 3 clients. Would it be acceptable to run through 3 (1 where each secretProvider "wins" the race)? It only takes a few seconds to run, but those seconds add up with tests.

          Show
          ebadger Eric Badger added a comment - Robert Kanter , it seems like redundant work to iterate through all 6 of the orderings for 3 clients. Would it be acceptable to run through 3 (1 where each secretProvider "wins" the race)? It only takes a few seconds to run, but those seconds add up with tests.
          Hide
          rkanter Robert Kanter added a comment -

          Iterating through all of the permutations sounds like the best solution. It's been too long for me to remember why I didn't do that before

          Show
          rkanter Robert Kanter added a comment - Iterating through all of the permutations sounds like the best solution. It's been too long for me to remember why I didn't do that before
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 14s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 2s trunk passed
          +1 compile 7m 11s trunk passed
          +1 checkstyle 0m 11s trunk passed
          +1 mvnsite 0m 16s trunk passed
          +1 mvneclipse 0m 11s trunk passed
          +1 findbugs 0m 20s trunk passed
          +1 javadoc 0m 12s trunk passed
          +1 mvninstall 0m 12s the patch passed
          +1 compile 7m 9s the patch passed
          +1 javac 7m 9s the patch passed
          -0 checkstyle 0m 13s hadoop-common-project/hadoop-auth: The patch generated 26 new + 2 unchanged - 2 fixed = 28 total (was 4)
          +1 mvnsite 0m 17s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 34s the patch passed
          +1 javadoc 0m 12s the patch passed
          +1 unit 3m 8s hadoop-auth in the patch passed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          29m 21s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HADOOP-12611
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12830582/HADOOP-12611.003.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 920793daf2e8 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 2acfb1e
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/10617/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-auth.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10617/testReport/
          modules C: hadoop-common-project/hadoop-auth U: hadoop-common-project/hadoop-auth
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10617/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 2s trunk passed +1 compile 7m 11s trunk passed +1 checkstyle 0m 11s trunk passed +1 mvnsite 0m 16s trunk passed +1 mvneclipse 0m 11s trunk passed +1 findbugs 0m 20s trunk passed +1 javadoc 0m 12s trunk passed +1 mvninstall 0m 12s the patch passed +1 compile 7m 9s the patch passed +1 javac 7m 9s the patch passed -0 checkstyle 0m 13s hadoop-common-project/hadoop-auth: The patch generated 26 new + 2 unchanged - 2 fixed = 28 total (was 4) +1 mvnsite 0m 17s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 34s the patch passed +1 javadoc 0m 12s the patch passed +1 unit 3m 8s hadoop-auth in the patch passed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 29m 21s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HADOOP-12611 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12830582/HADOOP-12611.003.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 920793daf2e8 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 2acfb1e Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/10617/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-auth.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10617/testReport/ modules C: hadoop-common-project/hadoop-auth U: hadoop-common-project/hadoop-auth Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10617/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          ebadger Eric Badger added a comment -

          Uploading a new patch that is built off of Wei-Chiu Chuang's 002 patch. I made 3 main changes.

          1. I explicitly call realRollSecret() so that we got rid of the race condition between rollSecret() being called and checking the secrets of each secretProvider.
          2. I changed the verify() functions to use atLeast() instead of times(), because it is possible for rollSecret() to be called many times if the main code is slow and the scheduler thread is not.
          3. I randomized the order in which realRollSecret() is called for each secretProvider.

          In regards to change #3, I had an offline conversation with Jason Lowe that makes me wonder why we need this in the first place (though I was the one who suggested randomly selecting the order). Why are we relying on the test to give us randomness? If there are a finite amount of states like there are here, why don't we iterate over all of them instead of trying to randomly get all of the states over multiple runs? If we care that every permutation of the order of realRollSecret() works, then we should test every permutation. Randomly selecting the order means that it would take 6+ runs to go through every permutation. So if a change was made in the source that screwed up one of those permutations, we might not see it until a long ways down (and then it would be really hard to debug since we wouldn't know what order). Robert Kanter, I think the better way to do this is to explicitly iterate through all of the permutations (or at least 2 permutations, 1 with each secretProvider "winning" the race).

          Show
          ebadger Eric Badger added a comment - Uploading a new patch that is built off of Wei-Chiu Chuang 's 002 patch. I made 3 main changes. 1. I explicitly call realRollSecret() so that we got rid of the race condition between rollSecret() being called and checking the secrets of each secretProvider. 2. I changed the verify() functions to use atLeast() instead of times(), because it is possible for rollSecret() to be called many times if the main code is slow and the scheduler thread is not. 3. I randomized the order in which realRollSecret() is called for each secretProvider. In regards to change #3, I had an offline conversation with Jason Lowe that makes me wonder why we need this in the first place (though I was the one who suggested randomly selecting the order). Why are we relying on the test to give us randomness? If there are a finite amount of states like there are here, why don't we iterate over all of them instead of trying to randomly get all of the states over multiple runs? If we care that every permutation of the order of realRollSecret() works, then we should test every permutation. Randomly selecting the order means that it would take 6+ runs to go through every permutation. So if a change was made in the source that screwed up one of those permutations, we might not see it until a long ways down (and then it would be really hard to debug since we wouldn't know what order). Robert Kanter , I think the better way to do this is to explicitly iterate through all of the permutations (or at least 2 permutations, 1 with each secretProvider "winning" the race).
          Hide
          rkanter Robert Kanter added a comment -

          That sounds right to me.

          You said above to check secrets based on size, but the secrets list is only ever 2 elements. So I could check for it to change, but I don't know how I would check for each iteration based on the size of the list.

          Wei-Chiu Chuang's 001 patch adds a subclass that remembers all secrets in an arraylist. That should let you determine based on size.

          Calling rollSecret() in a random order is an interesting idea. I think that would cover testing the randomness in the ordering.

          Show
          rkanter Robert Kanter added a comment - That sounds right to me. You said above to check secrets based on size, but the secrets list is only ever 2 elements. So I could check for it to change, but I don't know how I would check for each iteration based on the size of the list. Wei-Chiu Chuang 's 001 patch adds a subclass that remembers all secrets in an arraylist. That should let you determine based on size. Calling rollSecret() in a random order is an interesting idea. I think that would cover testing the randomness in the ordering.
          Hide
          ebadger Eric Badger added a comment - - edited

          Robert Kanter, thanks for the detailed response! What you said makes sense with regards to the race between the servers calling rollSecret() to push their secrets to ZK. Let me make sure that I understand the approach that you proposed above.

          1. Seed deterministically as we are currently doing and let rollSecret() happen twice.

          • You said above to check secrets based on size, but the secrets list is only ever 2 elements. So I could check for it to change, but I don't know how I would check for each iteration based on the size of the list.

          2. Keep track of the secrets list for both A and B at each iteration.
          3. Check to make sure that A and B are correct at each iteration

          • A: [A1, null], [A2, A1], [A3 or B3, A2]
          • B: [A2, A1], [A3 or B3, A2]

          I do see a potential problem with this setup though. Right after we call secretProviderB.init(), we check to make sure that it's secrets are equal to [A2, A1]. But if there is a slow code path for whatever reason in the main code, then rollSecret() could be called to update the secrets via either secretProviderA or secretProviderB. This would make the secrets [A3 or B3, A2] (or something else if rollSecret() was called multiple times) instead of [A2, A1]. I'm not sure how to remove this race condition without changing the source code.

          A little hokey, but would it be acceptable to explicitly call rollSecret() instead of using verify(), but calling them in a random order? This way we guarantee the number of times that rollSecret() is called, we guarantee the contents of secrets for both secretProviders, and we still provide the randomness of each secretProvider being able to talk to ZK first.

          Show
          ebadger Eric Badger added a comment - - edited Robert Kanter , thanks for the detailed response! What you said makes sense with regards to the race between the servers calling rollSecret() to push their secrets to ZK. Let me make sure that I understand the approach that you proposed above. 1. Seed deterministically as we are currently doing and let rollSecret() happen twice. You said above to check secrets based on size, but the secrets list is only ever 2 elements. So I could check for it to change, but I don't know how I would check for each iteration based on the size of the list. 2. Keep track of the secrets list for both A and B at each iteration. 3. Check to make sure that A and B are correct at each iteration A: [A1, null] , [A2, A1] , [A3 or B3, A2] B: [A2, A1] , [A3 or B3, A2] I do see a potential problem with this setup though. Right after we call secretProviderB.init(), we check to make sure that it's secrets are equal to [A2, A1] . But if there is a slow code path for whatever reason in the main code, then rollSecret() could be called to update the secrets via either secretProviderA or secretProviderB. This would make the secrets [A3 or B3, A2] (or something else if rollSecret() was called multiple times) instead of [A2, A1] . I'm not sure how to remove this race condition without changing the source code. A little hokey, but would it be acceptable to explicitly call rollSecret() instead of using verify(), but calling them in a random order? This way we guarantee the number of times that rollSecret() is called, we guarantee the contents of secrets for both secretProviders, and we still provide the randomness of each secretProvider being able to talk to ZK first.
          Hide
          rkanter Robert Kanter added a comment -

          It's been quite a while since I looked at this, so hopefully I'm remembering things correctly. The trick to the ZKSignerSecretProvider is that it doesn't rely on external locking by taking advantage of the versioning property in ZNodes. Specifically, each server has it's own seed for the RNG. Instead of reconciling that, because we don't really care about the actual content of the secret, all servers try to push their next generated secret to the ZNode based on the current version. Only one will succeed because the version will increment. All servers then read back the next generated secret from the ZNode. So while each server (or instance of ZKSignerSecretProvider) is deterministic because we give them a deterministic seed, we can't guarantee the ordering of the servers when they talk to ZooKeeper; whichever one happens to get there first, will "win". It would be best if the test reflects the fact that any of the servers can win at updating the secret, as that's more like what will happen in a real environment. Does that make sense?

          Show
          rkanter Robert Kanter added a comment - It's been quite a while since I looked at this, so hopefully I'm remembering things correctly. The trick to the ZKSignerSecretProvider is that it doesn't rely on external locking by taking advantage of the versioning property in ZNodes. Specifically, each server has it's own seed for the RNG. Instead of reconciling that, because we don't really care about the actual content of the secret, all servers try to push their next generated secret to the ZNode based on the current version. Only one will succeed because the version will increment. All servers then read back the next generated secret from the ZNode. So while each server (or instance of ZKSignerSecretProvider ) is deterministic because we give them a deterministic seed, we can't guarantee the ordering of the servers when they talk to ZooKeeper; whichever one happens to get there first, will "win". It would be best if the test reflects the fact that any of the servers can win at updating the secret, as that's more like what will happen in a real environment. Does that make sense?
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 48s trunk passed
          +1 compile 7m 45s trunk passed
          +1 checkstyle 0m 13s trunk passed
          +1 mvnsite 0m 17s trunk passed
          +1 mvneclipse 0m 11s trunk passed
          +1 findbugs 0m 23s trunk passed
          +1 javadoc 0m 12s trunk passed
          +1 mvninstall 0m 13s the patch passed
          +1 compile 7m 58s the patch passed
          +1 javac 7m 58s the patch passed
          -0 checkstyle 0m 12s hadoop-common-project/hadoop-auth: The patch generated 5 new + 3 unchanged - 1 fixed = 8 total (was 4)
          +1 mvnsite 0m 18s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 31s the patch passed
          +1 javadoc 0m 13s the patch passed
          +1 unit 3m 19s hadoop-auth in the patch passed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          31m 48s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HADOOP-12611
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12775426/HADOOP-12611.002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 18983f7587b0 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 8e06d86
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/10599/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-auth.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10599/testReport/
          modules C: hadoop-common-project/hadoop-auth U: hadoop-common-project/hadoop-auth
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10599/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 48s trunk passed +1 compile 7m 45s trunk passed +1 checkstyle 0m 13s trunk passed +1 mvnsite 0m 17s trunk passed +1 mvneclipse 0m 11s trunk passed +1 findbugs 0m 23s trunk passed +1 javadoc 0m 12s trunk passed +1 mvninstall 0m 13s the patch passed +1 compile 7m 58s the patch passed +1 javac 7m 58s the patch passed -0 checkstyle 0m 12s hadoop-common-project/hadoop-auth: The patch generated 5 new + 3 unchanged - 1 fixed = 8 total (was 4) +1 mvnsite 0m 18s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 31s the patch passed +1 javadoc 0m 13s the patch passed +1 unit 3m 19s hadoop-auth in the patch passed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 31m 48s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HADOOP-12611 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12775426/HADOOP-12611.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 18983f7587b0 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8e06d86 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/10599/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-auth.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10599/testReport/ modules C: hadoop-common-project/hadoop-auth U: hadoop-common-project/hadoop-auth Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10599/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Eric Badger It's a shame I've not worked on this patch for a long time. If you like feel free to grab and submit a patch!

          Show
          jojochuang Wei-Chiu Chuang added a comment - Eric Badger It's a shame I've not worked on this patch for a long time. If you like feel free to grab and submit a patch!
          Hide
          ebadger Eric Badger added a comment -

          Robert Kanter, Wei-Chiu Chuang, I read up on all of your comments on here and I think I understand what's going on. Robert, could you expand on what randomness we would be losing if we called rollSecret() explicitly as Wei-Chiu has in his v2 patch? The secret generation RNG is deterministic since we have given a deterministic seed, so we know what values are going to come out of generating the secret each time. The timing of exactly when rollSecret() would fire using the rolloverFrequency is random, but we're waiting on it anyway with the verify() functionality. Is there something else that I've missed here?

          Show
          ebadger Eric Badger added a comment - Robert Kanter , Wei-Chiu Chuang , I read up on all of your comments on here and I think I understand what's going on. Robert, could you expand on what randomness we would be losing if we called rollSecret() explicitly as Wei-Chiu has in his v2 patch? The secret generation RNG is deterministic since we have given a deterministic seed, so we know what values are going to come out of generating the secret each time. The timing of exactly when rollSecret() would fire using the rolloverFrequency is random, but we're waiting on it anyway with the verify() functionality. Is there something else that I've missed here?
          Hide
          ebadger Eric Badger added a comment -

          I just saw this test fail in our internal 2.8 build with the following stack trace

          org.junit.internal.ArrayComparisonFailure: arrays first differed at element [0]; expected:<49> but was:<56>
          	at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50)
          	at org.junit.Assert.internalArrayEquals(Assert.java:473)
          	at org.junit.Assert.assertArrayEquals(Assert.java:294)
          	at org.junit.Assert.assertArrayEquals(Assert.java:305)
          	at org.apache.hadoop.security.authentication.util.TestZKSignerSecretProvider.testMultipleUnsychnronized(TestZKSignerSecretProvider.java:237)
          

          Here are the relevant log lines from stdout:

          016-09-22 20:16:16,988 INFO  ZKSignerSecretProvider - Creating secret znode
          2016-09-22 20:16:18,001 INFO  SessionTrackerImpl - SessionTrackerImpl exited loop!
          2016-09-22 20:16:18,992 INFO  PrepRequestProcessor - Processed session termination for sessionid: 0x157538a9e5e0000
          2016-09-22 20:16:18,993 INFO  ZooKeeper - Session: 0x157538a9e5e0000 closed
          2016-09-22 20:16:18,993 INFO  ClientCnxn - EventThread shut down
          2016-09-22 20:16:18,993 INFO  NIOServerCnxn - Closed socket connection for client /127.0.0.1:47644 which had sessionid 0x157538a9e5e0000
          2016-09-22 20:16:18,994 ERROR ZKSignerSecretProvider - An unexpected exception occurred while pulling data fromZooKeeper
          java.lang.IllegalStateException: instance must be started before calling this method
          	at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
          	at org.apache.curator.framework.imps.CuratorFrameworkImpl.getData(CuratorFrameworkImpl.java:363)
          	at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider.pullFromZK(ZKSignerSecretProvider.java:314)
          	at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider.rollSecret(ZKSignerSecretProvider.java:237)
          	at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider$$EnhancerByMockitoWithCGLIB$$91cc7da3.CGLIB$rollSecret$0(<generated>)
          	at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider$$EnhancerByMockitoWithCGLIB$$91cc7da3$$FastClassByMockitoWithCGLIB$$200c7800.invoke(<generated>)
          	at org.mockito.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:216)
          	at org.mockito.internal.creation.AbstractMockitoMethodProxy.invokeSuper(AbstractMockitoMethodProxy.java:10)
          	at org.mockito.internal.invocation.realmethod.CGLIBProxyRealMethod.invoke(CGLIBProxyRealMethod.java:22)
          	at org.mockito.internal.invocation.realmethod.FilteredCGLIBProxyRealMethod.invoke(FilteredCGLIBProxyRealMethod.java:27)
          	at org.mockito.internal.invocation.Invocation.callRealMethod(Invocation.java:211)
          	at org.mockito.internal.stubbing.answers.CallsRealMethods.answer(CallsRealMethods.java:36)
          	at org.mockito.internal.MockHandler.handle(MockHandler.java:99)
          	at org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
          	at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider$$EnhancerByMockitoWithCGLIB$$91cc7da3.rollSecret(<generated>)
          	at org.apache.hadoop.security.authentication.util.RolloverSignerSecretProvider$1.run(RolloverSignerSecretProvider.java:97)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
          	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
          	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          	at java.lang.Thread.run(Thread.java:745)
          
          Show
          ebadger Eric Badger added a comment - I just saw this test fail in our internal 2.8 build with the following stack trace org.junit.internal.ArrayComparisonFailure: arrays first differed at element [0]; expected:<49> but was:<56> at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50) at org.junit.Assert.internalArrayEquals(Assert.java:473) at org.junit.Assert.assertArrayEquals(Assert.java:294) at org.junit.Assert.assertArrayEquals(Assert.java:305) at org.apache.hadoop.security.authentication.util.TestZKSignerSecretProvider.testMultipleUnsychnronized(TestZKSignerSecretProvider.java:237) Here are the relevant log lines from stdout: 016-09-22 20:16:16,988 INFO ZKSignerSecretProvider - Creating secret znode 2016-09-22 20:16:18,001 INFO SessionTrackerImpl - SessionTrackerImpl exited loop! 2016-09-22 20:16:18,992 INFO PrepRequestProcessor - Processed session termination for sessionid: 0x157538a9e5e0000 2016-09-22 20:16:18,993 INFO ZooKeeper - Session: 0x157538a9e5e0000 closed 2016-09-22 20:16:18,993 INFO ClientCnxn - EventThread shut down 2016-09-22 20:16:18,993 INFO NIOServerCnxn - Closed socket connection for client /127.0.0.1:47644 which had sessionid 0x157538a9e5e0000 2016-09-22 20:16:18,994 ERROR ZKSignerSecretProvider - An unexpected exception occurred while pulling data fromZooKeeper java.lang.IllegalStateException: instance must be started before calling this method at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.curator.framework.imps.CuratorFrameworkImpl.getData(CuratorFrameworkImpl.java:363) at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider.pullFromZK(ZKSignerSecretProvider.java:314) at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider.rollSecret(ZKSignerSecretProvider.java:237) at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider$$EnhancerByMockitoWithCGLIB$$91cc7da3.CGLIB$rollSecret$0(<generated>) at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider$$EnhancerByMockitoWithCGLIB$$91cc7da3$$FastClassByMockitoWithCGLIB$$200c7800.invoke(<generated>) at org.mockito.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:216) at org.mockito.internal.creation.AbstractMockitoMethodProxy.invokeSuper(AbstractMockitoMethodProxy.java:10) at org.mockito.internal.invocation.realmethod.CGLIBProxyRealMethod.invoke(CGLIBProxyRealMethod.java:22) at org.mockito.internal.invocation.realmethod.FilteredCGLIBProxyRealMethod.invoke(FilteredCGLIBProxyRealMethod.java:27) at org.mockito.internal.invocation.Invocation.callRealMethod(Invocation.java:211) at org.mockito.internal.stubbing.answers.CallsRealMethods.answer(CallsRealMethods.java:36) at org.mockito.internal.MockHandler.handle(MockHandler.java:99) at org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47) at org.apache.hadoop.security.authentication.util.ZKSignerSecretProvider$$EnhancerByMockitoWithCGLIB$$91cc7da3.rollSecret(<generated>) at org.apache.hadoop.security.authentication.util.RolloverSignerSecretProvider$1.run(RolloverSignerSecretProvider.java:97) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
          Hide
          rkanter Robert Kanter added a comment -

          it can inspect the initial state of secret (currentSecret[1] == null) right after the three ZKSignerSecretProvider are initialized.

          Ok, I think the problem here is that HADOOP-12181 changed the rollover frequency to only 2 seconds (it was originally 15). So, it's very likely that one or more of the ZKSignerSecretProviders has already rolled by the time we check that currentSecret[1] == null, so it's no longer null. With a rollover time of 15 seconds, this was much less likely to occur, though I suppose it was still possible.

          Trying to keep the timing/randomness that we'd like to test but still making the test not flakey, I think the best approach would be to do something similar to the 001 patch where you'd have a subclass of ZKSignerSecretProvider that keeps track of secrets, let it run for two iterations (which you can check based on the size of the list keeping the secrets so no timing is involved), and then verify that each iteration of secrets is correct. For example:

          1. [s1][null]
          2. [s2][s1]
          3. [s3][s2]

          And the test itself doesn't have any timing components because you'd be waiting for the list of secrets from each ZKSignerSecretProvider to contain 3 snapshots. You can then just check that all sets of snapshots agree.

          Does that sound good?

          Show
          rkanter Robert Kanter added a comment - it can inspect the initial state of secret (currentSecret[1] == null) right after the three ZKSignerSecretProvider are initialized. Ok, I think the problem here is that HADOOP-12181 changed the rollover frequency to only 2 seconds (it was originally 15). So, it's very likely that one or more of the ZKSignerSecretProviders has already rolled by the time we check that currentSecret[1] == null , so it's no longer null . With a rollover time of 15 seconds, this was much less likely to occur, though I suppose it was still possible. Trying to keep the timing/randomness that we'd like to test but still making the test not flakey, I think the best approach would be to do something similar to the 001 patch where you'd have a subclass of ZKSignerSecretProvider that keeps track of secrets , let it run for two iterations (which you can check based on the size of the list keeping the secrets so no timing is involved), and then verify that each iteration of secrets is correct. For example: [s1][null] [s2][s1] [s3][s2] And the test itself doesn't have any timing components because you'd be waiting for the list of secrets from each ZKSignerSecretProvider to contain 3 snapshots. You can then just check that all sets of snapshots agree. Does that sound good?
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Thanks Robert for the explanation!

          There does exist the issue with talking to ZK before connection is created, but that does not seem to cause the test failure. I have added blockUntilConnected previously but which did not resolve the issue. Although it seems to be an issue in the real world.

          The root cause of this issue is, in the test, three ZKSignerSecretProvider are initialized and get secret. By the time the third one finishes initialization, rollSecret() has been called to update the state of secret, both locally and remotely on the ZK server. However, the test code assumed the initialization is instantaneous, and that it can inspect the initial state of secret (currentSecret[1] == null) right after the three ZKSignerSecretProvider are initialized.

          The 001 patch did not pass the test, which is why I proposed the second patch, which passed on my local machine. But I'd agree 002 patch lost the randomness.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Thanks Robert for the explanation! There does exist the issue with talking to ZK before connection is created, but that does not seem to cause the test failure. I have added blockUntilConnected previously but which did not resolve the issue. Although it seems to be an issue in the real world. The root cause of this issue is, in the test, three ZKSignerSecretProvider are initialized and get secret. By the time the third one finishes initialization, rollSecret() has been called to update the state of secret, both locally and remotely on the ZK server. However, the test code assumed the initialization is instantaneous, and that it can inspect the initial state of secret (currentSecret [1] == null) right after the three ZKSignerSecretProvider are initialized. The 001 patch did not pass the test, which is why I proposed the second patch, which passed on my local machine. But I'd agree 002 patch lost the randomness.
          Hide
          rkanter Robert Kanter added a comment -

          I actually think the approach in the 001 patch is better. One of the things that the test is checking is the ZKSignerSecretProvider's logic of talking to ZK to have all instances agree on the same secret. So there should be an element of randomness and timing in this test, because that's what will normally happen. Though I agree that makes the test more difficult to manage.

          If my understanding is correct, it looks like the issue we see in the test is that ZKSignerSecretProvider can start trying to talk to ZK before the client is connected, right? In that case, I think we can easily fix that by having the test create the Curator clients and wait for them (using blockUntilConnected) to connect to ZK before even creating the ZKSignerSecretProviders. You can make ZKSignerSecretProvider use an existing Curator client by passing it from the ServletContext (see getDummyServletContext()). That said, perhaps it would be a best to put blockUntilConnected in ZKSignerSecretProvider itself, like your 001 patch does, or we might run into this problem in the real world?

          Show
          rkanter Robert Kanter added a comment - I actually think the approach in the 001 patch is better. One of the things that the test is checking is the ZKSignerSecretProvider 's logic of talking to ZK to have all instances agree on the same secret. So there should be an element of randomness and timing in this test, because that's what will normally happen. Though I agree that makes the test more difficult to manage. If my understanding is correct, it looks like the issue we see in the test is that ZKSignerSecretProvider can start trying to talk to ZK before the client is connected, right? In that case, I think we can easily fix that by having the test create the Curator clients and wait for them (using blockUntilConnected ) to connect to ZK before even creating the ZKSignerSecretProviders . You can make ZKSignerSecretProvider use an existing Curator client by passing it from the ServletContext (see getDummyServletContext() ). That said, perhaps it would be a best to put blockUntilConnected in ZKSignerSecretProvider itself, like your 001 patch does, or we might run into this problem in the real world?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 9m 18s trunk passed
          +1 compile 11m 0s trunk passed with JDK v1.8.0_66
          +1 compile 10m 40s trunk passed with JDK v1.7.0_85
          +1 checkstyle 0m 9s trunk passed
          +1 mvnsite 0m 25s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 0m 31s trunk passed
          +1 javadoc 0m 17s trunk passed with JDK v1.8.0_66
          +1 javadoc 0m 16s trunk passed with JDK v1.7.0_85
          +1 mvninstall 0m 22s the patch passed
          +1 compile 10m 52s the patch passed with JDK v1.8.0_66
          +1 javac 10m 52s the patch passed
          +1 compile 10m 37s the patch passed with JDK v1.7.0_85
          +1 javac 10m 37s the patch passed
          -1 checkstyle 0m 13s Patch generated 1 new checkstyle issues in hadoop-common-project/hadoop-auth (total was 2, now 3).
          +1 mvnsite 0m 24s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 0m 42s the patch passed
          +1 javadoc 0m 15s the patch passed with JDK v1.8.0_66
          +1 javadoc 0m 16s the patch passed with JDK v1.7.0_85
          +1 unit 3m 59s hadoop-auth in the patch passed with JDK v1.8.0_66.
          +1 unit 4m 14s hadoop-auth in the patch passed with JDK v1.7.0_85.
          +1 asflicense 0m 28s Patch does not generate ASF License warnings.
          66m 45s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12775426/HADOOP-12611.002.patch
          JIRA Issue HADOOP-12611
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 24a5217a203a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 6b9a5be
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/8176/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-auth.txt
          JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8176/testReport/
          modules C: hadoop-common-project/hadoop-auth U: hadoop-common-project/hadoop-auth
          Max memory used 76MB
          Powered by Apache Yetus http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8176/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 9m 18s trunk passed +1 compile 11m 0s trunk passed with JDK v1.8.0_66 +1 compile 10m 40s trunk passed with JDK v1.7.0_85 +1 checkstyle 0m 9s trunk passed +1 mvnsite 0m 25s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 0m 31s trunk passed +1 javadoc 0m 17s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 16s trunk passed with JDK v1.7.0_85 +1 mvninstall 0m 22s the patch passed +1 compile 10m 52s the patch passed with JDK v1.8.0_66 +1 javac 10m 52s the patch passed +1 compile 10m 37s the patch passed with JDK v1.7.0_85 +1 javac 10m 37s the patch passed -1 checkstyle 0m 13s Patch generated 1 new checkstyle issues in hadoop-common-project/hadoop-auth (total was 2, now 3). +1 mvnsite 0m 24s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 0m 42s the patch passed +1 javadoc 0m 15s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 16s the patch passed with JDK v1.7.0_85 +1 unit 3m 59s hadoop-auth in the patch passed with JDK v1.8.0_66. +1 unit 4m 14s hadoop-auth in the patch passed with JDK v1.7.0_85. +1 asflicense 0m 28s Patch does not generate ASF License warnings. 66m 45s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12775426/HADOOP-12611.002.patch JIRA Issue HADOOP-12611 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 24a5217a203a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 6b9a5be findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/8176/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-auth.txt JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8176/testReport/ modules C: hadoop-common-project/hadoop-auth U: hadoop-common-project/hadoop-auth Max memory used 76MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8176/console This message was automatically generated.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Rev02 is based on Masatake's suggestion with my hack.

          Sub-class ZKSignerSecretProvider, make rollSecret() of the subclass an no-op. In this way, we can avoid the race-condition issue; on the other hand, Mockito can still verify that rollSecret() is called periodically at the expected frequency.

          Finally, because rollSecret() of the subclass is a no-op, the test manually calls ZKSignerSecretProvider.rollSecret() to update the secret. Best of all, src/ code is virtually not modified (except for some logging).

          I tested this patch locally and no test failures happened in more than 100 runs.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Rev02 is based on Masatake's suggestion with my hack. Sub-class ZKSignerSecretProvider, make rollSecret() of the subclass an no-op. In this way, we can avoid the race-condition issue; on the other hand, Mockito can still verify that rollSecret() is called periodically at the expected frequency. Finally, because rollSecret() of the subclass is a no-op, the test manually calls ZKSignerSecretProvider.rollSecret() to update the secret. Best of all, src/ code is virtually not modified (except for some logging). I tested this patch locally and no test failures happened in more than 100 runs.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          My understanding of the code is probably wrong. Maybe the correct solution is to patch ZKSignerSecretProvider so that it does not roll secret until the initialization is done?

          Show
          jojochuang Wei-Chiu Chuang added a comment - My understanding of the code is probably wrong. Maybe the correct solution is to patch ZKSignerSecretProvider so that it does not roll secret until the initialization is done?
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Masatake Iwasaki Thanks for your suggestions. I think your proposed solution will also fix the issue.

          Because the issue is mostly related to test logic, it would be great if we can change just the test code, and minimize the change to the src code. My patch here only has only one line of change in ZKSignerSecretProvider (changing modifier from private to protected) and no change to RolloverSignerSecretProvider. It also preserves the period rolling of secret, which is another property the test tries to verify.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Masatake Iwasaki Thanks for your suggestions. I think your proposed solution will also fix the issue. Because the issue is mostly related to test logic, it would be great if we can change just the test code, and minimize the change to the src code. My patch here only has only one line of change in ZKSignerSecretProvider (changing modifier from private to protected) and no change to RolloverSignerSecretProvider. It also preserves the period rolling of secret, which is another property the test tries to verify.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Rev01: proof of concept.

          The main point in the patch is to avoid the synchronization problem. By subclassing ZKSignerSecretProvider, and overriding pullFromZK(), I can store the secret each time it is updated. In this way, I can store and compare multiple versions of secrets without the need to synchronize three secret providers.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Rev01: proof of concept. The main point in the patch is to avoid the synchronization problem. By subclassing ZKSignerSecretProvider, and overriding pullFromZK(), I can store the secret each time it is updated. In this way, I can store and compare multiple versions of secrets without the need to synchronize three secret providers.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Hi Masatake Iwasaki thanks for comments.
          So yes, I believe HADOOP-12181 reduced timeout which makes the test more likely to fail.
          I think the original test logic had the race condition/is not thread-safe. I have to increase the timeout by twice (timeout=8000) in order not to see the failures (in 1000 runs). On my machine, some ZKSignerProvider.init() calls takes long time (~1000 ms) to complete, and therefore rollSecret() has been called already.

          In general I don't like the idea of relying on timing. I think a better hack is to subclass ZKSignerSecretProvider, overriding rollSecret() to collect secrets after each invocation. Let it run for a few periods, and then compare the secret of each secret provider.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Hi Masatake Iwasaki thanks for comments. So yes, I believe HADOOP-12181 reduced timeout which makes the test more likely to fail. I think the original test logic had the race condition/is not thread-safe. I have to increase the timeout by twice (timeout=8000) in order not to see the failures (in 1000 runs). On my machine, some ZKSignerProvider.init() calls takes long time (~1000 ms) to complete, and therefore rollSecret() has been called already. In general I don't like the idea of relying on timing. I think a better hack is to subclass ZKSignerSecretProvider, overriding rollSecret() to collect secrets after each invocation. Let it run for a few periods, and then compare the secret of each secret provider.
          Hide
          iwasakims Masatake Iwasaki added a comment -

          The exception "An unexpected exception occurred while pulling data fromZooKeeper" is unrelated to this failure. I can reproduce the failure even without that exception.

          Yeah. java.lang.IllegalStateException: instance must be started before calling this method means curator client is used before started. There may be another bug on the test or ZKSignerSecretProvider.

          Show
          iwasakims Masatake Iwasaki added a comment - The exception "An unexpected exception occurred while pulling data fromZooKeeper" is unrelated to this failure. I can reproduce the failure even without that exception. Yeah. java.lang.IllegalStateException: instance must be started before calling this method means curator client is used before started. There may be another bug on the test or ZKSignerSecretProvider.
          Hide
          iwasakims Masatake Iwasaki added a comment -

          Thanks for working on this, Wei-Chiu Chuang. I changed rolloverFrequency from 15s to 2s in order to shorten test execution time. It is possible to be too short on heavily loaded build server. Setting rolloverFrequency under 50ms reproduced the error on my VM. It is hard to say what value is appropriate.

          I think mocking out scheduler and calling ZKSignerSecretProvider#rollSecret explicitly could be an option to make test deterministic. If it is difficult to mock scheduler, we can effectively deactivate scheduler by setting very long rolloverFrequency instead.

          Show
          iwasakims Masatake Iwasaki added a comment - Thanks for working on this, Wei-Chiu Chuang . I changed rolloverFrequency from 15s to 2s in order to shorten test execution time. It is possible to be too short on heavily loaded build server. Setting rolloverFrequency under 50ms reproduced the error on my VM. It is hard to say what value is appropriate. I think mocking out scheduler and calling ZKSignerSecretProvider#rollSecret explicitly could be an option to make test deterministic. If it is difficult to mock scheduler, we can effectively deactivate scheduler by setting very long rolloverFrequency instead.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          The test logic assumes that rollSecret() is not called when allSecretsA[1] is checked ( allSecretsA[1] is initially null, and will be updated by calling rollSecret() ). But some times, the test code takes more time to run, and rollSecret() will be called. This is a race condition.

          Show
          jojochuang Wei-Chiu Chuang added a comment - The test logic assumes that rollSecret() is not called when allSecretsA [1] is checked ( allSecretsA [1] is initially null, and will be updated by calling rollSecret() ). But some times, the test code takes more time to run, and rollSecret() will be called. This is a race condition.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          The exception "An unexpected exception occurred while pulling data fromZooKeeper" is unrelated to this failure. I can reproduce the failure even without that exception.

          Show
          jojochuang Wei-Chiu Chuang added a comment - The exception "An unexpected exception occurred while pulling data fromZooKeeper" is unrelated to this failure. I can reproduce the failure even without that exception.

            People

            • Assignee:
              ebadger Eric Badger
              Reporter:
              jojochuang Wei-Chiu Chuang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development