Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.9.0, 2.8.3, 3.0.0
    • Component/s: None
    • Labels:
      None

      Description

      The kms client appears to have no retry logic – at all. It's completely decoupled from the ipc retry logic. This has major impacts if the KMS is unreachable for any reason, including but not limited to network connection issues, timeouts, the restart during an upgrade.

      This has some major ramifications:

      1. Jobs may fail to submit, although oozie resubmit logic should mask it
      2. Non-oozie launchers may experience higher rates if they do not already have retry logic.
      3. Tasks reading EZ files will fail, probably be masked by framework reattempts
      4. EZ file creation fails after creating a 0-length file – client receives EDEK in the create response, then fails when decrypting the EDEK
      5. Bulk hadoop fs copies, and maybe distcp, will prematurely fail
      1. HADOOP-14521.09.patch
        31 kB
        Xiao Chen
      2. HADOOP-14521.11.patch
        31 kB
        Xiao Chen
      3. HADOOP-14521-branch-2.8.002.patch
        31 kB
        Rushabh S Shah
      4. HADOOP-14521-branch-2.8.2.patch
        31 kB
        Rushabh S Shah
      5. HADOOP-14521-trunk-10.patch
        31 kB
        Rushabh S Shah
      6. HDFS-11804-branch-2.8.patch
        30 kB
        Rushabh S Shah
      7. HDFS-11804-trunk.patch
        22 kB
        Rushabh S Shah
      8. HDFS-11804-trunk-1.patch
        22 kB
        Rushabh S Shah
      9. HDFS-11804-trunk-2.patch
        24 kB
        Rushabh S Shah
      10. HDFS-11804-trunk-3.patch
        24 kB
        Rushabh S Shah
      11. HDFS-11804-trunk-4.patch
        24 kB
        Rushabh S Shah
      12. HDFS-11804-trunk-5.patch
        27 kB
        Rushabh S Shah
      13. HDFS-11804-trunk-6.patch
        30 kB
        Rushabh S Shah
      14. HDFS-11804-trunk-7.patch
        30 kB
        Rushabh S Shah
      15. HDFS-11804-trunk-8.patch
        30 kB
        Rushabh S Shah

        Activity

        Hide
        shahrs87 Rushabh S Shah added a comment - - edited

        This is by no means a committable patch.
        It needs some rework.
        Changes done in this patch:

        1. Only LoadBalancingKMSClientProvider will be instantiated when createProvider is being called.
        2. Added three new config keys similar to dfs client.
          • kms.client.failover.attempts.multiplier: This is the multiplier factor.
            For example: if kms.client.failover.attempts.multiplier is 2 and there are 2 backend kms servers, then client will retry for maximum 4 times.
            Default is 1 to maintain original behavior.
          • kms.client.failover.sleep.base.millis: To calculate failover sleep time.
            Default is 500 (same as dfs client configuration)
          • kms.client.failover.sleep.max.millis: To calculate maximum amount of time to sleep.
            Default is 15000 (same as dfs client configuration)
        3. The client is configured to use FailoverOnNetworkExceptionRetry with fallback policy as TryOnceThenFail.
          On encountering AccessControlException or AuthorizationException, the client will stop retrying. This assumes all the servers are configured with identical permissions and key acls.
        4. Since the RetryPolicy#shouldRetry function expects whether the option is idempotent or not, We need to decide what all operations are idempotent.
          Below is my understanding.
          • IdempotentOperations: renewDelegationToken, cancelDelegationToken, decryptEncryptedKey, getKeyVersion, getKeys, getKeysMetadata, getKeyVersions, getCurrentKey, getMetadata
          • AtMostOnce: createKey, deleteKey, rollNewVersion
          • Not sure: addDelegationTokens, generateEncryptedKey, reencryptEncryptedKey
            Would like to know your views.

        There are few TODOs in the patch. Would like to know your opinions on that also.
        Thanks in advance for reviewing my patch.

        Show
        shahrs87 Rushabh S Shah added a comment - - edited This is by no means a committable patch. It needs some rework. Changes done in this patch: Only LoadBalancingKMSClientProvider will be instantiated when createProvider is being called. Added three new config keys similar to dfs client. kms.client.failover.attempts.multiplier : This is the multiplier factor. For example: if kms.client.failover.attempts.multiplier is 2 and there are 2 backend kms servers, then client will retry for maximum 4 times. Default is 1 to maintain original behavior. kms.client.failover.sleep.base.millis : To calculate failover sleep time. Default is 500 (same as dfs client configuration) kms.client.failover.sleep.max.millis : To calculate maximum amount of time to sleep. Default is 15000 (same as dfs client configuration) The client is configured to use FailoverOnNetworkExceptionRetry with fallback policy as TryOnceThenFail . On encountering AccessControlException or AuthorizationException, the client will stop retrying. This assumes all the servers are configured with identical permissions and key acls. Since the RetryPolicy#shouldRetry function expects whether the option is idempotent or not, We need to decide what all operations are idempotent. Below is my understanding. IdempotentOperations: renewDelegationToken, cancelDelegationToken, decryptEncryptedKey, getKeyVersion, getKeys, getKeysMetadata, getKeyVersions, getCurrentKey, getMetadata AtMostOnce: createKey, deleteKey, rollNewVersion Not sure: addDelegationTokens, generateEncryptedKey, reencryptEncryptedKey Would like to know your views. There are few TODOs in the patch. Would like to know your opinions on that also. Thanks in advance for reviewing my patch.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 15s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        0 mvndep 0m 19s Maven dependency ordering for branch
        +1 mvninstall 13m 43s trunk passed
        +1 compile 14m 42s trunk passed
        +1 checkstyle 1m 55s trunk passed
        +1 mvnsite 2m 12s trunk passed
        +1 mvneclipse 0m 42s trunk passed
        -1 findbugs 1m 23s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings.
        +1 javadoc 1m 36s trunk passed
        0 mvndep 0m 14s Maven dependency ordering for patch
        -1 mvninstall 0m 46s hadoop-hdfs in the patch failed.
        -1 compile 2m 3s root in the patch failed.
        -1 javac 2m 3s root in the patch failed.
        -0 checkstyle 1m 47s root: The patch generated 21 new + 28 unchanged - 0 fixed = 49 total (was 28)
        -1 mvnsite 0m 46s hadoop-hdfs in the patch failed.
        +1 mvneclipse 0m 25s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        -1 findbugs 0m 24s hadoop-hdfs in the patch failed.
        +1 javadoc 1m 22s the patch passed
        +1 unit 7m 37s hadoop-common in the patch passed.
        -1 unit 0m 47s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 19s The patch does not generate ASF License warnings.
        59m 11s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-11804
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12869178/HDFS-11804-trunk.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 30f88a9203f3 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 46d9c4c
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
        mvninstall https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt
        compile https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/patch-compile-root.txt
        javac https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/patch-compile-root.txt
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/diff-checkstyle-root.txt
        mvnsite https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/patch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19534/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19534/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 19s Maven dependency ordering for branch +1 mvninstall 13m 43s trunk passed +1 compile 14m 42s trunk passed +1 checkstyle 1m 55s trunk passed +1 mvnsite 2m 12s trunk passed +1 mvneclipse 0m 42s trunk passed -1 findbugs 1m 23s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings. +1 javadoc 1m 36s trunk passed 0 mvndep 0m 14s Maven dependency ordering for patch -1 mvninstall 0m 46s hadoop-hdfs in the patch failed. -1 compile 2m 3s root in the patch failed. -1 javac 2m 3s root in the patch failed. -0 checkstyle 1m 47s root: The patch generated 21 new + 28 unchanged - 0 fixed = 49 total (was 28) -1 mvnsite 0m 46s hadoop-hdfs in the patch failed. +1 mvneclipse 0m 25s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. -1 findbugs 0m 24s hadoop-hdfs in the patch failed. +1 javadoc 1m 22s the patch passed +1 unit 7m 37s hadoop-common in the patch passed. -1 unit 0m 47s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 59m 11s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11804 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12869178/HDFS-11804-trunk.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 30f88a9203f3 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 46d9c4c Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html mvninstall https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt compile https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/patch-compile-root.txt javac https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/patch-compile-root.txt checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/diff-checkstyle-root.txt mvnsite https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/patch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19534/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19534/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19534/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Just quick fixing the compile errors in this patch.
        Will fix checkstyle and find bugs warning in next patch

        Show
        shahrs87 Rushabh S Shah added a comment - Just quick fixing the compile errors in this patch. Will fix checkstyle and find bugs warning in next patch
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 16s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        0 mvndep 0m 14s Maven dependency ordering for branch
        +1 mvninstall 12m 59s trunk passed
        +1 compile 13m 15s trunk passed
        +1 checkstyle 1m 51s trunk passed
        +1 mvnsite 2m 2s trunk passed
        +1 mvneclipse 0m 42s trunk passed
        -1 findbugs 1m 21s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings.
        +1 javadoc 1m 35s trunk passed
        0 mvndep 0m 14s Maven dependency ordering for patch
        +1 mvninstall 1m 25s the patch passed
        +1 compile 12m 34s the patch passed
        +1 javac 12m 34s the patch passed
        -0 checkstyle 1m 53s root: The patch generated 21 new + 28 unchanged - 0 fixed = 49 total (was 28)
        +1 mvnsite 2m 3s the patch passed
        +1 mvneclipse 0m 42s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 3m 23s the patch passed
        +1 javadoc 1m 34s the patch passed
        +1 unit 7m 26s hadoop-common in the patch passed.
        -1 unit 64m 0s hadoop-hdfs in the patch failed.
        -1 asflicense 0m 37s The patch generated 1 ASF License warnings.
        133m 9s



        Reason Tests
        Failed junit tests hadoop.hdfs.TestAclsEndToEnd
          hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
          hadoop.hdfs.server.balancer.TestBalancer
          hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery
          hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-11804
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12869181/HDFS-11804-trunk-1.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux ff7685554308 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 46d9c4c
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19535/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19535/artifact/patchprocess/diff-checkstyle-root.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19535/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19535/testReport/
        asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/19535/artifact/patchprocess/patch-asflicense-problems.txt
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19535/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 14s Maven dependency ordering for branch +1 mvninstall 12m 59s trunk passed +1 compile 13m 15s trunk passed +1 checkstyle 1m 51s trunk passed +1 mvnsite 2m 2s trunk passed +1 mvneclipse 0m 42s trunk passed -1 findbugs 1m 21s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings. +1 javadoc 1m 35s trunk passed 0 mvndep 0m 14s Maven dependency ordering for patch +1 mvninstall 1m 25s the patch passed +1 compile 12m 34s the patch passed +1 javac 12m 34s the patch passed -0 checkstyle 1m 53s root: The patch generated 21 new + 28 unchanged - 0 fixed = 49 total (was 28) +1 mvnsite 2m 3s the patch passed +1 mvneclipse 0m 42s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 3m 23s the patch passed +1 javadoc 1m 34s the patch passed +1 unit 7m 26s hadoop-common in the patch passed. -1 unit 64m 0s hadoop-hdfs in the patch failed. -1 asflicense 0m 37s The patch generated 1 ASF License warnings. 133m 9s Reason Tests Failed junit tests hadoop.hdfs.TestAclsEndToEnd   hadoop.hdfs.server.namenode.ha.TestPipelinesFailover   hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery   hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11804 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12869181/HDFS-11804-trunk-1.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ff7685554308 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 46d9c4c Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19535/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19535/artifact/patchprocess/diff-checkstyle-root.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19535/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19535/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/19535/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19535/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Andrew Wang, John Zhuge, Yongjun Zhang Wei-Chiu Chuang Daryn Sharp: Can I get some review cycles from you guys please ?
        Apologies for spamming all of you.

        Show
        shahrs87 Rushabh S Shah added a comment - Andrew Wang , John Zhuge , Yongjun Zhang Wei-Chiu Chuang Daryn Sharp : Can I get some review cycles from you guys please ? Apologies for spamming all of you.
        Hide
        xiaochen Xiao Chen added a comment -

        Thanks Rushabh S Shah for proposing the idea and a demonstrative patch. +1 on adding client side retries.

        Some comments:

        • Should we also treat AuthenticationException as non-retry?
        • Do we want retry policy's maxRetries also be configurable? Any reason hard-code that to 0?
        • generateEncryptedKey, reencryptEncryptedKey I think are idempotent. addDelegationTokens should be Idempotent (because it's really getDelegationTokens, so similar to hdfs's ClientProtocol IMHO.
        • retry: m2c is we retry immediately to give each server a chance. If multiplier > 1 (say, numFailovers > providers.length), then we retry with delays. (Related, I think KMS's exceptions are less complicated than NN exception, mostly because KMSCP's validateResponse throws just IOExceptions in many cases.)
        Show
        xiaochen Xiao Chen added a comment - Thanks Rushabh S Shah for proposing the idea and a demonstrative patch. +1 on adding client side retries. Some comments: Should we also treat AuthenticationException as non-retry? Do we want retry policy's maxRetries also be configurable? Any reason hard-code that to 0? generateEncryptedKey, reencryptEncryptedKey I think are idempotent. addDelegationTokens should be Idempotent (because it's really getDelegationTokens, so similar to hdfs's ClientProtocol IMHO. retry: m2c is we retry immediately to give each server a chance. If multiplier > 1 (say, numFailovers > providers.length), then we retry with delays. (Related, I think KMS's exceptions are less complicated than NN exception, mostly because KMSCP's validateResponse throws just IOExceptions in many cases.)
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Thanks Xiao Chen for reviewing.

        Should we also treat AuthenticationException as non-retry?

        This means user doesn't have access to key. Even if we retry its going to fails anyways unless some servers are misconfigured.
        I am ok with any approach.

        Do we want retry policy's maxRetries also be configurable? Any reason hard-code that to 0?

        IIUC we should increment the retry counter only if we encounter Retriable Exception. Since kms server doesn't throw Retribale exception yet, it didn't made sense to have retry count.

        generateEncryptedKey, reencryptEncryptedKey I think are idempotent.

        Thanks.. Will incorporate the change in next patch.

        retry: m2c is we retry immediately to give each server a chance. If multiplier > 1 (say, numFailovers > providers.length), then we retry with delays.

        I didn't understand this comment.
        Can you elaborate the context ?

        Show
        shahrs87 Rushabh S Shah added a comment - Thanks Xiao Chen for reviewing. Should we also treat AuthenticationException as non-retry? This means user doesn't have access to key. Even if we retry its going to fails anyways unless some servers are misconfigured. I am ok with any approach. Do we want retry policy's maxRetries also be configurable? Any reason hard-code that to 0? IIUC we should increment the retry counter only if we encounter Retriable Exception. Since kms server doesn't throw Retribale exception yet, it didn't made sense to have retry count. generateEncryptedKey, reencryptEncryptedKey I think are idempotent. Thanks.. Will incorporate the change in next patch. retry: m2c is we retry immediately to give each server a chance. If multiplier > 1 (say, numFailovers > providers.length), then we retry with delays. I didn't understand this comment. Can you elaborate the context ?
        Hide
        xiaochen Xiao Chen added a comment -

        Thanks Rushabh.

        This means user doesn't have access to key. Even if we retry its going to fails anyways unless some servers are misconfigured.

        Correct. So we should treat AuthenticationException similar as AccessControlException and AuthorizationException, right?

        Can you elaborate the context ?

        Say for example we have 3 KMS's. I think we should retry immediately for all 3 at first. (Same as current LBKMSCP behavior).
        If multiplier > 1, we could retry with delays on the 2nd pass.

        Show
        xiaochen Xiao Chen added a comment - Thanks Rushabh. This means user doesn't have access to key. Even if we retry its going to fails anyways unless some servers are misconfigured. Correct. So we should treat AuthenticationException similar as AccessControlException and AuthorizationException , right? Can you elaborate the context ? Say for example we have 3 KMS's. I think we should retry immediately for all 3 at first. (Same as current LBKMSCP behavior). If multiplier > 1, we could retry with delays on the 2nd pass.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Correct. So we should treat AuthenticationException similar as AccessControlException and AuthorizationException, right?

        Somehow I mixed AuthenticationException with AuthorizationException. my bad.
        Yes will incorporate the change in next patch.

        Say for example we have 3 KMS's. I think we should retry immediately for all 3 at first.

        Makes sense..The only issue I can see is the first time we calculate FailoverOnNetworkExceptionRetry.getFailoverOrRetrySleepTime(int times) will have times == #kms servers instead of gradually increasing the sleep time. But maybe its ok.
        Will attach a new patch shortly incorporating all the comments.
        Thanks again Xiao Chen !

        Show
        shahrs87 Rushabh S Shah added a comment - Correct. So we should treat AuthenticationException similar as AccessControlException and AuthorizationException, right? Somehow I mixed AuthenticationException with AuthorizationException. my bad. Yes will incorporate the change in next patch. Say for example we have 3 KMS's. I think we should retry immediately for all 3 at first. Makes sense..The only issue I can see is the first time we calculate FailoverOnNetworkExceptionRetry.getFailoverOrRetrySleepTime(int times) will have times == #kms servers instead of gradually increasing the sleep time. But maybe its ok. Will attach a new patch shortly incorporating all the comments. Thanks again Xiao Chen !
        Hide
        daryn Daryn Sharp added a comment -

        Minor comments:

        1. I'm not sure about a multiplier. Let's say I want to retry for a certain amount of time. I do the math and decide on 10 retries. I have 2 kms servers so multiplier is 5. Then the bank is expanded to 8, causing 40 retries. Or even the reverse, shrink the bank and now client retries less than expected. Should just specify a max retries that defaults to the number of load balanced hosts (today's behavior) is probably better.
        2. The max sleep time seems rather high since it can block server threads for a long time. I'd suggest perhaps 1-2s.
        3. Should probably only sleep after trying all clients instead of between each client. If there's a refused connection because a kms server is down, there's little use in waiting to try the next.
        4. Separate try blocks for shouldRetry and sleep to better control exception rethrows.
        5. Change the sleep catch to rethrow InteruptedIOException.
        6. Trivial but there's a lot of double punctuation, ex. periods, exclamations.
        Show
        daryn Daryn Sharp added a comment - Minor comments: I'm not sure about a multiplier. Let's say I want to retry for a certain amount of time. I do the math and decide on 10 retries. I have 2 kms servers so multiplier is 5. Then the bank is expanded to 8, causing 40 retries. Or even the reverse, shrink the bank and now client retries less than expected. Should just specify a max retries that defaults to the number of load balanced hosts (today's behavior) is probably better. The max sleep time seems rather high since it can block server threads for a long time. I'd suggest perhaps 1-2s. Should probably only sleep after trying all clients instead of between each client. If there's a refused connection because a kms server is down, there's little use in waiting to try the next. Separate try blocks for shouldRetry and sleep to better control exception rethrows. Change the sleep catch to rethrow InteruptedIOException . Trivial but there's a lot of double punctuation, ex. periods, exclamations.
        Hide
        shahrs87 Rushabh S Shah added a comment - - edited

        Attaching a new patching addressing all the comments by Daryn Sharp and Xiao Chen

        Should we also treat AuthenticationException as non-retry?

        Since AuthenticationException is not thrown by LoadBalancingKMSClientProvider#doOp, we can't catch it explicitly.
        But it will be thrown as WrapperException and we don't retry.

        I think we should retry immediately for all 3 at first. (Same as current LBKMSCP behavior).

        Addressed in the latest patch.

        Should just specify a max retries that defaults to the number of load balanced hosts (today's behavior) is probably better.

        Addressed in the latest patch.

        The max sleep time seems rather high since it can block server threads for a long time. I'd suggest perhaps 1-2s.

        Made it 2 seconds.

        Should probably only sleep after trying all clients instead of between each client.

        Addressed in the latest patch.

        Separate try blocks for shouldRetry and sleep to better control exception rethrows.

        Addressed in the latest patch.

        Change the sleep catch to rethrow InteruptedIOException.

        Addressed in the latest patch.

        Trivial but there's a lot of double punctuation, ex. periods, exclamations.

        Addressed in the latest patch.

        One more major change done in the latest patch.
        Treat all the operations as non idempotent.
        Since we catch all the Socket related exceptions in FailoverOnNetworkExceptionRetry#shouldRetry in the first catch block and Failover, it should be safe to assume that all the exception that passed the first catch block shouldn't be retried.
        Consulted with Daryn on this approach.

        Show
        shahrs87 Rushabh S Shah added a comment - - edited Attaching a new patching addressing all the comments by Daryn Sharp and Xiao Chen Should we also treat AuthenticationException as non-retry? Since AuthenticationException is not thrown by LoadBalancingKMSClientProvider#doOp , we can't catch it explicitly. But it will be thrown as WrapperException and we don't retry. I think we should retry immediately for all 3 at first. (Same as current LBKMSCP behavior). Addressed in the latest patch. Should just specify a max retries that defaults to the number of load balanced hosts (today's behavior) is probably better. Addressed in the latest patch. The max sleep time seems rather high since it can block server threads for a long time. I'd suggest perhaps 1-2s. Made it 2 seconds. Should probably only sleep after trying all clients instead of between each client. Addressed in the latest patch. Separate try blocks for shouldRetry and sleep to better control exception rethrows. Addressed in the latest patch. Change the sleep catch to rethrow InteruptedIOException. Addressed in the latest patch. Trivial but there's a lot of double punctuation, ex. periods, exclamations. Addressed in the latest patch. One more major change done in the latest patch. Treat all the operations as non idempotent. Since we catch all the Socket related exceptions in FailoverOnNetworkExceptionRetry#shouldRetry in the first catch block and Failover, it should be safe to assume that all the exception that passed the first catch block shouldn't be retried. Consulted with Daryn on this approach.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 28s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        0 mvndep 0m 29s Maven dependency ordering for branch
        +1 mvninstall 15m 49s trunk passed
        +1 compile 17m 21s trunk passed
        +1 checkstyle 2m 13s trunk passed
        +1 mvnsite 2m 39s trunk passed
        -1 findbugs 1m 49s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings.
        +1 javadoc 1m 49s trunk passed
        0 mvndep 0m 17s Maven dependency ordering for patch
        +1 mvninstall 1m 50s the patch passed
        +1 compile 12m 46s the patch passed
        +1 javac 12m 46s the patch passed
        -0 checkstyle 2m 13s root: The patch generated 21 new + 28 unchanged - 0 fixed = 49 total (was 28)
        +1 mvnsite 2m 29s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 3m 54s the patch passed
        +1 javadoc 1m 36s the patch passed
        -1 unit 7m 45s hadoop-common in the patch failed.
        -1 unit 67m 32s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 42s The patch does not generate ASF License warnings.
        147m 18s



        Reason Tests
        Failed junit tests hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem
          hadoop.hdfs.TestAclsEndToEnd
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
          hadoop.hdfs.server.namenode.TestDecommissioningStatus
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy
          hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-11804
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12871688/HDFS-11804-trunk-2.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux c4472a286bfa 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 867903d
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19806/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19806/artifact/patchprocess/diff-checkstyle-root.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19806/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19806/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19806/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19806/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 28s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 29s Maven dependency ordering for branch +1 mvninstall 15m 49s trunk passed +1 compile 17m 21s trunk passed +1 checkstyle 2m 13s trunk passed +1 mvnsite 2m 39s trunk passed -1 findbugs 1m 49s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings. +1 javadoc 1m 49s trunk passed 0 mvndep 0m 17s Maven dependency ordering for patch +1 mvninstall 1m 50s the patch passed +1 compile 12m 46s the patch passed +1 javac 12m 46s the patch passed -0 checkstyle 2m 13s root: The patch generated 21 new + 28 unchanged - 0 fixed = 49 total (was 28) +1 mvnsite 2m 29s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 3m 54s the patch passed +1 javadoc 1m 36s the patch passed -1 unit 7m 45s hadoop-common in the patch failed. -1 unit 67m 32s hadoop-hdfs in the patch failed. +1 asflicense 0m 42s The patch does not generate ASF License warnings. 147m 18s Reason Tests Failed junit tests hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem   hadoop.hdfs.TestAclsEndToEnd   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.server.namenode.TestDecommissioningStatus   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150   hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy   hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11804 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12871688/HDFS-11804-trunk-2.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c4472a286bfa 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 867903d Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19806/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19806/artifact/patchprocess/diff-checkstyle-root.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19806/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19806/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19806/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19806/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        shahrs87 Rushabh S Shah added a comment - - edited

        Added new patch to address checkstyle warnings.
        The test failures from the previous build are unrelated :

        Failed test Comment
        hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem Passing locally
        hadoop.hdfs.TestAclsEndToEnd Created HDFS-11944
        hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Passing locally
        hadoop.hdfs.server.namenode.TestDecommissioningStatus Passing locally
        hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 Passing locally
        hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy Passing locally
        hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy Passing locally

        Also none of the findbugs warnings are related to my patch.

        Xiao Chen Daryn Sharp please review the latest patch.

        Show
        shahrs87 Rushabh S Shah added a comment - - edited Added new patch to address checkstyle warnings. The test failures from the previous build are unrelated : Failed test Comment hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem Passing locally hadoop.hdfs.TestAclsEndToEnd Created HDFS-11944 hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Passing locally hadoop.hdfs.server.namenode.TestDecommissioningStatus Passing locally hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 Passing locally hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy Passing locally hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy Passing locally Also none of the findbugs warnings are related to my patch. Xiao Chen Daryn Sharp please review the latest patch.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 21s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        0 mvndep 0m 22s Maven dependency ordering for branch
        +1 mvninstall 14m 13s trunk passed
        +1 compile 21m 2s trunk passed
        +1 checkstyle 2m 20s trunk passed
        +1 mvnsite 2m 36s trunk passed
        -1 findbugs 1m 45s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings.
        +1 javadoc 2m 8s trunk passed
        0 mvndep 0m 19s Maven dependency ordering for patch
        +1 mvninstall 1m 58s the patch passed
        +1 compile 16m 22s the patch passed
        +1 javac 16m 22s the patch passed
        -0 checkstyle 2m 21s root: The patch generated 7 new + 27 unchanged - 0 fixed = 34 total (was 27)
        +1 mvnsite 2m 38s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 4m 36s the patch passed
        +1 javadoc 2m 2s the patch passed
        +1 unit 9m 16s hadoop-common in the patch passed.
        -1 unit 98m 41s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 36s The patch does not generate ASF License warnings.
        187m 46s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.balancer.TestBalancer
          hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
          hadoop.hdfs.TestAclsEndToEnd



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-11804
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12871836/HDFS-11804-trunk-3.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 87ee4abaacb5 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 5ec7163
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19823/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19823/artifact/patchprocess/diff-checkstyle-root.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19823/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19823/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19823/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 21s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 22s Maven dependency ordering for branch +1 mvninstall 14m 13s trunk passed +1 compile 21m 2s trunk passed +1 checkstyle 2m 20s trunk passed +1 mvnsite 2m 36s trunk passed -1 findbugs 1m 45s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings. +1 javadoc 2m 8s trunk passed 0 mvndep 0m 19s Maven dependency ordering for patch +1 mvninstall 1m 58s the patch passed +1 compile 16m 22s the patch passed +1 javac 16m 22s the patch passed -0 checkstyle 2m 21s root: The patch generated 7 new + 27 unchanged - 0 fixed = 34 total (was 27) +1 mvnsite 2m 38s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 4m 36s the patch passed +1 javadoc 2m 2s the patch passed +1 unit 9m 16s hadoop-common in the patch passed. -1 unit 98m 41s hadoop-hdfs in the patch failed. +1 asflicense 0m 36s The patch does not generate ASF License warnings. 187m 46s Reason Tests Failed junit tests hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.TestAclsEndToEnd Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11804 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12871836/HDFS-11804-trunk-3.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 87ee4abaacb5 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 5ec7163 Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19823/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19823/artifact/patchprocess/diff-checkstyle-root.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19823/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19823/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19823/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        daryn Daryn Sharp added a comment -

        General issues:

        • Attempts are not actually attempts, but retries (attempts + 1). See below, only noticed due to test.
        • Suggest incrementing numFailovers in the for loop statement, instead of a conditional in the middle of the loop, to be a little more clear about what's happening.
        • The catch ACE should rethrow instead of "break"-ing since the latter causes it to log a misleading line that it tried all the providers when it actually didn't.
        • If shouldRetry were to throw an IOE, it will erroneously be re-wrapped in another IOE.
        • The sleep condition is numFailovers >= providers.length. I think it makes more sense as (numFailovers % providers.length) == 0 to sleep only between sweeps of the kms cluster.

        Test issues:

        • testClientRetriesWithAccessControlException doesn't appear to test what it claims to do. The 1st provider throws IOE, 2nd throws ACE. The test doesn't verify the ACE stopped the retries. It tests that the IOE did.
        • LoadBalancingKMSClientProvider.KMS_FAILOVER_MAX_ATTEMPTS_KEY is really acting like max retries. testClientRetriesSpecifiedNumberOfTimes shows that 10 attempts != 10.
        Show
        daryn Daryn Sharp added a comment - General issues: Attempts are not actually attempts, but retries (attempts + 1). See below, only noticed due to test. Suggest incrementing numFailovers in the for loop statement, instead of a conditional in the middle of the loop, to be a little more clear about what's happening. The catch ACE should rethrow instead of "break"-ing since the latter causes it to log a misleading line that it tried all the providers when it actually didn't. If shouldRetry were to throw an IOE, it will erroneously be re-wrapped in another IOE. The sleep condition is numFailovers >= providers.length . I think it makes more sense as (numFailovers % providers.length) == 0 to sleep only between sweeps of the kms cluster. Test issues: testClientRetriesWithAccessControlException doesn't appear to test what it claims to do. The 1st provider throws IOE, 2nd throws ACE. The test doesn't verify the ACE stopped the retries. It tests that the IOE did. LoadBalancingKMSClientProvider.KMS_FAILOVER_MAX_ATTEMPTS_KEY is really acting like max retries. testClientRetriesSpecifiedNumberOfTimes shows that 10 attempts != 10.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Cancelling the patch to address Daryn Sharp's comments.

        Show
        shahrs87 Rushabh S Shah added a comment - Cancelling the patch to address Daryn Sharp 's comments.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Uploading a new patch addressing all the comments and correcting some assert messages.
        Daryn Sharp please review.

        Show
        shahrs87 Rushabh S Shah added a comment - Uploading a new patch addressing all the comments and correcting some assert messages. Daryn Sharp please review.
        Hide
        xiaochen Xiao Chen added a comment -

        Thanks Rushabh S Shah for revving.

        I have some comments below:

        • Not introduced by this, but the first parameter URI providerUri is not used in KMSCP#createProvider.
        • Config names should follow existing pattern: s/kms.client/hadoop.security.kms.client/g
        • core-default.xml needs to be updated with the new configs
        • LBKMSCP, can we bring throw new IOException("No providers configured !"); forward, to add a check at the beginning?
              if (providers.length == 0) {
                throw new IOException("No providers configured !");
               }
          
        • The original exception message above is double-exclamation marked.
        • Regarding AuthenticatedException, I think below is 1 possible way to have LBKMSCP#doOp end up catching one:
          KMSCP#createKey -> KMSCP#createConnection -> DelegationTokenAuthenticatedURL#openConnection
        Show
        xiaochen Xiao Chen added a comment - Thanks Rushabh S Shah for revving. I have some comments below: Not introduced by this, but the first parameter URI providerUri is not used in KMSCP#createProvider. Config names should follow existing pattern: s/kms.client/hadoop.security.kms.client/g core-default.xml needs to be updated with the new configs LBKMSCP, can we bring throw new IOException("No providers configured !"); forward, to add a check at the beginning? if (providers.length == 0) { throw new IOException( "No providers configured !" ); } The original exception message above is double-exclamation marked. Regarding AuthenticatedException, I think below is 1 possible way to have LBKMSCP#doOp end up catching one: KMSCP#createKey -> KMSCP#createConnection -> DelegationTokenAuthenticatedURL#openConnection
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 17s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        0 mvndep 0m 15s Maven dependency ordering for branch
        +1 mvninstall 13m 3s trunk passed
        +1 compile 13m 39s trunk passed
        +1 checkstyle 1m 54s trunk passed
        +1 mvnsite 2m 4s trunk passed
        -1 findbugs 1m 26s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings.
        +1 javadoc 1m 36s trunk passed
        0 mvndep 0m 13s Maven dependency ordering for patch
        +1 mvninstall 1m 26s the patch passed
        +1 compile 10m 4s the patch passed
        +1 javac 10m 4s the patch passed
        -0 checkstyle 1m 54s root: The patch generated 6 new + 27 unchanged - 0 fixed = 33 total (was 27)
        +1 mvnsite 2m 2s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 3m 26s the patch passed
        +1 javadoc 1m 39s the patch passed
        -1 unit 8m 3s hadoop-common in the patch failed.
        -1 unit 71m 59s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 38s The patch does not generate ASF License warnings.
        138m 51s



        Reason Tests
        Failed junit tests hadoop.net.TestDNS
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
          hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation
          hadoop.hdfs.TestAclsEndToEnd



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-11804
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872138/HDFS-11804-trunk-4.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 24a1a27d8b7f 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / a062374
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19840/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19840/artifact/patchprocess/diff-checkstyle-root.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19840/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19840/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19840/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19840/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 15s Maven dependency ordering for branch +1 mvninstall 13m 3s trunk passed +1 compile 13m 39s trunk passed +1 checkstyle 1m 54s trunk passed +1 mvnsite 2m 4s trunk passed -1 findbugs 1m 26s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings. +1 javadoc 1m 36s trunk passed 0 mvndep 0m 13s Maven dependency ordering for patch +1 mvninstall 1m 26s the patch passed +1 compile 10m 4s the patch passed +1 javac 10m 4s the patch passed -0 checkstyle 1m 54s root: The patch generated 6 new + 27 unchanged - 0 fixed = 33 total (was 27) +1 mvnsite 2m 2s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 3m 26s the patch passed +1 javadoc 1m 39s the patch passed -1 unit 8m 3s hadoop-common in the patch failed. -1 unit 71m 59s hadoop-hdfs in the patch failed. +1 asflicense 0m 38s The patch does not generate ASF License warnings. 138m 51s Reason Tests Failed junit tests hadoop.net.TestDNS   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation   hadoop.hdfs.TestAclsEndToEnd Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11804 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872138/HDFS-11804-trunk-4.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 24a1a27d8b7f 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / a062374 Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19840/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19840/artifact/patchprocess/diff-checkstyle-root.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19840/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19840/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19840/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19840/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Thanks Xiao Chen for the review.

        Not introduced by this, but the first parameter URI providerUri is not used in KMSCP#createProvider.

        Fixed in the latest patch.

        Config names should follow existing pattern: s/kms.client/hadoop.security.kms.client/g

        Fixed in the latest patch.

        core-default.xml needs to be updated with the new configs

        Added in the latest patch.

        LBKMSCP, can we bring throw new IOException("No providers configured !"); forward, to add a check at the beginning?

        Added in the latest patch.

        Regarding AuthenticatedException, I think below is 1 possible way to have LBKMSCP#doOp end up catching one:
        KMSCP#createKey -> KMSCP#createConnection -> DelegationTokenAuthenticatedURL#openConnection

        For this particular case, FailoverOnNetworkExceptionRetry retry policy will do the right thing.

        FailoverOnNetworkExceptionRetry.java
        // Some comments here
          public RetryAction shouldRetry(Exception e, int retries,
                int failovers, boolean isIdempotentOrAtMostOnce) throws Exception {
        ...
        ...
        else if (e instanceof SocketException
                  || (e instanceof IOException && !(e instanceof RemoteException))) {
                if (isIdempotentOrAtMostOnce) {
                  return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY,
                      getFailoverOrRetrySleepTime(retries));
                } else {
                  return new RetryAction(RetryAction.RetryDecision.FAIL, 0,
                      "the invoked method is not idempotent, and unable to determine "
                          + "whether it was invoked");
                }
        ...
        ...
        }
        

        Since we are always passing isIdempotentOrAtMostOnce flag as false, the retryPolicy will fail.

        There are many places in KMSClientProvider in which all the Exception are getting converted to IOException.
        I don't understand the reasoning behind that.
        If we want to fix it, we can open a new jira as that change is outside the scope of this jira.
        Please review the latest patch.

        Show
        shahrs87 Rushabh S Shah added a comment - Thanks Xiao Chen for the review. Not introduced by this, but the first parameter URI providerUri is not used in KMSCP#createProvider. Fixed in the latest patch. Config names should follow existing pattern: s/kms.client/hadoop.security.kms.client/g Fixed in the latest patch. core-default.xml needs to be updated with the new configs Added in the latest patch. LBKMSCP, can we bring throw new IOException("No providers configured !"); forward, to add a check at the beginning? Added in the latest patch. Regarding AuthenticatedException, I think below is 1 possible way to have LBKMSCP#doOp end up catching one: KMSCP#createKey -> KMSCP#createConnection -> DelegationTokenAuthenticatedURL#openConnection For this particular case, FailoverOnNetworkExceptionRetry retry policy will do the right thing. FailoverOnNetworkExceptionRetry.java // Some comments here public RetryAction shouldRetry(Exception e, int retries, int failovers, boolean isIdempotentOrAtMostOnce) throws Exception { ... ... else if (e instanceof SocketException || (e instanceof IOException && !(e instanceof RemoteException))) { if (isIdempotentOrAtMostOnce) { return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY, getFailoverOrRetrySleepTime(retries)); } else { return new RetryAction(RetryAction.RetryDecision.FAIL, 0, "the invoked method is not idempotent, and unable to determine " + "whether it was invoked" ); } ... ... } Since we are always passing isIdempotentOrAtMostOnce flag as false, the retryPolicy will fail. There are many places in KMSClientProvider in which all the Exception are getting converted to IOException. I don't understand the reasoning behind that. If we want to fix it, we can open a new jira as that change is outside the scope of this jira. Please review the latest patch.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 25s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        0 mvndep 1m 29s Maven dependency ordering for branch
        +1 mvninstall 12m 46s trunk passed
        +1 compile 13m 30s trunk passed
        +1 checkstyle 1m 38s trunk passed
        +1 mvnsite 1m 49s trunk passed
        -1 findbugs 1m 18s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings.
        +1 javadoc 1m 26s trunk passed
        0 mvndep 0m 15s Maven dependency ordering for patch
        +1 mvninstall 1m 23s the patch passed
        +1 compile 9m 56s the patch passed
        +1 javac 9m 56s the patch passed
        -0 checkstyle 1m 47s root: The patch generated 6 new + 27 unchanged - 0 fixed = 33 total (was 27)
        +1 mvnsite 1m 57s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 xml 0m 1s The patch has no ill-formed XML file.
        +1 findbugs 3m 14s the patch passed
        +1 javadoc 1m 41s the patch passed
        -1 unit 7m 13s hadoop-common in the patch failed.
        -1 unit 68m 48s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 26s The patch does not generate ASF License warnings.
        133m 57s



        Reason Tests
        Failed junit tests hadoop.conf.TestCommonConfigurationFields
          hadoop.crypto.key.kms.TestLoadBalancingKMSClientProvider
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080
          hadoop.hdfs.TestAclsEndToEnd
          hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy
          hadoop.hdfs.web.TestWebHdfsTimeouts



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-11804
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872309/HDFS-11804-trunk-5.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
        uname Linux 8956b53b7214 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 99634d1
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19853/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19853/artifact/patchprocess/diff-checkstyle-root.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19853/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19853/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19853/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19853/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 25s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 1m 29s Maven dependency ordering for branch +1 mvninstall 12m 46s trunk passed +1 compile 13m 30s trunk passed +1 checkstyle 1m 38s trunk passed +1 mvnsite 1m 49s trunk passed -1 findbugs 1m 18s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings. +1 javadoc 1m 26s trunk passed 0 mvndep 0m 15s Maven dependency ordering for patch +1 mvninstall 1m 23s the patch passed +1 compile 9m 56s the patch passed +1 javac 9m 56s the patch passed -0 checkstyle 1m 47s root: The patch generated 6 new + 27 unchanged - 0 fixed = 33 total (was 27) +1 mvnsite 1m 57s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 3m 14s the patch passed +1 javadoc 1m 41s the patch passed -1 unit 7m 13s hadoop-common in the patch failed. -1 unit 68m 48s hadoop-hdfs in the patch failed. +1 asflicense 0m 26s The patch does not generate ASF License warnings. 133m 57s Reason Tests Failed junit tests hadoop.conf.TestCommonConfigurationFields   hadoop.crypto.key.kms.TestLoadBalancingKMSClientProvider   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080   hadoop.hdfs.TestAclsEndToEnd   hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy   hadoop.hdfs.web.TestWebHdfsTimeouts Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11804 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872309/HDFS-11804-trunk-5.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 8956b53b7214 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 99634d1 Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19853/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19853/artifact/patchprocess/diff-checkstyle-root.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19853/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19853/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19853/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19853/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        jzhuge John Zhuge added a comment -

        Rushabh S Shah Should this JIRA be moved to project "Hadoop Common" since KMS belongs to common?

        Show
        jzhuge John Zhuge added a comment - Rushabh S Shah Should this JIRA be moved to project "Hadoop Common" since KMS belongs to common?
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Fixed 2 test failures introduced by patch 5. TestLoadBalancingKMSClientProvider and TestCommonConfigurationFields
        Remove hadoop.security.kms.client.failover.max.retries from core-default.xml because the default value is supposed to be the num of providers to preserve original behavior.
        Adding the value in core-default.xml will change that behavior since it will always override that value.

        Show
        shahrs87 Rushabh S Shah added a comment - Fixed 2 test failures introduced by patch 5. TestLoadBalancingKMSClientProvider and TestCommonConfigurationFields Remove hadoop.security.kms.client.failover.max.retries from core-default.xml because the default value is supposed to be the num of providers to preserve original behavior. Adding the value in core-default.xml will change that behavior since it will always override that value.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Should this JIRA be moved to project "Hadoop Common" since KMS belongs to common?

        Moved to common. Thanks John Zhuge.

        Show
        shahrs87 Rushabh S Shah added a comment - Should this JIRA be moved to project "Hadoop Common" since KMS belongs to common? Moved to common. Thanks John Zhuge .
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 16s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        0 mvndep 0m 15s Maven dependency ordering for branch
        +1 mvninstall 13m 4s trunk passed
        -1 compile 6m 49s root in trunk failed.
        +1 checkstyle 1m 48s trunk passed
        +1 mvnsite 1m 53s trunk passed
        -1 findbugs 1m 18s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings.
        +1 javadoc 1m 29s trunk passed
        0 mvndep 0m 17s Maven dependency ordering for patch
        +1 mvninstall 1m 31s the patch passed
        +1 compile 13m 4s the patch passed
        -1 javac 13m 4s root generated 501 new + 286 unchanged - 0 fixed = 787 total (was 286)
        -0 checkstyle 1m 58s root: The patch generated 2 new + 137 unchanged - 0 fixed = 139 total (was 137)
        +1 mvnsite 2m 9s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 xml 0m 1s The patch has no ill-formed XML file.
        +1 findbugs 3m 32s the patch passed
        +1 javadoc 1m 46s the patch passed
        +1 unit 7m 59s hadoop-common in the patch passed.
        -1 unit 64m 10s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 41s The patch does not generate ASF License warnings.
        147m 18s



        Reason Tests
        Failed junit tests hadoop.hdfs.TestAclsEndToEnd
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
          hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy
          hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HADOOP-14521
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872745/HDFS-11804-trunk-6.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
        uname Linux b42f19aae496 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 86368cc
        Default Java 1.8.0_131
        compile https://builds.apache.org/job/PreCommit-HADOOP-Build/12521/artifact/patchprocess/branch-compile-root.txt
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12521/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
        javac https://builds.apache.org/job/PreCommit-HADOOP-Build/12521/artifact/patchprocess/diff-compile-javac-root.txt
        checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12521/artifact/patchprocess/diff-checkstyle-root.txt
        unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12521/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12521/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12521/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 15s Maven dependency ordering for branch +1 mvninstall 13m 4s trunk passed -1 compile 6m 49s root in trunk failed. +1 checkstyle 1m 48s trunk passed +1 mvnsite 1m 53s trunk passed -1 findbugs 1m 18s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings. +1 javadoc 1m 29s trunk passed 0 mvndep 0m 17s Maven dependency ordering for patch +1 mvninstall 1m 31s the patch passed +1 compile 13m 4s the patch passed -1 javac 13m 4s root generated 501 new + 286 unchanged - 0 fixed = 787 total (was 286) -0 checkstyle 1m 58s root: The patch generated 2 new + 137 unchanged - 0 fixed = 139 total (was 137) +1 mvnsite 2m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 3m 32s the patch passed +1 javadoc 1m 46s the patch passed +1 unit 7m 59s hadoop-common in the patch passed. -1 unit 64m 10s hadoop-hdfs in the patch failed. +1 asflicense 0m 41s The patch does not generate ASF License warnings. 147m 18s Reason Tests Failed junit tests hadoop.hdfs.TestAclsEndToEnd   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy   hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14521 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872745/HDFS-11804-trunk-6.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux b42f19aae496 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 86368cc Default Java 1.8.0_131 compile https://builds.apache.org/job/PreCommit-HADOOP-Build/12521/artifact/patchprocess/branch-compile-root.txt findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12521/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html javac https://builds.apache.org/job/PreCommit-HADOOP-Build/12521/artifact/patchprocess/diff-compile-javac-root.txt checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12521/artifact/patchprocess/diff-checkstyle-root.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12521/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12521/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12521/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        daryn Daryn Sharp added a comment -

        I'd revert the change that breaks instead of returning. Otherwise this final condition: (ret == null && ex != null) can be faulty. If the method was successfully invoked and it returned null, and it was a retry, the condition will erroneously throw the pre-retry exception.

        Show
        daryn Daryn Sharp added a comment - I'd revert the change that breaks instead of returning. Otherwise this final condition: (ret == null && ex != null) can be faulty. If the method was successfully invoked and it returned null, and it was a retry, the condition will erroneously throw the pre-retry exception.
        Hide
        xiaochen Xiao Chen added a comment -

        Thanks for revving Rushabh. Looks pretty close to me.

        Thanks for the explanations about AuthenticatedException, makes sense. I don't know about why the IOExceptions are wrapped either (except for guessing they're tricks with the method signature). Agree it's not related to this jira.

        Patch 6 comments below:
        For the retries, do you think maybe it's more intuitive to specify num of failovers for the whole provider set? (i.e. maxFailovers = conf.getInt(KMS_CLIENT_FAILOVER_MAX_RETRIES_KEY) * providers.length. (Then we can have the core-default.xml set to 1) Will leave the decision to you and Daryn.

        Nits:

        • KMSCP: the first @Link seems redundant in This will always create a @Link ...
        • Good to have some validations on the config params (e.g. numRetries > 0 etc.)
        Show
        xiaochen Xiao Chen added a comment - Thanks for revving Rushabh. Looks pretty close to me. Thanks for the explanations about AuthenticatedException , makes sense. I don't know about why the IOExceptions are wrapped either (except for guessing they're tricks with the method signature). Agree it's not related to this jira. Patch 6 comments below: For the retries, do you think maybe it's more intuitive to specify num of failovers for the whole provider set? (i.e. maxFailovers = conf.getInt(KMS_CLIENT_FAILOVER_MAX_RETRIES_KEY) * providers.length . (Then we can have the core-default.xml set to 1) Will leave the decision to you and Daryn. Nits: KMSCP: the first @Link seems redundant in This will always create a @Link ... Good to have some validations on the config params (e.g. numRetries > 0 etc.)
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Thanks Daryn Sharp Xiao Chen for your valuable reviews.
        Canceling the patch to address your comments.

        Show
        shahrs87 Rushabh S Shah added a comment - Thanks Daryn Sharp Xiao Chen for your valuable reviews. Canceling the patch to address your comments.
        Hide
        shahrs87 Rushabh S Shah added a comment - - edited

        For the retries, do you think maybe it's more intuitive to specify num of failovers for the whole provider set?

        From this comment, point 1, I think Daryn Sharp had a good reason explaining why have fix number of retries is a good idea. I kind of agreed with him. So keeping the fix number of retries even in patch #7.

        KMSCP: the first @Link seems redundant

        Fixed in v7

        Good to have some validations on the config params (e.g. numRetries > 0 etc.)

        Added few pre conditions checks for all the config params.

        I'd revert the change that breaks instead of returning.

        Thanks a lot for catching that. Fixed that in v7 of the patch.

        Show
        shahrs87 Rushabh S Shah added a comment - - edited For the retries, do you think maybe it's more intuitive to specify num of failovers for the whole provider set? From this comment, point 1 , I think Daryn Sharp had a good reason explaining why have fix number of retries is a good idea. I kind of agreed with him. So keeping the fix number of retries even in patch #7. KMSCP: the first @Link seems redundant Fixed in v7 Good to have some validations on the config params (e.g. numRetries > 0 etc.) Added few pre conditions checks for all the config params. I'd revert the change that breaks instead of returning. Thanks a lot for catching that. Fixed that in v7 of the patch.
        Hide
        xiaochen Xiao Chen added a comment -

        Thanks for sticking with this Rushabh.

        From this comment, point 1,

        Right... sorry I missed that one.

        But what confuses me is that, what if maxNumRetries < providers.length ? If someone configures it to be 2, and we have 3 providers, the call will fail and print a message:

            LOG.warn("Aborting since the Request has failed with all KMS"
                      + " providers in the group or the exception is not recoverable.");
        

        which isn't entirely accurate, because here it did not fail with all providers.

        We can either update the message, or update the maxNumRetries to providers.length to make sure we try at least once on all providers. Either way feels okay to me. Slightly prefer the latter since by assumption all providers should work and we don't have retry delay for the whole sweep.

        Besides that, I'm +1 on patch 7.

        Show
        xiaochen Xiao Chen added a comment - Thanks for sticking with this Rushabh. From this comment, point 1, Right... sorry I missed that one. But what confuses me is that, what if maxNumRetries < providers.length ? If someone configures it to be 2, and we have 3 providers, the call will fail and print a message: LOG.warn( "Aborting since the Request has failed with all KMS" + " providers in the group or the exception is not recoverable." ); which isn't entirely accurate, because here it did not fail with all providers. We can either update the message, or update the maxNumRetries to providers.length to make sure we try at least once on all providers. Either way feels okay to me. Slightly prefer the latter since by assumption all providers should work and we don't have retry delay for the whole sweep. Besides that, I'm +1 on patch 7.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 18s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        0 mvndep 0m 18s Maven dependency ordering for branch
        +1 mvninstall 15m 42s trunk passed
        +1 compile 13m 21s trunk passed
        +1 checkstyle 1m 54s trunk passed
        +1 mvnsite 2m 3s trunk passed
        -1 findbugs 1m 24s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings.
        +1 javadoc 1m 39s trunk passed
        0 mvndep 0m 16s Maven dependency ordering for patch
        +1 mvninstall 1m 31s the patch passed
        +1 compile 10m 10s the patch passed
        +1 javac 10m 10s the patch passed
        +1 checkstyle 1m 54s the patch passed
        +1 mvnsite 2m 7s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 xml 0m 1s The patch has no ill-formed XML file.
        +1 findbugs 3m 32s the patch passed
        +1 javadoc 1m 46s the patch passed
        +1 unit 8m 4s hadoop-common in the patch passed.
        -1 unit 63m 29s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 42s The patch does not generate ASF License warnings.
        153m 47s



        Reason Tests
        Failed junit tests hadoop.hdfs.TestAclsEndToEnd
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080
          hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HADOOP-14521
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872794/HDFS-11804-trunk-7.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
        uname Linux da44da7a9e61 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / bec79ca
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12525/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
        unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12525/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12525/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12525/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 18s Maven dependency ordering for branch +1 mvninstall 15m 42s trunk passed +1 compile 13m 21s trunk passed +1 checkstyle 1m 54s trunk passed +1 mvnsite 2m 3s trunk passed -1 findbugs 1m 24s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings. +1 javadoc 1m 39s trunk passed 0 mvndep 0m 16s Maven dependency ordering for patch +1 mvninstall 1m 31s the patch passed +1 compile 10m 10s the patch passed +1 javac 10m 10s the patch passed +1 checkstyle 1m 54s the patch passed +1 mvnsite 2m 7s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 3m 32s the patch passed +1 javadoc 1m 46s the patch passed +1 unit 8m 4s hadoop-common in the patch passed. -1 unit 63m 29s hadoop-hdfs in the patch failed. +1 asflicense 0m 42s The patch does not generate ASF License warnings. 153m 47s Reason Tests Failed junit tests hadoop.hdfs.TestAclsEndToEnd   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080   hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14521 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872794/HDFS-11804-trunk-7.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux da44da7a9e61 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / bec79ca Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12525/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12525/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12525/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12525/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        shahrs87 Rushabh S Shah added a comment - - edited

        We can either update the message, or update the maxNumRetries to providers.length to make sure we try at least once on all providers. Either way feels okay to me.

        I wouldn't worry much about the log message not being entirely accurate.
        If the user is making the number of retries less than the number of providers then he/she knows the problems associated with it.

        Slightly prefer the latter since by assumption all providers should work and we don't have retry delay for the whole sweep.

        Actually I don't prefer overriding the config value which user intentionally overrode it.
        If you strongly feel about updating the log message, then I can change the log message in the next revision and will add the numRetries and providers length. Let me know you thoughts.
        Xiao Chen: As always thanks a lot for your time reviewing all the updated patches.

        Show
        shahrs87 Rushabh S Shah added a comment - - edited We can either update the message, or update the maxNumRetries to providers.length to make sure we try at least once on all providers. Either way feels okay to me. I wouldn't worry much about the log message not being entirely accurate. If the user is making the number of retries less than the number of providers then he/she knows the problems associated with it. Slightly prefer the latter since by assumption all providers should work and we don't have retry delay for the whole sweep. Actually I don't prefer overriding the config value which user intentionally overrode it. If you strongly feel about updating the log message, then I can change the log message in the next revision and will add the numRetries and providers length. Let me know you thoughts. Xiao Chen : As always thanks a lot for your time reviewing all the updated patches.
        Hide
        xiaochen Xiao Chen added a comment -

        I feel printing the information in the log message will at least help people debug. So please update the log message in the next rev if you agree.

        +1 from me once that's addressed. Thanks for the work here Rushabh S Shah and Daryn.

        Show
        xiaochen Xiao Chen added a comment - I feel printing the information in the log message will at least help people debug. So please update the log message in the next rev if you agree. +1 from me once that's addressed. Thanks for the work here Rushabh S Shah and Daryn.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Xiao Chen: thanks for the review.
        Attaching a patch with more informative log message.
        Below is the diff between patch#7 and patch#8 for quick reference.

        diff ~/patches/jira/HDFS-11804-trunk-7.patch ~/patches/jira/HDFS-11804-trunk-8.patch
        103c103
        < @@ -79,24 +87,73 @@ public LoadBalancingKMSClientProvider(KMSClientProvider[] providers,
        ---
        > @@ -79,24 +87,79 @@ public LoadBalancingKMSClientProvider(KMSClientProvider[] providers,
        167c167,173
        < +              + " providers in the group or the exception is not recoverable.");
        ---
        > +              + " providers(depending on {}={} setting and numProviders={})"
        > +              + " in the group OR the exception is not recoverable",
        > +              CommonConfigurationKeysPublic.KMS_CLIENT_FAILOVER_MAX_RETRIES_KEY
        > +              , getConf().getInt(
        > +                  CommonConfigurationKeysPublic.
        > +                  KMS_CLIENT_FAILOVER_MAX_RETRIES_KEY, providers.length),
        > +              providers.length);
        
        Show
        shahrs87 Rushabh S Shah added a comment - Xiao Chen : thanks for the review. Attaching a patch with more informative log message. Below is the diff between patch#7 and patch#8 for quick reference. diff ~/patches/jira/HDFS-11804-trunk-7.patch ~/patches/jira/HDFS-11804-trunk-8.patch 103c103 < @@ -79,24 +87,73 @@ public LoadBalancingKMSClientProvider(KMSClientProvider[] providers, --- > @@ -79,24 +87,79 @@ public LoadBalancingKMSClientProvider(KMSClientProvider[] providers, 167c167,173 < + + " providers in the group or the exception is not recoverable."); --- > + + " providers(depending on {}={} setting and numProviders={})" > + + " in the group OR the exception is not recoverable", > + CommonConfigurationKeysPublic.KMS_CLIENT_FAILOVER_MAX_RETRIES_KEY > + , getConf().getInt( > + CommonConfigurationKeysPublic. > + KMS_CLIENT_FAILOVER_MAX_RETRIES_KEY, providers.length), > + providers.length);
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 15s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        0 mvndep 1m 25s Maven dependency ordering for branch
        +1 mvninstall 13m 26s trunk passed
        +1 compile 13m 46s trunk passed
        +1 checkstyle 1m 57s trunk passed
        +1 mvnsite 2m 3s trunk passed
        -1 findbugs 1m 25s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings.
        +1 javadoc 1m 46s trunk passed
        0 mvndep 0m 18s Maven dependency ordering for patch
        +1 mvninstall 1m 40s the patch passed
        +1 compile 11m 32s the patch passed
        +1 javac 11m 32s the patch passed
        -0 checkstyle 2m 1s root: The patch generated 1 new + 137 unchanged - 0 fixed = 138 total (was 137)
        +1 mvnsite 2m 14s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 xml 0m 1s The patch has no ill-formed XML file.
        +1 findbugs 3m 43s the patch passed
        +1 javadoc 1m 44s the patch passed
        -1 unit 7m 58s hadoop-common in the patch failed.
        -1 unit 65m 38s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 55s The patch does not generate ASF License warnings.
        157m 15s



        Reason Tests
        Failed junit tests hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem
          hadoop.security.TestRaceWhenRelogin
          hadoop.hdfs.TestAclsEndToEnd
          hadoop.hdfs.server.balancer.TestBalancer
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HADOOP-14521
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872983/HDFS-11804-trunk-8.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
        uname Linux 84d0022b7cc6 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 999c8fc
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12531/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
        checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12531/artifact/patchprocess/diff-checkstyle-root.txt
        unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12531/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
        unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12531/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12531/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12531/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 1m 25s Maven dependency ordering for branch +1 mvninstall 13m 26s trunk passed +1 compile 13m 46s trunk passed +1 checkstyle 1m 57s trunk passed +1 mvnsite 2m 3s trunk passed -1 findbugs 1m 25s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings. +1 javadoc 1m 46s trunk passed 0 mvndep 0m 18s Maven dependency ordering for patch +1 mvninstall 1m 40s the patch passed +1 compile 11m 32s the patch passed +1 javac 11m 32s the patch passed -0 checkstyle 2m 1s root: The patch generated 1 new + 137 unchanged - 0 fixed = 138 total (was 137) +1 mvnsite 2m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 3m 43s the patch passed +1 javadoc 1m 44s the patch passed -1 unit 7m 58s hadoop-common in the patch failed. -1 unit 65m 38s hadoop-hdfs in the patch failed. +1 asflicense 0m 55s The patch does not generate ASF License warnings. 157m 15s Reason Tests Failed junit tests hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem   hadoop.security.TestRaceWhenRelogin   hadoop.hdfs.TestAclsEndToEnd   hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14521 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872983/HDFS-11804-trunk-8.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 84d0022b7cc6 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 999c8fc Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12531/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12531/artifact/patchprocess/diff-checkstyle-root.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12531/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12531/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12531/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12531/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Test failures are not relevant.

        Running org.apache.hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem
        Tests run: 70, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 9.571 sec - in org.apache.hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem
        Running org.apache.hadoop.security.TestRaceWhenRelogin
        Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.127 sec - in org.apache.hadoop.security.TestRaceWhenRelogin
        Results :
        Tests run: 71, Failures: 0, Errors: 0, Skipped: 1
        
        Running org.apache.hadoop.hdfs.server.balancer.TestBalancer
        Tests run: 32, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 370.577 sec - in org.apache.hadoop.hdfs.server.balancer.TestBalancer
        Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
        Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 82.397 sec - in org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
        Running org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy
        Tests run: 14, Failures: 0, Errors: 0, Skipped: 10, Time elapsed: 69.502 sec - in org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy
        
        Results :
        
        Tests run: 53, Failures: 0, Errors: 0, Skipped: 10
        

        Failure of TestAclsEndToEnd getting tracked via HDFS-11944
        Xiao Chen: Mind reviewing the latest patch.
        Hopefully the last patch.

        Show
        shahrs87 Rushabh S Shah added a comment - Test failures are not relevant. Running org.apache.hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem Tests run: 70, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 9.571 sec - in org.apache.hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem Running org.apache.hadoop.security.TestRaceWhenRelogin Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.127 sec - in org.apache.hadoop.security.TestRaceWhenRelogin Results : Tests run: 71, Failures: 0, Errors: 0, Skipped: 1 Running org.apache.hadoop.hdfs.server.balancer.TestBalancer Tests run: 32, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 370.577 sec - in org.apache.hadoop.hdfs.server.balancer.TestBalancer Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 82.397 sec - in org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting Running org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy Tests run: 14, Failures: 0, Errors: 0, Skipped: 10, Time elapsed: 69.502 sec - in org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy Results : Tests run: 53, Failures: 0, Errors: 0, Skipped: 10 Failure of TestAclsEndToEnd getting tracked via HDFS-11944 Xiao Chen : Mind reviewing the latest patch. Hopefully the last patch.
        Hide
        xiaochen Xiao Chen added a comment -

        Thanks for patiently revving Rushabh S Shah.
        I already gave a +1 pending, patch 8 LGTM. Does Daryn Sharp have any further comments?

        The checkstyle looks relevant though... Next patch is hopefully the last.

        Show
        xiaochen Xiao Chen added a comment - Thanks for patiently revving Rushabh S Shah . I already gave a +1 pending, patch 8 LGTM. Does Daryn Sharp have any further comments? The checkstyle looks relevant though... Next patch is hopefully the last.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        The checkstyle looks relevant though...

        Following is the checkstyle warning.

        ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java:149:              , getConf().getInt(:14: ',' is preceded with whitespace. [NoWhitespaceBefore]
        

        The whitespace is there to maintain indentation level. So the whitespace is irrelevant.
        Will ask Daryn Sharp to comment.

        Show
        shahrs87 Rushabh S Shah added a comment - The checkstyle looks relevant though... Following is the checkstyle warning. ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java:149: , getConf().getInt(:14: ',' is preceded with whitespace. [NoWhitespaceBefore] The whitespace is there to maintain indentation level. So the whitespace is irrelevant. Will ask Daryn Sharp to comment.
        Hide
        xiaochen Xiao Chen added a comment -

        I know this probably isn't anyone's favorite part (at least not mine), but we're supposed to fix things caught by pre-commit.

        Replacing

                      CommonConfigurationKeysPublic.KMS_CLIENT_FAILOVER_MAX_RETRIES_KEY
                      , getConf().getInt(
        

        with

                      CommonConfigurationKeysPublic.KMS_CLIENT_FAILOVER_MAX_RETRIES_KEY,
                      getConf().getInt(CommonConfigurationKeysPublic.
        

        Will fix that.

        Show
        xiaochen Xiao Chen added a comment - I know this probably isn't anyone's favorite part (at least not mine), but we're supposed to fix things caught by pre-commit. Replacing CommonConfigurationKeysPublic.KMS_CLIENT_FAILOVER_MAX_RETRIES_KEY , getConf().getInt( with CommonConfigurationKeysPublic.KMS_CLIENT_FAILOVER_MAX_RETRIES_KEY, getConf().getInt(CommonConfigurationKeysPublic. Will fix that.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Will fix that.

        By doing that, it will violate 80 characters per line constraint.

        Show
        shahrs87 Rushabh S Shah added a comment - Will fix that. By doing that, it will violate 80 characters per line constraint.
        Hide
        xiaochen Xiao Chen added a comment -

        By doing that, it will violate 80 characters per line constraint.

        Attached patch 9 just doing what I suggested. So I think we're just pending Daryn's final reviews.

        Show
        xiaochen Xiao Chen added a comment - By doing that, it will violate 80 characters per line constraint. Attached patch 9 just doing what I suggested. So I think we're just pending Daryn's final reviews.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 18s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        0 mvndep 1m 24s Maven dependency ordering for branch
        +1 mvninstall 14m 46s trunk passed
        +1 compile 17m 54s trunk passed
        +1 checkstyle 2m 2s trunk passed
        +1 mvnsite 2m 13s trunk passed
        -1 findbugs 1m 26s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings.
        +1 javadoc 1m 44s trunk passed
        0 mvndep 0m 18s Maven dependency ordering for patch
        +1 mvninstall 1m 36s the patch passed
        +1 compile 12m 4s the patch passed
        +1 javac 12m 4s the patch passed
        +1 checkstyle 2m 17s the patch passed
        +1 mvnsite 2m 49s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 xml 0m 1s The patch has no ill-formed XML file.
        +1 findbugs 3m 50s the patch passed
        +1 javadoc 1m 54s the patch passed
        +1 unit 8m 9s hadoop-common in the patch passed.
        -1 unit 100m 27s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 49s The patch does not generate ASF License warnings.
        199m 54s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.balancer.TestBalancer
          hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy
          hadoop.hdfs.TestRollingUpgrade
          hadoop.hdfs.TestAclsEndToEnd
        Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HADOOP-14521
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12873222/HADOOP-14521.09.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
        uname Linux 52ba56acf822 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / e855cc4
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12540/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
        unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12540/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12540/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12540/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 1m 24s Maven dependency ordering for branch +1 mvninstall 14m 46s trunk passed +1 compile 17m 54s trunk passed +1 checkstyle 2m 2s trunk passed +1 mvnsite 2m 13s trunk passed -1 findbugs 1m 26s hadoop-common-project/hadoop-common in trunk has 19 extant Findbugs warnings. +1 javadoc 1m 44s trunk passed 0 mvndep 0m 18s Maven dependency ordering for patch +1 mvninstall 1m 36s the patch passed +1 compile 12m 4s the patch passed +1 javac 12m 4s the patch passed +1 checkstyle 2m 17s the patch passed +1 mvnsite 2m 49s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 3m 50s the patch passed +1 javadoc 1m 54s the patch passed +1 unit 8m 9s hadoop-common in the patch passed. -1 unit 100m 27s hadoop-hdfs in the patch failed. +1 asflicense 0m 49s The patch does not generate ASF License warnings. 199m 54s Reason Tests Failed junit tests hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy   hadoop.hdfs.TestRollingUpgrade   hadoop.hdfs.TestAclsEndToEnd Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2 Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14521 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12873222/HADOOP-14521.09.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 52ba56acf822 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / e855cc4 Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12540/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12540/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12540/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12540/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Xiao Chen: thanks for v9 patch.
        Trunk patch applies well for branch-2.
        Attaching a branch-2.8 patch just for jenkins to validate.

        Show
        shahrs87 Rushabh S Shah added a comment - Xiao Chen : thanks for v9 patch. Trunk patch applies well for branch-2. Attaching a branch-2.8 patch just for jenkins to validate.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 16m 32s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        0 mvndep 0m 19s Maven dependency ordering for branch
        +1 mvninstall 8m 44s branch-2.8 passed
        +1 compile 5m 55s branch-2.8 passed with JDK v1.8.0_131
        +1 compile 7m 6s branch-2.8 passed with JDK v1.7.0_131
        +1 checkstyle 1m 7s branch-2.8 passed
        +1 mvnsite 2m 0s branch-2.8 passed
        +1 findbugs 3m 52s branch-2.8 passed
        +1 javadoc 1m 28s branch-2.8 passed with JDK v1.8.0_131
        +1 javadoc 1m 52s branch-2.8 passed with JDK v1.7.0_131
        0 mvndep 0m 17s Maven dependency ordering for patch
        +1 mvninstall 1m 29s the patch passed
        +1 compile 6m 22s the patch passed with JDK v1.8.0_131
        +1 javac 6m 22s the patch passed
        +1 compile 7m 19s the patch passed with JDK v1.7.0_131
        +1 javac 7m 19s the patch passed
        +1 checkstyle 1m 9s the patch passed
        +1 mvnsite 2m 2s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 xml 0m 1s The patch has no ill-formed XML file.
        +1 findbugs 4m 58s the patch passed
        +1 javadoc 1m 53s the patch passed with JDK v1.8.0_131
        +1 javadoc 2m 18s the patch passed with JDK v1.7.0_131
        -1 unit 21m 4s hadoop-common in the patch failed with JDK v1.7.0_131.
        -1 unit 65m 32s hadoop-hdfs in the patch failed with JDK v1.7.0_131.
        +1 asflicense 0m 30s The patch does not generate ASF License warnings.
        273m 57s



        Reason Tests
        JDK v1.8.0_131 Failed junit tests hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
          hadoop.hdfs.TestAclsEndToEnd
          hadoop.hdfs.web.TestWebHDFS
        JDK v1.7.0_131 Failed junit tests hadoop.hdfs.TestAclsEndToEnd
        JDK v1.7.0_131 Timed out junit tests org.apache.hadoop.http.TestHttpServerLifecycle



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:d946387
        JIRA Issue HADOOP-14521
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12873350/HDFS-11804-branch-2.8.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
        uname Linux 7a1870570631 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision branch-2.8 / e47ec94
        Default Java 1.7.0_131
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_131 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_131
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12552/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_131.txt
        unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12552/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_131.txt
        JDK v1.7.0_131 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12552/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12552/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 16m 32s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 19s Maven dependency ordering for branch +1 mvninstall 8m 44s branch-2.8 passed +1 compile 5m 55s branch-2.8 passed with JDK v1.8.0_131 +1 compile 7m 6s branch-2.8 passed with JDK v1.7.0_131 +1 checkstyle 1m 7s branch-2.8 passed +1 mvnsite 2m 0s branch-2.8 passed +1 findbugs 3m 52s branch-2.8 passed +1 javadoc 1m 28s branch-2.8 passed with JDK v1.8.0_131 +1 javadoc 1m 52s branch-2.8 passed with JDK v1.7.0_131 0 mvndep 0m 17s Maven dependency ordering for patch +1 mvninstall 1m 29s the patch passed +1 compile 6m 22s the patch passed with JDK v1.8.0_131 +1 javac 6m 22s the patch passed +1 compile 7m 19s the patch passed with JDK v1.7.0_131 +1 javac 7m 19s the patch passed +1 checkstyle 1m 9s the patch passed +1 mvnsite 2m 2s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 4m 58s the patch passed +1 javadoc 1m 53s the patch passed with JDK v1.8.0_131 +1 javadoc 2m 18s the patch passed with JDK v1.7.0_131 -1 unit 21m 4s hadoop-common in the patch failed with JDK v1.7.0_131. -1 unit 65m 32s hadoop-hdfs in the patch failed with JDK v1.7.0_131. +1 asflicense 0m 30s The patch does not generate ASF License warnings. 273m 57s Reason Tests JDK v1.8.0_131 Failed junit tests hadoop.hdfs.server.namenode.ha.TestBootstrapStandby   hadoop.hdfs.TestAclsEndToEnd   hadoop.hdfs.web.TestWebHDFS JDK v1.7.0_131 Failed junit tests hadoop.hdfs.TestAclsEndToEnd JDK v1.7.0_131 Timed out junit tests org.apache.hadoop.http.TestHttpServerLifecycle Subsystem Report/Notes Docker Image:yetus/hadoop:d946387 JIRA Issue HADOOP-14521 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12873350/HDFS-11804-branch-2.8.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 7a1870570631 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.8 / e47ec94 Default Java 1.7.0_131 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_131 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_131 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12552/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_131.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12552/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_131.txt JDK v1.7.0_131 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12552/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12552/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        daryn Daryn Sharp added a comment -

        +1 Assuming test failures are unrelated.

        Show
        daryn Daryn Sharp added a comment - +1 Assuming test failures are unrelated.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Xiao Chen: Thanks Daryn Sharp and Xiao for reviewing the patch.
        Xiao: Can you please commit the patch ?

        Show
        shahrs87 Rushabh S Shah added a comment - Xiao Chen : Thanks Daryn Sharp and Xiao for reviewing the patch. Xiao: Can you please commit the patch ?
        Hide
        xiaochen Xiao Chen added a comment -

        Thanks for the review Daryn.

        Rushabh S Shah,
        In general it'd be nice to confirm the failed tests are unrelated.
        TestAclsEndToEnd failure seems to be caused by this change.

        As a side note, there's a tool dev-support/bin/smart-apply-patch that can conveniently apply the latest patch given the jira number, without the hassle to manually download. So please follow wiki's patch naming convention so the branch-2.8 patch can be applied automatically (current branch-2.8 patch is recognized by the tool, so it tried to apply on trunk). NBD but would make reviewer's job easier.

        Show
        xiaochen Xiao Chen added a comment - Thanks for the review Daryn. Rushabh S Shah , In general it'd be nice to confirm the failed tests are unrelated. TestAclsEndToEnd failure seems to be caused by this change. As a side note, there's a tool dev-support/bin/smart-apply-patch that can conveniently apply the latest patch given the jira number, without the hassle to manually download. So please follow wiki 's patch naming convention so the branch-2.8 patch can be applied automatically (current branch-2.8 patch is recognized by the tool, so it tried to apply on trunk). NBD but would make reviewer's job easier.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        TestAclsEndToEnd failure seems to be caused by this change.

        This test failed in HDFS-11885 Precommit also.
        I have also created jira HDFS-11944 and will work on that soon.
        Last 2 patches were just log messages change and addressing checkstyle issues that's why I didn't run all the failed tests again.

        So please follow wiki's patch naming convention so the branch-2.8 patch can be applied automatically

        I did create the branch-2.8 patch. The last precommit build that ran was for branch-2.8 git revision – branch-2.8 / e47ec94.
        I don't fully understand your comment.

        Show
        shahrs87 Rushabh S Shah added a comment - TestAclsEndToEnd failure seems to be caused by this change. This test failed in HDFS-11885 Precommit also. I have also created jira HDFS-11944 and will work on that soon. Last 2 patches were just log messages change and addressing checkstyle issues that's why I didn't run all the failed tests again. So please follow wiki's patch naming convention so the branch-2.8 patch can be applied automatically I did create the branch-2.8 patch. The last precommit build that ran was for branch-2.8 git revision – branch-2.8 / e47ec94 . I don't fully understand your comment.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        In general it'd be nice to confirm the failed tests are unrelated.

        From this comment , I did verify all the failed test were unrelated.

        Show
        shahrs87 Rushabh S Shah added a comment - In general it'd be nice to confirm the failed tests are unrelated. From this comment , I did verify all the failed test were unrelated.
        Hide
        xiaochen Xiao Chen added a comment - - edited

        pre-commit

        I tried to ran TestAclsEndToEnd before and after applying this patch on branch-2.8. It was passing before, failing after. Could you try the same? Perhaps something odd happening on my local. (I didn't look inside except for running the tests)

        I don't fully understand your comment.

        Your patch is cool, just the tool was confused due to the naming.
        smart-apply-patch uses the wiki's naming convention to choose patch for trunk, branch-2, branch-2.8 etc. I was saying running dev-support/bin/smart-apply-patch HADOOP-14521 was trying to apply the branch-2.8 patch when running on trunk:

        $ ./dev-support/bin/smart-apply-patch HADOOP-14521
        Processing: HADOOP-14521
        HADOOP-14521 patch is being downloaded at Tue Jun 20 08:31:28 PDT 2017 from
        https://issues.apache.org/jira/secure/attachment/12873350/HDFS-11804-branch-2.8.patch -> Downloaded
        ERROR: Aborting! HADOOP-14521 cannot be verified.

        Maybe we can improve yetus on that since as you said, pre-commit ran fine. This is NBD for this jira as I said, since I can download and apply the patch manually.

        Show
        xiaochen Xiao Chen added a comment - - edited pre-commit I tried to ran TestAclsEndToEnd before and after applying this patch on branch-2.8. It was passing before, failing after. Could you try the same? Perhaps something odd happening on my local. (I didn't look inside except for running the tests) I don't fully understand your comment. Your patch is cool, just the tool was confused due to the naming. smart-apply-patch uses the wiki's naming convention to choose patch for trunk, branch-2, branch-2.8 etc. I was saying running dev-support/bin/smart-apply-patch HADOOP-14521 was trying to apply the branch-2.8 patch when running on trunk: $ ./dev-support/bin/smart-apply-patch HADOOP-14521 Processing: HADOOP-14521 HADOOP-14521 patch is being downloaded at Tue Jun 20 08:31:28 PDT 2017 from https://issues.apache.org/jira/secure/attachment/12873350/HDFS-11804-branch-2.8.patch -> Downloaded ERROR: Aborting! HADOOP-14521 cannot be verified. Maybe we can improve yetus on that since as you said, pre-commit ran fine. This is NBD for this jira as I said, since I can download and apply the patch manually.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        I tried to ran TestAclsEndToEnd before and after applying this patch on branch-2.8. It was passing before, failing after.

        HADOOP-14563 fixed the failing test issue.
        We deployed this change internally in our cluster and found out the client is not retrying on addDelegationTokens call.
        The reason is following piece of code.

        KMSClientProvider.java
        @Override
          public Token<?>[] addDelegationTokens(final String renewer,
              Credentials credentials) throws IOException {
          ....
              } catch (Exception e) {
                throw new IOException(e);  ---> Catching any Exception and throwing IOException.
              }
            }
            return tokens;
          }
        

        All the other calls checks whether the exception is IOException then rethrow IOException otherwise wrap the actual exception in IOException and rethrow that.
        Will put up a new patch including this fix.

        Show
        shahrs87 Rushabh S Shah added a comment - I tried to ran TestAclsEndToEnd before and after applying this patch on branch-2.8. It was passing before, failing after. HADOOP-14563 fixed the failing test issue. We deployed this change internally in our cluster and found out the client is not retrying on addDelegationTokens call. The reason is following piece of code. KMSClientProvider.java @Override public Token<?>[] addDelegationTokens( final String renewer, Credentials credentials) throws IOException { .... } catch (Exception e) { throw new IOException(e); ---> Catching any Exception and throwing IOException. } } return tokens; } All the other calls checks whether the exception is IOException then rethrow IOException otherwise wrap the actual exception in IOException and rethrow that. Will put up a new patch including this fix.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 12s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
              trunk Compile Tests
        0 mvndep 0m 14s Maven dependency ordering for branch
        +1 mvninstall 13m 11s trunk passed
        +1 compile 13m 24s trunk passed
        +1 checkstyle 1m 56s trunk passed
        +1 mvnsite 2m 28s trunk passed
        -1 findbugs 1m 47s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
        +1 javadoc 1m 39s trunk passed
              Patch Compile Tests
        0 mvndep 0m 16s Maven dependency ordering for patch
        +1 mvninstall 1m 32s the patch passed
        +1 compile 10m 10s the patch passed
        +1 javac 10m 10s the patch passed
        +1 checkstyle 1m 57s the patch passed
        +1 mvnsite 2m 30s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 xml 0m 1s The patch has no ill-formed XML file.
        +1 findbugs 3m 34s the patch passed
        +1 javadoc 1m 47s the patch passed
              Other Tests
        -1 unit 8m 6s hadoop-common in the patch failed.
        -1 unit 73m 44s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 41s The patch does not generate ASF License warnings.
        162m 22s



        Reason Tests
        Failed junit tests hadoop.security.TestKDiag
          hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation
          hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HADOOP-14521
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12876487/HADOOP-14521-trunk-10.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
        uname Linux eb7f682feb85 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 09653ea
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12755/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
        unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12755/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
        unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12755/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12755/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12755/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.       trunk Compile Tests 0 mvndep 0m 14s Maven dependency ordering for branch +1 mvninstall 13m 11s trunk passed +1 compile 13m 24s trunk passed +1 checkstyle 1m 56s trunk passed +1 mvnsite 2m 28s trunk passed -1 findbugs 1m 47s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 1m 39s trunk passed       Patch Compile Tests 0 mvndep 0m 16s Maven dependency ordering for patch +1 mvninstall 1m 32s the patch passed +1 compile 10m 10s the patch passed +1 javac 10m 10s the patch passed +1 checkstyle 1m 57s the patch passed +1 mvnsite 2m 30s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 3m 34s the patch passed +1 javadoc 1m 47s the patch passed       Other Tests -1 unit 8m 6s hadoop-common in the patch failed. -1 unit 73m 44s hadoop-hdfs in the patch failed. +1 asflicense 0m 41s The patch does not generate ASF License warnings. 162m 22s Reason Tests Failed junit tests hadoop.security.TestKDiag   hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation   hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14521 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12876487/HADOOP-14521-trunk-10.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux eb7f682feb85 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 09653ea Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12755/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12755/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12755/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12755/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12755/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        shahrs87 Rushabh S Shah added a comment -
        Running org.apache.hadoop.security.TestKDiag
        Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.084 sec - in org.apache.hadoop.security.TestKDiag
        Running org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation
        Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.922 sec - in org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation
        Running org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits
        Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 81.238 sec - in org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits
        

        The failed tests are not related to this patch. Passing locally.

        Findbugs warnings are not related to this patch also.

        Xiao Chen: Mind giving a final pass ?

        Show
        shahrs87 Rushabh S Shah added a comment - Running org.apache.hadoop.security.TestKDiag Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.084 sec - in org.apache.hadoop.security.TestKDiag Running org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.922 sec - in org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation Running org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 81.238 sec - in org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits The failed tests are not related to this patch. Passing locally. Findbugs warnings are not related to this patch also. Xiao Chen : Mind giving a final pass ?
        Hide
        xiaochen Xiao Chen added a comment -

        Sure, will take a look today.

        Show
        xiaochen Xiao Chen added a comment - Sure, will take a look today.
        Hide
        xiaochen Xiao Chen added a comment -

        Thanks for the patch Rushabh S Shah, +1 on patch 10. Will commit on Friday if no comments from other watchers.

        Show
        xiaochen Xiao Chen added a comment - Thanks for the patch Rushabh S Shah , +1 on patch 10. Will commit on Friday if no comments from other watchers.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        +1 on patch 10. Will commit on Friday if no comments from other watchers.

        Thanks a lot Xiao Chen for the review.
        Will update the branch-2 and branch-2.8 patch by Friday so that you can commit all at once.

        Show
        shahrs87 Rushabh S Shah added a comment - +1 on patch 10. Will commit on Friday if no comments from other watchers. Thanks a lot Xiao Chen for the review. Will update the branch-2 and branch-2.8 patch by Friday so that you can commit all at once.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Attaching branch-2.8 patch.
        trunk patch still applies to trunk.

        Show
        shahrs87 Rushabh S Shah added a comment - Attaching branch-2.8 patch. trunk patch still applies to trunk.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        -1 docker 4m 19s Docker failed to build yetus/hadoop:311d924.



        Subsystem Report/Notes
        JIRA Issue HADOOP-14521
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877319/HADOOP-14521-branch-2.8.2.patch
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12784/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 docker 4m 19s Docker failed to build yetus/hadoop:311d924. Subsystem Report/Notes JIRA Issue HADOOP-14521 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877319/HADOOP-14521-branch-2.8.2.patch Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12784/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 8m 54s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
              branch-2.8 Compile Tests
        0 mvndep 0m 19s Maven dependency ordering for branch
        +1 mvninstall 9m 13s branch-2.8 passed
        +1 compile 6m 40s branch-2.8 passed with JDK v1.8.0_131
        +1 compile 7m 25s branch-2.8 passed with JDK v1.7.0_131
        +1 checkstyle 1m 9s branch-2.8 passed
        +1 mvnsite 1m 58s branch-2.8 passed
        +1 findbugs 3m 45s branch-2.8 passed
        +1 javadoc 1m 29s branch-2.8 passed with JDK v1.8.0_131
        +1 javadoc 1m 57s branch-2.8 passed with JDK v1.7.0_131
              Patch Compile Tests
        0 mvndep 0m 19s Maven dependency ordering for patch
        +1 mvninstall 1m 33s the patch passed
        +1 compile 6m 22s the patch passed with JDK v1.8.0_131
        +1 javac 6m 22s the patch passed
        +1 compile 7m 24s the patch passed with JDK v1.7.0_131
        +1 javac 7m 24s the patch passed
        +1 checkstyle 1m 10s the patch passed
        +1 mvnsite 2m 3s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 xml 0m 0s The patch has no ill-formed XML file.
        +1 findbugs 4m 25s the patch passed
        +1 javadoc 1m 31s the patch passed with JDK v1.8.0_131
        +1 javadoc 2m 2s the patch passed with JDK v1.7.0_131
              Other Tests
        +1 unit 8m 53s hadoop-common in the patch passed with JDK v1.7.0_131.
        +1 unit 48m 56s hadoop-hdfs in the patch passed with JDK v1.7.0_131.
        +1 asflicense 0m 31s The patch does not generate ASF License warnings.
        213m 14s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:d946387
        JIRA Issue HADOOP-14521
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877332/HADOOP-14521-branch-2.8.002.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
        uname Linux b4ff7fde02ab 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision branch-2.8 / d6228fb
        Default Java 1.7.0_131
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_131 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_131
        findbugs v3.0.0
        JDK v1.7.0_131 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12785/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12785/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 8m 54s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.       branch-2.8 Compile Tests 0 mvndep 0m 19s Maven dependency ordering for branch +1 mvninstall 9m 13s branch-2.8 passed +1 compile 6m 40s branch-2.8 passed with JDK v1.8.0_131 +1 compile 7m 25s branch-2.8 passed with JDK v1.7.0_131 +1 checkstyle 1m 9s branch-2.8 passed +1 mvnsite 1m 58s branch-2.8 passed +1 findbugs 3m 45s branch-2.8 passed +1 javadoc 1m 29s branch-2.8 passed with JDK v1.8.0_131 +1 javadoc 1m 57s branch-2.8 passed with JDK v1.7.0_131       Patch Compile Tests 0 mvndep 0m 19s Maven dependency ordering for patch +1 mvninstall 1m 33s the patch passed +1 compile 6m 22s the patch passed with JDK v1.8.0_131 +1 javac 6m 22s the patch passed +1 compile 7m 24s the patch passed with JDK v1.7.0_131 +1 javac 7m 24s the patch passed +1 checkstyle 1m 10s the patch passed +1 mvnsite 2m 3s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 0s The patch has no ill-formed XML file. +1 findbugs 4m 25s the patch passed +1 javadoc 1m 31s the patch passed with JDK v1.8.0_131 +1 javadoc 2m 2s the patch passed with JDK v1.7.0_131       Other Tests +1 unit 8m 53s hadoop-common in the patch passed with JDK v1.7.0_131. +1 unit 48m 56s hadoop-hdfs in the patch passed with JDK v1.7.0_131. +1 asflicense 0m 31s The patch does not generate ASF License warnings. 213m 14s Subsystem Report/Notes Docker Image:yetus/hadoop:d946387 JIRA Issue HADOOP-14521 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877332/HADOOP-14521-branch-2.8.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux b4ff7fde02ab 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.8 / d6228fb Default Java 1.7.0_131 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_131 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_131 findbugs v3.0.0 JDK v1.7.0_131 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12785/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12785/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Xiao Chen: Pre-commit ran on branch-2.8 patch also.
        Whenever you get time, please commit the patch.

        Show
        shahrs87 Rushabh S Shah added a comment - Xiao Chen : Pre-commit ran on branch-2.8 patch also. Whenever you get time, please commit the patch.
        Hide
        xiaochen Xiao Chen added a comment -

        Committed this to trunk, branch-2 and branch-2.8.

        Thanks Rushabh S Shah for the contribution, and Daryn for the reviews!

        Show
        xiaochen Xiao Chen added a comment - Committed this to trunk, branch-2 and branch-2.8. Thanks Rushabh S Shah for the contribution, and Daryn for the reviews!
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12013 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12013/)
        HADOOP-14521. KMS client needs retry logic. Contributed by Rushabh S (xiao: rev 0a6d5c0cf1d963da9131aa12326fc576f0e92d2c)

        • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java
        • (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
        • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZonesWithKMS.java
        • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java
        • (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/kms/TestLoadBalancingKMSClientProvider.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12013 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12013/ ) HADOOP-14521 . KMS client needs retry logic. Contributed by Rushabh S (xiao: rev 0a6d5c0cf1d963da9131aa12326fc576f0e92d2c) (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZonesWithKMS.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/kms/TestLoadBalancingKMSClientProvider.java
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Thanks Xiao Chen for the reviews and commits.

        Show
        shahrs87 Rushabh S Shah added a comment - Thanks Xiao Chen for the reviews and commits.
        Hide
        xiaochen Xiao Chen added a comment -

        Unfortunately, this turns out still brought some incompatible behaviors to the KMS client, for some corner cases.

        Reason is:

        • After this, client uses RetryPolicies.failoverOnNetworkException with TryOnceThenFail.
        • Before this, whatever the exception is, client always tries on all servers.
        • While our initial judgement of no-retry for certain types of exceptions (e.g. AccessControlException etc.) makes sense, a lot of the wrapped IOExceptions are no longer retried. This includes the EOFE like HADOOP-14841, and the IOException wrapped GSSE met in HADOOP-14445.

        As a result, this fix made HADOOP-14841 and HADOOP-14445 from 'wrong-but-works' kind of issues, become 'wrong-and-breaks' kind of issues.

        Given the KMS has been mysterious and untamed at times, I'm not confident the above list is exhaustive.
        Suggest we:

        • mark this incompatible
        • either provide an addendum or do a follow-on jira, to keep existing behavior.

        Rushabh S Shah, thoughts?

        Show
        xiaochen Xiao Chen added a comment - Unfortunately, this turns out still brought some incompatible behaviors to the KMS client, for some corner cases. Reason is: After this, client uses RetryPolicies.failoverOnNetworkException with TryOnceThenFail . Before this, whatever the exception is, client always tries on all servers. While our initial judgement of no-retry for certain types of exceptions (e.g. AccessControlException etc.) makes sense, a lot of the wrapped IOExceptions are no longer retried. This includes the EOFE like HADOOP-14841 , and the IOException wrapped GSSE met in HADOOP-14445 . As a result, this fix made HADOOP-14841 and HADOOP-14445 from 'wrong-but-works' kind of issues, become 'wrong-and-breaks' kind of issues. Given the KMS has been mysterious and untamed at times, I'm not confident the above list is exhaustive. Suggest we: mark this incompatible either provide an addendum or do a follow-on jira, to keep existing behavior. Rushabh S Shah , thoughts?
        Hide
        xiaochen Xiao Chen added a comment -

        Marking this as incompatible - due to client behavior changes. See above comment for detail.

        Show
        xiaochen Xiao Chen added a comment - Marking this as incompatible - due to client behavior changes. See above comment for detail.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        As a result, this fix made HADOOP-14841 and HADOOP-14445 from 'wrong-but-works' kind of issues, become 'wrong-and-breaks' kind of issues.

        HADOOP-14445: I don't have enough cycles to work on this. I need to address backwards incompatible concerns on that one. I will have a patch in few days.
        HADOOP-14841: In my opinion, we should to try to add more instrumentation code to find out root cause instead of just blindly retrying and ignoring the bug.

        either provide an addendum or do a follow-on jira, to keep existing behavior.

        The previous behavior was just masking the bugs on the server side.
        I don't see any pressing need to go back to old behavior.

        Show
        shahrs87 Rushabh S Shah added a comment - As a result, this fix made HADOOP-14841 and HADOOP-14445 from 'wrong-but-works' kind of issues, become 'wrong-and-breaks' kind of issues. HADOOP-14445 : I don't have enough cycles to work on this. I need to address backwards incompatible concerns on that one. I will have a patch in few days. HADOOP-14841 : In my opinion, we should to try to add more instrumentation code to find out root cause instead of just blindly retrying and ignoring the bug. either provide an addendum or do a follow-on jira, to keep existing behavior. The previous behavior was just masking the bugs on the server side. I don't see any pressing need to go back to old behavior.
        Hide
        xiaochen Xiao Chen added a comment -

        Thanks for the prompt response Rushabh S Shah.

        The previous behavior was just masking the bugs on the server side.

        True, and I agree the the server-side bugs should be fixed.

        However, as noted in the last comment, in practice the existing behavior allows a client request to succeed after retry. With this patch, clients will straightly fail on the first failure. This incompatible behavior is painful for the client, and is the main reason I raise the above.

        HADOOP-14445 and HADOOP-14841 are just examples for this kind of failures. Although they are nasty bugs, my biggest concern now is not specific to any of them. Rather, it's the behavior change that made a previously working client, doesn't work anymore. This sounds pretty pressing to me.

        I can think of a few ways to keep existing behavior, but with 3.0beta and 2.8.2 coming, I'm inclined to revert this for now, and re-commit after improvement. Andrew Wang / Junping Du FYI.

        Show
        xiaochen Xiao Chen added a comment - Thanks for the prompt response Rushabh S Shah . The previous behavior was just masking the bugs on the server side. True, and I agree the the server-side bugs should be fixed. However, as noted in the last comment, in practice the existing behavior allows a client request to succeed after retry. With this patch, clients will straightly fail on the first failure. This incompatible behavior is painful for the client, and is the main reason I raise the above. HADOOP-14445 and HADOOP-14841 are just examples for this kind of failures. Although they are nasty bugs, my biggest concern now is not specific to any of them. Rather, it's the behavior change that made a previously working client, doesn't work anymore. This sounds pretty pressing to me. I can think of a few ways to keep existing behavior, but with 3.0beta and 2.8.2 coming, I'm inclined to revert this for now, and re-commit after improvement. Andrew Wang / Junping Du FYI.
        Hide
        djp Junping Du added a comment -

        Thanks Xiao Chen for notification here. I prefer to revert this from 2.8.2 but keep it in branch-2.8 and branch-2 for improvements later.

        Show
        djp Junping Du added a comment - Thanks Xiao Chen for notification here. I prefer to revert this from 2.8.2 but keep it in branch-2.8 and branch-2 for improvements later.
        Hide
        djp Junping Du added a comment -

        I have revert the patch from branch-2.8.2. Mark fix version to 2.8.3 instead.

        Show
        djp Junping Du added a comment - I have revert the patch from branch-2.8.2. Mark fix version to 2.8.3 instead.
        Hide
        xiaochen Xiao Chen added a comment -

        Thanks a lot Junping Du for the quick actions!

        Show
        xiaochen Xiao Chen added a comment - Thanks a lot Junping Du for the quick actions!
        Hide
        andrew.wang Andrew Wang added a comment -

        I reverted this from branch-3.0 as well, leaving it in trunk.

        We also need to file blocker JIRAs targeted for 2.8.3 and 3.1.0 to make sure we get any required follow-on work done. I'd also be fine with reverting this everywhere for simplicity, and committing all at once when it's ready for production.

        Show
        andrew.wang Andrew Wang added a comment - I reverted this from branch-3.0 as well, leaving it in trunk. We also need to file blocker JIRAs targeted for 2.8.3 and 3.1.0 to make sure we get any required follow-on work done. I'd also be fine with reverting this everywhere for simplicity, and committing all at once when it's ready for production.
        Hide
        xiaochen Xiao Chen added a comment -

        Thanks Andrew for reverting.

        I'll take the simple route to revert this everywhere, and re-commit after improvement. Rushabh please feel free to take a shot, I can work on the improvement patch next week if you're busy. Thanks again for the contribution so far.

        Show
        xiaochen Xiao Chen added a comment - Thanks Andrew for reverting. I'll take the simple route to revert this everywhere, and re-commit after improvement. Rushabh please feel free to take a shot, I can work on the improvement patch next week if you're busy. Thanks again for the contribution so far.
        Hide
        xiaochen Xiao Chen added a comment -

        Reverted from trunk, branch-2 and branch-2.8.

        Show
        xiaochen Xiao Chen added a comment - Reverted from trunk, branch-2 and branch-2.8.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12858 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12858/)
        Revert "HADOOP-14521. KMS client needs retry logic. Contributed by (xiao: rev fa6cc43edd3f6e886a40b90b062c9f16189c50d1)

        • (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/kms/TestLoadBalancingKMSClientProvider.java
        • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZonesWithKMS.java
        • (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
        • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java
        • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12858 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12858/ ) Revert " HADOOP-14521 . KMS client needs retry logic. Contributed by (xiao: rev fa6cc43edd3f6e886a40b90b062c9f16189c50d1) (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/kms/TestLoadBalancingKMSClientProvider.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZonesWithKMS.java (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java
        Hide
        shahrs87 Rushabh S Shah added a comment -

        I'll take the simple route to revert this everywhere, and re-commit after improvement.

        Note that there is nothing to change in this patch.
        The 2 jiras which are responsible for reverting this patch are:
        1. HADOOP-14445: This is because the initial design was flawed and conf based and we added another hack (to retry on all servers until it succeeded).
        2. HADOOP-14841: We don't understand the actual bug and wanting to add the same hack (retrying until it succeeds).
        Personally after we fix HADOOP-14445, I will suggest to put this patch in even if HADOOP-14841 is fixed or not.
        In our internal repository, we are going to commit this jira and I don't want to maintain this diff forever.
        I am sure we will uncover other bugs (because of flaws in original design) since we were retrying on all exceptions until it succeeds.

        Show
        shahrs87 Rushabh S Shah added a comment - I'll take the simple route to revert this everywhere, and re-commit after improvement. Note that there is nothing to change in this patch. The 2 jiras which are responsible for reverting this patch are: 1. HADOOP-14445 : This is because the initial design was flawed and conf based and we added another hack (to retry on all servers until it succeeded). 2. HADOOP-14841 : We don't understand the actual bug and wanting to add the same hack (retrying until it succeeds). Personally after we fix HADOOP-14445 , I will suggest to put this patch in even if HADOOP-14841 is fixed or not. In our internal repository, we are going to commit this jira and I don't want to maintain this diff forever. I am sure we will uncover other bugs (because of flaws in original design) since we were retrying on all exceptions until it succeeds.
        Hide
        xiaochen Xiao Chen added a comment -

        Sorry for my delayed response here, was on leave last week.

        Perhaps my previous comment was lost among the revert messages - to restate the TL;DR:
        This change will make the clients not retry on certain IOEs. That means if a client upgrades, what's working' for them before will no longer 'work'.
        In other words, clients who did not have to be aware of HADOOP-14445/HADOOP-14841/any undiscovered bugs in that pattern, will see them and fail.

        Any server bugs aside, I don't see a reason why we cannot keep the existing behavior, and only add more retries for the failoverOnNetworkException types of exceptions, which seems to be the main reason to have this jira. In other words, the 'change' I'm proposing to add on top of the reverted patch is to, s/TryOnceThenFail/Try(providers.length)ThenFail/g

        Rushabh S Shah are we on the same page now?

        Show
        xiaochen Xiao Chen added a comment - Sorry for my delayed response here, was on leave last week. Perhaps my previous comment was lost among the revert messages - to restate the TL;DR: This change will make the clients not retry on certain IOEs. That means if a client upgrades, what's working' for them before will no longer 'work'. In other words, clients who did not have to be aware of HADOOP-14445 / HADOOP-14841 /any undiscovered bugs in that pattern, will see them and fail. Any server bugs aside, I don't see a reason why we cannot keep the existing behavior, and only add more retries for the failoverOnNetworkException types of exceptions, which seems to be the main reason to have this jira. In other words, the 'change' I'm proposing to add on top of the reverted patch is to, s/TryOnceThenFail/Try(providers.length)ThenFail/g Rushabh S Shah are we on the same page now?
        Hide
        xiaochen Xiao Chen added a comment -

        Patch 11 to implement the idea in the comments above.

        Existing tests already verifies the behavior, so v11 only needed to change back the test modification v10 did on existing tests, and update the expected times invoked for the new tests.

        Rushabh S Shah, would you mind take a look? IMO this still benefits from your retry improvement, yet kept existing behavior.

        Show
        xiaochen Xiao Chen added a comment - Patch 11 to implement the idea in the comments above. Existing tests already verifies the behavior, so v11 only needed to change back the test modification v10 did on existing tests, and update the expected times invoked for the new tests. Rushabh S Shah , would you mind take a look? IMO this still benefits from your retry improvement, yet kept existing behavior.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 28s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
              trunk Compile Tests
        0 mvndep 0m 21s Maven dependency ordering for branch
        +1 mvninstall 16m 56s trunk passed
        +1 compile 15m 7s trunk passed
        +1 checkstyle 2m 33s trunk passed
        +1 mvnsite 2m 8s trunk passed
        +1 findbugs 3m 17s trunk passed
        +1 javadoc 1m 45s trunk passed
              Patch Compile Tests
        0 mvndep 0m 17s Maven dependency ordering for patch
        +1 mvninstall 1m 39s the patch passed
        +1 compile 11m 39s the patch passed
        +1 javac 11m 39s the patch passed
        +1 checkstyle 2m 6s the patch passed
        +1 mvnsite 2m 14s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 xml 0m 2s The patch has no ill-formed XML file.
        +1 findbugs 3m 39s the patch passed
        +1 javadoc 1m 50s the patch passed
              Other Tests
        -1 unit 8m 23s hadoop-common in the patch failed.
        -1 unit 125m 0s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 51s The patch does not generate ASF License warnings.
        226m 38s



        Reason Tests
        Failed junit tests hadoop.ha.TestZKFailoverController
          hadoop.net.TestClusterTopology
          hadoop.hdfs.TestReadStripedFileWithMissingBlocks
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:71bbb86
        JIRA Issue HADOOP-14521
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12888393/HADOOP-14521.11.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
        uname Linux 93fad5fd66ef 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / bfd1a72
        Default Java 1.8.0_144
        findbugs v3.1.0-RC1
        unit https://builds.apache.org/job/PreCommit-HADOOP-Build/13350/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
        unit https://builds.apache.org/job/PreCommit-HADOOP-Build/13350/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13350/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13350/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 28s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.       trunk Compile Tests 0 mvndep 0m 21s Maven dependency ordering for branch +1 mvninstall 16m 56s trunk passed +1 compile 15m 7s trunk passed +1 checkstyle 2m 33s trunk passed +1 mvnsite 2m 8s trunk passed +1 findbugs 3m 17s trunk passed +1 javadoc 1m 45s trunk passed       Patch Compile Tests 0 mvndep 0m 17s Maven dependency ordering for patch +1 mvninstall 1m 39s the patch passed +1 compile 11m 39s the patch passed +1 javac 11m 39s the patch passed +1 checkstyle 2m 6s the patch passed +1 mvnsite 2m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 2s The patch has no ill-formed XML file. +1 findbugs 3m 39s the patch passed +1 javadoc 1m 50s the patch passed       Other Tests -1 unit 8m 23s hadoop-common in the patch failed. -1 unit 125m 0s hadoop-hdfs in the patch failed. +1 asflicense 0m 51s The patch does not generate ASF License warnings. 226m 38s Reason Tests Failed junit tests hadoop.ha.TestZKFailoverController   hadoop.net.TestClusterTopology   hadoop.hdfs.TestReadStripedFileWithMissingBlocks   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:71bbb86 JIRA Issue HADOOP-14521 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12888393/HADOOP-14521.11.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 93fad5fd66ef 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / bfd1a72 Default Java 1.8.0_144 findbugs v3.1.0-RC1 unit https://builds.apache.org/job/PreCommit-HADOOP-Build/13350/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/13350/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13350/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13350/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        xiaochen Xiao Chen added a comment -

        Hi Rushabh S Shah,

        I understand this was missed in my initial reviews when it got committed. My apologies for overlooking it, and any frustration it might have brought when reverting this.

        But we should really fix the incompatible behavior I mentioned above, which patch 11 would address. Could you take a look?

        Show
        xiaochen Xiao Chen added a comment - Hi Rushabh S Shah , I understand this was missed in my initial reviews when it got committed. My apologies for overlooking it, and any frustration it might have brought when reverting this. But we should really fix the incompatible behavior I mentioned above, which patch 11 would address. Could you take a look?
        Hide
        asuresh Arun Suresh added a comment -

        Is this still on target for 2.9.0 ? If not, can we we push this out to the next major release ?

        Show
        asuresh Arun Suresh added a comment - Is this still on target for 2.9.0 ? If not, can we we push this out to the next major release ?
        Hide
        xiaochen Xiao Chen added a comment -

        Thanks for the ping Arun.

        I'm hoping to get this in soon. Does any watcher have cycles to review? For convenience, below is the diff of LBKMSCP between patch 11 and the previously committed patch 10.

        > --- a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java
        > +++ b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java
        109c109
        < @@ -80,24 +87,79 @@ public LoadBalancingKMSClientProvider(KMSClientProvider[] providers,
        ---
        > @@ -80,24 +87,82 @@ public LoadBalancingKMSClientProvider(KMSClientProvider[] providers,
        171c171,174
        < +        if (action.action == RetryAction.RetryDecision.FAIL) {
        ---
        > +        // make sure each provider is tried at least once, to keep behavior
        > +        // compatible with earlier versions of LBKMSCP
        > +        if (action.action == RetryAction.RetryDecision.FAIL
        > +            && numFailovers >= providers.length - 1) {
        193c196
        
        Show
        xiaochen Xiao Chen added a comment - Thanks for the ping Arun. I'm hoping to get this in soon. Does any watcher have cycles to review? For convenience, below is the diff of LBKMSCP between patch 11 and the previously committed patch 10. > --- a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java > +++ b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java 109c109 < @@ -80,24 +87,79 @@ public LoadBalancingKMSClientProvider(KMSClientProvider[] providers, --- > @@ -80,24 +87,82 @@ public LoadBalancingKMSClientProvider(KMSClientProvider[] providers, 171c171,174 < + if (action.action == RetryAction.RetryDecision.FAIL) { --- > + // make sure each provider is tried at least once, to keep behavior > + // compatible with earlier versions of LBKMSCP > + if (action.action == RetryAction.RetryDecision.FAIL > + && numFailovers >= providers.length - 1) { 193c196
        Hide
        shahrs87 Rushabh S Shah added a comment -

        thanks Xiao Chen for the latest patch.
        +1 non-binding.

        Show
        shahrs87 Rushabh S Shah added a comment - thanks Xiao Chen for the latest patch. +1 non-binding.
        Hide
        andrew.wang Andrew Wang added a comment -

        +1 based on the diff, thanks Xiao and Rushabh!

        Show
        andrew.wang Andrew Wang added a comment - +1 based on the diff, thanks Xiao and Rushabh!
        Hide
        xiaochen Xiao Chen added a comment -

        Committed to trunk, branch-3.0, branch-2, branch-2.8. There was a trivial conflict backporting to branch-2.8 (log message), compiled before committing.

        Thanks Rushabh and Andrew! Resolving this jira and removed the incompatible flag.

        Show
        xiaochen Xiao Chen added a comment - Committed to trunk, branch-3.0, branch-2, branch-2.8. There was a trivial conflict backporting to branch-2.8 (log message), compiled before committing. Thanks Rushabh and Andrew! Resolving this jira and removed the incompatible flag.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13039 (See https://builds.apache.org/job/Hadoop-trunk-Commit/13039/)
        HADOOP-14521. KMS client needs retry logic. Contributed by Rushabh S (xiao: rev 25f31d9fc47d21ac2f3afd7042e2ced1b849da39)

        • (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
        • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java
        • (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/kms/TestLoadBalancingKMSClientProvider.java
        • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZonesWithKMS.java
        • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13039 (See https://builds.apache.org/job/Hadoop-trunk-Commit/13039/ ) HADOOP-14521 . KMS client needs retry logic. Contributed by Rushabh S (xiao: rev 25f31d9fc47d21ac2f3afd7042e2ced1b849da39) (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/kms/TestLoadBalancingKMSClientProvider.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZonesWithKMS.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java

          People

          • Assignee:
            shahrs87 Rushabh S Shah
            Reporter:
            shahrs87 Rushabh S Shah
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development