Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3021

YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: security
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      ResourceManager renews delegation tokens for applications. This behavior has been changed to renew tokens only if the token's renewer is a non-empty string. MapReduce jobs can instruct ResourceManager to skip renewal of tokens obtained from certain hosts by specifying the hosts with configuration mapreduce.job.hdfs-servers.token-renewal.exclude=<host1>,<host2>,..,<hostN>.
      Show
      ResourceManager renews delegation tokens for applications. This behavior has been changed to renew tokens only if the token's renewer is a non-empty string. MapReduce jobs can instruct ResourceManager to skip renewal of tokens obtained from certain hosts by specifying the hosts with configuration mapreduce.job.hdfs-servers.token-renewal.exclude=<host1>,<host2>,..,<hostN>.

      Description

      Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters.

      Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer).

      In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately.

      We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained.

      1. YARN-3021.001.patch
        13 kB
        Yongjun Zhang
      2. YARN-3021.002.patch
        17 kB
        Yongjun Zhang
      3. YARN-3021.003.patch
        20 kB
        Yongjun Zhang
      4. YARN-3021.004.patch
        9 kB
        Yongjun Zhang
      5. YARN-3021.005.patch
        13 kB
        Yongjun Zhang
      6. YARN-3021.006.patch
        11 kB
        Yongjun Zhang
      7. YARN-3021.007.patch
        11 kB
        Yongjun Zhang
      8. YARN-3021.007.patch
        11 kB
        Yongjun Zhang
      9. YARN-3021.007.patch
        11 kB
        Yongjun Zhang
      10. YARN-3021.patch
        4 kB
        Harsh J

        Issue Links

          Activity

          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks also to Karthik Kambatla for the earlier discussions, and we worked out a release notes which I just updated.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks also to Karthik Kambatla for the earlier discussions, and we worked out a release notes which I just updated.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2116 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2116/)
          YARN-3021. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp. Contributed by Yongjun Zhang (jianhe: rev bb6dde68f19be1885a9e7f7949316a03825b6f3e)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2116 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2116/ ) YARN-3021 . YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp. Contributed by Yongjun Zhang (jianhe: rev bb6dde68f19be1885a9e7f7949316a03825b6f3e) hadoop-yarn-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks again Jian He for the reviews/suggestions and committing!

          Thanks Harsh J for diagnosing and reporting the issue, Harsh, Vinod Kumar Vavilapalli, Anubhav Dhoot for the reviews and discussions!

          Show
          yzhangal Yongjun Zhang added a comment - Thanks again Jian He for the reviews/suggestions and committing! Thanks Harsh J for diagnosing and reporting the issue, Harsh, Vinod Kumar Vavilapalli , Anubhav Dhoot for the reviews and discussions!
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #167 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/167/)
          YARN-3021. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp. Contributed by Yongjun Zhang (jianhe: rev bb6dde68f19be1885a9e7f7949316a03825b6f3e)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #167 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/167/ ) YARN-3021 . YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp. Contributed by Yongjun Zhang (jianhe: rev bb6dde68f19be1885a9e7f7949316a03825b6f3e) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #157 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/157/)
          YARN-3021. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp. Contributed by Yongjun Zhang (jianhe: rev bb6dde68f19be1885a9e7f7949316a03825b6f3e)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #157 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/157/ ) YARN-3021 . YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp. Contributed by Yongjun Zhang (jianhe: rev bb6dde68f19be1885a9e7f7949316a03825b6f3e) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java hadoop-yarn-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2098 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2098/)
          YARN-3021. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp. Contributed by Yongjun Zhang (jianhe: rev bb6dde68f19be1885a9e7f7949316a03825b6f3e)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2098 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2098/ ) YARN-3021 . YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp. Contributed by Yongjun Zhang (jianhe: rev bb6dde68f19be1885a9e7f7949316a03825b6f3e) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #900 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/900/)
          YARN-3021. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp. Contributed by Yongjun Zhang (jianhe: rev bb6dde68f19be1885a9e7f7949316a03825b6f3e)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #900 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/900/ ) YARN-3021 . YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp. Contributed by Yongjun Zhang (jianhe: rev bb6dde68f19be1885a9e7f7949316a03825b6f3e) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #166 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/166/)
          YARN-3021. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp. Contributed by Yongjun Zhang (jianhe: rev bb6dde68f19be1885a9e7f7949316a03825b6f3e)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #166 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/166/ ) YARN-3021 . YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp. Contributed by Yongjun Zhang (jianhe: rev bb6dde68f19be1885a9e7f7949316a03825b6f3e) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java hadoop-yarn-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7602 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7602/)
          YARN-3021. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp. Contributed by Yongjun Zhang (jianhe: rev bb6dde68f19be1885a9e7f7949316a03825b6f3e)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7602 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7602/ ) YARN-3021 . YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp. Contributed by Yongjun Zhang (jianhe: rev bb6dde68f19be1885a9e7f7949316a03825b6f3e) hadoop-yarn-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          Hide
          jianhe Jian He added a comment -

          Committed to trunk and branch-2, thanks Yongjun Zhang !
          Thanks all other reviewers Hrash, Anubhav, Vinod !

          Show
          jianhe Jian He added a comment - Committed to trunk and branch-2, thanks Yongjun Zhang ! Thanks all other reviewers Hrash, Anubhav, Vinod !
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks a lot Jian He!

          Show
          yzhangal Yongjun Zhang added a comment - Thanks a lot Jian He !
          Hide
          jianhe Jian He added a comment -

          Yongjun Zhang, I think the failure is not related.
          Patch looks good , +1.
          I'll commit this today if no comments from others.

          Show
          jianhe Jian He added a comment - Yongjun Zhang , I think the failure is not related. Patch looks good , +1. I'll commit this today if no comments from others.
          Hide
          yzhangal Yongjun Zhang added a comment -

          I don't see

          java.lang.AssertionError: AppAttempt state is not correct (timedout) expected:<ALLOCATED> but was:<SCHEDULED>
          	at org.junit.Assert.fail(Assert.java:88)
          

          reported in YARN-2483 here, so the failure here may be for a different reason.

          The same patch finished successfully in previous jenkins run, which indicates some flakiness of the failed test. Will throw another jenkins run.

          Show
          yzhangal Yongjun Zhang added a comment - I don't see java.lang.AssertionError: AppAttempt state is not correct (timedout) expected:<ALLOCATED> but was:<SCHEDULED> at org.junit.Assert.fail(Assert.java:88) reported in YARN-2483 here, so the failure here may be for a different reason. The same patch finished successfully in previous jenkins run, which indicates some flakiness of the failed test. Will throw another jenkins run.
          Hide
          yzhangal Yongjun Zhang added a comment -

          The test failure is likely YARN-2483.

          Show
          yzhangal Yongjun Zhang added a comment - The test failure is likely YARN-2483 .
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12725911/YARN-3021.007.patch
          against trunk revision 1fa8075.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7358//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7358//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725911/YARN-3021.007.patch against trunk revision 1fa8075. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7358//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7358//console This message is automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          HI Jian He,

          Thanks for looking at it again and sorry for late response, I was out for some time myself too.

          It turned out that the same patch 007 applies for me with today's trunk, and I uploaded it again.

          Show
          yzhangal Yongjun Zhang added a comment - HI Jian He , Thanks for looking at it again and sorry for late response, I was out for some time myself too. It turned out that the same patch 007 applies for me with today's trunk, and I uploaded it again.
          Hide
          jianhe Jian He added a comment -

          Yongjun Zhang, patch looks good to me.
          sorry, the patch went stale after YARN-3055. mind updating please?

          Show
          jianhe Jian He added a comment - Yongjun Zhang , patch looks good to me. sorry, the patch went stale after YARN-3055 . mind updating please?
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12723642/YARN-3021.007.patch
          against trunk revision 75c5454.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7237//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7237//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723642/YARN-3021.007.patch against trunk revision 75c5454. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7237//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7237//console This message is automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Upload same patch again for another test.

          Show
          yzhangal Yongjun Zhang added a comment - Upload same patch again for another test.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12723545/YARN-3021.007.patch
          against trunk revision 3fb5abf.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.mapreduce.lib.input.TestLineRecordReader
          org.apache.hadoop.mapred.TestLineRecordReader
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
          org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
          org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections
          org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
          org.apache.hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore
          org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf
          org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
          org.apache.hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector

          The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apTests
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7233//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7233//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723545/YARN-3021.007.patch against trunk revision 3fb5abf. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.lib.input.TestLineRecordReader org.apache.hadoop.mapred.TestLineRecordReader org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore org.apache.hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apTests org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7233//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7233//console This message is automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jian He,

          I uploaded rev 007 to address your latest comment. I agree that the token renewer won't be empty in that case, and if we need to modify the definition of skipTokenRenewal in the future, we can add back the check at that time.

          Would you please take a look?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jian He , I uploaded rev 007 to address your latest comment. I agree that the token renewer won't be empty in that case, and if we need to modify the definition of skipTokenRenewal in the future, we can add back the check at that time. Would you please take a look? Thanks.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jian He,

          Thanks for taking a further look. No worry about the delay, I guessed you were out.

          About your comment, the code

           private void collectDelegationTokens(final String renewer,
                                                 final Credentials credentials,
                                                 final List<Token<?>> tokens)
                                                     throws IOException {
              final String serviceName = getCanonicalServiceName();
              // Collect token of the this filesystem and then of its embedded children
              if (serviceName != null) { // fs has token, grab it
                final Text service = new Text(serviceName);
                Token<?> token = credentials.getToken(service); <============
                if (token == null) {
                  token = getDelegationToken(renewer);
                  if (token != null) {
                    tokens.add(token);
                    credentials.addToken(service, token);
                  }
                }
              }
          

          The line highlighted with "<===" indicates that a token could be retrieved from the token map. In this case, are we sure that they always have a non-empty renewer? In addition, it's possible that we might change the skipTokenRenewer method in the future to do some additional checking. Seems safer to have this check. Do you think we should just keep this checking?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jian He , Thanks for taking a further look. No worry about the delay, I guessed you were out. About your comment, the code private void collectDelegationTokens( final String renewer, final Credentials credentials, final List<Token<?>> tokens) throws IOException { final String serviceName = getCanonicalServiceName(); // Collect token of the this filesystem and then of its embedded children if (serviceName != null ) { // fs has token, grab it final Text service = new Text(serviceName); Token<?> token = credentials.getToken(service); <============ if (token == null ) { token = getDelegationToken(renewer); if (token != null ) { tokens.add(token); credentials.addToken(service, token); } } } The line highlighted with "<===" indicates that a token could be retrieved from the token map. In this case, are we sure that they always have a non-empty renewer? In addition, it's possible that we might change the skipTokenRenewer method in the future to do some additional checking. Seems safer to have this check. Do you think we should just keep this checking? Thanks.
          Hide
          jianhe Jian He added a comment -

          Yongjun Zhang, I was out last couple weeks. sorry for the late response.
          Patch looks good overall, one comment:
          the skipTokenRenewal(token) check in requestNewHdfsDelegationToken may be not needed because it's explicitly passing UserGroupInformation.getLoginUser().getUserName() as the renewer, and so the token "renewer" won't be empty.

          Show
          jianhe Jian He added a comment - Yongjun Zhang , I was out last couple weeks. sorry for the late response. Patch looks good overall, one comment: the skipTokenRenewal(token) check in requestNewHdfsDelegationToken may be not needed because it's explicitly passing UserGroupInformation.getLoginUser().getUserName() as the renewer, and so the token "renewer" won't be empty.
          Hide
          yzhangal Yongjun Zhang added a comment -

          HI Vinod Kumar Vavilapalli,

          Seems Jian He is not available. Would you please help with a review?.

          Thanks a lot.

          Show
          yzhangal Yongjun Zhang added a comment - HI Vinod Kumar Vavilapalli , Seems Jian He is not available. Would you please help with a review?. Thanks a lot.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jian He, would you please take a look at the latest patch? thanks a lot.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jian He , would you please take a look at the latest patch? thanks a lot.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Restarted my VM (the same one on which I reported the trace stack in my last update), and rerun the failed test TestCapacitySchedulerNodeLabelUpdate, and it is successful. There is some flakiness with this test but not related to this jira.

          Show
          yzhangal Yongjun Zhang added a comment - Restarted my VM (the same one on which I reported the trace stack in my last update), and rerun the failed test TestCapacitySchedulerNodeLabelUpdate, and it is successful. There is some flakiness with this test but not related to this jira.
          Hide
          yzhangal Yongjun Zhang added a comment -

          I tried to run the failed test locally and ran into a different issue, which I believe irrelevant to this jira:

          -------------------------------------------------------
           T E S T S
          -------------------------------------------------------
          Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
          Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 31.099 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
          testNodeUpdate(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate)  Time elapsed: 30.992 sec  <<< ERROR!
          java.lang.Exception: test timed out after 30000 milliseconds
          	at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
          	at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:894)
          	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1286)
          	at java.net.InetAddress.getAllByName0(InetAddress.java:1239)
          	at java.net.InetAddress.getAllByName(InetAddress.java:1155)
          	at java.net.InetAddress.getAllByName(InetAddress.java:1091)
          	at java.net.InetAddress.getByName(InetAddress.java:1041)
          	at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:563)
          	at org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.isValidNode(NodesListManager.java:147)
          	at org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(ResourceTrackerService.java:367)
          	at org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:178)
          	at org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:136)
          	at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:206)
          	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate(TestCapacitySchedulerNodeLabelUpdate.java:134)
          

          While taking time to look into, I will create a different jira for it.

          Hi Jian He, except this test failure, it seems to be clean. Would you please take a look at the latest patch? thanks a lot.

          Show
          yzhangal Yongjun Zhang added a comment - I tried to run the failed test locally and ran into a different issue, which I believe irrelevant to this jira: ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 31.099 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate testNodeUpdate(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate) Time elapsed: 30.992 sec <<< ERROR! java.lang.Exception: test timed out after 30000 milliseconds at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:894) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1286) at java.net.InetAddress.getAllByName0(InetAddress.java:1239) at java.net.InetAddress.getAllByName(InetAddress.java:1155) at java.net.InetAddress.getAllByName(InetAddress.java:1091) at java.net.InetAddress.getByName(InetAddress.java:1041) at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:563) at org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.isValidNode(NodesListManager.java:147) at org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(ResourceTrackerService.java:367) at org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:178) at org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:136) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:206) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate(TestCapacitySchedulerNodeLabelUpdate.java:134) While taking time to look into, I will create a different jira for it. Hi Jian He , except this test failure, it seems to be clean. Would you please take a look at the latest patch? thanks a lot.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12706795/YARN-3021.006.patch
          against trunk revision 2c238ae.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7085//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7085//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706795/YARN-3021.006.patch against trunk revision 2c238ae. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7085//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7085//console This message is automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          The test failure seems to be unrelated, upload same patch 06 to trigger another jenkins run.

          Show
          yzhangal Yongjun Zhang added a comment - The test failure seems to be unrelated, upload same patch 06 to trigger another jenkins run.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12706735/YARN-3021.006.patch
          against trunk revision 972f1f1.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions
          org.apache.hadoop.yarn.server.resourcemanager.TestRM
          org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7081//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7081//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706735/YARN-3021.006.patch against trunk revision 972f1f1. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7081//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7081//console This message is automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jian He,

          Thanks a lot for the clarification, I did a new rev (06) to address your latest comment, and also tested it against real clusters. Would you please take a further look? Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jian He , Thanks a lot for the clarification, I did a new rev (06) to address your latest comment, and also tested it against real clusters. Would you please take a further look? Thanks.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12705841/YARN-3021.005.patch
          against trunk revision 4e886eb.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7039//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7039//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705841/YARN-3021.005.patch against trunk revision 4e886eb. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7039//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7039//console This message is automatically generated.
          Hide
          jianhe Jian He added a comment -

          thanks for updating !
          sorry for being unclear. For the change in DelegationTokenRenewer, I think we only need to perform the check whether the renewer is empty in the for loop at line 410: if token renewer is empty, do not add to the tokenList;
          I missed one thing that dist cp may also work with webhdfs in which case the token kind won't be "HDFS_DELEGATION_TOKEN", so it should not have the HDFS_DELEGATION_TOKEN check. please disregard that comment.
          Basically, the for loop may look like something below.

              for (Token<?> token : tokens) {
                if (token.isManaged()) {
                  if (token.getKind().equals(HDFS_DELEGATION_KIND)) {
                    LOG.info(applicationId + " found existing hdfs token " + token);
                    hasHdfsToken = true;
                  }
          
                  Text renewer = ((Token<AbstractDelegationTokenIdentifier>) token).
                      decodeIdentifier().getRenewer();
                  if (renewer != null && renewer.toString().equals("")) {
                    continue;
                  }
          
                  DelegationTokenToRenew dttr = allTokens.get(token);
                  if (dttr != null) {
                    // If any of the jobs sharing the same token doesn't want to cancel
                    // the token, we should not cancel the token.
                    if (!evt.shouldCancelAtEnd) {
                      dttr.shouldCancelAtEnd = evt.shouldCancelAtEnd;
                      LOG.info("Set shouldCancelAtEnd=" + shouldCancelAtEnd
                          + " for token " + dttr.token);
                    }
                    continue;
                  }
          
                  tokenList.add(new DelegationTokenToRenew(applicationId, token,
                    getConfig(), now, shouldCancelAtEnd, evt.getUser()));
                }
              }
          
          Show
          jianhe Jian He added a comment - thanks for updating ! sorry for being unclear. For the change in DelegationTokenRenewer, I think we only need to perform the check whether the renewer is empty in the for loop at line 410: if token renewer is empty, do not add to the tokenList; I missed one thing that dist cp may also work with webhdfs in which case the token kind won't be "HDFS_DELEGATION_TOKEN", so it should not have the HDFS_DELEGATION_TOKEN check. please disregard that comment. Basically, the for loop may look like something below. for (Token<?> token : tokens) { if (token.isManaged()) { if (token.getKind().equals(HDFS_DELEGATION_KIND)) { LOG.info(applicationId + " found existing hdfs token " + token); hasHdfsToken = true ; } Text renewer = ((Token<AbstractDelegationTokenIdentifier>) token). decodeIdentifier().getRenewer(); if (renewer != null && renewer.toString().equals("")) { continue ; } DelegationTokenToRenew dttr = allTokens.get(token); if (dttr != null ) { // If any of the jobs sharing the same token doesn't want to cancel // the token, we should not cancel the token. if (!evt.shouldCancelAtEnd) { dttr.shouldCancelAtEnd = evt.shouldCancelAtEnd; LOG.info( "Set shouldCancelAtEnd=" + shouldCancelAtEnd + " for token " + dttr.token); } continue ; } tokenList.add( new DelegationTokenToRenew(applicationId, token, getConfig(), now, shouldCancelAtEnd, evt.getUser())); } }
          Hide
          yzhangal Yongjun Zhang added a comment -

          HI Jian,

          Thanks a lot for your detailed review and comments! I'm attaching rev5 to address all of them.

          • Replaced new Text("HDFS_DELEGATION_TOKEN") with predefined constant
          • About "does conf.getStrings strip off the leading or ending empty strings? if not, we may strip those off.", I followed JobSubmitter#populateTokenCache. I think it makes sense for user to not to put leading and ending empty strings.
          • Removed NON_RENEWER. But still use empty renewer string instead of null.
          • I did test rev 4 earlier, and I also tested rev5 with real clusters.

          Thanks for taking look at the new rev.

          Show
          yzhangal Yongjun Zhang added a comment - HI Jian, Thanks a lot for your detailed review and comments! I'm attaching rev5 to address all of them. Replaced new Text("HDFS_DELEGATION_TOKEN") with predefined constant About "does conf.getStrings strip off the leading or ending empty strings? if not, we may strip those off.", I followed JobSubmitter#populateTokenCache . I think it makes sense for user to not to put leading and ending empty strings. Removed NON_RENEWER. But still use empty renewer string instead of null. I did test rev 4 earlier, and I also tested rev5 with real clusters. Thanks for taking look at the new rev.
          Hide
          jianhe Jian He added a comment - - edited

          thanks Yongjun, some comments on the patch :

          • DelegationTokenRenewer: the skipTokenRenewal check should be done under the existing code if (token.getKind().equals(new Text("HDFS_DELEGATION_TOKEN"))) as below. And I think only doing this check is enough, we don't need checks in other places.
                  if (token.isManaged()) {
                    if (token.getKind().equals(new Text("HDFS_DELEGATION_TOKEN"))) {
                      LOG.info(applicationId + " found existing hdfs token " + token);
                      hasHdfsToken = true;
                      Text renewer = ((Token<AbstractDelegationTokenIdentifier>) token).
                          decodeIdentifier().getRenewer();
                      if ((renewer != null && renewer.toString()
                          .equals(Token.NON_RENEWER))) {
                        continue;
                      }
                    }
            
          • does conf.getStrings strip off the leading or ending empty strings? if not, we may strip those off.
            String [] nns =
    conf.getStrings(MRJobConfig.JOB_NAMENODES_TOKEN_RENEWAL_EXCLUDE);
            
          • given that this is a work-around fix, maybe not adding the NON_RENEWER publicly in common ? just check for null ?
          • Did you test the patch on real cluster ?
          Show
          jianhe Jian He added a comment - - edited thanks Yongjun, some comments on the patch : DelegationTokenRenewer: the skipTokenRenewal check should be done under the existing code if (token.getKind().equals(new Text("HDFS_DELEGATION_TOKEN"))) as below. And I think only doing this check is enough, we don't need checks in other places. if (token.isManaged()) { if (token.getKind().equals( new Text( "HDFS_DELEGATION_TOKEN" ))) { LOG.info(applicationId + " found existing hdfs token " + token); hasHdfsToken = true ; Text renewer = ((Token<AbstractDelegationTokenIdentifier>) token). decodeIdentifier().getRenewer(); if ((renewer != null && renewer.toString() .equals(Token.NON_RENEWER))) { continue ; } } does conf.getStrings strip off the leading or ending empty strings? if not, we may strip those off. String [] nns =
 conf.getStrings(MRJobConfig.JOB_NAMENODES_TOKEN_RENEWAL_EXCLUDE); given that this is a work-around fix, maybe not adding the NON_RENEWER publicly in common ? just check for null ? Did you test the patch on real cluster ?
          Hide
          yzhangal Yongjun Zhang added a comment -

          Running the failed test TestRM locally is successful.

          Show
          yzhangal Yongjun Zhang added a comment - Running the failed test TestRM locally is successful.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12705619/YARN-3021.004.patch
          against trunk revision 1ccbc29.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestRM

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7024//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7024//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705619/YARN-3021.004.patch against trunk revision 1ccbc29. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7024//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7024//console This message is automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          BTW, there is some weird problem with setting renewer string to null, I chose to set it to empty string instead.

          Show
          yzhangal Yongjun Zhang added a comment - BTW, there is some weird problem with setting renewer string to null, I chose to set it to empty string instead.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jian He, thanks for your help yesterday.

          Jian and all, I uploaded patch rev 004, would you please help taking a look? thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jian He , thanks for your help yesterday. Jian and all, I uploaded patch rev 004, would you please help taking a look? thanks.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jian, looking closer at what you suggested, I think I was wrong about setting the TokenRenewer object in the token to null, instead, we want to set the renewer string to null. thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jian, looking closer at what you suggested, I think I was wrong about setting the TokenRenewer object in the token to null, instead, we want to set the renewer string to null. thanks.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jian He,

          Thanks for your comment. I'm actually aligned with what you suggested.

          The problem I was trying to point out is, we will have to change the behavior of the code I pasted above to deal with null renewer. E.g., the getRenewer() method will return a non-null based on current implementation (if not set or found, TRIVIAL_RENEWER will be returned); after making the suggested change for this jira, the renewer can be null, so we should return null from getRenewer().

          My question was, I'm not sure about the impact of this behavior change. I expect some application does count on the current behavior.

          More comments?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jian He , Thanks for your comment. I'm actually aligned with what you suggested. The problem I was trying to point out is, we will have to change the behavior of the code I pasted above to deal with null renewer. E.g., the getRenewer() method will return a non-null based on current implementation (if not set or found, TRIVIAL_RENEWER will be returned); after making the suggested change for this jira, the renewer can be null, so we should return null from getRenewer() . My question was, I'm not sure about the impact of this behavior change. I expect some application does count on the current behavior. More comments? Thanks.
          Hide
          jianhe Jian He added a comment -

          Yongjun , thanks for taking this up ! just assigned the jira under your name

          Show
          jianhe Jian He added a comment - Yongjun , thanks for taking this up ! just assigned the jira under your name
          Hide
          jianhe Jian He added a comment -

          Hi Yongjun Zhang, I think what we should do is in TokenCache#obtainTokensForNamenodesInternal change the delegTokenRenewer to be null for name nodes listed in "mapreduce.job.hdfs-servers.token-renewal.exclude".
          And on server side, decode the identifier field in Token and check whether the renewer in AbstractDelegationTokenIdentifier is null or not. make sense ?

          Show
          jianhe Jian He added a comment - Hi Yongjun Zhang , I think what we should do is in TokenCache#obtainTokensForNamenodesInternal change the delegTokenRenewer to be null for name nodes listed in "mapreduce.job.hdfs-servers.token-renewal.exclude". And on server side, decode the identifier field in Token and check whether the renewer in AbstractDelegationTokenIdentifier is null or not. make sense ?
          Hide
          yzhangal Yongjun Zhang added a comment -

          Possibly introduce a dummy renewer class and make its methods no op, instead of setting renewer to null?

          I wonder whether this would be compatible change ...

          Show
          yzhangal Yongjun Zhang added a comment - Possibly introduce a dummy renewer class and make its methods no op, instead of setting renewer to null? I wonder whether this would be compatible change ...
          Hide
          yzhangal Yongjun Zhang added a comment -

          HI Jian He and all,

          I resumed working on this and found an obstacle here.

          See org.apache.hadoop.security.token.Token:

           private synchronized TokenRenewer getRenewer() throws IOException {
              if (renewer != null) {
                return renewer;
              }
              renewer = TRIVIAL_RENEWER;
              synchronized (renewers) {
                for (TokenRenewer canidate : renewers) {
                  if (canidate.handleKind(this.kind)) {
                    renewer = canidate;
                    return renewer;
                  }
                }
              }
              LOG.warn("No TokenRenewer defined for token kind " + this.kind);
              return renewer;
            }
          
           public boolean isManaged() throws IOException {
              return getRenewer().isManaged(this);
            }
          
            public long renew(Configuration conf
                              ) throws IOException, InterruptedException {
              return getRenewer().renew(this, conf);
            }
            
            public void cancel(Configuration conf
                               ) throws IOException, InterruptedException {
              getRenewer().cancel(this, conf);
            }
          
          

          We can see that getRenewer() does more work than simply return the renewer. And non-null renewer is guaranteed to be returned currently. The other methods (listed above, called at server side) count on this behavior.

          If we set the renewer to null at client side and expect the server to pick it up, we need to do either

          1. change the behaviour of {{getRenewer()} to return whatever renewer set by client.
          2. or we change the token's kind to make getRenewer to return null, which will be really hacky.

          Making this kind of change seems to be more wide impact than expected, and things likely will broken by this change.

          Any thoughts?

          Thanks a lot.

          Show
          yzhangal Yongjun Zhang added a comment - HI Jian He and all, I resumed working on this and found an obstacle here. See org.apache.hadoop.security.token.Token: private synchronized TokenRenewer getRenewer() throws IOException { if (renewer != null ) { return renewer; } renewer = TRIVIAL_RENEWER; synchronized (renewers) { for (TokenRenewer canidate : renewers) { if (canidate.handleKind( this .kind)) { renewer = canidate; return renewer; } } } LOG.warn( "No TokenRenewer defined for token kind " + this .kind); return renewer; } public boolean isManaged() throws IOException { return getRenewer().isManaged( this ); } public long renew(Configuration conf ) throws IOException, InterruptedException { return getRenewer().renew( this , conf); } public void cancel(Configuration conf ) throws IOException, InterruptedException { getRenewer().cancel( this , conf); } We can see that getRenewer() does more work than simply return the renewer. And non-null renewer is guaranteed to be returned currently. The other methods (listed above, called at server side) count on this behavior. If we set the renewer to null at client side and expect the server to pick it up, we need to do either 1. change the behaviour of {{getRenewer()} to return whatever renewer set by client. 2. or we change the token's kind to make getRenewer to return null, which will be really hacky. Making this kind of change seems to be more wide impact than expected, and things likely will broken by this change. Any thoughts? Thanks a lot.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Vinod Kumar Vavilapalli,

          Thanks for the comments. We do have the consensus about the approach too, I have been caught on other critical stuff. Will try to get to this asap. Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Vinod Kumar Vavilapalli , Thanks for the comments. We do have the consensus about the approach too, I have been caught on other critical stuff. Will try to get to this asap. Thanks.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Hi Vinod Kumar Vavilapalli and Harsh J, comments on this approach that Jian described above?

          Caught up with the discussion. The latest proposal seems like a reasonable approach without adding too much throw-away functionality in YARN. +1 for the approach, let's get this done.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Hi Vinod Kumar Vavilapalli and Harsh J, comments on this approach that Jian described above? Caught up with the discussion. The latest proposal seems like a reasonable approach without adding too much throw-away functionality in YARN. +1 for the approach, let's get this done.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Gotcha, thanks Jian.

          Show
          yzhangal Yongjun Zhang added a comment - Gotcha, thanks Jian.
          Hide
          jianhe Jian He added a comment -

          it seems that "mapreduce.job.hdfs-servers.token-renewal.exclude" is still going to be a user-facing API only used for short-term

          Yes, it is a MR land config. The difference is that YARN won't need to expose an API used for MR only. After all, we are now giving a temp solution for MR itself, right ?

          Show
          jianhe Jian He added a comment - it seems that "mapreduce.job.hdfs-servers.token-renewal.exclude" is still going to be a user-facing API only used for short-term Yes, it is a MR land config. The difference is that YARN won't need to expose an API used for MR only. After all, we are now giving a temp solution for MR itself, right ?
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jian He,

          My only concern is not to add a user-facing API only used for short-term.

          I thought about it a bit more, it seems that "mapreduce.job.hdfs-servers.token-renewal.exclude" is still going to be a user-facing API only used for short-term, because when we introduce external renewer, the tokens need to be assigned to the renewer, after all, we want the tokens to be renewed. Right? Or there are use cases that we really don't want to renew?

          That said, I think the solution would solve our current problem.

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jian He , My only concern is not to add a user-facing API only used for short-term. I thought about it a bit more, it seems that "mapreduce.job.hdfs-servers.token-renewal.exclude" is still going to be a user-facing API only used for short-term, because when we introduce external renewer, the tokens need to be assigned to the renewer, after all, we want the tokens to be renewed. Right? Or there are use cases that we really don't want to renew? That said, I think the solution would solve our current problem. Thanks.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks a lot Jian,

          Good suggestion of #1, and I agree that "in reality I don't foresee big breakage". I further discussed Anubhav Dhoot, and he agrees with this approach too.

          Hi Vinod Kumar Vavilapalli and Harsh J, comments on this approach that Jian described above?

          If there is no objection, I will try to work out a revised patch asap.

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks a lot Jian, Good suggestion of #1, and I agree that "in reality I don't foresee big breakage". I further discussed Anubhav Dhoot , and he agrees with this approach too. Hi Vinod Kumar Vavilapalli and Harsh J , comments on this approach that Jian described above? If there is no objection, I will try to work out a revised patch asap. Thanks.
          Hide
          jianhe Jian He added a comment -

          Thanks Yongjun !
          How about this:
          1. Introduce a new MR config "mapreduce.job.hdfs-servers.token-renewal.exclude" which contains a list of servers to be excluded from renewing the tokens. This is similar to the existing config "mapreduce.job.hdfs-servers" to get the tokens. MR sets renewer to be null for tokens retrieved from these servers.
          2. Change RM to skip renewing the token if renewer is null; The only thing is that a null renewer earlier would cause application to fail at submission but now will pass and fail later. I know this is incompatible in some sense, but in reality I don't foresee big breakage. I would prefer not adding an extra config.

          Show
          jianhe Jian He added a comment - Thanks Yongjun ! How about this: 1. Introduce a new MR config "mapreduce.job.hdfs-servers.token-renewal.exclude" which contains a list of servers to be excluded from renewing the tokens. This is similar to the existing config "mapreduce.job.hdfs-servers" to get the tokens. MR sets renewer to be null for tokens retrieved from these servers. 2. Change RM to skip renewing the token if renewer is null; The only thing is that a null renewer earlier would cause application to fail at submission but now will pass and fail later. I know this is incompatible in some sense, but in reality I don't foresee big breakage. I would prefer not adding an extra config.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Many thanks Jian.

          Change MR client to set null renewer for the token coming from a different cluster

          In the special case that we are dealing with in this jira, cluster A and cluster B don't trust each other. However, in other scenarios, two clusters may trust each other. So we can't always set null renewer based on which cluster the token is from.
          Maybe we can combine our approaches, set null renewer for external cluster only when
          -Dmapreduce.job.delegation.tokenrenewer.for.external.cluster=null is specified for a job?

          Actually, YARN can also provide a constant string say "SKIP_RENEW_TOKEN", MR uses this string as the renewer for tokens it doesn't want to renew. RM detects if the renewer equals the constant string and skip renew if it is.

          Maybe we can use string "null" for SKIP_RENEW_TOKEN? we need to document whatever string here as a special string so application don't use it for tokens that need to be renewed.

          There is still chance of changing existing applications behavior for those who happen to set the renewer to our special string. So what about we still introduce yarn.resourcemanager.validate.tokenrenewer described in my last comment (enable renewer validation only when the config is true)?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Many thanks Jian. Change MR client to set null renewer for the token coming from a different cluster In the special case that we are dealing with in this jira, cluster A and cluster B don't trust each other. However, in other scenarios, two clusters may trust each other. So we can't always set null renewer based on which cluster the token is from. Maybe we can combine our approaches, set null renewer for external cluster only when -Dmapreduce.job.delegation.tokenrenewer.for.external.cluster=null is specified for a job? Actually, YARN can also provide a constant string say "SKIP_RENEW_TOKEN", MR uses this string as the renewer for tokens it doesn't want to renew. RM detects if the renewer equals the constant string and skip renew if it is. Maybe we can use string "null" for SKIP_RENEW_TOKEN? we need to document whatever string here as a special string so application don't use it for tokens that need to be renewed. There is still chance of changing existing applications behavior for those who happen to set the renewer to our special string. So what about we still introduce yarn.resourcemanager.validate.tokenrenewer described in my last comment (enable renewer validation only when the config is true)? Thanks.
          Hide
          jianhe Jian He added a comment -

          RM should check if the renewer is null

          Actually, YARN can also provide a constant string say "SKIP_RENEW_TOKEN", MR uses this string as the renewer for tokens it doesn't want to renew. RM detects if the renewer equals the constant string and skip renew if it is.

          Show
          jianhe Jian He added a comment - RM should check if the renewer is null Actually, YARN can also provide a constant string say "SKIP_RENEW_TOKEN", MR uses this string as the renewer for tokens it doesn't want to renew. RM detects if the renewer equals the constant string and skip renew if it is.
          Hide
          jianhe Jian He added a comment -

          it'll skip all tokens the application provides, even though some tokens can be continued renewed.

          For example, time line delegation token should continue being renewed irrespective of hdfs token.

          Show
          jianhe Jian He added a comment - it'll skip all tokens the application provides, even though some tokens can be continued renewed. For example, time line delegation token should continue being renewed irrespective of hdfs token.
          Hide
          jianhe Jian He added a comment -

          have MR client specify the token renewer it needs to use (instead of your step 1), such as passing -Dmapreduce.job.delegation.tokenrenewer=null

          Are you suggesting a client option to override the renewer for all tokens? the renewer is on per-token basis; we should not override the renewer for the regular token (the token issued by its own cluster). What I was suggesting is that we should override the renewer to be null only for tokens retrieved from the cluster which is different from the cluster where the client lives.

          #2 would let certain jobs continue to run even if they would have failed token renewal without this short term solution.

          Actually, instead of checking whether the renewer is RM itself, RM should check if the renewer is null; If the renewer is null, RM skips the renew; otherwise RM can continue renewing the token; This way a wrong token renewer will also fail the application. The only thing is that a null renewer would cause application to fail earlier but now will pass, which I think is fine ?

          Also, one problem with current per-app-basis API is that, it'll skip all tokens the application provides, even though some tokens can be continued renewed.

          Show
          jianhe Jian He added a comment - have MR client specify the token renewer it needs to use (instead of your step 1), such as passing -Dmapreduce.job.delegation.tokenrenewer=null Are you suggesting a client option to override the renewer for all tokens? the renewer is on per-token basis; we should not override the renewer for the regular token (the token issued by its own cluster). What I was suggesting is that we should override the renewer to be null only for tokens retrieved from the cluster which is different from the cluster where the client lives. #2 would let certain jobs continue to run even if they would have failed token renewal without this short term solution. Actually, instead of checking whether the renewer is RM itself, RM should check if the renewer is null; If the renewer is null, RM skips the renew; otherwise RM can continue renewing the token; This way a wrong token renewer will also fail the application. The only thing is that a null renewer would cause application to fail earlier but now will pass, which I think is fine ? Also, one problem with current per-app-basis API is that, it'll skip all tokens the application provides, even though some tokens can be continued renewed.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jian He,

          I agree with you about the longer term solution. However, with the short term solution you suggested, #2 would let certain jobs continue to run even if they would have failed token renewal without this short term solution.

          Another alternative is,

          1. have MR client specify the token renewer it needs to use (instead of your step 1), such as passing -Dmapreduce.job.delegation.tokenrenewer=null
          2. client code will update the token renewer if specified
          3. RM implements the logic to only renew its own token, however, this is configurable by a new server-side config property, such as yarn.resourcemanager.validate.tokenrenewer. This config property defaults to false to retain old behavior (means not validating the renewer). We may consider changing the default to true in the future.

          This alternative avoids the new temporary API, but it also involves setting a server-side config property that impacts all clients.
          (The advantage of my earlier proposed solution has minimum impact, of course at the cost of introducing the temporary new API)

          Thoughts?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jian He , I agree with you about the longer term solution. However, with the short term solution you suggested, #2 would let certain jobs continue to run even if they would have failed token renewal without this short term solution. Another alternative is, have MR client specify the token renewer it needs to use (instead of your step 1), such as passing -Dmapreduce.job.delegation.tokenrenewer=null client code will update the token renewer if specified RM implements the logic to only renew its own token, however, this is configurable by a new server-side config property, such as yarn.resourcemanager.validate.tokenrenewer. This config property defaults to false to retain old behavior (means not validating the renewer). We may consider changing the default to true in the future. This alternative avoids the new temporary API, but it also involves setting a server-side config property that impacts all clients. (The advantage of my earlier proposed solution has minimum impact, of course at the cost of introducing the temporary new API) Thoughts? Thanks.
          Hide
          jianhe Jian He added a comment -

          Consider it's also matching our future extension of introducing external renewer,

          If this API is needed in the future, we can definitely do as the current patch does. My only concern is not to add a user-facing API only used for short-term. What I have in mind is that, in the long term we can
          1. Change MR client to set the correct renewer for a given token, either pointing to a central renewal service or RM itself. Today JobClient is blindly setting the renewer of all tokens with the local RM config which is wrong in the first place.
          2. RM checks if the token renewer is itself; Renew if it is, skip renewing otherwise.

          Thinking more, how about this approach:
          1. Change MR client to set null renewer for the token coming from a different cluster (meaning no renewer for this token which is true in real scenario). This is more or less equivalent to explicitly adding a flag to inform RM wether to renew as current patch does
          2. RM implements the logic to only renew its own token.
          thoughts?

          Show
          jianhe Jian He added a comment - Consider it's also matching our future extension of introducing external renewer, If this API is needed in the future, we can definitely do as the current patch does. My only concern is not to add a user-facing API only used for short-term. What I have in mind is that, in the long term we can 1. Change MR client to set the correct renewer for a given token, either pointing to a central renewal service or RM itself. Today JobClient is blindly setting the renewer of all tokens with the local RM config which is wrong in the first place. 2. RM checks if the token renewer is itself; Renew if it is, skip renewing otherwise. Thinking more, how about this approach: 1. Change MR client to set null renewer for the token coming from a different cluster (meaning no renewer for this token which is true in real scenario). This is more or less equivalent to explicitly adding a flag to inform RM wether to renew as current patch does 2. RM implements the logic to only renew its own token. thoughts?
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jian He,

          Thanks a lot for your comments.

          I discussed with Anubhav Dhoot, below is what we thought:

          1.

          we may choose to add a server-side config to not let application fail if renewal fails

          The change is to let RM ignore token renewal failure, this means all applications will be impacted by this change since it's a server side config. What Harsh did in the initial patch is to ignore renewal failure in a hardcoded way. What my earlier patch does is to skip renewing instead of ignore token renewal failure.

          We think skipping renewal seems better than ignoring renewal failure, because we are also talking about adding external renewer in the future, and these two changes will be compatible. Say, there might be renewal failure with external renewer, which we don't want to ignore.

          2. API change in the current patch. It's an optional parameter, so it's compatible change. Consider it's also matching our future extension of introducing external renewer, it seems ok to have the API change.

          Comments?

          Many thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jian He , Thanks a lot for your comments. I discussed with Anubhav Dhoot , below is what we thought: 1. we may choose to add a server-side config to not let application fail if renewal fails The change is to let RM ignore token renewal failure, this means all applications will be impacted by this change since it's a server side config. What Harsh did in the initial patch is to ignore renewal failure in a hardcoded way. What my earlier patch does is to skip renewing instead of ignore token renewal failure. We think skipping renewal seems better than ignoring renewal failure, because we are also talking about adding external renewer in the future, and these two changes will be compatible. Say, there might be renewal failure with external renewer, which we don't want to ignore. 2. API change in the current patch. It's an optional parameter, so it's compatible change. Consider it's also matching our future extension of introducing external renewer, it seems ok to have the API change. Comments? Many thanks.
          Hide
          jianhe Jian He added a comment -

          Overall I think "automatic token renewal" has always been an "auxiliary service" provided by YARN's RM.

          I think this raised a point that the DelegationTokenRenewal is just an auxiliary service, not a fundamental service required by YARN. RM today happens to be the renewer, in the long term solution, we can point the renewer to a real centralized renewal service to support such cross-platform trust setup. Instead of explicitly adding a user-facing API and deprecate the API in the future, we may choose to add a server-side config to not let application fail if renewal fails. thoughts ?

          Show
          jianhe Jian He added a comment - Overall I think "automatic token renewal" has always been an "auxiliary service" provided by YARN's RM. I think this raised a point that the DelegationTokenRenewal is just an auxiliary service, not a fundamental service required by YARN. RM today happens to be the renewer, in the long term solution, we can point the renewer to a real centralized renewal service to support such cross-platform trust setup. Instead of explicitly adding a user-facing API and deprecate the API in the future, we may choose to add a server-side config to not let application fail if renewal fails. thoughts ?
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Vinod Kumar Vavilapalli,

          Thanks a lot for your comment!

          The question is whether we continue supporting this implicit aux feature or drop it. And given my earlier point that RM cannot know either ways, this implicit feature was always broken.

          Agree. What about we use the patch of this jira to disable/enable this implicit feature (as it currently does), and create a new jira to address the broken implicit feature when enabled?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Vinod Kumar Vavilapalli , Thanks a lot for your comment! The question is whether we continue supporting this implicit aux feature or drop it. And given my earlier point that RM cannot know either ways, this implicit feature was always broken. Agree. What about we use the patch of this jira to disable/enable this implicit feature (as it currently does), and create a new jira to address the broken implicit feature when enabled? Thanks.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Seems regardless of this jira, we could do a renewer address match as a validation step. Right?

          +1. Not as a validation, but to see if this RM should attempt renewal or not.

          In this case, actually looks like the renewer would be cluster A's yarn, based on TokenCache@obtainTokensForNamenodesInternal and Master.getMasterPrincipal.

          So it looks like that even if we check, the renewer would match in this case. Please correct me if I'm wrong.

          To make it work, we will still have to change the applications (MR etc). App changes are needed irrespective of the approach.

          I'd be willing to accept that approach, but for one small worry: Any app sending in a token with a bad renewer set could get through with such a change, whereas previously it'd be rejected outright. Not that it'd be harmful (as it is ignored), but it could still be seen as a behaviour change, no?

          This is what you originally wanted ["In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately."]
          I think the problem is that RM doesn't have enough knowledge to know what is a valid third-party renewer (that is not this RM itself), and what is an invalid renewer. Even the app can really be not sure.

          Overall I think "automatic token renewal" has always been an "auxiliary service" provided by YARN's RM. If you want to make use of that service as an application, you need to get token with the right token-service ('me') and pass it to 'me' to renew it correctly. If either of those conditions, I'll not give you that service.

          Implicitly we also had a "automatic token validation" as a auxiliary feature. But given the history I know, this was never our intention. The question is whether we continue supporting this implicit aux feature or drop it. And given my earlier point that RM cannot know either ways, this implicit feature was always broken. I'm wary of adding this new API (I know I started with that proposal )

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Seems regardless of this jira, we could do a renewer address match as a validation step. Right? +1. Not as a validation, but to see if this RM should attempt renewal or not. In this case, actually looks like the renewer would be cluster A's yarn, based on TokenCache@obtainTokensForNamenodesInternal and Master.getMasterPrincipal. So it looks like that even if we check, the renewer would match in this case. Please correct me if I'm wrong. To make it work, we will still have to change the applications (MR etc). App changes are needed irrespective of the approach. I'd be willing to accept that approach, but for one small worry: Any app sending in a token with a bad renewer set could get through with such a change, whereas previously it'd be rejected outright. Not that it'd be harmful (as it is ignored), but it could still be seen as a behaviour change, no? This is what you originally wanted ["In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately."] I think the problem is that RM doesn't have enough knowledge to know what is a valid third-party renewer (that is not this RM itself), and what is an invalid renewer. Even the app can really be not sure. Overall I think "automatic token renewal" has always been an "auxiliary service" provided by YARN's RM. If you want to make use of that service as an application, you need to get token with the right token-service ('me') and pass it to 'me' to renew it correctly. If either of those conditions, I'll not give you that service. Implicitly we also had a "automatic token validation" as a auxiliary feature. But given the history I know, this was never our intention. The question is whether we continue supporting this implicit aux feature or drop it. And given my earlier point that RM cannot know either ways, this implicit feature was always broken. I'm wary of adding this new API (I know I started with that proposal )
          Hide
          yzhangal Yongjun Zhang added a comment -

          HI Vinod Kumar Vavilapalli and Jian He,

          Would you please comment on Harsh J's comment above?

          Thanks a lot.

          Show
          yzhangal Yongjun Zhang added a comment - HI Vinod Kumar Vavilapalli and Jian He , Would you please comment on Harsh J 's comment above? Thanks a lot.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks Harsh J, I agree with your comment. Look forward to hearing other folks' viewpoints.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks Harsh J , I agree with your comment. Look forward to hearing other folks' viewpoints.
          Hide
          qwertymaniac Harsh J added a comment -

          Thanks again Vinod Kumar Vavilapalli and Yongjun Zhang,

          bq. RM can simply inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address. This way, we don't need an explicit API in the submission context.

          I think this will work, and is a preferable solution to me. What do others think?

          I'd be willing to accept that approach, but for one small worry: Any app sending in a token with a bad renewer set could get through with such a change, whereas previously it'd be rejected outright. Not that it'd be harmful (as it is ignored), but it could still be seen as a behaviour change, no?

          The current patch OTOH, is explicit in demanding a config/flag to be set for direct awareness of such a thing. That sounds more cleaner to me to do.

          Show
          qwertymaniac Harsh J added a comment - Thanks again Vinod Kumar Vavilapalli and Yongjun Zhang , bq. RM can simply inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address. This way, we don't need an explicit API in the submission context. I think this will work, and is a preferable solution to me. What do others think? I'd be willing to accept that approach, but for one small worry: Any app sending in a token with a bad renewer set could get through with such a change, whereas previously it'd be rejected outright. Not that it'd be harmful (as it is ignored), but it could still be seen as a behaviour change, no? The current patch OTOH, is explicit in demanding a config/flag to be set for direct awareness of such a thing. That sounds more cleaner to me to do.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Vinod Kumar Vavilapalli,

          Thanks for your further look and confirm that the uploaded patch would work.

          About your non-code comments, I assume you meant to " inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address". I have two thoughts:

          • Seems regardless of this jira, we could do a renewer address match as a validation step. Right?
          • In this case, actually looks like the renewer would be cluster A's yarn, based on TokenCache@obtainTokensForNamenodesInternal and Master.getMasterPrincipal.
          public static String getMasterUserName(Configuration conf) {
              String framework = conf.get(MRConfig.FRAMEWORK_NAME, MRConfig.YARN_FRAMEWORK_NAME);
              if (framework.equals(MRConfig.CLASSIC_FRAMEWORK_NAME)) {    
                return conf.get(MRConfig.MASTER_USER_NAME);
              } 
              else {
                return conf.get(YarnConfiguration.RM_PRINCIPAL);
              }
            }
          

          So it looks like that even if we check, the renewer would match in this case. Please correct me if I'm wrong.

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Vinod Kumar Vavilapalli , Thanks for your further look and confirm that the uploaded patch would work. About your non-code comments, I assume you meant to " inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address". I have two thoughts: Seems regardless of this jira, we could do a renewer address match as a validation step. Right? In this case, actually looks like the renewer would be cluster A's yarn, based on TokenCache@obtainTokensForNamenodesInternal and Master.getMasterPrincipal . public static String getMasterUserName(Configuration conf) { String framework = conf.get(MRConfig.FRAMEWORK_NAME, MRConfig.YARN_FRAMEWORK_NAME); if (framework.equals(MRConfig.CLASSIC_FRAMEWORK_NAME)) { return conf.get(MRConfig.MASTER_USER_NAME); } else { return conf.get(YarnConfiguration.RM_PRINCIPAL); } } So it looks like that even if we check, the renewer would match in this case. Please correct me if I'm wrong. Thanks.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          RM can simply inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address. This way, we don't need an explicit API in the submission context.

          Seems regardless of this jira, we could have do the above change, right? any catch?

          I think this will work, and is a preferable solution to me. What do others think? /cc Harsh J/Jian He

          In our simple tests the app did run through successfully with such an approach, but there was multiple factors we did not test for (app recovery, task failures, etc. which could be impacted). Would it be better if we added in a morphed DelegationTokenRenewer (which does NOP as part of actual renewal logic), instead of skipping adding in the renewer completely?

          Never mind. I looked at the patch again, it would work, barring my other non-code comments.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - RM can simply inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address. This way, we don't need an explicit API in the submission context. Seems regardless of this jira, we could have do the above change, right? any catch? I think this will work, and is a preferable solution to me. What do others think? /cc Harsh J / Jian He In our simple tests the app did run through successfully with such an approach, but there was multiple factors we did not test for (app recovery, task failures, etc. which could be impacted). Would it be better if we added in a morphed DelegationTokenRenewer (which does NOP as part of actual renewal logic), instead of skipping adding in the renewer completely? Never mind. I looked at the patch again, it would work, barring my other non-code comments.
          Hide
          qwertymaniac Harsh J added a comment -

          Vinod Kumar Vavilapalli,

          Many thanks for the response here!

          Though the patch unblocks the jobs in the short term, it seems like long term this is still bad.

          I agree in that it does not resolve the problem. The goal we're seeking is also short-term, in that of bringing back a behaviour that got allowed on MR1, in MR2 - even though both end up facing the same issue.

          The longer term approach sounds like the most optimal thing to do for proper resolution, but given some users are getting blocked by this behaviour change I'd like to know if there'll be any objections in adding the current approach as an interim-fix (the doc for the property does/will claim it disables several necessary features of the job), and file subsequent JIRAs for implementing the standalone renewer?

          Irrespective of how we decide to skip tokens, the way the patch is skipping renewal will not work. In secure mode, DelegationTokenRenewer drives the app state machine. So if you skip adding the app itself to DTR, the app will be completely stuck.

          In our simple tests the app did run through successfully with such an approach, but there was multiple factors we did not test for (app recovery, task failures, etc. which could be impacted). Would it be better if we added in a morphed DelegationTokenRenewer (which does NOP as part of actual renewal logic), instead of skipping adding in the renewer completely?

          Show
          qwertymaniac Harsh J added a comment - Vinod Kumar Vavilapalli , Many thanks for the response here! Though the patch unblocks the jobs in the short term, it seems like long term this is still bad. I agree in that it does not resolve the problem. The goal we're seeking is also short-term, in that of bringing back a behaviour that got allowed on MR1, in MR2 - even though both end up facing the same issue. The longer term approach sounds like the most optimal thing to do for proper resolution, but given some users are getting blocked by this behaviour change I'd like to know if there'll be any objections in adding the current approach as an interim-fix (the doc for the property does/will claim it disables several necessary features of the job), and file subsequent JIRAs for implementing the standalone renewer? Irrespective of how we decide to skip tokens, the way the patch is skipping renewal will not work. In secure mode, DelegationTokenRenewer drives the app state machine. So if you skip adding the app itself to DTR, the app will be completely stuck. In our simple tests the app did run through successfully with such an approach, but there was multiple factors we did not test for (app recovery, task failures, etc. which could be impacted). Would it be better if we added in a morphed DelegationTokenRenewer (which does NOP as part of actual renewal logic), instead of skipping adding in the renewer completely?
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Vinod Kumar Vavilapalli and Jian He,

          Thank you so much for review and commenting!

          I will try to respond to part of your comments here and keep looking into the rest.

          RM can simply inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address. This way, we don't need an explicit API in the submission context.

          Seems regardless of this jira, we could have do the above change, right? any catch?

          Apologies for going back and forth on this one.

          I appreciate the insight you provided, and we are trying to figure out the best solution together. All the points you provided are reasonable, so absolutely no need for apologies here.

          Irrespective of how we decide to skip tokens, the way the patch is skipping renewal will not work. In secure mode, DelegationTokenRenewer drives the app state machine. So if you skip adding the app itself to DTR, the app will be completely

          I did test in a secure env and it worked. Would you please elaborate?

          I think in this case, the renewer specified in the token is the same as the RM. IIUC, the JobClient will request the token from B cluster, but still specify the renewer as the A cluster RM (via the A cluster local config), am I right?

          I think that's the case. The problem is that there is no trust between A and B. So "common" should be the one to renew the token.

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Vinod Kumar Vavilapalli and Jian He , Thank you so much for review and commenting! I will try to respond to part of your comments here and keep looking into the rest. RM can simply inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address. This way, we don't need an explicit API in the submission context. Seems regardless of this jira, we could have do the above change, right? any catch? Apologies for going back and forth on this one. I appreciate the insight you provided, and we are trying to figure out the best solution together. All the points you provided are reasonable, so absolutely no need for apologies here. Irrespective of how we decide to skip tokens, the way the patch is skipping renewal will not work. In secure mode, DelegationTokenRenewer drives the app state machine. So if you skip adding the app itself to DTR, the app will be completely I did test in a secure env and it worked. Would you please elaborate? I think in this case, the renewer specified in the token is the same as the RM. IIUC, the JobClient will request the token from B cluster, but still specify the renewer as the A cluster RM (via the A cluster local config), am I right? I think that's the case. The problem is that there is no trust between A and B. So "common" should be the one to renew the token. Thanks.
          Hide
          jianhe Jian He added a comment -

          the JobClient will request the token from B cluster, but still specify the renewer as the A cluster RM (via the A cluster local config)

          If this is the case, the assumption here is problematic, why would I request a token from B but let untrusted 3rd party A renew my token in the first place?

          Show
          jianhe Jian He added a comment - the JobClient will request the token from B cluster, but still specify the renewer as the A cluster RM (via the A cluster local config) If this is the case, the assumption here is problematic, why would I request a token from B but let untrusted 3rd party A renew my token in the first place?
          Hide
          jianhe Jian He added a comment -

          Explicitly have an external renewer system that has the right permissions to renew these tokens.

          I think this is the correct long-term solution. RM today happens to be the renewer. But we need a central renewer component so that we can do cross-cluster renewals.

          RM can simply inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address

          I think in this case, the renewer specified in the token is the same as the RM. IIUC, the JobClient will request the token from B cluster, but still specify the renewer as the A cluster RM (via the A cluster local config), am I right?

          Show
          jianhe Jian He added a comment - Explicitly have an external renewer system that has the right permissions to renew these tokens. I think this is the correct long-term solution. RM today happens to be the renewer. But we need a central renewer component so that we can do cross-cluster renewals. RM can simply inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address I think in this case, the renewer specified in the token is the same as the RM. IIUC, the JobClient will request the token from B cluster, but still specify the renewer as the A cluster RM (via the A cluster local config), am I right?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12696924/YARN-3021.003.patch
          against trunk revision 45ea53f.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.conf.TestJobConf

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6533//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6533//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6533//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696924/YARN-3021.003.patch against trunk revision 45ea53f. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.conf.TestJobConf Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6533//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6533//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6533//console This message is automatically generated.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Though the patch unblocks the jobs in the short term, it seems like long term this is still bad. Applications that want to run for longer than 7 days in such setups will just fail without any other way.

          May be the solution is the following:

          • Explicitly have an external renewer system that has the right permissions to renew these tokens. Working with such an external renewer system needs support in frameworks, for e.g. in MapReduce, a renewal server list similar to mapreduce.job.hdfs-servers.
          • RM can simply inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address. This way, we don't need an explicit API in the submission context.

          Apologies for going back and forth on this one. Does that work? /cc Jian He, Karthik Kambatla.

          Irrespective of how we decide to skip tokens, the way the patch is skipping renewal will not work. In secure mode, DelegationTokenRenewer drives the app state machine. So if you skip adding the app itself to DTR, the app will be completely stuck.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Though the patch unblocks the jobs in the short term, it seems like long term this is still bad. Applications that want to run for longer than 7 days in such setups will just fail without any other way. May be the solution is the following: Explicitly have an external renewer system that has the right permissions to renew these tokens. Working with such an external renewer system needs support in frameworks, for e.g. in MapReduce, a renewal server list similar to mapreduce.job.hdfs-servers. RM can simply inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address. This way, we don't need an explicit API in the submission context. Apologies for going back and forth on this one. Does that work? /cc Jian He , Karthik Kambatla . Irrespective of how we decide to skip tokens, the way the patch is skipping renewal will not work. In secure mode, DelegationTokenRenewer drives the app state machine. So if you skip adding the app itself to DTR, the app will be completely stuck.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Uploaded rev 003 to address Zhihai's comments. Thanks Zhihai.

          Show
          yzhangal Yongjun Zhang added a comment - Uploaded rev 003 to address Zhihai's comments. Thanks Zhihai.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks for the in-person discussion and comment zhihai xu. I will do the change in next rev.

          Hi Vinod Kumar Vavilapalli, your help is greatly appreciated here, would you please take a look at the patch? thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks for the in-person discussion and comment zhihai xu . I will do the change in next rev. Hi Vinod Kumar Vavilapalli , your help is greatly appreciated here, would you please take a look at the patch? thanks.
          Hide
          zxu zhihai xu added a comment -

          There is another place in RMAppImpl#transition that calls DeletationTokenRenewer#addApplicationSync

          IMO, we should do the same change in RMAppImpl#RMAppRecoveredTransition to skip renewing the Token during recovery.

          Show
          zxu zhihai xu added a comment - There is another place in RMAppImpl#transition that calls DeletationTokenRenewer#addApplicationSync IMO, we should do the same change in RMAppImpl#RMAppRecoveredTransition to skip renewing the Token during recovery.
          Hide
          yzhangal Yongjun Zhang added a comment -

          I submitted the same patch 002 twice (only diff between them is a comment in xml file), the two tests failed at the 1st submission are the same as rev 001, which I described at
          https://issues.apache.org/jira/browse/YARN-3021?focusedCommentId=14290493&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14290493

          Rerunning the other failed tests of rev 002 locally is successful.

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - I submitted the same patch 002 twice (only diff between them is a comment in xml file), the two tests failed at the 1st submission are the same as rev 001, which I described at https://issues.apache.org/jira/browse/YARN-3021?focusedCommentId=14290493&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14290493 Rerunning the other failed tests of rev 002 locally is successful. Thanks.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12695793/YARN-3021.002.patch
          against trunk revision ffc75d6.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings.

          -1 release audit. The applied patch generated 1 release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.conf.TestJobConf
          org.apache.hadoop.mapreduce.TestLargeSort
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
          org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
          org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer
          org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication
          org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication
          org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
          org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization

          The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6480//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6480//artifact/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6480//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6480//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695793/YARN-3021.002.patch against trunk revision ffc75d6. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. -1 release audit . The applied patch generated 1 release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.conf.TestJobConf org.apache.hadoop.mapreduce.TestLargeSort org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6480//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6480//artifact/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6480//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6480//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12695788/YARN-3021.002.patch
          against trunk revision ffc75d6.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings.

          -1 release audit. The applied patch generated 1 release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.mapreduce.TestLargeSort
          org.apache.hadoop.conf.TestJobConf

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6479//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6479//artifact/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6479//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6479//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695788/YARN-3021.002.patch against trunk revision ffc75d6. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. -1 release audit . The applied patch generated 1 release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.TestLargeSort org.apache.hadoop.conf.TestJobConf Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6479//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6479//artifact/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6479//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6479//console This message is automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Harsh J,

          Thanks a lot for your review and feedback! I just uploaded rev 002 to address them,

          Right now my fix is to change RMAppManager#submitApplication (that calls DeletationTokenRenewer#addApplicationAsync) to check the config property setting. Because I saw that it's the method that gets called when distcp submits job.

          There is another place in RMAppImpl#transition that calls DeletationTokenRenewer#addApplicationSync, which I'm not familiar with.

          I wonder if you could comment on that?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Harsh J , Thanks a lot for your review and feedback! I just uploaded rev 002 to address them, Right now my fix is to change RMAppManager#submitApplication (that calls DeletationTokenRenewer#addApplicationAsync ) to check the config property setting. Because I saw that it's the method that gets called when distcp submits job. There is another place in RMAppImpl#transition that calls DeletationTokenRenewer#addApplicationSync , which I'm not familiar with. I wonder if you could comment on that? Thanks.
          Hide
          qwertymaniac Harsh J added a comment -

          Overall the patch looks fine to me, but please do hold up for Vinod Kumar Vavilapalli or another YARN active committer to take a look.

          Could you conceive a test case for this as well, to catch regressions in behaviour in future? For example it could be done by adding an invalid token with the app, but with this option turned on. With the option turned off, such a thing will always fail and app gets rejected, but with the fix in proper behaviour it will pass through the submit procedure at least. Checkout the test-case modified in the earlier patch for a reusable reference.

          Also, could you document the added MR config in mapred-default.xml, describing its use and marking it also as advanced, as it disables some features of a regular resilient application such as token reuse and renewals.

          Show
          qwertymaniac Harsh J added a comment - Overall the patch looks fine to me, but please do hold up for Vinod Kumar Vavilapalli or another YARN active committer to take a look. Could you conceive a test case for this as well, to catch regressions in behaviour in future? For example it could be done by adding an invalid token with the app, but with this option turned on. With the option turned off, such a thing will always fail and app gets rejected, but with the fix in proper behaviour it will pass through the submit procedure at least. Checkout the test-case modified in the earlier patch for a reusable reference. Also, could you document the added MR config in mapred-default.xml, describing its use and marking it also as advanced, as it disables some features of a regular resilient application such as token reuse and renewals.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Vinod Kumar Vavilapalli,

          Thanks for your earlier comments and suggestions, would you please help taking a look at the patch? Thanks a lot!

          Show
          yzhangal Yongjun Zhang added a comment - Hi Vinod Kumar Vavilapalli , Thanks for your earlier comments and suggestions, would you please help taking a look at the patch? Thanks a lot!
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Robert Kanter, the reason I moved context.setResourcere(resource) is because it seems to be misplaced: all other parameters appear to be set in the order they are passed to the method, except this one. Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Robert Kanter , the reason I moved context.setResourcere(resource) is because it seems to be misplaced: all other parameters appear to be set in the order they are passed to the method, except this one. Thanks.
          Hide
          rkanter Robert Kanter added a comment -

          Ok, then I think it should probably fine; though I'd like to let some others take a look.

          Also, is there a reason why the patch moves context.setResourcere(resource)?

          Show
          rkanter Robert Kanter added a comment - Ok, then I think it should probably fine; though I'd like to let some others take a look. Also, is there a reason why the patch moves context.setResourcere(resource) ?
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Robert Kanter and Anubhav Dhoot,

          Thanks for your comments. Yes, we are talking about HDFS delegation token, this patch provides an option to turn off initialization validation inside RM, and the token verification will happen inside HDFS when distcp job runs.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Robert Kanter and Anubhav Dhoot , Thanks for your comments. Yes, we are talking about HDFS delegation token, this patch provides an option to turn off initialization validation inside RM, and the token verification will happen inside HDFS when distcp job runs.
          Hide
          adhoot Anubhav Dhoot added a comment -

          We are talking about turning off renewal of tokens and some initialization validation checks done inside RM on behalf of the user. This should not be turning off token verification inside HDFS. That should still happen

          Show
          adhoot Anubhav Dhoot added a comment - We are talking about turning off renewal of tokens and some initialization validation checks done inside RM on behalf of the user. This should not be turning off token verification inside HDFS. That should still happen
          Hide
          rkanter Robert Kanter added a comment -

          I don't see any security holes. This token is only for the application's own use. The validation and renewal that you are turning off via the new parameter should not impact security of YARN or other applications.

          I'm not sure that's entirely correct. We're talking about the HDFS delegation token, right? Might it be possible to circumvent token expiration times by telling YARN not to renew the token? I'm not sure when the expiration check is done, so I could be wrong here.

          Show
          rkanter Robert Kanter added a comment - I don't see any security holes. This token is only for the application's own use. The validation and renewal that you are turning off via the new parameter should not impact security of YARN or other applications. I'm not sure that's entirely correct. We're talking about the HDFS delegation token, right? Might it be possible to circumvent token expiration times by telling YARN not to renew the token? I'm not sure when the expiration check is done, so I could be wrong here.
          Hide
          yzhangal Yongjun Zhang added a comment -

          I reran the failed tests locally,

          The following tests

          org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStatorg.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
          org.apache.hadoop.mapred.TestJobConf

          are successful

          The following test failed:

          org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem
          TestJobConf.testNegativeValueForTaskVmem:111 expected:<1024> but was:<-1>

          and it was reported as MAPREDUCE-6223.

          TestLargeSort failed in
          https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2033/

          Show
          yzhangal Yongjun Zhang added a comment - I reran the failed tests locally, The following tests org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStatorg.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.mapred.TestJobConf are successful The following test failed: org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem TestJobConf.testNegativeValueForTaskVmem:111 expected:<1024> but was:<-1> and it was reported as MAPREDUCE-6223 . TestLargeSort failed in https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2033/
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12694177/YARN-3021.001.patch
          against trunk revision 24aa462.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.mapreduce.TestLargeSort
          org.apache.hadoop.conf.TestJobConf
          org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
          org.apache.hadoop.yarn.server.resourcemanager.TestRM

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6398//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6398//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6398//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694177/YARN-3021.001.patch against trunk revision 24aa462. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.TestLargeSort org.apache.hadoop.conf.TestJobConf org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6398//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6398//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6398//console This message is automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Harsh J, Vinod Kumar Vavilapalli, Anubhav Dhoot,

          Thanks for the earlier discussion and input. I uploaded patch rev 001 by introducing a new job configuration property mapreduce.job.skip.rm.token.renewal, thus passing "-Dmapreduce.job.skip.rm.token.renewal=true" to distcp (to instruct Resource Manager to skip token renewal) would solve the problem.

          I did test in the env Harsh helped to set up, thanks Harsh.

          Would you please help taking a look at the patch?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Harsh J , Vinod Kumar Vavilapalli , Anubhav Dhoot , Thanks for the earlier discussion and input. I uploaded patch rev 001 by introducing a new job configuration property mapreduce.job.skip.rm.token.renewal, thus passing "-Dmapreduce.job.skip.rm.token.renewal=true" to distcp (to instruct Resource Manager to skip token renewal) would solve the problem. I did test in the env Harsh helped to set up, thanks Harsh. Would you please help taking a look at the patch? Thanks.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Anubhav Dhoot, Thanks for clarifying. That sounds good.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Anubhav Dhoot , Thanks for clarifying. That sounds good.
          Hide
          adhoot Anubhav Dhoot added a comment -

          I don't see any security holes. This token is only for the application's own use. The validation and renewal that you are turning off via the new parameter should not impact security of YARN or other applications.

          Show
          adhoot Anubhav Dhoot added a comment - I don't see any security holes. This token is only for the application's own use. The validation and renewal that you are turning off via the new parameter should not impact security of YARN or other applications.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks a lot Anubhav Dhoot and Vinod Kumar Vavilapalli!

          Having said that, if RM cannot validate the token as valid why would the job itself work? Would not the containers themselves face the same issue using the tokens?

          Based on the scenario Harsh J described in the jira description, the token is from realm B, which can not be validated by realm A's YARN since A and B doesn't trust each other. However, the token can be used by distcp job running in realm A to access B's file (B is the distcp source).

          For the scenario described in the jira, I think we are aligned that it would be better to add an additional parameter at the time of job submission, so client to can tell YARN explicitly that YARN should not try to renew the token.

          What I wanted to clarify with my earlier question was, if we support this scenario by having YARN not to validate the token, do we open any security hole? Anyone could submit a job and ask YARN not to renew the token, right?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks a lot Anubhav Dhoot and Vinod Kumar Vavilapalli ! Having said that, if RM cannot validate the token as valid why would the job itself work? Would not the containers themselves face the same issue using the tokens? Based on the scenario Harsh J described in the jira description, the token is from realm B, which can not be validated by realm A's YARN since A and B doesn't trust each other. However, the token can be used by distcp job running in realm A to access B's file (B is the distcp source). For the scenario described in the jira, I think we are aligned that it would be better to add an additional parameter at the time of job submission, so client to can tell YARN explicitly that YARN should not try to renew the token. What I wanted to clarify with my earlier question was, if we support this scenario by having YARN not to validate the token, do we open any security hole? Anyone could submit a job and ask YARN not to renew the token, right? Thanks.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          So my question is, should YARN support running a job without validating a token? (Though MR1 "support" this because the token renewal is asynchronous as Harsh pointed out).

          As I proposed in the beginning of this JIRA, if we want to do this, it has to be explicit in the app-submission. There are several corner cases that are leaking though. Even if RM successfully validates a token at app-submission, the remote service may be down at the time of renewal. So I think there are two APIs

          1. Should RM validate the token by renewing at the time of app-submission?
          2. Should RM fail the app if renewal fails any time during the app-execution?

          Ostensibly, (1) and (2) are the same because for now check-token == renew-token successfully. But user can always ask us to not renew tokens explicitly at app-submission for whatever reason.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - So my question is, should YARN support running a job without validating a token? (Though MR1 "support" this because the token renewal is asynchronous as Harsh pointed out). As I proposed in the beginning of this JIRA, if we want to do this, it has to be explicit in the app-submission. There are several corner cases that are leaking though. Even if RM successfully validates a token at app-submission, the remote service may be down at the time of renewal. So I think there are two APIs Should RM validate the token by renewing at the time of app-submission? Should RM fail the app if renewal fails any time during the app-execution? Ostensibly, (1) and (2) are the same because for now check-token == renew-token successfully. But user can always ask us to not renew tokens explicitly at app-submission for whatever reason.
          Hide
          adhoot Anubhav Dhoot added a comment -

          Looking at the patch itself we seem to suppress an error that would earlier be visible to user. Thats going to make it harder to detect genuine failures. MR1 seems to be worse than YARN in this aspect and we don't need to make it match that behavior. If we really need to skip validation, as you said, adding a feature into YARN where application could opt in would be better.

          Having said that, if RM cannot validate the token as valid why would the job itself work? Would not the containers themselves face the same issue using the tokens?

          Show
          adhoot Anubhav Dhoot added a comment - Looking at the patch itself we seem to suppress an error that would earlier be visible to user. Thats going to make it harder to detect genuine failures. MR1 seems to be worse than YARN in this aspect and we don't need to make it match that behavior. If we really need to skip validation, as you said, adding a feature into YARN where application could opt in would be better. Having said that, if RM cannot validate the token as valid why would the job itself work? Would not the containers themselves face the same issue using the tokens?
          Hide
          yzhangal Yongjun Zhang added a comment -

          If we can have a way to tell YARN not to renew a token, and the client either wants to just to use the lifetime of the token, or the client can renew itself before the job finish, that will solve the problem. One possible solution is to add a new parameter when submitting a job, or ro add a field in the job context.

          However, assume we have the API to let YARN not to renew the token, YARN might still want to check whether a token submitted from client is valid. Currently the validity checking is done in the renewal code described in
          https://issues.apache.org/jira/browse/YARN-3021?focusedCommentId=14271642&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14271642
          and Harsh's patch tries to fix. In other words, the renewal code serves as both validating and renewing the token. There seems to be no dedicated API to check validity.

          In the scenario Harsh described, even if we have dedicated API to check token validity, the check would fail too because realm A and B don't have trust, even though the token is valid from realm B's point of view.

          So my question is, should YARN support running a job without validating a token? (Though MR1 "support" this because the token renewal is asynchronous as Harsh pointed out).

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - If we can have a way to tell YARN not to renew a token, and the client either wants to just to use the lifetime of the token, or the client can renew itself before the job finish, that will solve the problem. One possible solution is to add a new parameter when submitting a job, or ro add a field in the job context. However, assume we have the API to let YARN not to renew the token, YARN might still want to check whether a token submitted from client is valid. Currently the validity checking is done in the renewal code described in https://issues.apache.org/jira/browse/YARN-3021?focusedCommentId=14271642&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14271642 and Harsh's patch tries to fix. In other words, the renewal code serves as both validating and renewing the token. There seems to be no dedicated API to check validity. In the scenario Harsh described, even if we have dedicated API to check token validity, the check would fail too because realm A and B don't have trust, even though the token is valid from realm B's point of view. So my question is, should YARN support running a job without validating a token? (Though MR1 "support" this because the token renewal is asynchronous as Harsh pointed out). Thanks.
          Hide
          yzhangal Yongjun Zhang added a comment -

          HI Harsh J,

          I looked at your patch, I think if we can be sure renewing an invalid token will throw InvalidToken instead of IOException, then the fix would be good. However, it seems to me that there is chance to get IOException. Comments?

          If we can have a way to tell YARN not to renew a token, and the client either wants to just to use the lifetime of the token, or the client can renew itself before the job finish, that will solve the problem. One possible solution is to add a new parameter when submitting a job, or ro add a field in the job context.

          Should we create a YARN jira to request this support? Seems to be a useful infrastructure to have...

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - HI Harsh J , I looked at your patch, I think if we can be sure renewing an invalid token will throw InvalidToken instead of IOException, then the fix would be good. However, it seems to me that there is chance to get IOException. Comments? If we can have a way to tell YARN not to renew a token, and the client either wants to just to use the lifetime of the token, or the client can renew itself before the job finish, that will solve the problem. One possible solution is to add a new parameter when submitting a job, or ro add a field in the job context. Should we create a YARN jira to request this support? Seems to be a useful infrastructure to have... Thanks.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12691342/YARN-3021.patch
          against trunk revision ae91b13.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6293//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6293//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12691342/YARN-3021.patch against trunk revision ae91b13. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6293//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6293//console This message is automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks Harsh J for reporting the issue and providing patch.

          A bit more info:

          In DelegationTokenRenewer.java

             if (!tokenList.isEmpty()) {
                // Renewing token and adding it to timer calls are separated purposefully
                // If user provides incorrect token then it should not be added for
                // renewal.
                for (DelegationTokenToRenew dtr : tokenList) {
                  renewToken(dtr);
                }
                for (DelegationTokenToRenew dtr : tokenList) {
                  addTokenToList(dtr);
                  setTimerForTokenRenewal(dtr);
                  if (LOG.isDebugEnabled()) {
                    LOG.debug("Registering token for renewal for:" + " service = "
                        + dtr.token.getService() + " for appId = " + dtr.applicationId);
                  }
                }
              }
          

          It seems that YARN tries to fix an issue of MR1: instead of not checking the token when scheduling the job (MR1), YARN tries to do it based on the comment in the above code.

          I posted a question to MAPREDUCE-2764 to see if there is way to control YARN, so YARN doesn't renew the token and let the client to do the renewal. Not sure whether it's feasible yet.

          See

          https://issues.apache.org/jira/browse/MAPREDUCE-2764?focusedCommentId=14270283&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14270283

          and the one after.

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks Harsh J for reporting the issue and providing patch. A bit more info: In DelegationTokenRenewer.java if (!tokenList.isEmpty()) { // Renewing token and adding it to timer calls are separated purposefully // If user provides incorrect token then it should not be added for // renewal. for (DelegationTokenToRenew dtr : tokenList) { renewToken(dtr); } for (DelegationTokenToRenew dtr : tokenList) { addTokenToList(dtr); setTimerForTokenRenewal(dtr); if (LOG.isDebugEnabled()) { LOG.debug( "Registering token for renewal for :" + " service = " + dtr.token.getService() + " for appId = " + dtr.applicationId); } } } It seems that YARN tries to fix an issue of MR1: instead of not checking the token when scheduling the job (MR1), YARN tries to do it based on the comment in the above code. I posted a question to MAPREDUCE-2764 to see if there is way to control YARN, so YARN doesn't renew the token and let the client to do the renewal. Not sure whether it's feasible yet. See https://issues.apache.org/jira/browse/MAPREDUCE-2764?focusedCommentId=14270283&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14270283 and the one after. Thanks.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          I'd rather make it an explicit signal from the user - whether to or not to fail the app immediately - (mysterious) app failure at the end of the day is an equally bad behavior IMO.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - I'd rather make it an explicit signal from the user - whether to or not to fail the app immediately - (mysterious) app failure at the end of the day is an equally bad behavior IMO.
          Hide
          qwertymaniac Harsh J added a comment -

          A patch that illustrates the change.

          Show
          qwertymaniac Harsh J added a comment - A patch that illustrates the change.

            People

            • Assignee:
              yzhangal Yongjun Zhang
              Reporter:
              qwertymaniac Harsh J
            • Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development