Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4041

Slow delegation token renewal can severely prolong RM recovery

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When the RM does a work-preserving restart it synchronously tries to renew delegation tokens for every active application. If a token server happens to be down or is running slow and a lot of the active apps were using tokens from that server then it can have a huge impact on the time it takes the RM to process the restart.

      1. 0001-YARN-4041.patch
        7 kB
        Sunil G
      2. 0002-YARN-4041.patch
        7 kB
        Sunil G
      3. 0003-YARN-4041.patch
        8 kB
        Sunil G
      4. 0004-YARN-4041.patch
        8 kB
        Sunil G
      5. 0005-YARN-4041.patch
        9 kB
        Sunil G

        Issue Links

          Activity

          Hide
          sunilg Sunil G added a comment -

          Thank you Jason Lowe for the review and commit. Thank you Jian He for the review.!

          Show
          sunilg Sunil G added a comment - Thank you Jason Lowe for the review and commit. Thank you Jian He for the review.!
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2467 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2467/)
          YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2467 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2467/ ) YARN-4041 . Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #531 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/531/)
          YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #531 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/531/ ) YARN-4041 . Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2521 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2521/)
          YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2521 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2521/ ) YARN-4041 . Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #1312 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1312/)
          YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1312 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1312/ ) YARN-4041 . Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #589 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/589/)
          YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #589 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/589/ ) YARN-4041 . Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #576 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/576/)
          YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #576 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/576/ ) YARN-4041 . Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8697 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8697/)
          YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8697 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8697/ ) YARN-4041 . Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          jlowe Jason Lowe added a comment -

          Thanks to Sunil for the contribution and to Jian for additional review! I committed this to trunk, branch-2, and branch-2.7.

          Show
          jlowe Jason Lowe added a comment - Thanks to Sunil for the contribution and to Jian for additional review! I committed this to trunk, branch-2, and branch-2.7.
          Hide
          jlowe Jason Lowe added a comment -

          +1 for the latest patch, will commit this later today if there are no objections.

          Show
          jlowe Jason Lowe added a comment - +1 for the latest patch, will commit this later today if there are no objections.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 21m 9s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 9m 56s There were no new javac warning messages.
          +1 javadoc 12m 11s There were no new javadoc warning messages.
          +1 release audit 0m 26s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 1m 3s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 42s mvn install still works.
          +1 eclipse:eclipse 0m 38s The patch built with eclipse:eclipse.
          +1 findbugs 1m 30s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 yarn tests 66m 37s Tests passed in hadoop-yarn-server-resourcemanager.
              115m 17s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12768222/0005-YARN-4041.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 124a412
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/9540/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9540/testReport/
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/9540/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 21m 9s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 9m 56s There were no new javac warning messages. +1 javadoc 12m 11s There were no new javadoc warning messages. +1 release audit 0m 26s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 1m 3s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 42s mvn install still works. +1 eclipse:eclipse 0m 38s The patch built with eclipse:eclipse. +1 findbugs 1m 30s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 66m 37s Tests passed in hadoop-yarn-server-resourcemanager.     115m 17s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12768222/0005-YARN-4041.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 124a412 hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/9540/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9540/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/9540/console This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          Yes Jason Lowe, we can compare with token itself and do wait in smaller units. With new patch, I kept a total wait time of 1sec but with 10ms units. Locally test run seems more faster. Uploading a new patch.

          Show
          sunilg Sunil G added a comment - Yes Jason Lowe , we can compare with token itself and do wait in smaller units. With new patch, I kept a total wait time of 1sec but with 10ms units. Locally test run seems more faster. Uploading a new patch.
          Hide
          jlowe Jason Lowe added a comment -

          The problem with checking the renewer event queue directly is that the queue can be empty but processing has not yet completed. Threads can still be executing the last events, having just pulled them from the queue to leave it empty. Therefore the test is still racy. A simpler approach would be to just keep checking if the tokens are equal. If they aren't then sleep for a bit then try again, up to some limit of time to keep checking.

          By the way, we should not sleep an entire second between checks. All those seconds of waiting add up across all of our tests doing it, making it take significantly longer to run them overall. We should be sleeping for only 10ms or so. That's still a large amount of time for modern processors to get work done while we're waiting, and we still won't be spinning non-stop on the CPU.

          Show
          jlowe Jason Lowe added a comment - The problem with checking the renewer event queue directly is that the queue can be empty but processing has not yet completed. Threads can still be executing the last events, having just pulled them from the queue to leave it empty. Therefore the test is still racy. A simpler approach would be to just keep checking if the tokens are equal. If they aren't then sleep for a bit then try again, up to some limit of time to keep checking. By the way, we should not sleep an entire second between checks. All those seconds of waiting add up across all of our tests doing it, making it take significantly longer to run them overall. We should be sleeping for only 10ms or so. That's still a large amount of time for modern processors to get work done while we're waiting, and we still won't be spinning non-stop on the CPU.
          Hide
          sunilg Sunil G added a comment -

          Test case failures are not related. Its passing locally.

          Show
          sunilg Sunil G added a comment - Test case failures are not related. Its passing locally.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 19m 20s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 9m 1s There were no new javac warning messages.
          +1 javadoc 11m 39s There were no new javadoc warning messages.
          +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 59s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 41s mvn install still works.
          +1 eclipse:eclipse 0m 38s The patch built with eclipse:eclipse.
          +1 findbugs 1m 39s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          -1 yarn tests 58m 36s Tests failed in hadoop-yarn-server-resourcemanager.
              104m 5s  



          Reason Tests
          Failed unit tests hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
            hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
            hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
            hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesForCSWithPartitions



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12767832/0004-YARN-4041.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / e27c2ae
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/9511/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9511/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/9511/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 19m 20s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 9m 1s There were no new javac warning messages. +1 javadoc 11m 39s There were no new javadoc warning messages. +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 59s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 41s mvn install still works. +1 eclipse:eclipse 0m 38s The patch built with eclipse:eclipse. +1 findbugs 1m 39s The patch does not introduce any new Findbugs (version 3.0.0) warnings. -1 yarn tests 58m 36s Tests failed in hadoop-yarn-server-resourcemanager.     104m 5s   Reason Tests Failed unit tests hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler   hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification   hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched   hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesForCSWithPartitions Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12767832/0004-YARN-4041.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / e27c2ae hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/9511/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9511/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/9511/console This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          Hi Jason Lowe and Jian He
          Pls find an updated patch. I made a correction in test case to wait for renewerService thread pool executor to process the renew event raised. Kindly share your thoughts.

          Show
          sunilg Sunil G added a comment - Hi Jason Lowe and Jian He Pls find an updated patch. I made a correction in test case to wait for renewerService thread pool executor to process the renew event raised. Kindly share your thoughts.
          Hide
          sunilg Sunil G added a comment -

          Thank you Jason Lowe for the comments.
          Yes, I was also planning to move this logic inside the waitForTokensToBeRenewed method, but the test failure occurred only for one test case, hence placed outside. I think its better we place that logic inside waitForTokensToBeRenewed itself as suggested. I also was not much liking the solution of sleep, however a better checkpoint was not raised explicitly from DelegationTokenRenewer. I also thought of checking the event queue size there. Now I feel we can verify that whether any Token renewal event is raised or not. It can be a good checkpoint. I will attach a patch for this with other comment fix.

          Show
          sunilg Sunil G added a comment - Thank you Jason Lowe for the comments. Yes, I was also planning to move this logic inside the waitForTokensToBeRenewed method, but the test failure occurred only for one test case, hence placed outside. I think its better we place that logic inside waitForTokensToBeRenewed itself as suggested. I also was not much liking the solution of sleep, however a better checkpoint was not raised explicitly from DelegationTokenRenewer. I also thought of checking the event queue size there. Now I feel we can verify that whether any Token renewal event is raised or not. It can be a good checkpoint. I will attach a patch for this with other comment fix.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for updating the patch, Sunil!

          When fixing the test, why wasn't the fix in waitForTokensToBeRenewed? Also I'm not thrilled with the idea of sleeping for 1 second per application and hoping it's enough time. And we're getting out early when there is at least one token in the token set, but there's a race where we may have taken a snapshot before all the tokens are there. Can't we key off the app start events coming out of the token renewal process to know when we're done? Would be nice if there were a more reliable way so we can avoid arbitrary sleeps (which tend to slow down unit tests overall) and racy tests.

          Also noticed on subsequent look that AbsrtactDelegationTokenRenewerAppEvent s/b AbstractDelegationTokenRenewerAppEvent.

          Show
          jlowe Jason Lowe added a comment - Thanks for updating the patch, Sunil! When fixing the test, why wasn't the fix in waitForTokensToBeRenewed? Also I'm not thrilled with the idea of sleeping for 1 second per application and hoping it's enough time. And we're getting out early when there is at least one token in the token set, but there's a race where we may have taken a snapshot before all the tokens are there. Can't we key off the app start events coming out of the token renewal process to know when we're done? Would be nice if there were a more reliable way so we can avoid arbitrary sleeps (which tend to slow down unit tests overall) and racy tests. Also noticed on subsequent look that AbsrtactDelegationTokenRenewerAppEvent s/b AbstractDelegationTokenRenewerAppEvent.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 17m 26s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 58s There were no new javac warning messages.
          +1 javadoc 10m 32s There were no new javadoc warning messages.
          +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 51s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 30s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 1m 28s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 yarn tests 57m 50s Tests passed in hadoop-yarn-server-resourcemanager.
              98m 36s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12767585/0003-YARN-4041.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 9cb5d35
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/9490/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9490/testReport/
          Java 1.7.0_55
          uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/9490/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 17m 26s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 58s There were no new javac warning messages. +1 javadoc 10m 32s There were no new javadoc warning messages. +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 51s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 30s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 1m 28s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 57m 50s Tests passed in hadoop-yarn-server-resourcemanager.     98m 36s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12767585/0003-YARN-4041.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 9cb5d35 hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/9490/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9490/testReport/ Java 1.7.0_55 uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/9490/console This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          Updating patch after test case fix.

          Show
          sunilg Sunil G added a comment - Updating patch after test case fix.
          Hide
          sunilg Sunil G added a comment -

          Test case failures looks related, I will debug and will check.

          Show
          sunilg Sunil G added a comment - Test case failures looks related, I will debug and will check.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 18m 28s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 8m 32s There were no new javac warning messages.
          +1 javadoc 10m 39s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 52s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 30s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 1m 30s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          -1 yarn tests 58m 8s Tests failed in hadoop-yarn-server-resourcemanager.
              100m 40s  



          Reason Tests
          Failed unit tests hadoop.yarn.server.resourcemanager.TestRMRestart



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12767527/0002-YARN-4041.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 7e2837f
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/9488/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9488/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/9488/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 18m 28s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 8m 32s There were no new javac warning messages. +1 javadoc 10m 39s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 52s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 30s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 1m 30s The patch does not introduce any new Findbugs (version 3.0.0) warnings. -1 yarn tests 58m 8s Tests failed in hadoop-yarn-server-resourcemanager.     100m 40s   Reason Tests Failed unit tests hadoop.yarn.server.resourcemanager.TestRMRestart Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12767527/0002-YARN-4041.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 7e2837f hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/9488/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9488/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/9488/console This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          Thank you Jian He and Jason Lowe.
          As per latest jenkins, patch needs rebase. Attaching a rebased version. Tests are passing locally.

          Show
          sunilg Sunil G added a comment - Thank you Jian He and Jason Lowe . As per latest jenkins, patch needs rebase. Attaching a rebased version. Tests are passing locally.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 17m 25s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 8m 2s There were no new javac warning messages.
          +1 javadoc 11m 7s There were no new javadoc warning messages.
          +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 52s The applied patch generated 1 new checkstyle issues (total was 150, now 149).
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 37s mvn install still works.
          +1 eclipse:eclipse 0m 37s The patch built with eclipse:eclipse.
          +1 findbugs 1m 37s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          -1 yarn tests 57m 51s Tests failed in hadoop-yarn-server-resourcemanager.
              99m 38s  



          Reason Tests
          Failed unit tests hadoop.yarn.server.resourcemanager.TestRMRestart
            hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
          Timed out tests org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
            org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12751320/0001-YARN-4041.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 6144e01
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/9485/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/9485/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9485/testReport/
          Java 1.7.0_55
          uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/9485/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 17m 25s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 8m 2s There were no new javac warning messages. +1 javadoc 11m 7s There were no new javadoc warning messages. +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 52s The applied patch generated 1 new checkstyle issues (total was 150, now 149). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 37s mvn install still works. +1 eclipse:eclipse 0m 37s The patch built with eclipse:eclipse. +1 findbugs 1m 37s The patch does not introduce any new Findbugs (version 3.0.0) warnings. -1 yarn tests 57m 51s Tests failed in hadoop-yarn-server-resourcemanager.     99m 38s   Reason Tests Failed unit tests hadoop.yarn.server.resourcemanager.TestRMRestart   hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler Timed out tests org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA   org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12751320/0001-YARN-4041.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 6144e01 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/9485/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/9485/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9485/testReport/ Java 1.7.0_55 uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/9485/console This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Sorry for the delay. Looks good to me as well, kicking Jenkins to comment on the patch.

          Show
          jlowe Jason Lowe added a comment - Sorry for the delay. Looks good to me as well, kicking Jenkins to comment on the patch.
          Hide
          jianhe Jian He added a comment -

          looks good to me overall, hold on committing in case any comments from others.

          Show
          jianhe Jian He added a comment - looks good to me overall, hold on committing in case any comments from others.
          Hide
          sunilg Sunil G added a comment -

          Hi Bob,
          I have shared a patch which uses delegation token renewal during recovery in an asynchronous way. I will rebase the same against trunk now. Meantime Jason Lowe, Rohith Sharma K S and Karthik Kambatla could you please take a look on this patch.

          Show
          sunilg Sunil G added a comment - Hi Bob, I have shared a patch which uses delegation token renewal during recovery in an asynchronous way. I will rebase the same against trunk now. Meantime Jason Lowe , Rohith Sharma K S and Karthik Kambatla could you please take a look on this patch.
          Hide
          Jobo Bob.zhao added a comment -

          Hi, Sunil G , Any update or idea on this issue?

          Show
          Jobo Bob.zhao added a comment - Hi, Sunil G , Any update or idea on this issue?
          Hide
          sunilg Sunil G added a comment -

          Uploading an initial version of work in progress patch where token renewal is made as asynchronous. Used DelegationTokenRenewerRunnable to achieve the same.

          Show
          sunilg Sunil G added a comment - Uploading an initial version of work in progress patch where token renewal is made as asynchronous. Used DelegationTokenRenewerRunnable to achieve the same.
          Hide
          jlowe Jason Lowe added a comment -

          IIRR, synchronous recovery was to fail-fast if recovery doesn't work. With the proposed change, what happens when the recovery fails?

          Arguably the same thing that happens when the RM goes to renew tokens on a live application and fails without a restart. IIRC this is not fatal to either the RM nor the application when this occurs today. In general I think we should make restarting as orthogonal as possible to token renewals, and ideally RM restart should not cause an out-of-band token renewal storm.

          Show
          jlowe Jason Lowe added a comment - IIRR, synchronous recovery was to fail-fast if recovery doesn't work. With the proposed change, what happens when the recovery fails? Arguably the same thing that happens when the RM goes to renew tokens on a live application and fails without a restart. IIRC this is not fatal to either the RM nor the application when this occurs today. In general I think we should make restarting as orthogonal as possible to token renewals, and ideally RM restart should not cause an out-of-band token renewal storm.
          Hide
          kasha Karthik Kambatla added a comment -

          IIRR, synchronous recovery was to fail-fast if recovery doesn't work. With the proposed change, what happens when the recovery fails?

          Show
          kasha Karthik Kambatla added a comment - IIRR, synchronous recovery was to fail-fast if recovery doesn't work. With the proposed change, what happens when the recovery fails?
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          One correction in my previous comment, it NOT 8 minutes, its 8-10 seconds. So 8 seconds * 60 apps = 480 seconds i.e 8 minutes

          Show
          rohithsharma Rohith Sharma K S added a comment - One correction in my previous comment, it NOT 8 minutes , its 8-10 seconds . So 8 seconds * 60 apps = 480 seconds i.e 8 minutes
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Recently in test cluster faced the similar issue i.e around 60 apps were running. On RM switch, each applications took around 8 minutes to renew delegation token which is 8 min* 60 apps = 480minutes for recovery. YARN-3639 is the issue raised for the same.

          Show
          rohithsharma Rohith Sharma K S added a comment - Recently in test cluster faced the similar issue i.e around 60 apps were running. On RM switch, each applications took around 8 minutes to renew delegation token which is 8 min* 60 apps = 480minutes for recovery. YARN-3639 is the issue raised for the same.
          Hide
          sunilg Sunil G added a comment -

          I wud like to take this. Jason Lowe cud I take over this.

          Show
          sunilg Sunil G added a comment - I wud like to take this. Jason Lowe cud I take over this.
          Hide
          jianhe Jian He added a comment -

          YARN-2010 was done to actually ignore the failure on token renewal for recovery. Now I agree that we do not even need to do the renew

          Show
          jianhe Jian He added a comment - YARN-2010 was done to actually ignore the failure on token renewal for recovery. Now I agree that we do not even need to do the renew
          Hide
          jianhe Jian He added a comment -

          I don't think the RM should blindly renew all tokens for apps that are already active and running on the cluster when it restarts.

          I agree with this. We do not need to renew tokens for apps on recovery.

          Show
          jianhe Jian He added a comment - I don't think the RM should blindly renew all tokens for apps that are already active and running on the cluster when it restarts. I agree with this. We do not need to renew tokens for apps on recovery.
          Hide
          jlowe Jason Lowe added a comment -

          Maybe. The synchronous recovery was added as part of YARN-2010, and I don't recall from that JIRA why it was crucial for the token renewal process to be performed synchronously during recovery. Jian He or Karthik Kambatla do you see any issues with making the delegation token renewal asynchronous for active applications during RM recovery?

          Show
          jlowe Jason Lowe added a comment - Maybe. The synchronous recovery was added as part of YARN-2010 , and I don't recall from that JIRA why it was crucial for the token renewal process to be performed synchronously during recovery. Jian He or Karthik Kambatla do you see any issues with making the delegation token renewal asynchronous for active applications during RM recovery?
          Hide
          sunilg Sunil G added a comment -

          Could we use an async way here and use DelegationTokenRenewerRunnable to renew tokens if needed. A new state can be added in this class as below

            enum DelegationTokenRenewerEventType {
              VERIFY_AND_START_APPLICATION,
          +    RECOVER_APPLICATION,
              FINISH_APPLICATION
            }
          

          And we can handle this recover event to decide to renew token from DelegationTokenRenewer. Will it be fine?

          Show
          sunilg Sunil G added a comment - Could we use an async way here and use DelegationTokenRenewerRunnable to renew tokens if needed. A new state can be added in this class as below enum DelegationTokenRenewerEventType { VERIFY_AND_START_APPLICATION, + RECOVER_APPLICATION, FINISH_APPLICATION } And we can handle this recover event to decide to renew token from DelegationTokenRenewer . Will it be fine?
          Hide
          jlowe Jason Lowe added a comment -

          The active apps already have the tokens and are running on the cluster, so I'm not sure why it's so pressing that we synchronously process token renewal upon recovery. This should be made asynchronous, or even better, we shouldn't do any renewals just because we restarted. Ideally the RM should be tracking when tokens need to be renewed and renew them at that point. If we restart and some tokens are due for a renewal then we should go ahead and renew those, but I don't think the RM should blindly renew all tokens for apps that are already active and running on the cluster when it restarts.

          Show
          jlowe Jason Lowe added a comment - The active apps already have the tokens and are running on the cluster, so I'm not sure why it's so pressing that we synchronously process token renewal upon recovery. This should be made asynchronous, or even better, we shouldn't do any renewals just because we restarted. Ideally the RM should be tracking when tokens need to be renewed and renew them at that point. If we restart and some tokens are due for a renewal then we should go ahead and renew those, but I don't think the RM should blindly renew all tokens for apps that are already active and running on the cluster when it restarts.

            People

            • Assignee:
              sunilg Sunil G
              Reporter:
              jlowe Jason Lowe
            • Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development