Hadoop YARN
  1. Hadoop YARN
  2. YARN-128 RM Restart
  3. YARN-674

Slow or failing DelegationToken renewals on submission itself make RM unavailable

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.3.0
    • Component/s: resourcemanager
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This was caused by YARN-280. A slow or a down NameNode for will make it look like RM is unavailable as it may run out of RPC handlers due to blocked client submissions.

      1. YARN-674.1.patch
        31 kB
        Omkar Vinit Joshi
      2. YARN-674.10.patch
        49 kB
        Vinod Kumar Vavilapalli
      3. YARN-674.2.patch
        33 kB
        Omkar Vinit Joshi
      4. YARN-674.3.patch
        36 kB
        Omkar Vinit Joshi
      5. YARN-674.4.patch
        42 kB
        Omkar Vinit Joshi
      6. YARN-674.5.patch
        51 kB
        Omkar Vinit Joshi
      7. YARN-674.5.patch
        52 kB
        Omkar Vinit Joshi
      8. YARN-674.6.patch
        51 kB
        Omkar Vinit Joshi
      9. YARN-674.7.patch
        49 kB
        Omkar Vinit Joshi
      10. YARN-674.8.patch
        49 kB
        Omkar Vinit Joshi
      11. YARN-674.9.patch
        49 kB
        Omkar Vinit Joshi

        Activity

        Hide
        Bikas Saha added a comment -

        the only nit I have is that sending the recover/start event to the app is now obscured inside the delegation token renewer.

        Show
        Bikas Saha added a comment - the only nit I have is that sending the recover/start event to the app is now obscured inside the delegation token renewer.
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Hdfs-trunk #1587 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1587/)
        YARN-674. Fixed ResourceManager to renew DelegationTokens on submission asynchronously to work around potential slowness in state-store. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1543312)

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1587 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1587/ ) YARN-674 . Fixed ResourceManager to renew DelegationTokens on submission asynchronously to work around potential slowness in state-store. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1543312 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #1613 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1613/)
        YARN-674. Fixed ResourceManager to renew DelegationTokens on submission asynchronously to work around potential slowness in state-store. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1543312)

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1613 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1613/ ) YARN-674 . Fixed ResourceManager to renew DelegationTokens on submission asynchronously to work around potential slowness in state-store. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1543312 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Yarn-trunk #396 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/396/)
        YARN-674. Fixed ResourceManager to renew DelegationTokens on submission asynchronously to work around potential slowness in state-store. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1543312)

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #396 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/396/ ) YARN-674 . Fixed ResourceManager to renew DelegationTokens on submission asynchronously to work around potential slowness in state-store. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1543312 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #4757 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4757/)
        YARN-674. Fixed ResourceManager to renew DelegationTokens on submission asynchronously to work around potential slowness in state-store. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1543312)

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #4757 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4757/ ) YARN-674 . Fixed ResourceManager to renew DelegationTokens on submission asynchronously to work around potential slowness in state-store. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1543312 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Committed this to trunk and branch-2. Thanks Omkar!

        Can you file a ticket for TestRMRestart race condition?

        Show
        Vinod Kumar Vavilapalli added a comment - Committed this to trunk and branch-2. Thanks Omkar! Can you file a ticket for TestRMRestart race condition?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12614545/YARN-674.10.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2482//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2482//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614545/YARN-674.10.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2482//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2482//console This message is automatically generated.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        +1 for the latest patch, save for the findbugs issue.

        Trying to fix it myself.

        Show
        Vinod Kumar Vavilapalli added a comment - +1 for the latest patch, save for the findbugs issue. Trying to fix it myself.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        The findBugs warning is valid. Trying to fix.

        Show
        Vinod Kumar Vavilapalli added a comment - The findBugs warning is valid. Trying to fix.
        Hide
        Omkar Vinit Joshi added a comment -

        I think we should just ignore the find bug warning.. it is never going to occur...plus TestRMRestart is passing locally... there must be some race condition here not related to this patch.

        Show
        Omkar Vinit Joshi added a comment - I think we should just ignore the find bug warning.. it is never going to occur...plus TestRMRestart is passing locally... there must be some race condition here not related to this patch.
        Hide
        Omkar Vinit Joshi added a comment -

        Bikas Saha I completely missed your comment. What you are saying will not occur.

            pool.allowCoreThreadTimeOut(true);
        

        this should time out core threads if there are any lying around.

        Show
        Omkar Vinit Joshi added a comment - Bikas Saha I completely missed your comment. What you are saying will not occur. pool.allowCoreThreadTimeOut( true ); this should time out core threads if there are any lying around.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12614512/YARN-674.9.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2479//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2479//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2479//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614512/YARN-674.9.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2479//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2479//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2479//console This message is automatically generated.
        Hide
        Omkar Vinit Joshi added a comment -

        fixing find bug warning...

        Show
        Omkar Vinit Joshi added a comment - fixing find bug warning...
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12614481/YARN-674.8.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2476//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2476//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2476//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614481/YARN-674.8.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2476//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2476//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2476//console This message is automatically generated.
        Hide
        Omkar Vinit Joshi added a comment -

        Thanks Vinod Kumar Vavilapalli for pointing it out..didn't understand earlier. Adding synchronized block to service state change.

        Show
        Omkar Vinit Joshi added a comment - Thanks Vinod Kumar Vavilapalli for pointing it out..didn't understand earlier. Adding synchronized block to service state change.
        Hide
        Omkar Vinit Joshi added a comment -

        resubmitting the same patch.. no idea why it is failing.. It is passing locally.

        Show
        Omkar Vinit Joshi added a comment - resubmitting the same patch.. no idea why it is failing.. It is passing locally.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12613748/YARN-674.7.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2451//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2451//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12613748/YARN-674.7.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2451//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2451//console This message is automatically generated.
        Hide
        Omkar Vinit Joshi added a comment -

        Thanks Vinod Kumar Vavilapalli

        RMAppManager.submitApplication: Put a comment where you move apps to finish state saying we are doing this before token-renewal so that we don't renew tokens for finished apps.

        Added a comment.

        isServiceStarted needs to be volatile?

        No.. it is updated only once just when service starts.

        handleDTRenewerEvent -> handleDTRenewerAppSubmitEvent

        done..

        Add a comment in handleDTRenewerEvent to indicate why DTRenewer is starting the app as opposed to RMAppManager.

        added one..

        Instead of putting renewerCount in the main code path, you can access the thread count from ThreadPoolExecutor.getPoolSize() in the tests directly ?

        moved this to test code.

        DelegationTokenRenewerAppSubmitEvent can be nested class inside DelegationTokenRenewer? This is not an event from outside the renewer. Similarly DelegationTokenRenewerEventType. Either nest them in, or create a separate package.

        moved the events and eventType inside DTTokenRenewer.

        testInvalidDelegationTokenApplicationSubmit, testInvalidDTWithAddApplication: Seem similar but test different things. May be rename one or both?

        renamed both..

        The other point is the default number of threads in the renewer. 5 is too small, may be bump it up to existing number of RPC threads - 50 or something in that range?

        using thread pool with core pool size = 5 and max pool size = 50 (configurable).

        Show
        Omkar Vinit Joshi added a comment - Thanks Vinod Kumar Vavilapalli RMAppManager.submitApplication: Put a comment where you move apps to finish state saying we are doing this before token-renewal so that we don't renew tokens for finished apps. Added a comment. isServiceStarted needs to be volatile? No.. it is updated only once just when service starts. handleDTRenewerEvent -> handleDTRenewerAppSubmitEvent done.. Add a comment in handleDTRenewerEvent to indicate why DTRenewer is starting the app as opposed to RMAppManager. added one.. Instead of putting renewerCount in the main code path, you can access the thread count from ThreadPoolExecutor.getPoolSize() in the tests directly ? moved this to test code. DelegationTokenRenewerAppSubmitEvent can be nested class inside DelegationTokenRenewer? This is not an event from outside the renewer. Similarly DelegationTokenRenewerEventType. Either nest them in, or create a separate package. moved the events and eventType inside DTTokenRenewer. testInvalidDelegationTokenApplicationSubmit, testInvalidDTWithAddApplication: Seem similar but test different things. May be rename one or both? renamed both.. The other point is the default number of threads in the renewer. 5 is too small, may be bump it up to existing number of RPC threads - 50 or something in that range? using thread pool with core pool size = 5 and max pool size = 50 (configurable).
        Hide
        Bikas Saha added a comment -

        Increasing the threads without ascertaining if work is blocked due to lack of threads may not end up being helpful but will consume resources in the RM JVM (1 MB per thread for 64bit JVMs)

        Show
        Bikas Saha added a comment - Increasing the threads without ascertaining if work is blocked due to lack of threads may not end up being helpful but will consume resources in the RM JVM (1 MB per thread for 64bit JVMs)
        Hide
        Vinod Kumar Vavilapalli added a comment -

        The other point is the default number of threads in the renewer. 5 is too small, may be bump it up to existing number of RPC threads - 50 or something in that range?

        Show
        Vinod Kumar Vavilapalli added a comment - The other point is the default number of threads in the renewer. 5 is too small, may be bump it up to existing number of RPC threads - 50 or something in that range?
        Hide
        Vinod Kumar Vavilapalli added a comment -

        More comments, hopefully the final round.

        • RMAppManager.submitApplication: Put a comment where you move apps to finish state saying we are doing this before token-renewal so that we don't renew tokens for finished apps.
        • DelegationTokenRenewer
          • isServiceStarted needs to be volatile?
          • handleDTRenewerEvent -> handleDTRenewerAppSubmitEvent
          • Add a comment in handleDTRenewerEvent to indicate why DTRenewer is starting the app as opposed to RMAppManager.
          • applicationFinished(DelegationTokenRenewerEvent evt) -> handleAppFinishEvent and similarly addApplication(DelegationTokenRenewerAppSubmitEvent evt) -> handleAppsubmitEvent?
          • Instead of putting renewerCount in the main code path, you can access the thread count from ThreadPoolExecutor.getPoolSize() in the tests directly ?
        • DelegationTokenRenewerAppSubmitEvent can be nested class inside DelegationTokenRenewer? This is not an event from outside the renewer. Similarly DelegationTokenRenewerEventType. Either nest them in, or create a separate package.
        • testInvalidDelegationTokenApplicationSubmit, testInvalidDTWithAddApplication: Seem similar but test different things. May be rename one or both?
        Show
        Vinod Kumar Vavilapalli added a comment - More comments, hopefully the final round. RMAppManager.submitApplication: Put a comment where you move apps to finish state saying we are doing this before token-renewal so that we don't renew tokens for finished apps. DelegationTokenRenewer isServiceStarted needs to be volatile? handleDTRenewerEvent -> handleDTRenewerAppSubmitEvent Add a comment in handleDTRenewerEvent to indicate why DTRenewer is starting the app as opposed to RMAppManager. applicationFinished(DelegationTokenRenewerEvent evt) -> handleAppFinishEvent and similarly addApplication(DelegationTokenRenewerAppSubmitEvent evt) -> handleAppsubmitEvent? Instead of putting renewerCount in the main code path, you can access the thread count from ThreadPoolExecutor.getPoolSize() in the tests directly ? DelegationTokenRenewerAppSubmitEvent can be nested class inside DelegationTokenRenewer? This is not an event from outside the renewer. Similarly DelegationTokenRenewerEventType. Either nest them in, or create a separate package. testInvalidDelegationTokenApplicationSubmit, testInvalidDTWithAddApplication: Seem similar but test different things. May be rename one or both?
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12612310/YARN-674.6.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2380//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2380//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612310/YARN-674.6.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2380//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2380//console This message is automatically generated.
        Hide
        Omkar Vinit Joshi added a comment -

        Thanks Jian He, Bikas Saha .

        Saw this is changed back to asynchronous submission on recovery, the original intention was to prevent client from seeing the application as a new application. If asynchronously, the client can query the application before recover event gets processed, meaning before the application is fully recovered as some recover logic happens when app is processing the recover event(app.FinalTransition).

        fixed to make sure that it gets updated synchronously.

        The assert doesnt make it to the production jar - so it wont catch anything on the cluster. Need to throw an exception here. If we dont want to crash the RM here then we can log and error. When the attempt state machine gets the event then it will crash on the async dispatcher thread if the event is not handled in the current state.

        discussed with bikas offline.. this is fine.

        Show
        Omkar Vinit Joshi added a comment - Thanks Jian He , Bikas Saha . Saw this is changed back to asynchronous submission on recovery, the original intention was to prevent client from seeing the application as a new application. If asynchronously, the client can query the application before recover event gets processed, meaning before the application is fully recovered as some recover logic happens when app is processing the recover event(app.FinalTransition). fixed to make sure that it gets updated synchronously. The assert doesnt make it to the production jar - so it wont catch anything on the cluster. Need to throw an exception here. If we dont want to crash the RM here then we can log and error. When the attempt state machine gets the event then it will crash on the async dispatcher thread if the event is not handled in the current state. discussed with bikas offline.. this is fine.
        Hide
        Jian He added a comment -

        Saw this is changed back to asynchronous submission on recovery, the original intention was to prevent client from seeing the application as a new application. If asynchronously, the client can query the application before recover event gets processed, meaning before the application is fully recovered as some recover logic happens when app is processing the recover event(app.FinalTransition).

             // Recover the app synchronously, as otherwise client is possible to see
              // the application not recovered before it is actually recovered because
              // ClientRMService is already started at this point of time.
              appImpl.handle(new RMAppEvent(appImpl.getApplicationId(),
                RMAppEventType.RECOVER));
        
        Show
        Jian He added a comment - Saw this is changed back to asynchronous submission on recovery, the original intention was to prevent client from seeing the application as a new application. If asynchronously, the client can query the application before recover event gets processed, meaning before the application is fully recovered as some recover logic happens when app is processing the recover event(app.FinalTransition). // Recover the app synchronously, as otherwise client is possible to see // the application not recovered before it is actually recovered because // ClientRMService is already started at this point of time. appImpl.handle( new RMAppEvent(appImpl.getApplicationId(), RMAppEventType.RECOVER));
        Hide
        Bikas Saha added a comment -

        The assert doesnt make it to the production jar - so it wont catch anything on the cluster. Need to throw an exception here. If we dont want to crash the RM here then we can log and error. When the attempt state machine gets the event then it will crash on the async dispatcher thread if the event is not handled in the current state.

        +        assert application.getState() == RMAppState.NEW;
        Show
        Bikas Saha added a comment - The assert doesnt make it to the production jar - so it wont catch anything on the cluster. Need to throw an exception here. If we dont want to crash the RM here then we can log and error. When the attempt state machine gets the event then it will crash on the async dispatcher thread if the event is not handled in the current state. + assert application.getState() == RMAppState.NEW;
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12612257/YARN-674.5.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2378//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2378//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612257/YARN-674.5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2378//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2378//console This message is automatically generated.
        Hide
        Omkar Vinit Joshi added a comment -

        Thanks Bikas Saha for the review

        We were intentionally going through the same submitApplication() method to make sure that all the initialization and setup code paths are consistently followed in both cases by keeping the code path identical as much as possible. The RM would submit a recovered application, in essence proxying a user submitting the application. Its a general pattern followed through the recovery logic - to be minimally invasive to the mainline code path so that we can avoid functional bugs as much as possible. Separating them into 2 methods has resulted in code duplication in both methods without any huge benefit that I can see. It also leave us susceptible to future code changes made in one code path and not the other.

        I agree with your suggestion... reverting the changes ..discussed with Vinod Kumar Vavilapalli offline.

        Why is isSecurityEnabled() being checked at this internal level. The code should not even reach this point if security is not enabled.

        you have a point ..fixing it..

        Also why is it calling rmContext.getDelegationTokenRenewer().addApplication(event) instead of DelegationTokenRenewer.this.addApplication(). Same for rmContext.getDelegationTokenRenewer().applicationFinished(evt);

        Makes sense...fixed it..

        Rename DelegationTokenRenewerThread to not have misleading Thread in the name ?

        fixed.

        Can DelegationTokenRenewerAppSubmitEvent event objects have an event type different from VERIFY_AND_START_APPLICATION? If not, we dont need this check and we can change the constructor of DelegationTokenRenewerAppSubmitEvent to not expect an event type argument. It should set the VERIFY_AND_START_APPLICATION within the constructor.

        fixed..

        @Private + @VisibleForTesting???

        fixed.

        Show
        Omkar Vinit Joshi added a comment - Thanks Bikas Saha for the review We were intentionally going through the same submitApplication() method to make sure that all the initialization and setup code paths are consistently followed in both cases by keeping the code path identical as much as possible. The RM would submit a recovered application, in essence proxying a user submitting the application. Its a general pattern followed through the recovery logic - to be minimally invasive to the mainline code path so that we can avoid functional bugs as much as possible. Separating them into 2 methods has resulted in code duplication in both methods without any huge benefit that I can see. It also leave us susceptible to future code changes made in one code path and not the other. I agree with your suggestion... reverting the changes ..discussed with Vinod Kumar Vavilapalli offline. Why is isSecurityEnabled() being checked at this internal level. The code should not even reach this point if security is not enabled. you have a point ..fixing it.. Also why is it calling rmContext.getDelegationTokenRenewer().addApplication(event) instead of DelegationTokenRenewer.this.addApplication(). Same for rmContext.getDelegationTokenRenewer().applicationFinished(evt); Makes sense...fixed it.. Rename DelegationTokenRenewerThread to not have misleading Thread in the name ? fixed. Can DelegationTokenRenewerAppSubmitEvent event objects have an event type different from VERIFY_AND_START_APPLICATION? If not, we dont need this check and we can change the constructor of DelegationTokenRenewerAppSubmitEvent to not expect an event type argument. It should set the VERIFY_AND_START_APPLICATION within the constructor. fixed.. @Private + @VisibleForTesting??? fixed.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        We were intentionally going through the same submitApplication() method to make sure that all the initialization and setup code paths are consistently followed in both cases by keeping the code path identical as much as possible.

        I didn't mean to fork the code, but it seems like the patch is doing exactly that. My original intention was to make submitApplicationOnRecovery() call submitApplication().

        Show
        Vinod Kumar Vavilapalli added a comment - We were intentionally going through the same submitApplication() method to make sure that all the initialization and setup code paths are consistently followed in both cases by keeping the code path identical as much as possible. I didn't mean to fork the code, but it seems like the patch is doing exactly that. My original intention was to make submitApplicationOnRecovery() call submitApplication().
        Hide
        Bikas Saha added a comment -

        We were intentionally going through the same submitApplication() method to make sure that all the initialization and setup code paths are consistently followed in both cases by keeping the code path identical as much as possible. The RM would submit a recovered application, in essence proxying a user submitting the application. Its a general pattern followed through the recovery logic - to be minimally invasive to the mainline code path so that we can avoid functional bugs as much as possible. Separating them into 2 methods has resulted in code duplication in both methods without any huge benefit that I can see. It also leave us susceptible to future code changes made in one code path and not the other.

        Why is isSecurityEnabled() being checked at this internal level. The code should not even reach this point if security is not enabled. It should already be taken care of in the public apis, right?
        Also why is it calling rmContext.getDelegationTokenRenewer().addApplication(event) instead of DelegationTokenRenewer.this.addApplication(). Same for rmContext.getDelegationTokenRenewer().applicationFinished(evt);

        @SuppressWarnings("unchecked")
        +    private void handleDTRenewerEvent(
        +        DelegationTokenRenewerAppSubmitEvent event) {
        +      try {
        +        // Setup tokens for renewal
        +        if (UserGroupInformation.isSecurityEnabled()) {
        +          rmContext.getDelegationTokenRenewer().addApplication(event);
        +          rmContext.getDispatcher().getEventHandler()
        +              .handle(new RMAppEvent(event.getAppicationId(),
        +              event.isApplicationRecovered() ? RMAppEventType.RECOVER
        +              : RMAppEventType.START));
        +        }
        +      } catch (Throwable t) {
        

        These assumptions may make the code brittle to future changes. Also Typo in comments. We should probably assert that the application state is NEW over here so that the broken assumption is caught at the source instead of at the destination app causing a state machine crash.................

        +            "Unable to add the application to the delegation token renewer.",
        +            t);
        +        // Sending APP_REJECTED is fine, since we assume that the
        +        // RMApp is in NEW state and thus we havne't yet informed the
        +        // Scheduler about the existence of the application
        +        rmContext.getDispatcher().getEventHandler().handle(
        +            new RMAppRejectedEvent(event.getAppicationId(), t.getMessage()));
        +      }
        

        typo

         public ApplicationId getAppicationId() {
        

        @Private + @VisibleForTesting???

        +  //Only for Testing
        +  public int getInProcessDelegationTokenRenewerEventsCount() {
        +    return this.renewerCount.get();
        +  }
        

        Can DelegationTokenRenewerAppSubmitEvent event objects have an event type different from VERIFY_AND_START_APPLICATION? If not, we dont need this check and we can change the constructor of DelegationTokenRenewerAppSubmitEvent to not expect an event type argument. It should set the VERIFY_AND_START_APPLICATION within the constructor.

        +      if (evt.getType().equals(
        +          DelegationTokenRenewerEventType.VERIFY_AND_START_APPLICATION)
        +          && evt instanceof DelegationTokenRenewerAppSubmitEvent) {
        

        Rename DelegationTokenRenewerThread to not have misleading Thread in the name ?

        Why is this warning not happening for other services? Whats special in the code for DelegationTokenRenewer?

        +  <!-- Ignore Synchronization issues as they are never going to occur for
        +       methods like serviceInit(), serviceStart() and handle() -->
        +  <Match>
        +    <Class name="org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer" />
        +    <Bug pattern="IS2_INCONSISTENT_SYNC" />
        +  </Match>
        
        Show
        Bikas Saha added a comment - We were intentionally going through the same submitApplication() method to make sure that all the initialization and setup code paths are consistently followed in both cases by keeping the code path identical as much as possible. The RM would submit a recovered application, in essence proxying a user submitting the application. Its a general pattern followed through the recovery logic - to be minimally invasive to the mainline code path so that we can avoid functional bugs as much as possible. Separating them into 2 methods has resulted in code duplication in both methods without any huge benefit that I can see. It also leave us susceptible to future code changes made in one code path and not the other. Why is isSecurityEnabled() being checked at this internal level. The code should not even reach this point if security is not enabled. It should already be taken care of in the public apis, right? Also why is it calling rmContext.getDelegationTokenRenewer().addApplication(event) instead of DelegationTokenRenewer.this.addApplication(). Same for rmContext.getDelegationTokenRenewer().applicationFinished(evt); @SuppressWarnings( "unchecked" ) + private void handleDTRenewerEvent( + DelegationTokenRenewerAppSubmitEvent event) { + try { + // Setup tokens for renewal + if (UserGroupInformation.isSecurityEnabled()) { + rmContext.getDelegationTokenRenewer().addApplication(event); + rmContext.getDispatcher().getEventHandler() + .handle( new RMAppEvent(event.getAppicationId(), + event.isApplicationRecovered() ? RMAppEventType.RECOVER + : RMAppEventType.START)); + } + } catch (Throwable t) { These assumptions may make the code brittle to future changes. Also Typo in comments. We should probably assert that the application state is NEW over here so that the broken assumption is caught at the source instead of at the destination app causing a state machine crash................. + "Unable to add the application to the delegation token renewer." , + t); + // Sending APP_REJECTED is fine, since we assume that the + // RMApp is in NEW state and thus we havne't yet informed the + // Scheduler about the existence of the application + rmContext.getDispatcher().getEventHandler().handle( + new RMAppRejectedEvent(event.getAppicationId(), t.getMessage())); + } typo public ApplicationId getAppicationId() { @Private + @VisibleForTesting??? + //Only for Testing + public int getInProcessDelegationTokenRenewerEventsCount() { + return this .renewerCount.get(); + } Can DelegationTokenRenewerAppSubmitEvent event objects have an event type different from VERIFY_AND_START_APPLICATION? If not, we dont need this check and we can change the constructor of DelegationTokenRenewerAppSubmitEvent to not expect an event type argument. It should set the VERIFY_AND_START_APPLICATION within the constructor. + if (evt.getType().equals( + DelegationTokenRenewerEventType.VERIFY_AND_START_APPLICATION) + && evt instanceof DelegationTokenRenewerAppSubmitEvent) { Rename DelegationTokenRenewerThread to not have misleading Thread in the name ? Why is this warning not happening for other services? Whats special in the code for DelegationTokenRenewer? + <!-- Ignore Synchronization issues as they are never going to occur for + methods like serviceInit(), serviceStart() and handle() --> + <Match> + < Class name= "org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer" /> + <Bug pattern= "IS2_INCONSISTENT_SYNC" /> + </Match>
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12612089/YARN-674.5.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified test files.

        -1 javac. The patch appears to cause the build to fail.

        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2368//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612089/YARN-674.5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2368//console This message is automatically generated.
        Hide
        Omkar Vinit Joshi added a comment -

        Thanks Vinod Kumar Vavilapalli for review...

        Does this patch also include YARN-1210? Seems like it, we should separate that code.

        No .. anything specific? YARN-1210 is more about waiting for older AM to finish before launching a new AM.

        Depending on the final patch, I think we should split RMAppManager.submitApp into two, one for regular submit and one for submit after recovery.

        Splitting the method into 2.

        • submitApplication - normal application submission
        • submitRecoveredApplication - submitting recovered application

        RMAppState.java change is unnecessary.

        fixed

        ForwardingEventHandler is a bottleneck for renewals now - especially during submission. We need to have a thread pool.

        Create fixed thread pool service with thread count controllable via configuration (Not adding this to yarn-default). Keeping default thread count to be 5. fair enough?

        Once we do the above, the old concurrency test should be added back.

        yeah..added that test back..

        We are undoing most of YARN-1107. Good that we laid the groundwork there. Let's make sure we remove all the dead code. One comment stands out

        Anything did I miss here? didn't understand. The comment I have not removed as it is still valid.

        The newly added test can have race conditions? We may be lucky in the test, but in real life scenario, client has to submit app and poll for app failure due to invalid tokens

        I think it will not. For clients yes after they submit the application they will have to keep polling to know the status of the application (got accepted or failed due to token renewal).

        Similarly we should add a test for successful submission after renewal.

        sure added one.. checking for RMAppEvent.START

        Show
        Omkar Vinit Joshi added a comment - Thanks Vinod Kumar Vavilapalli for review... Does this patch also include YARN-1210 ? Seems like it, we should separate that code. No .. anything specific? YARN-1210 is more about waiting for older AM to finish before launching a new AM. Depending on the final patch, I think we should split RMAppManager.submitApp into two, one for regular submit and one for submit after recovery. Splitting the method into 2. submitApplication - normal application submission submitRecoveredApplication - submitting recovered application RMAppState.java change is unnecessary. fixed ForwardingEventHandler is a bottleneck for renewals now - especially during submission. We need to have a thread pool. Create fixed thread pool service with thread count controllable via configuration (Not adding this to yarn-default). Keeping default thread count to be 5. fair enough? Once we do the above, the old concurrency test should be added back. yeah..added that test back.. We are undoing most of YARN-1107 . Good that we laid the groundwork there. Let's make sure we remove all the dead code. One comment stands out Anything did I miss here? didn't understand. The comment I have not removed as it is still valid. The newly added test can have race conditions? We may be lucky in the test, but in real life scenario, client has to submit app and poll for app failure due to invalid tokens I think it will not. For clients yes after they submit the application they will have to keep polling to know the status of the application (got accepted or failed due to token renewal). Similarly we should add a test for successful submission after renewal. sure added one.. checking for RMAppEvent.START
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Started looking at the patch:

        • Does this patch also include YARN-1210? Seems like it, we should separate that code.
        • Depending on the final patch, I think we should split RMAppManager.submitApp into two, one for regular submit and one for submit after recovery.
        • RMAppState.java change is unnecessary.
        • ForwardingEventHandler is a bottleneck for renewals now - especially during submission. We need to have a thread pool.
        • Once we do the above, the old concurrency test should be added back.
        • We are undoing most of YARN-1107. Good that we laid the groundwork there. Let's make sure we remove all the dead code. One comment stands out
               // At RM restart it is safe to assume that all the previously added tokens
               // are valid
          
        • The newly added test can have race conditions? We may be lucky in the test, but in real life scenario, client has to submit app and poll for app failure due to invalid tokens.
        • Similarly we should add a test for successful submission after renewal.
        Show
        Vinod Kumar Vavilapalli added a comment - Started looking at the patch: Does this patch also include YARN-1210 ? Seems like it, we should separate that code. Depending on the final patch, I think we should split RMAppManager.submitApp into two, one for regular submit and one for submit after recovery. RMAppState.java change is unnecessary. ForwardingEventHandler is a bottleneck for renewals now - especially during submission. We need to have a thread pool. Once we do the above, the old concurrency test should be added back. We are undoing most of YARN-1107 . Good that we laid the groundwork there. Let's make sure we remove all the dead code. One comment stands out // At RM restart it is safe to assume that all the previously added tokens // are valid The newly added test can have race conditions? We may be lucky in the test, but in real life scenario, client has to submit app and poll for app failure due to invalid tokens. Similarly we should add a test for successful submission after renewal.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12611535/YARN-674.4.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2342//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2342//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611535/YARN-674.4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2342//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2342//console This message is automatically generated.
        Hide
        Omkar Vinit Joshi added a comment -

        rebasing the patch after recent RM -restart changes.

        Show
        Omkar Vinit Joshi added a comment - rebasing the patch after recent RM -restart changes.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12610729/YARN-674.3.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2303//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2303//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610729/YARN-674.3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2303//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2303//console This message is automatically generated.
        Hide
        Omkar Vinit Joshi added a comment -

        Thanks Zhijie Shen and Jian He for reviewing..

        DelegationTokenRenewer.applicationFinished() is anyways canceling the token on a separate thread, do we need to funnel through the dispatcher as well ?

        discussed with Jian He offline.. We need this because user may just submit an application and kill it immediately after it. The earlier submitted application may still be in flight (in dispatcher queue) and we may try to process application finish event as a part of kill before it. To avoid this I am enqueuing it.

        It would be to have an end-to-end test that application submitted with an Invalid token will be rejected and verify yarn client can get Failed application status using MockRM.

        Added the test

        Show
        Omkar Vinit Joshi added a comment - Thanks Zhijie Shen and Jian He for reviewing.. DelegationTokenRenewer.applicationFinished() is anyways canceling the token on a separate thread, do we need to funnel through the dispatcher as well ? discussed with Jian He offline.. We need this because user may just submit an application and kill it immediately after it. The earlier submitted application may still be in flight (in dispatcher queue) and we may try to process application finish event as a part of kill before it. To avoid this I am enqueuing it. It would be to have an end-to-end test that application submitted with an Invalid token will be rejected and verify yarn client can get Failed application status using MockRM. Added the test
        Hide
        Omkar Vinit Joshi added a comment -

        parseCredentials is not asynchronous, right? Therefore, is it better to fail the application submission immediately instead of forcing the client the come back to check the status? In fact, in submitApplication, there're already several points where the application submission can fail immediately, even though the application START is handled asynchronously.

        I am not sure about this. No problem but will add it.

        Maybe you can do
          if (event instanceof DelegationTokenRenewerAppSubmitEvent) {
            ...
          }
        to avoid the findbug warning?
        

        the whole point of adding ExceptionType is to avoid this. right Zhijie Shen? I am wondering why at other places we are not getting similar casting exception.

        Show
        Omkar Vinit Joshi added a comment - parseCredentials is not asynchronous, right? Therefore, is it better to fail the application submission immediately instead of forcing the client the come back to check the status? In fact, in submitApplication, there're already several points where the application submission can fail immediately, even though the application START is handled asynchronously. I am not sure about this. No problem but will add it. Maybe you can do if (event instanceof DelegationTokenRenewerAppSubmitEvent) { ... } to avoid the findbug warning? the whole point of adding ExceptionType is to avoid this. right Zhijie Shen ? I am wondering why at other places we are not getting similar casting exception.
        Hide
        Zhijie Shen added a comment -

        Yes I have removed the error purposefully..here are the thoughts.

        parseCredentials is not asynchronous, right? Therefore, is it better to fail the application submission immediately instead of forcing the client the come back to check the status? In fact, in submitApplication, there're already several points where the application submission can fail immediately, even though the application START is handled asynchronously.

        Not understanding how to fix that findbug warning .. should I add that too into exclude-findbug.xml? I tried this. Even eclipse doesn't complain

        Maybe you can do

          if (event instanceof DelegationTokenRenewerAppSubmitEvent) {
            ...
          }
        

        to avoid the findbug warning?

        Show
        Zhijie Shen added a comment - Yes I have removed the error purposefully..here are the thoughts. parseCredentials is not asynchronous, right? Therefore, is it better to fail the application submission immediately instead of forcing the client the come back to check the status? In fact, in submitApplication, there're already several points where the application submission can fail immediately, even though the application START is handled asynchronously. Not understanding how to fix that findbug warning .. should I add that too into exclude-findbug.xml? I tried this. Even eclipse doesn't complain Maybe you can do if (event instanceof DelegationTokenRenewerAppSubmitEvent) { ... } to avoid the findbug warning?
        Hide
        Jian He added a comment -

        DelegationTokenRenewer.applicationFinished() is anyways canceling the token on a separate thread, do we need to funnel through the dispatcher as well ?
        It would be to have an end-to-end test that application submitted with an Invalid token will be rejected and verify yarn client can get Failed application status using MockRM.

        Show
        Jian He added a comment - DelegationTokenRenewer.applicationFinished() is anyways canceling the token on a separate thread, do we need to funnel through the dispatcher as well ? It would be to have an end-to-end test that application submitted with an Invalid token will be rejected and verify yarn client can get Failed application status using MockRM.
        Hide
        Omkar Vinit Joshi added a comment -
        • The recent test failure doesn't seem to be related to the code. The test passes locally. Should I open one ticket for this?
        • Not understanding how to fix that findbug warning .. should I add that too into exclude-findbug.xml? I tried this. Even eclipse doesn't complain
              @Override
              @SuppressWarnings("unchecked")
              public void handle(DelegationTokenRenewerEvent event) {
                if (event.getType().equals(
                    DelegationTokenRenewerEventType.VERIFY_AND_START_APPLICATION)) {
                  DelegationTokenRenewerAppSubmitEvent appSubmitEvt =
                      (DelegationTokenRenewerAppSubmitEvent) event;
                  handleDTRenewerEvent(appSubmitEvt);
                } else if (event.getType().equals(
                    DelegationTokenRenewerEventType.FINISH_APPLICATION)) {
                  rmContext.getDelegationTokenRenewer().applicationFinished(event);
                }
          
              }
          
        Show
        Omkar Vinit Joshi added a comment - The recent test failure doesn't seem to be related to the code. The test passes locally. Should I open one ticket for this? Not understanding how to fix that findbug warning .. should I add that too into exclude-findbug.xml? I tried this. Even eclipse doesn't complain @Override @SuppressWarnings( "unchecked" ) public void handle(DelegationTokenRenewerEvent event) { if (event.getType().equals( DelegationTokenRenewerEventType.VERIFY_AND_START_APPLICATION)) { DelegationTokenRenewerAppSubmitEvent appSubmitEvt = (DelegationTokenRenewerAppSubmitEvent) event; handleDTRenewerEvent(appSubmitEvt); } else if (event.getType().equals( DelegationTokenRenewerEventType.FINISH_APPLICATION)) { rmContext.getDelegationTokenRenewer().applicationFinished(event); } }
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12610392/YARN-674.2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2291//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2291//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2291//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610392/YARN-674.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2291//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2291//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2291//console This message is automatically generated.
        Hide
        Omkar Vinit Joshi added a comment -

        Thanks Zhijie Shen for reviewing my patch

        I think the exception needs to be thrown, which is missing in your patch. The exception will notice the client that the app submission fails; otherwise, the client will think the submission succeeds?

        Yes I have removed the error purposefully..here are the thoughts.

        • For client once he submits the application should check the app status and will come to know about the failing app from it.
          • Either when parsing credentials fails.
          • OR when initial token renewal fails.

        Since DelegationTokenRenewer#addApplication becomes asynchronous, what will the impact of that the application is already accepted and starts its life cycle, while DelegationTokenRenewer is so slow to DelegationTokenRenewerAppSubmitEvent. Will the application fail somewhere else due to the fresh token unavailable?

        The logic here is modified a bit. If token renewal succeeds then only app is submitted to scheduler not before that. Today too it is the same case. Only problem is that we are holding client request while doing this. With the change this will become async.

        I noticed testConncurrentAddApplication has been removed. Does the change affect the current app submission?

        No. Now there is no problem w.r.t. concurrent app submission as we are anyway funneling it through event handler. This test is no longer required so removed it completely.

        • Fixing findbug warnings...
        • fixing failed test case...
        Show
        Omkar Vinit Joshi added a comment - Thanks Zhijie Shen for reviewing my patch I think the exception needs to be thrown, which is missing in your patch. The exception will notice the client that the app submission fails; otherwise, the client will think the submission succeeds? Yes I have removed the error purposefully..here are the thoughts. For client once he submits the application should check the app status and will come to know about the failing app from it. Either when parsing credentials fails. OR when initial token renewal fails. Since DelegationTokenRenewer#addApplication becomes asynchronous, what will the impact of that the application is already accepted and starts its life cycle, while DelegationTokenRenewer is so slow to DelegationTokenRenewerAppSubmitEvent. Will the application fail somewhere else due to the fresh token unavailable? The logic here is modified a bit. If token renewal succeeds then only app is submitted to scheduler not before that. Today too it is the same case. Only problem is that we are holding client request while doing this. With the change this will become async. I noticed testConncurrentAddApplication has been removed. Does the change affect the current app submission? No. Now there is no problem w.r.t. concurrent app submission as we are anyway funneling it through event handler. This test is no longer required so removed it completely. Fixing findbug warnings... fixing failed test case...
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12610181/YARN-674.1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2282//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2282//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2282//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610181/YARN-674.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2282//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2282//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2282//console This message is automatically generated.
        Hide
        Zhijie Shen added a comment -

        I've a quick look at the patch. Here're my comments:

        1. It seem that the change in RMAppManager is not necessary, because the current logic is to reject the app in the secure case when parsing the credentials and adding the apps to DelegationTokenRenewer have something wrong; otherwise, the app will be accepted. Though there's no obvious "if... else..." structure, it achieves the same logic control via:

              throw RPCUtil.getRemoteException(ie);
        

        I think the exception needs to be thrown, which is missing in your patch. The exception will notice the client that the app submission fails; otherwise, the client will think the submission succeeds?

        If I miss some ideas here, please let me know.

        2. Since DelegationTokenRenewer#addApplication becomes asynchronous, what will the impact of that the application is already accepted and starts its life cycle, while DelegationTokenRenewer is so slow to DelegationTokenRenewerAppSubmitEvent. Will the application fail somewhere else due to the fresh token unavailable?

        3. I noticed testConncurrentAddApplication has been removed. Does the change affect the current app submission?

        Show
        Zhijie Shen added a comment - I've a quick look at the patch. Here're my comments: 1. It seem that the change in RMAppManager is not necessary, because the current logic is to reject the app in the secure case when parsing the credentials and adding the apps to DelegationTokenRenewer have something wrong; otherwise, the app will be accepted. Though there's no obvious "if... else..." structure, it achieves the same logic control via: throw RPCUtil.getRemoteException(ie); I think the exception needs to be thrown, which is missing in your patch. The exception will notice the client that the app submission fails; otherwise, the client will think the submission succeeds? If I miss some ideas here, please let me know. 2. Since DelegationTokenRenewer#addApplication becomes asynchronous, what will the impact of that the application is already accepted and starts its life cycle, while DelegationTokenRenewer is so slow to DelegationTokenRenewerAppSubmitEvent. Will the application fail somewhere else due to the fresh token unavailable? 3. I noticed testConncurrentAddApplication has been removed. Does the change affect the current app submission?
        Hide
        Omkar Vinit Joshi added a comment -

        updating the patch
        Modifying the event flow

        • In unsecured case nothing will change
        • In secured case
          • For Recovery
            • ApplicationEvents will be enqueued when we are recovering. When service starts they will get processed.
          • For normal app submission
            • Event will be enqueued into the separate DelegationTokenRenewer dispatcher queue and client's request will be returned immediately.
            • If the token renewal is successful then renewer will send the START/RECOVER event else it will fail the app.
        • Testing
          • Updated unit test to test the updated behavior
          • Manually tested it on secured cluster
            • tested app submission with by default HDFS TOKEN and it works.
            • tested same by restarting RM in between
        Show
        Omkar Vinit Joshi added a comment - updating the patch Modifying the event flow In unsecured case nothing will change In secured case For Recovery ApplicationEvents will be enqueued when we are recovering. When service starts they will get processed. For normal app submission Event will be enqueued into the separate DelegationTokenRenewer dispatcher queue and client's request will be returned immediately. If the token renewal is successful then renewer will send the START/RECOVER event else it will fail the app. Testing Updated unit test to test the updated behavior Manually tested it on secured cluster tested app submission with by default HDFS TOKEN and it works. tested same by restarting RM in between
        Hide
        Omkar Vinit Joshi added a comment -

        Taking it over..

        Show
        Omkar Vinit Joshi added a comment - Taking it over..
        Hide
        Jian He added a comment -

        Is this related to ClientRMService.renewDelegationToken this method?

        Show
        Jian He added a comment - Is this related to ClientRMService.renewDelegationToken this method?
        Hide
        Bikas Saha added a comment -

        Looks like we might have to resurrect the remaining changes proposed in the document in YARN-549, namely sending an event to RMAppManager instead of calling it RMAppManager.submitApplication() directly since that method is no longer cheap. Any other alternatives?

        Show
        Bikas Saha added a comment - Looks like we might have to resurrect the remaining changes proposed in the document in YARN-549 , namely sending an event to RMAppManager instead of calling it RMAppManager.submitApplication() directly since that method is no longer cheap. Any other alternatives?

          People

          • Assignee:
            Omkar Vinit Joshi
            Reporter:
            Vinod Kumar Vavilapalli
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development