Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5353

ResourceManager can leak delegation tokens when they are shared across apps

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Recently saw a ResourceManager go into heavy GC. Heap dump showed that there were millions of delegation tokens on the heap. It looks like most of them belonged to the appTokens map in DelegationTokenRenewer. When an app completes and tokens are removed for it, I noticed that the appTokens entry for the app is not cleaned up if tokens were shared with other active apps.

        Activity

        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #10085 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10085/)
        YARN-5353. ResourceManager can leak delegation tokens when they are (varunsaxena: rev 06c56ff79b4cdf82f733498d3edfa0b6e531a2db)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #10085 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10085/ ) YARN-5353 . ResourceManager can leak delegation tokens when they are (varunsaxena: rev 06c56ff79b4cdf82f733498d3edfa0b6e531a2db) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
        Hide
        varun_saxena Varun Saxena added a comment -

        Committed this to trunk, branch-2, branch-2.8, branch-2.7 and branch-2.6
        Thanks Jason Lowe for fixing the issue.

        Show
        varun_saxena Varun Saxena added a comment - Committed this to trunk, branch-2, branch-2.8, branch-2.7 and branch-2.6 Thanks Jason Lowe for fixing the issue.
        Hide
        varun_saxena Varun Saxena added a comment -

        LGTM.
        Will commit it later today unless there are further comments.

        Show
        varun_saxena Varun Saxena added a comment - LGTM. Will commit it later today unless there are further comments.
        Hide
        jlowe Jason Lowe added a comment -

        Test failures are unrelated and pass for me locally with the patch applied. The TestRMWebServicesDelegationTokenAuthentication failure is tracked by YARN-4813 and the TestAMRestart failure is tracked by YARN-5317.

        Show
        jlowe Jason Lowe added a comment - Test failures are unrelated and pass for me locally with the patch applied. The TestRMWebServicesDelegationTokenAuthentication failure is tracked by YARN-4813 and the TestAMRestart failure is tracked by YARN-5317 .
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 17s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 9m 28s trunk passed
        +1 compile 0m 40s trunk passed
        +1 checkstyle 0m 23s trunk passed
        +1 mvnsite 0m 46s trunk passed
        +1 mvneclipse 0m 21s trunk passed
        +1 findbugs 1m 9s trunk passed
        +1 javadoc 0m 24s trunk passed
        +1 mvninstall 0m 39s the patch passed
        +1 compile 0m 37s the patch passed
        +1 javac 0m 37s the patch passed
        +1 checkstyle 0m 21s the patch passed
        +1 mvnsite 0m 44s the patch passed
        +1 mvneclipse 0m 18s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 16s the patch passed
        +1 javadoc 0m 21s the patch passed
        -1 unit 35m 42s hadoop-yarn-server-resourcemanager in the patch failed.
        +1 asflicense 0m 17s The patch does not generate ASF License warnings.
        54m 23s



        Reason Tests
        Failed junit tests hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication
          hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12817239/YARN-5353.001.patch
        JIRA Issue YARN-5353
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 8c770bee11a9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 0fd3980
        Default Java 1.8.0_91
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-YARN-Build/12276/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/12276/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12276/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/12276/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 9m 28s trunk passed +1 compile 0m 40s trunk passed +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 46s trunk passed +1 mvneclipse 0m 21s trunk passed +1 findbugs 1m 9s trunk passed +1 javadoc 0m 24s trunk passed +1 mvninstall 0m 39s the patch passed +1 compile 0m 37s the patch passed +1 javac 0m 37s the patch passed +1 checkstyle 0m 21s the patch passed +1 mvnsite 0m 44s the patch passed +1 mvneclipse 0m 18s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 16s the patch passed +1 javadoc 0m 21s the patch passed -1 unit 35m 42s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 54m 23s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication   hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12817239/YARN-5353.001.patch JIRA Issue YARN-5353 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 8c770bee11a9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0fd3980 Default Java 1.8.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/12276/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/12276/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12276/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12276/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        jlowe Jason Lowe added a comment -

        Seems to me that we need to make sure that the appTokens map always has the application removed when the application is marked as finished. It's our one chance to clean up the app entry, and currently the code can conditionally decide to leave the app's entry in the map.

        Attaching a patch that always removes the appTokens entry corresponding to an app when the app finished event is received. Any tokens that are shared with other apps will continue to exist in the allTokens map, so I think we'll still be good as far as token-sharing goes.

        Show
        jlowe Jason Lowe added a comment - Seems to me that we need to make sure that the appTokens map always has the application removed when the application is marked as finished. It's our one chance to clean up the app entry, and currently the code can conditionally decide to leave the app's entry in the map. Attaching a patch that always removes the appTokens entry corresponding to an app when the app finished event is received. Any tokens that are shared with other apps will continue to exist in the allTokens map, so I think we'll still be good as far as token-sharing goes.

          People

          • Assignee:
            jlowe Jason Lowe
            Reporter:
            jlowe Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development