Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5836

Malicious AM can kill containers of other apps running in any node its containers are running

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha2
    • Component/s: nodemanager
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When AM calls NM via ContainerManagementProtocol, the NMToken is suppied for authentication. The RPC server will verify the password of NMToken (originally generated by RM) so that we know the content of NMTokenIdentifier is geniune.

      Next, for stopContainers() and getContainerStatus(), method authorizeGetAndStopContainerRequest() is used to verify that the requsted containers do belong to the AM by comparing them against the AppId in NMTokenIdentifier. However, right now when the appId doesn't match, authorizeGetAndStopContainerRequest() only log a warning message and continues to kill the container... Overall a malicious AM can kill containers of other apps running in any node its containers are running.

      1. YARN-5836.v1.patch
        10 kB
        Botong Huang
      2. YARN-5836.v2.patch
        11 kB
        Botong Huang

        Activity

        Hide
        asuresh Arun Suresh added a comment -

        Thanks for raising this Botong Huang.

        Makes sense. I guess the stopContainer call can probably send a new ApplicationEvent.KILL_CONTAINER event which is routed thru the application to ensure the container in question actually belongs to the Application before forwarding the KILL_CONTAINER to the container.

        Jian He, Varun Vasudev, Karthik Kambatla.. Thoughts ?

        Show
        asuresh Arun Suresh added a comment - Thanks for raising this Botong Huang . Makes sense. I guess the stopContainer call can probably send a new ApplicationEvent.KILL_CONTAINER event which is routed thru the application to ensure the container in question actually belongs to the Application before forwarding the KILL_CONTAINER to the container. Jian He , Varun Vasudev , Karthik Kambatla .. Thoughts ?
        Hide
        jlowe Jason Lowe added a comment -

        As I understand it, the NM token should be getting verified by the SASL server as part of the RPC connection, since ContainerManagerImpl sets up the RPC server with the NM token secret manager. That's why we wouldn't see any explicit checking of the NM token, as it should be implicitly done as part of connecting to the NM. The container token needs to be verified separately since that's not associated directly with an RPC server like the NM token.

        even for plain text checking, when the appId doesn’t match, all it does is log it as a warning and continues to kill the container

        That sounds like a bug to me, authorizeGetAndStopContainerRequest isn't throwing an exception like it should.

        Show
        jlowe Jason Lowe added a comment - As I understand it, the NM token should be getting verified by the SASL server as part of the RPC connection, since ContainerManagerImpl sets up the RPC server with the NM token secret manager. That's why we wouldn't see any explicit checking of the NM token, as it should be implicitly done as part of connecting to the NM. The container token needs to be verified separately since that's not associated directly with an RPC server like the NM token. even for plain text checking, when the appId doesn’t match, all it does is log it as a warning and continues to kill the container That sounds like a bug to me, authorizeGetAndStopContainerRequest isn't throwing an exception like it should.
        Hide
        botong Botong Huang added a comment -

        Good point, thanks Jason for the info.

        Show
        botong Botong Huang added a comment - Good point, thanks Jason for the info.
        Hide
        jlowe Jason Lowe added a comment -

        So have you verified that a faked NM token "works" or was this theoretical? If there's a case on a secure cluster where a faked NM token allowed an application master (or other agent) to connect to the NM then that's serious and needs to be fixed. Otherwise we should update the JIRA headline to reflect this is tracking the missing exception for the invalid stop container request.

        Show
        jlowe Jason Lowe added a comment - So have you verified that a faked NM token "works" or was this theoretical? If there's a case on a secure cluster where a faked NM token allowed an application master (or other agent) to connect to the NM then that's serious and needs to be fixed. Otherwise we should update the JIRA headline to reflect this is tracking the missing exception for the invalid stop container request.
        Hide
        botong Botong Huang added a comment -

        It is theoretical so far. I am in the process of verifying it. I will update here when I get some results, thanks!

        Show
        botong Botong Huang added a comment - It is theoretical so far. I am in the process of verifying it. I will update here when I get some results, thanks!
        Hide
        jlowe Jason Lowe added a comment -

        Simplifying the summary to describe the symptom rather than detail the fix.

        Thanks for the patch! Looks good to me pending a Jenkins result.

        Show
        jlowe Jason Lowe added a comment - Simplifying the summary to describe the symptom rather than detail the fix. Thanks for the patch! Looks good to me pending a Jenkins result.
        Hide
        botong Botong Huang added a comment -

        Jason Lowe You are right, it turns out that the RPC server is indeed verifying the token password when token authentication is used. Updated the description, title and v1 patch uploaded. Thanks!

        Show
        botong Botong Huang added a comment - Jason Lowe You are right, it turns out that the RPC server is indeed verifying the token password when token authentication is used. Updated the description, title and v1 patch uploaded. Thanks!
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 13s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        +1 mvninstall 8m 48s trunk passed
        +1 compile 0m 32s trunk passed
        +1 checkstyle 0m 21s trunk passed
        +1 mvnsite 0m 32s trunk passed
        +1 mvneclipse 0m 15s trunk passed
        +1 findbugs 0m 52s trunk passed
        +1 javadoc 0m 19s trunk passed
        +1 mvninstall 0m 23s the patch passed
        +1 compile 0m 24s the patch passed
        +1 javac 0m 24s the patch passed
        +1 checkstyle 0m 16s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 191 unchanged - 6 fixed = 191 total (was 197)
        +1 mvnsite 0m 25s the patch passed
        +1 mvneclipse 0m 11s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 0m 46s the patch passed
        +1 javadoc 0m 13s the patch passed
        -1 unit 12m 32s hadoop-yarn-server-nodemanager in the patch failed.
        +1 asflicense 0m 18s The patch does not generate ASF License warnings.
        28m 43s



        Reason Tests
        Failed junit tests hadoop.yarn.server.nodemanager.TestContainerManagerWithLCE



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue YARN-5836
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839034/YARN-5836.v1.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 5eb7488cebe7 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 5af572b
        Default Java 1.8.0_101
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-YARN-Build/13928/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13928/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/13928/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 8m 48s trunk passed +1 compile 0m 32s trunk passed +1 checkstyle 0m 21s trunk passed +1 mvnsite 0m 32s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 0m 52s trunk passed +1 javadoc 0m 19s trunk passed +1 mvninstall 0m 23s the patch passed +1 compile 0m 24s the patch passed +1 javac 0m 24s the patch passed +1 checkstyle 0m 16s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 191 unchanged - 6 fixed = 191 total (was 197) +1 mvnsite 0m 25s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 46s the patch passed +1 javadoc 0m 13s the patch passed -1 unit 12m 32s hadoop-yarn-server-nodemanager in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 28m 43s Reason Tests Failed junit tests hadoop.yarn.server.nodemanager.TestContainerManagerWithLCE Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-5836 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839034/YARN-5836.v1.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5eb7488cebe7 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 5af572b Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/13928/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13928/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13928/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        -1 patch 0m 6s YARN-5836 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



        Subsystem Report/Notes
        JIRA Issue YARN-5836
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839074/YARN-5836.v2.patch
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/13930/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 6s YARN-5836 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue YARN-5836 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839074/YARN-5836.v2.patch Console output https://builds.apache.org/job/PreCommit-YARN-Build/13930/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 26s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
        +1 mvninstall 8m 38s trunk passed
        +1 compile 0m 28s trunk passed
        +1 checkstyle 0m 18s trunk passed
        +1 mvnsite 0m 28s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 0m 42s trunk passed
        +1 javadoc 0m 16s trunk passed
        +1 mvninstall 0m 22s the patch passed
        +1 compile 0m 24s the patch passed
        +1 javac 0m 24s the patch passed
        +1 checkstyle 0m 16s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 196 unchanged - 6 fixed = 196 total (was 202)
        +1 mvnsite 0m 25s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 0m 47s the patch passed
        +1 javadoc 0m 15s the patch passed
        +1 unit 13m 40s hadoop-yarn-server-nodemanager in the patch passed.
        +1 asflicense 0m 17s The patch does not generate ASF License warnings.
        29m 22s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue YARN-5836
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839079/YARN-5836.v2.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 9c57046ec320 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / f121d0b
        Default Java 1.8.0_101
        findbugs v3.0.0
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13932/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/13932/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 26s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 8m 38s trunk passed +1 compile 0m 28s trunk passed +1 checkstyle 0m 18s trunk passed +1 mvnsite 0m 28s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 0m 42s trunk passed +1 javadoc 0m 16s trunk passed +1 mvninstall 0m 22s the patch passed +1 compile 0m 24s the patch passed +1 javac 0m 24s the patch passed +1 checkstyle 0m 16s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 196 unchanged - 6 fixed = 196 total (was 202) +1 mvnsite 0m 25s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 47s the patch passed +1 javadoc 0m 15s the patch passed +1 unit 13m 40s hadoop-yarn-server-nodemanager in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 29m 22s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-5836 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839079/YARN-5836.v2.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 9c57046ec320 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / f121d0b Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13932/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13932/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        jlowe Jason Lowe added a comment -

        +1 lgtm. Committing this.

        Show
        jlowe Jason Lowe added a comment - +1 lgtm. Committing this.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10849 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10849/)
        YARN-5836. Malicious AM can kill containers of other apps running in any (jlowe: rev 59bfcbf3579e45ddf96db3aafccf669c8e03648f)

        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java
        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerManagerWithLCE.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10849 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10849/ ) YARN-5836 . Malicious AM can kill containers of other apps running in any (jlowe: rev 59bfcbf3579e45ddf96db3aafccf669c8e03648f) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerManagerWithLCE.java
        Hide
        jlowe Jason Lowe added a comment -

        Thanks to Botong Huang for the contribution and to Arun Suresh for additional review! I committed this to trunk, branch-2, and branch-2.8.

        Show
        jlowe Jason Lowe added a comment - Thanks to Botong Huang for the contribution and to Arun Suresh for additional review! I committed this to trunk, branch-2, and branch-2.8.
        Hide
        botong Botong Huang added a comment -
        Show
        botong Botong Huang added a comment - Great, thanks Jason Lowe , Arun Suresh and Subru Krishnan !
        Hide
        subru Subru Krishnan added a comment - - edited

        Congrats Botong Huang on your first patch and thanks Jason Lowe for the review/guidance.

        Show
        subru Subru Krishnan added a comment - - edited Congrats Botong Huang on your first patch and thanks Jason Lowe for the review/guidance.

          People

          • Assignee:
            botong Botong Huang
            Reporter:
            botong Botong Huang
          • Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 5h
              5h
              Remaining:
              Remaining Estimate - 5h
              5h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development