Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6230

MR AM does not survive RM restart if RM activated a new AMRM secret key

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      A MapReduce AM will fail to reconnect to an RM that performed restart in the following scenario:

      1. MapReduce job launched with AMRM token generated from AMRM secret X
      2. RM rolls new AMRM secret Y and activates the new key
      3. RM performs a work-preserving restart
      4. MapReduce job AM now unable to connect to RM with "Invalid AMRMToken" exception

        Issue Links

          Activity

          Hide
          jlowe Jason Lowe added a comment -

          There are a couple of reasons this doesn't work:

          1) The token handed to the AM from the RM does not have a service set, since the RM does not know the service name being used by the client to connect to the RM. Unfortunately the RMContainerAllocator never updates the service name of the token, so it ends up not being selected by the RPC layer when reconnecting to the RM.

          2) The RMContainerAllocator for some reason tries to update the login user rather than the current user when security is enabled. This ends up updating a token in the UGI that is not being used by the IPC layer when reconnecting to the RM on a secure cluster.

          Also note that the AMRM token is mapped with an empty service name in the credentials, so if the token service is updated before adding it to the credentials we can end up with two AMRM tokens (one with an empty service key and one with a non-empty service key). So we need to be careful when adding the new token to the credentials that we will indeed be clobbering the old token, as we need to use a consistent service/alias that we cannot obtain directly from the credentials.

          Show
          jlowe Jason Lowe added a comment - There are a couple of reasons this doesn't work: 1) The token handed to the AM from the RM does not have a service set, since the RM does not know the service name being used by the client to connect to the RM. Unfortunately the RMContainerAllocator never updates the service name of the token, so it ends up not being selected by the RPC layer when reconnecting to the RM. 2) The RMContainerAllocator for some reason tries to update the login user rather than the current user when security is enabled. This ends up updating a token in the UGI that is not being used by the IPC layer when reconnecting to the RM on a secure cluster. Also note that the AMRM token is mapped with an empty service name in the credentials, so if the token service is updated before adding it to the credentials we can end up with two AMRM tokens (one with an empty service key and one with a non-empty service key). So we need to be careful when adding the new token to the credentials that we will indeed be clobbering the old token, as we need to use a consistent service/alias that we cannot obtain directly from the credentials.
          Hide
          jlowe Jason Lowe added a comment -

          Based on my comment from YARN-3103 here's a patch to store the token into the current UGI using whatever service name the RM set as the key/alias to clobber the existing token and then updates the service name after the token has been stored in the credentials.

          Added a unit test and also manually tested the patch on a secure cluster where the RM rolled the AMRM token master key and then restarted the RM after it activated while the app was still running. Verified that before this fix the AM failed to connect to the RM in that scenario but was able to succeed with this patch.

          Show
          jlowe Jason Lowe added a comment - Based on my comment from YARN-3103 here's a patch to store the token into the current UGI using whatever service name the RM set as the key/alias to clobber the existing token and then updates the service name after the token has been stored in the credentials. Added a unit test and also manually tested the patch on a secure cluster where the RM rolled the AMRM token master key and then restarted the RM after it activated while the app was still running. Verified that before this fix the AM failed to connect to the RM in that scenario but was able to succeed with this patch.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12695013/MAPREDUCE-6230.001.patch
          against trunk revision 9850e15.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5126//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5126//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695013/MAPREDUCE-6230.001.patch against trunk revision 9850e15. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5126//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5126//console This message is automatically generated.
          Hide
          jianhe Jian He added a comment -

          +1

          Show
          jianhe Jian He added a comment - +1
          Hide
          jianhe Jian He added a comment -

          Committed to trunk and branch-2, thanks Jason !

          Show
          jianhe Jian He added a comment - Committed to trunk and branch-2, thanks Jason !
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #6954 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6954/)
          MAPREDUCE-6230. Fixed RMContainerAllocator to update the new AMRMToken service name properly. Contributed by Jason Lowe (jianhe: rev cff05bff1fe24628677d41a0d537f2c383b44faf)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #6954 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6954/ ) MAPREDUCE-6230 . Fixed RMContainerAllocator to update the new AMRMToken service name properly. Contributed by Jason Lowe (jianhe: rev cff05bff1fe24628677d41a0d537f2c383b44faf) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java hadoop-mapreduce-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #88 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/88/)
          MAPREDUCE-6230. Fixed RMContainerAllocator to update the new AMRMToken service name properly. Contributed by Jason Lowe (jianhe: rev cff05bff1fe24628677d41a0d537f2c383b44faf)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #88 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/88/ ) MAPREDUCE-6230 . Fixed RMContainerAllocator to update the new AMRMToken service name properly. Contributed by Jason Lowe (jianhe: rev cff05bff1fe24628677d41a0d537f2c383b44faf) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #822 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/822/)
          MAPREDUCE-6230. Fixed RMContainerAllocator to update the new AMRMToken service name properly. Contributed by Jason Lowe (jianhe: rev cff05bff1fe24628677d41a0d537f2c383b44faf)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #822 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/822/ ) MAPREDUCE-6230 . Fixed RMContainerAllocator to update the new AMRMToken service name properly. Contributed by Jason Lowe (jianhe: rev cff05bff1fe24628677d41a0d537f2c383b44faf) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java hadoop-mapreduce-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #89 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/89/)
          MAPREDUCE-6230. Fixed RMContainerAllocator to update the new AMRMToken service name properly. Contributed by Jason Lowe (jianhe: rev cff05bff1fe24628677d41a0d537f2c383b44faf)

          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #89 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/89/ ) MAPREDUCE-6230 . Fixed RMContainerAllocator to update the new AMRMToken service name properly. Contributed by Jason Lowe (jianhe: rev cff05bff1fe24628677d41a0d537f2c383b44faf) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #85 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/85/)
          MAPREDUCE-6230. Fixed RMContainerAllocator to update the new AMRMToken service name properly. Contributed by Jason Lowe (jianhe: rev cff05bff1fe24628677d41a0d537f2c383b44faf)

          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #85 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/85/ ) MAPREDUCE-6230 . Fixed RMContainerAllocator to update the new AMRMToken service name properly. Contributed by Jason Lowe (jianhe: rev cff05bff1fe24628677d41a0d537f2c383b44faf) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #2020 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2020/)
          MAPREDUCE-6230. Fixed RMContainerAllocator to update the new AMRMToken service name properly. Contributed by Jason Lowe (jianhe: rev cff05bff1fe24628677d41a0d537f2c383b44faf)

          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #2020 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2020/ ) MAPREDUCE-6230 . Fixed RMContainerAllocator to update the new AMRMToken service name properly. Contributed by Jason Lowe (jianhe: rev cff05bff1fe24628677d41a0d537f2c383b44faf) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2039 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2039/)
          MAPREDUCE-6230. Fixed RMContainerAllocator to update the new AMRMToken service name properly. Contributed by Jason Lowe (jianhe: rev cff05bff1fe24628677d41a0d537f2c383b44faf)

          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2039 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2039/ ) MAPREDUCE-6230 . Fixed RMContainerAllocator to update the new AMRMToken service name properly. Contributed by Jason Lowe (jianhe: rev cff05bff1fe24628677d41a0d537f2c383b44faf) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Pulled this into 2.6.1. Ran compilation and TestRMContainerAllocator before the push. Patch applied cleanly.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Pulled this into 2.6.1. Ran compilation and TestRMContainerAllocator before the push. Patch applied cleanly.

            People

            • Assignee:
              jlowe Jason Lowe
              Reporter:
              jlowe Jason Lowe
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development