Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-1489 [Umbrella] Work-preserving ApplicationMaster restart
  3. YARN-2433

Stale token used by restarted AM (with previous containers retained) to request new container

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.4.0, 2.4.1
    • Fix Version/s: 2.5.0
    • Component/s: None
    • Labels:
      None

      Description

      With Hadoop 2.4, container retention is supported across AM crash-and-restart. However, after an AM is restarted with containers retained, it appears to be using the stale token to start new container. This leads to the error below. To truly support container retention, AM should be able to communicate with previous container(s) with the old token and ask for new container with new token.

      This could be similar to YARN-1321 which was reported and fixed earlier.

      ERROR:
      Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_000001 was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_000002

      STACK trace:

      hadoop.ipc.ProtobufRpcEngine$Invoker.invoke org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 | 103: Response <- YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: startContainers {services_meta_data { key: "mapreduce_shuffle" value: "\000\0004\372" } failed_requests { container_id { app_attempt_id { application_id { id: 65 cluster_timestamp: 1408130608672 } attemptId: 2 } id: 2 } exception { message: "Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_000001 was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_000002" trace: "org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. \nNMToken for application attempt : appattempt_1408130608672_0065_000001 was used for starting container with container token issued for application attempt : appattempt_1408130608672_0065_000002\r\n\tat org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat java.security.AccessController.doPrivileged(Native Method)\r\n\tat javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n" class_name: "org.apache.hadoop.yarn.exceptions.YarnException" } }}
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jianhe Jian He
                Reporter:
                yingdachen Yingda Chen
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: