Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5918

Handle Opportunistic scheduling allocate request failure when NM is lost

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha2
    • Component/s: None
    • Labels:
      None

      Description

      Allocate request failure during Opportunistic container allocation when nodemanager is lost

      2016-11-20 10:38:49,011 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root     OPERATION=AM Released Container TARGET=SchedulerApp     RESULT=SUCCESS  APPID=application_1479637990302_0002    CONTAINERID=container_e12_1479637990302_0002_01_000006  RESOURCE=<memory:1024, vCores:1>
      2016-11-20 10:38:49,011 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Removed node docker2:38297 clusterResource: <memory:4096, vCores:8>
      2016-11-20 10:38:49,434 WARN org.apache.hadoop.ipc.Server: IPC Server handler 7 on 8030, call Call#35 Retry#0 org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from 172.17.0.2:51584
      java.lang.NullPointerException
              at org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.convertToRemoteNode(OpportunisticContainerAllocatorAMService.java:420)
              at org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.convertToRemoteNodes(OpportunisticContainerAllocatorAMService.java:412)
              at org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.getLeastLoadedNodes(OpportunisticContainerAllocatorAMService.java:402)
              at org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.allocate(OpportunisticContainerAllocatorAMService.java:236)
              at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
              at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:467)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:990)
              at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846)
              at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1857)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2539)
      2016-11-20 10:38:50,824 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e12_1479637990302_0002_01_000002 Container Transitioned from RUNNING to COMPLETED
      
      

        Attachments

        1. YARN-5918.0001.patch
          2 kB
          Bibin A Chundatt
        2. YARN-5918.0002.patch
          9 kB
          Bibin A Chundatt
        3. YARN-5918.0003.patch
          11 kB
          Bibin A Chundatt
        4. YARN-5918.0004.patch
          10 kB
          Bibin A Chundatt

          Activity

            People

            • Assignee:
              bibinchundatt Bibin A Chundatt
              Reporter:
              bibinchundatt Bibin A Chundatt
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: