Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9238

Avoid allocating opportunistic containers to previous/removed/non-exist application attempt

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0, 3.2.1, 3.1.3
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      See org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.OpportunisticAMSProcessor.allocate

           // Allocate OPPORTUNISTIC containers.
      171.  SchedulerApplicationAttempt appAttempt =
      172.    ((AbstractYarnScheduler)rmContext.getScheduler())
      173.      .getApplicationAttempt(appAttemptId);
      174.
      175.  OpportunisticContainerContext oppCtx =
      176.  appAttempt.getOpportunisticContainerContext();
      177.  oppCtx.updateNodeList(getLeastLoadedNodes());
      

       MRAppmaster crashes before before allocate#171, ResourceManager will start the new appAttempt and do 

      org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplication.setCurrentAppAttempt(T currentAttempt){
          this.currentAttempt = currentAttempt;
      }

      hence the allocate#171 will get the new appAttmept  and  its field OpportunisticContainerContext hasn't been initialized.

      so oopCtx ==null at  and null pointer happens at line 177

      java.lang.NullPointerException
      at org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService$OpportunisticAMSProcessor.allocate(OpportunisticContainerAllocatorAMService.java:177)
      at org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
      at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
      at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
      at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
      at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
      at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830) 

        Attachments

        1. hadoop-test-resourcemanager-hadoop11.log
          94 kB
          lujie
        2. YARN-9238_1.patch
          5 kB
          lujie
        3. YARN-9238_2.patch
          5 kB
          lujie
        4. YARN-9238_3.patch
          5 kB
          lujie

          Issue Links

            Activity

              People

              • Assignee:
                xiaoheipangzi lujie
                Reporter:
                xiaoheipangzi lujie
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: