Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9238

Avoid allocating opportunistic containers to previous/removed/non-exist application attempt

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 3.3.0, 3.2.1, 3.1.3
    • None
    • None
    • Reviewed

    Description

      See org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.OpportunisticAMSProcessor.allocate

           // Allocate OPPORTUNISTIC containers.
      171.  SchedulerApplicationAttempt appAttempt =
      172.    ((AbstractYarnScheduler)rmContext.getScheduler())
      173.      .getApplicationAttempt(appAttemptId);
      174.
      175.  OpportunisticContainerContext oppCtx =
      176.  appAttempt.getOpportunisticContainerContext();
      177.  oppCtx.updateNodeList(getLeastLoadedNodes());
      

       MRAppmaster crashes before before allocate#171, ResourceManager will start the new appAttempt and do 

      org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplication.setCurrentAppAttempt(T currentAttempt){
          this.currentAttempt = currentAttempt;
      }

      hence the allocate#171 will get the new appAttmept  and  its field OpportunisticContainerContext hasn't been initialized.

      so oopCtx ==null at  and null pointer happens at line 177

      java.lang.NullPointerException
      at org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService$OpportunisticAMSProcessor.allocate(OpportunisticContainerAllocatorAMService.java:177)
      at org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
      at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
      at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
      at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
      at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
      at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830) 

      Attachments

        1. hadoop-test-resourcemanager-hadoop11.log
          94 kB
          lujie
        2. YARN-9238_1.patch
          5 kB
          lujie
        3. YARN-9238_2.patch
          5 kB
          lujie
        4. YARN-9238_3.patch
          5 kB
          lujie

        Issue Links

          Activity

            People

              xiaoheipangzi lujie
              xiaoheipangzi lujie
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: