Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-1689

RMAppAttempt is not killed when RMApp is at ACCEPTED

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.4.0
    • Component/s: resourcemanager
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When running some Hive on Tez jobs, the RM after a while gets into an unusable state where no jobs run. In the RM log I see the following exception:

      2014-02-04 20:28:08,553 WARN  ipc.Server (Server.java:run(1978)) - IPC Server handler 0 on 8030, call org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster from 172.18.145.156:40474 Call#0 Retry#0: error: java.lang.NullPointerException
      java.lang.NullPointerException
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getTransferredContainers(AbstractYarnScheduler.java:48)
              at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:278)
              at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90)
              at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:396)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
      ......
      2014-02-04 20:28:08,544 ERROR rmapp.RMAppImpl (RMAppImpl.java:handle(626)) - Can't handle this event at current state
      org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: ATTEMPT_REGISTERED at KILLED
              at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
              at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
              at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
              at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:624)
              at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:81)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:656)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:640)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
              at java.lang.Thread.run(Thread.java:662)
      2014-02-04 20:28:08,549 INFO  resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(140)) - USER=hrt_qa  IP=172.18.145.156       OPERATION=Kill Application Request      TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1391543307203_0001
      2014-02-04 20:28:08,553 WARN  ipc.Server (Server.java:run(1978)) - IPC Server handler 0 on 8030, call org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster from 172.18.145.156:40474 Call#0 Retry#0: error: java.lang.NullPointerException
      java.lang.NullPointerException
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getTransferredContainers(AbstractYarnScheduler.java:48)
              at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:278)
              at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90)
              at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:396)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
      

        Attachments

        1. YARN-1689-20140205.txt
          15 kB
          Vinod Kumar Vavilapalli
        2. RM_UI.png
          319 kB
          Deepesh Khandelwal

          Issue Links

            Activity

              People

              • Assignee:
                vinodkv Vinod Kumar Vavilapalli
                Reporter:
                deepesh Deepesh Khandelwal
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: