Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5476

Not existed application reported as ACCEPTED state by YarnClientImpl

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: yarn
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Steps To reproduce:

      • Create a cluster with RM HA enabled
      • Start a yarn application
      • When yarn application is in NEW state, do RM failover.

      In this case, the application gets "ApplicationNotFound" exception from YARN.
      and it goes to accepted state and gets stuck.

      At this point, if yarn application -status <appId> is run, it says that application is in ACCEPTED state.
      This state is misleading.

      hrt_qa@xxx:/root> yarn application -status application_1470379565464_0001
      16/08/05 17:24:29 INFO impl.TimelineClientImpl: Timeline service address: https://xxx:8190/ws/v1/timeline/
      16/08/05 17:24:30 INFO client.AHSProxy: Connecting to Application History server at xxx/xxx:10200
      16/08/05 17:24:31 WARN retry.RetryInvocationHandler: Exception while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over rm1. Not retrying because try once and fail.
      org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1470379565464_0001' doesn't exist in RM.
      	at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:331)
      	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
      	at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
      
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
      	at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
      	at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
      	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:194)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
      	at com.sun.proxy.$Proxy18.getApplicationReport(Unknown Source)
      	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:436)
      	at org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:481)
      	at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:160)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
      	at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:83)
      Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException): Application with id 'application_1470379565464_0001' doesn't exist in RM.
      	at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:331)
      	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
      	at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
      
      	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1496)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1396)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
      	at com.sun.proxy.$Proxy17.getApplicationReport(Unknown Source)
      	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:191)
      	... 14 more
      Application Report : 
      	Application-Id : application_1470379565464_0001
      	Application-Name : null
      	Application-Type : null
      	User : null
      	Queue : null
      	Application Priority : null
      	Start-Time : 0
      	Finish-Time : 0
      	Progress : 0%
      	State : ACCEPTED
      	Final-State : UNDEFINED
      	Tracking-URL : N/A
      	RPC Port : -1
      	AM Host : N/A
      	Aggregate Resource Allocation : N/A
      	Log Aggregation Status : N/A
      	Diagnostics : 
      	Unmanaged Application : false
      	Application Node Label Expression : null

        Attachments

        1. YARN-5476-branch-2.patch
          6 kB
          Junping Du
        2. YARN-5476.patch
          6 kB
          Junping Du

          Activity

            People

            • Assignee:
              djp Junping Du
              Reporter:
              yeshavora Yesha Vora
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: