Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-19022

AkkaRpcActor failed to start but no exception information

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      My job appeared that JM could not start normally, and the JM container was finally killed by RM.

      In the end, I found through debug that AkkaRpcActor failed to start because the version of yarn in my job was incompatible with the version in the cluster.

      AkkaRpcActor exception handling

      I add log printing here,and then found the specific problem.

      2020-08-21 21:31:16,985 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState [flink-akka.actor.default-dispatcher-4]  - Could not start RpcEndpoint resourcemanager.
      java.lang.NoSuchMethodError: org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto;
      	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
      	at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source)
      	at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222)
      	at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214)
      	at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
      	at org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229)
      	at org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262)
      	at org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204)
      	at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192)
      	at org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185)
      	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544)
      	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169)
      	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
      	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
      	at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
      	at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
      	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
      	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
      	at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
      	at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
      	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
      	at akka.actor.ActorCell.invoke(ActorCell.scala:561)
      	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
      	at akka.dispatch.Mailbox.run(Mailbox.scala:225)
      	at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
      	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
      	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
      	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
      	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

      Should we add logs here to help find problems?

       
       

        Attachments

          Activity

            People

            • Assignee:
              tartarus tartarus
              Reporter:
              tartarus tartarus

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment