Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-6187

Cancel job failed with option -m on yarn session

    XMLWordPrintableJSON

Details

    Description

      1. start yarn session: ./bin/yarn-session.sh -n 3 -jm 2048 -tm 3096
      2. submit a job: ./bin/flink run ...
      3. cancel the job with option -m:
      ./bin/flink cancel -m ip:port jobid

      org.apache.flink.configuration.IllegalConfigurationException: Couldn't retrieve client for cluster
              at org.apache.flink.client.CliFrontend.retrieveClient(CliFrontend.java:912)
              at org.apache.flink.client.CliFrontend.getJobManagerGateway(CliFrontend.java:926)
              at org.apache.flink.client.CliFrontend.cancel(CliFrontend.java:602)
              at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1079)
              at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
              at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
              at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
              at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
              at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)
      Caused by: java.lang.RuntimeException: Failed to retrieve JobManager address
              at org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:248)
              at org.apache.flink.client.CliFrontend.retrieveClient(CliFrontend.java:908)
              ... 11 more
      Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader address and leader session ID.
              at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:175)
              at org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:242)
              ... 12 more
      Caused by: java.util.concurrent.TimeoutException: Futures timed out after [60000 milliseconds]
              at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
              at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
              at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
              at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
              at scala.concurrent.Await$.result(package.scala:190)
              at scala.concurrent.Await.result(package.scala)
              at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:173)
              ... 13 more
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            Yuhong_kyo hongyuhong
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: