Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-9082

Submission with higher parallelism than task slots fails with TimeoutException

    XMLWordPrintableJSON

Details

    Description

      Submitting a job (for example of FLINK-8972) with a higher parallelism than available task slots in standalone FLIP-6 mode fails after 5 minutes with:

      org.apache.flink.client.program.ProgramInvocationException: java.util.concurrent.TimeoutException
      	at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259)
      	at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)
      	at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:452)
      	at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
      	at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:815)
      	at org.apache.flink.batch.tests.DataSetAllroundTestProgram.main(DataSetAllroundTestProgram.java:179)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
      	at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)
      	at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)
      	at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:781)
      	at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:275)
      	at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)
      	at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1020)
      	at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1096)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
      	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
      	at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1096)
      Caused by: java.util.concurrent.TimeoutException
      	at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:812)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      

      This should actually be a NotEnoughResourceAvailableException with a more meaningful exception message if possible.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              twalthr Timo Walther
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: