Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-4540

Detached job execution may prevent cluster shutdown

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.2.0, 1.1.2
    • Fix Version/s: None
    • Component/s: YARN
    • Labels:
      None

      Description

      There is a problem with the detached execution of jobs. This can prevent cluster shutdown 1) when eager jobs are executed, i.e. the job calls `collect()/count()`, and 2) when the user jar doesn't contain a job.

      1) For example, ./flink -d -m yarn-cluster -yn 1 ../examples/batch/WordCount.jar will throw an exception and only disconnect the YarnClusterClient afterwards. In detached mode, the code assumes the cluster is shutdown through the shutdownAfterJob method which ensures that the YarnJobManager shuts down after the job completes. Due to the exception thrown when executing eager jobs, the jobmanager never receives a job and thus never shuts down the cluster.

      2) The same problem also occurs in detached execution when the user jar doesn't contain a job.

      A good solution would be to defer cluster startup until the job has been fully assembled.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                roman_maier Roman Maier
                Reporter:
                mxm Maximilian Michels
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: