Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-4540

Detached job execution may prevent cluster shutdown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.1.2, 1.2.0
    • None
    • Deployment / YARN
    • None

    Description

      There is a problem with the detached execution of jobs. This can prevent cluster shutdown 1) when eager jobs are executed, i.e. the job calls `collect()/count()`, and 2) when the user jar doesn't contain a job.

      1) For example, ./flink -d -m yarn-cluster -yn 1 ../examples/batch/WordCount.jar will throw an exception and only disconnect the YarnClusterClient afterwards. In detached mode, the code assumes the cluster is shutdown through the shutdownAfterJob method which ensures that the YarnJobManager shuts down after the job completes. Due to the exception thrown when executing eager jobs, the jobmanager never receives a job and thus never shuts down the cluster.

      2) The same problem also occurs in detached execution when the user jar doesn't contain a job.

      A good solution would be to defer cluster startup until the job has been fully assembled.

      Attachments

        Issue Links

          Activity

            People

              roman_maier Roman Maier
              mxm Maximilian Michels
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: