Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-20033

Job fails when stopping JobMaster

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      When a JobMaster is stopped, we first disconnect all TaskExecutors. This disconnection causes potentially running Executions to fail. This in turn can cause a restart of the job or in the worst case a transition into FAILED state if the restarts are depleted. This again can cause the clean up of HA data.

      Instead of failing the job, the job should be suspended if the JobMaster gets stopped because this happens if the Dispatcher loses its leadership. The problem has been fixed unintentionally by FLINK-19237 in the master branch.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            trohrmann Till Rohrmann
            trohrmann Till Rohrmann
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment