Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-9908

Inconsistent state of SlotPool after ExecutionGraph cancellation

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      If the ExecutionGraph is concurrently scheduled and cancelled, it can happen that requested Slots are not properly returned to the SlotPool. This causes an inconsistent state of the SlotPool where it thinks that some of its slots are still occupied even though the respective Execution has already been cancelled.

      The problem seems to be caused by propagating the cancellation of the overall scheduling future to the individual scheduling futures. If the individual scheduling future is cancelled, then the callback which produces its value and also handles the failure case won't be called.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            trohrmann Till Rohrmann
            trohrmann Till Rohrmann
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment