Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-25817 FLIP-201: Persist local state in working directory
  3. FLINK-25849

Differentiate TaskManager sessions when registering at the JobMaster

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      With the introduction of configurable ResourceID for TaskManager processes, it can happen that a restarted TaskManager process will be restarted with the same ResourceID. When it now tries to register at the JobMaster, the JobMaster won't recognize it as a new instance because it only compares the ResourceID. As a consequence, the JobMaster things that this is a duplicate registration and ignores it.

      It would be better if the TaskManager would send a session id with the registration that could then be used to decide whether a new instance tries to register at the JobMaster and, therefore, the old one needs to be disconnected or whether the registration attempt is a duplicate. This would speed up the cluster reconciliation because we would not have to wait for the heartbeat timeout to occur.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            trohrmann Till Rohrmann
            trohrmann Till Rohrmann
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment