Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.15.0
Description
With the introduction of configurable ResourceID for TaskManager processes, it can happen that a restarted TaskManager process will be restarted with the same ResourceID. When it now tries to register at the JobMaster, the JobMaster won't recognize it as a new instance because it only compares the ResourceID. As a consequence, the JobMaster things that this is a duplicate registration and ignores it.
It would be better if the TaskManager would send a session id with the registration that could then be used to decide whether a new instance tries to register at the JobMaster and, therefore, the old one needs to be disconnected or whether the registration attempt is a duplicate. This would speed up the cluster reconciliation because we would not have to wait for the heartbeat timeout to occur.
Attachments
Issue Links
- links to