Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
1.15.0
Description
To allow recovered TMs to eagerly re-offer their slots we allowed the registration of slots without a matching requirement if the job is currently restarting.
All slots that the pool accepts are mapped to a certain requirement, in order to determine whether sufficient slots were received yet. If a slot is reserved for a requirement that does not coincide with the mapping the pool come up with, then the mapping and requirements are changed accordingly to ensure we still request sufficient slots.
This leads to issues with slots that were accepted without a matching requirement. Those were mapped to the actual resource profile of the slot (to fit into the book-keeping). With the above logic in place this could lead to a specific resource requirement being added, which the remaining JM components are not aware of (and thus will never get rid of).
Attachments
Issue Links
- is caused by
-
FLINK-25855 DefaultDeclarativeSlotPool rejects offered slots when the job is restarting
- Closed
- relates to
-
FLINK-26274 Test local recovery works across TaskManager process restarts
- Closed
- links to