Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
When the job container count changes the default taskname grouper (GroupByContainerCount) will reassign all the tasks among the new container list in a round-robin fashion. This causes many of the tasks to shift to new containers. The shifted tasks will be unable to restore state from local disk because the new containers may not be assigned to the same host that the task's original container was assigned to.
This ticket is to implement a task-to-container affinity, which complements the container-to-host affinity in the current implementation. The implementation will include a mapping which is persisted to the coordinator stream and is used as the basis for the new task-to-container mapping (ContainerModel).
If the container count doesn't change, the old task mapping will be used. (Note that this will allow tools to inject custom mappings by writing to the coordinator stream).
If the container count changes, a minimal number of tasks will be reassigned from the persisted mapping, in order to "balance" the containers.