Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
The current behavior of an executor when a block fetch exception occurs is to simply cancel the running task. The cancellation is notified to the master, and the task is retried cleanly from the beginning.
As an optimization. BlockFetchFailureProperty can provide options to not cancel the running task, but rather retry fetching the block. This can be useful, for example when running tasks on reserved resources (A) that depend on tasks on transient resources (B). With the new options, the (A) tasks can keep on running on reserved resources, despite frequent evictions of the (B) tasks that result in many block fetch failures.
Note that NEMO-137 talks about retrying parents of the tasks with fetch failures. This issue is about retrying the original tasks with fetch failures.
Attachments
Issue Links
- relates to
-
NEMO-137 Retry parent task(s) upon task INPUT_READ_FAILURE
- Open
- links to