Uploaded image for project: 'Apache Nemo'
  1. Apache Nemo
  2. NEMO-418

BlockFetchFailureProperty

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2

      Description

      The current behavior of an executor when a block fetch exception occurs is to simply cancel the running task. The cancellation is notified to the master, and the task is retried cleanly from the beginning.

      As an optimization. BlockFetchFailureProperty can provide options to not cancel the running task, but rather retry fetching the block. This can be useful, for example when running tasks on reserved resources (A) that depend on tasks on transient resources (B). With the new options, the (A) tasks can keep on running on reserved resources, despite frequent evictions of the (B) tasks that result in many block fetch failures.

      Note that NEMO-137 talks about retrying parents of the tasks with fetch failures. This issue is about retrying the original tasks with fetch failures.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                johnyangk John Yang
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m