Uploaded image for project: 'Apache Nemo'
  1. Apache Nemo
  2. NEMO-418

BlockFetchFailureProperty

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.2

    Description

      The current behavior of an executor when a block fetch exception occurs is to simply cancel the running task. The cancellation is notified to the master, and the task is retried cleanly from the beginning.

      As an optimization. BlockFetchFailureProperty can provide options to not cancel the running task, but rather retry fetching the block. This can be useful, for example when running tasks on reserved resources (A) that depend on tasks on transient resources (B). With the new options, the (A) tasks can keep on running on reserved resources, despite frequent evictions of the (B) tasks that result in many block fetch failures.

      Note that NEMO-137 talks about retrying parents of the tasks with fetch failures. This issue is about retrying the original tasks with fetch failures.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              johnyangk John Yang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m