Uploaded image for project: 'REEF (Retired)'
  1. REEF (Retired)
  2. REEF-364 A REEF application for utilizing volatile resources
  3. REEF-501

Distinguish different types of FailedEvaluator in Vortex

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Vortex
    • None

    Description

      Currently Vortex considers all failures the same, via FailedEvaluatorHandler. We should handle different types of failures differently.

      Type 1: Resource preemption
      We react based on a configured policy. (e.g. re-request infinitely) If needed we can even add a new event handler to REEF Driver named PreemptedEvaluatorHandler just for this type(a separate JIRA issue outside of the Vortex umbrella JIRA).

      Type 2: Internal Vortex code failure
      Can happen nondeterministically and even result in an infinite resource release+request. In such case, we should probably shut down the Driver immediately for the ease of debugging and to prevent it from interefereing with other jobs in the cluster.

      Type 3: Other types of failures
      If the failure is caused by issues like OOM then we also treat such case differently.

      Attachments

        Issue Links

          Activity

            People

              johnyangk John Yang
              johnyangk John Yang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: