[REEF-501] Distinguish different types of FailedEvaluator in Vortex - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Vortex
Labels:
None

Description

Currently Vortex considers all failures the same, via FailedEvaluatorHandler. We should handle different types of failures differently.

Type 1: Resource preemption
We react based on a configured policy. (e.g. re-request infinitely) If needed we can even add a new event handler to REEF Driver named PreemptedEvaluatorHandler just for this type(a separate JIRA issue outside of the Vortex umbrella JIRA).

Type 2: Internal Vortex code failure
Can happen nondeterministically and even result in an infinite resource release+request. In such case, we should probably shut down the Driver immediately for the ease of debugging and to prevent it from interefereing with other jobs in the cluster.

Type 3: Other types of failures
If the failure is caused by issues like OOM then we also treat such case differently.

Attachments

Issue Links

is related to

REEF-836 Add Preemption API

Open

Activity

People

Assignee:: John Yang

Reporter:: John Yang

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 24/Jul/15 13:06

Updated:: 24/Nov/15 08:34