Details
-
Task
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
-
ghx-label-1
Description
- If node “a” has a higher rate of task failures compared to the rest of the cluster, then node “a” should be blacklisted
- There should only be a specific set of failures that count against a node - e.g. query specific failures like reading corrupted files, or mem limit exceeded should not count
- This is similar to how Spark Executor Blacklisting works