Details
-
Bug
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
2.3.0
-
None
-
None
Description
DNS failures caused on the executor nodes causing shuffle nodes to be added to the exclude list. We should handle this failure and not cause a FetchFailed exception in such cases.
This helps in two things :
1. Stage won't be resubmitted due to FetchFailedException
2. Spark's exclude listing also won't exclude the shuffle service node when the problem is
indeed with the current executor host.