Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Duplicate
-
2.1.0, 2.1.1
-
None
-
None
Description
After upgrading from Spark 2.0.2 to 2.1.0 we've observed that jobs are often failing when speculative mode is enabled.
In 2.0.2 speculative tasks were simply skipped if they were not used for result (i.e. other instance finished earlier) - and it was clearly visible in UI that those tasks were not counted as failures.
In 2.1.0 many tasks are being marked failed/killed when speculative tasks start to run (that is at the end of stage when there are spare executors to use) which also leads to entire stage/job failures.
Disabling spark.speculation solves failing problem - but speculative mode is very useful especially when different executors run on machines with varying load (for example in YARN)
Attachments
Issue Links
- Is contained by
-
SPARK-20358 Executors failing stage on interrupted exception thrown by cancelled tasks
- Resolved