Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.0.0
-
None
Description
Because creating TaskSetManager and registering executors are asynchronous, if running job without enough executors, it will lead to some issues
- early stages' tasks run without preferred locality.
- the default parallelism in yarn is based on number of executors,
- the number of intermediate files per node for shuffle (this can bring the node down btw)
- and amount of memory consumed on a node for rdd MEMORY persisted data (making the job fail if disk is not specified : like some of the mllib algos ?)
- and so on ...
(thanks mridulm80 's comments )
A simple solution is sleeping few seconds in application, so that executors have enough time to register.
A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties.
# submit stage only after successfully registered executors arrived the ratio, default value 0 in Standalone mode and 0.9 in Yarn mode
spark.scheduler.minRegisteredRatio = 0.8
# whatever registered number is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 10000
spark.scheduler.maxRegisteredWaitingTime = 5000
Attachments
Attachments
Issue Links
- is duplicated by
-
SPARK-1453 Improve the way Spark on Yarn waits for executors before starting
- Resolved
- relates to
-
SPARK-2635 Fix race condition at SchedulerBackend.isReady in standalone mode
- Resolved
-
SPARK-1453 Improve the way Spark on Yarn waits for executors before starting
- Resolved
-
SPARK-2555 Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.
- Resolved