Details
Description
DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted events has to be processed as DAGSchedulerEventProcessLoop is single threaded and it will block other tasks in queue like TaskCompletion.
The JobSubmitted event is time consuming depending on the nature of the job (Example: calculating parent stage dependencies, shuffle dependencies, partitions) and thus it blocks all the events to be processed.
I see multiple JIRA referring to this behavior
https://issues.apache.org/jira/browse/SPARK-2647
https://issues.apache.org/jira/browse/SPARK-4961
Similarly in my cluster some jobs partition calculation is time consuming (Similar to stack at SPARK-2647) hence it slows down the spark DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even if its tasks are finished within seconds, as TaskCompletion Events are processed at a slower rate due to blockage.
Attachments
Issue Links
- relates to
-
SPARK-2647 DAGScheduler plugs others when processing one JobSubmitted event
- Resolved
-
SPARK-4961 Put HadoopRDD.getPartitions forward to reduce DAGScheduler.JobSubmitted processing time
- Resolved
- links to