Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.0.1, 2.1.0
-
None
-
None
-
RHEL 7.2
Description
On a large cluster with 45TB RAM and 1,000 cores, we used 1008 executors in order to use all RAM and cores for a 100TB Spark SQL workload. Long-running queries tend to report the following ERRORs
16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(136,WrappedArray()) 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(853,WrappedArray()) 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(395,WrappedArray()) 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(736,WrappedArray()) 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(439,WrappedArray()) 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(16,WrappedArray()) 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(307,WrappedArray()) 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(51,WrappedArray()) 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(535,WrappedArray()) 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(63,WrappedArray()) 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(333,WrappedArray()) 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(484,WrappedArray()) ....(omitted)
The message itself maybe a reasonable response to a already stopped SparkListenerBus (so subsequent events are thrown away with that ERROR message). The issue is that because SparkContext does NOT exit until all these ERROR/events are reported, which is a huge number in our setup – and this can take, in some cases, hours!!!
We tried increasing the
Adding default property: spark.scheduler.listenerbus.eventqueue.size=130000
from 10K, this still occurs.
Attachments
Attachments
Issue Links
- duplicates
-
SPARK-19146 Drop more elements when stageData.taskData.size > retainedTasks to reduce the number of times on call drop
- Resolved