Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19068

Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(41,WrappedArray())

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.0.1, 2.1.0
    • 2.2.0
    • None
    • None
    • RHEL 7.2

    Description

      On a large cluster with 45TB RAM and 1,000 cores, we used 1008 executors in order to use all RAM and cores for a 100TB Spark SQL workload. Long-running queries tend to report the following ERRORs

      16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(136,WrappedArray())
      16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(853,WrappedArray())
      16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(395,WrappedArray())
      16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(736,WrappedArray())
      16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(439,WrappedArray())
      16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(16,WrappedArray())
      16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(307,WrappedArray())
      16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(51,WrappedArray())
      16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(535,WrappedArray())
      16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(63,WrappedArray())
      16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(333,WrappedArray())
      16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(484,WrappedArray())
      ....(omitted) 
      

      The message itself maybe a reasonable response to a already stopped SparkListenerBus (so subsequent events are thrown away with that ERROR message). The issue is that because SparkContext does NOT exit until all these ERROR/events are reported, which is a huge number in our setup – and this can take, in some cases, hours!!!

      We tried increasing the
      Adding default property: spark.scheduler.listenerbus.eventqueue.size=130000
      from 10K, this still occurs.

      Attachments

        1. sparklog.tar.gz
          4.65 MB
          JESSE CHEN

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jfchen@us.ibm.com JESSE CHEN
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: