Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18881

Spark never finishes jobs and stages, JobProgressListener fails

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.0.2
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      yarn, deploy-mode = client

      Description

      We have a Spark application that process continuously a lot of incoming jobs. Several jobs are processed in parallel, on multiple threads.

      During intensive workloads, at some point, we start to have hundreds of warnings like this :

      16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379
      16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job 64610
      16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage 147405
      16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406
      16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job 64622
      

      Starting from that, the performance of the app plummet, most of Stages and Jobs never finish. On SparkUI, I can see figures like 13000 pending jobs.

      I can't see clearly another related exception happening before. Maybe this one, but it concerns another listener :

      16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of the SparkListeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.
      16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since Thu Jan 01 01:00:00 CET 1970
      

      This is very problematic for us, since it's hard to detect, and requires an app restart.

      EDIT :

      I confirm the sequence :
      1- ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue
      then
      2- JobProgressListener losing track of job and stages.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                mathieude Mathieu DESPRIEE
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: