Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30831

Executors UI shows more active tasks then possible

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0
    • None
    • Web UI
    • None

    Description

      I regularly see the executors web ui showing more active tasks then it has cores. Looking at the code it seems that we track those separately and the message that is sent for task end is asynchronous and thus ends up showing up at the UI much later then the start event.

      CoarseGrainedSchedulerBackend on statusUpdate increases the freeCores which then allow scheduler to assign another task, but the taskEndEvent is asynchronous.

      We definitely don't want to slow down the scheduling part so not sure how easily it will be to improve.

       

      To reproduce I just ran:

      val df = sc.makeRDD(1 to 10000000, 6).toDF
      val df2 = sc.makeRDD(1 to 10000000, 6).toDF

      spark.time(df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" === $"b").write.mode("overwrite").csv("somefile"))

       

      And view the executors ui page. I started spark-shell with just 1 core per executor and you  see 2 active tasks

      Attachments

        Activity

          People

            Unassigned Unassigned
            tgraves Thomas Graves
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: