Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20205

DAGScheduler posts SparkListenerStageSubmitted before updating stage

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • Spark Core
    • None

    Description

      Probably affects other versions, haven't checked.

      The code that submits the event to the bus is around line 991:

          stage.makeNewStageAttempt(partitionsToCompute.size, taskIdToLocations.values.toSeq)
          listenerBus.post(SparkListenerStageSubmitted(stage.latestInfo, properties))
      

      Later in the same method, the stage information is updated (around line 1057):

          if (tasks.size > 0) {
            logInfo(s"Submitting ${tasks.size} missing tasks from $stage (${stage.rdd}) (first 15 " +
              s"tasks are for partitions ${tasks.take(15).map(_.partitionId)})")
            taskScheduler.submitTasks(new TaskSet(
              tasks.toArray, stage.id, stage.latestInfo.attemptId, jobId, properties))
            stage.latestInfo.submissionTime = Some(clock.getTimeMillis())
      

      That means an event handler might get a stage submitted event with an unset submission time.

      Attachments

        Issue Links

          Activity

            People

              vanzin Marcelo Masiero Vanzin
              vanzin Marcelo Masiero Vanzin
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: