Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20205

DAGScheduler posts SparkListenerStageSubmitted before updating stage

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      Probably affects other versions, haven't checked.

      The code that submits the event to the bus is around line 991:

          stage.makeNewStageAttempt(partitionsToCompute.size, taskIdToLocations.values.toSeq)
          listenerBus.post(SparkListenerStageSubmitted(stage.latestInfo, properties))
      

      Later in the same method, the stage information is updated (around line 1057):

          if (tasks.size > 0) {
            logInfo(s"Submitting ${tasks.size} missing tasks from $stage (${stage.rdd}) (first 15 " +
              s"tasks are for partitions ${tasks.take(15).map(_.partitionId)})")
            taskScheduler.submitTasks(new TaskSet(
              tasks.toArray, stage.id, stage.latestInfo.attemptId, jobId, properties))
            stage.latestInfo.submissionTime = Some(clock.getTimeMillis())
      

      That means an event handler might get a stage submitted event with an unset submission time.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vanzin Marcelo Masiero Vanzin
                Reporter:
                vanzin Marcelo Masiero Vanzin
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: