Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20342

DAGScheduler sends SparkListenerTaskEnd before updating task's accumulators

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.2.1, 2.3.0
    • Spark Core
    • None

    Description

      Hit this on 2.2, but probably has been there forever. This is similar in spirit to SPARK-20205.

      Event is sent here, around L1154:

          listenerBus.post(SparkListenerTaskEnd(
             stageId, task.stageAttemptId, taskType, event.reason, event.taskInfo, taskMetrics))
      

      Accumulators are updated later, around L1173:

          val stage = stageIdToStage(task.stageId)
          event.reason match {
            case Success =>
              task match {
                case rt: ResultTask[_, _] =>
                  // Cast to ResultStage here because it's part of the ResultTask
                  // TODO Refactor this out to a function that accepts a ResultStage
                  val resultStage = stage.asInstanceOf[ResultStage]
                  resultStage.activeJob match {
                    case Some(job) =>
                      if (!job.finished(rt.outputId)) {
                        updateAccumulators(event)
      

      Same thing applies here; UI shows correct info because it's pointing at the mutable TaskInfo structure. But the event log, for example, may record the wrong information.

      Attachments

        Issue Links

          Activity

            People

              vanzin Marcelo Masiero Vanzin
              vanzin Marcelo Masiero Vanzin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: