Uploaded image for project: 'Apache Celeborn'
  1. Apache Celeborn
  2. CELEBORN-1496

Unable to differentiate map tasks that have only different stage attempts

    XMLWordPrintableJSON

Details

    Description

      In spark, there is a stage re-computation, the DAGScheduler explicitly calling unregisterAllMapAndMergeOutput and then resubmitting the stage. At this point, Celeborn receives two map task results with only different stage attempt IDs.
      For example: "stage 0 stageAttempt 0 map 10 taskAttempt 0" and "stage 0 stageAttempt 1 map 10 taskAttempt 0"

      https://github.com/apache/spark/blob/branch-3.2/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L2043

      In Celeborn, map results are only distinguished by shuffleId, mapId, and taskAttemptId, hence making it unable to differentiate cases where only the stageAttemptId differs.

      Attachments

        Activity

          People

            jiang13021 jiang13021
            jiang13021 jiang13021
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 15h 40m
                15h 40m