Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33747

Avoid calling unregisterMapOutput when the map stage is being rerunning.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.4.5, 3.0.1
    • Fix Version/s: None
    • Component/s: Block Manager
    • Labels:
      None

      Description

      When a fetch failure happened, DAGScheduler will try to unregister the corresponding map output. The current logic has a race condition that the new map stage attempt is running while the current reduce stage attempt returns another fetch failure (note: the current reduce stage firstly returns a fetch failure to make the maps stage is rerunning, and then the rerunning map stage may return some mapstatus of the failed MapId before the current reduce stage returns another fetch failure at the same MapId, the current reduce is last attempt due to the new map stage is not yet completed). In this case, if the map output is always unregistered, it may actually unregister the map output from the new map stage attempt.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              weixiuli weixiuli
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: