Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-837

ResultTask's serialization forget about handling "generation" field, while ShuffleMapTask does

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.7.3, 0.8.0
    • 0.8.0
    • Spark Core
    • None

    Description

      In ResultTask's serialization relative method: writeExternal and readExternal, they didn't do anything to generation.

      But in ShuffleMapTask's method, writeExternal and readExternal, they do something like "partition = in.readInt()" and " out.writeLong(generation)" to them.

      As we know ResultTask will be used after ShuffleMapTask, if right after ShuffleMapTask finish and the work failed for some reason, It will be recomputed, with a "generation" bigger than -1. The ResultTask can't get the right data again with default generation, that it will ask DAGScheduler to recompter ShuffleMapTask again. This will last until the whole job crash.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              andyyehoo Andy Huang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: