Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4819

AM can rerun job after reporting final job status to the client

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.23.3, 2.0.1-alpha
    • Fix Version/s: 2.0.3-alpha, 0.23.6
    • Component/s: mr-am
    • Labels:
      None

      Description

      If the AM reports final job status to the client but then crashes before unregistering with the RM then the RM can run another AM attempt. Currently AM re-attempts assume that the previous attempts did not reach a final job state, and that causes the job to rerun (from scratch, if the output format doesn't support recovery).

      Re-running the job when we've already told the client the final status of the job is bad for a number of reasons. If the job failed, it's confusing at best since the client was already told the job failed but the subsequent attempt could succeed. If the job succeeded there could be data loss, as a subsequent job launched by the client tries to consume the job's output as input just as the re-attempt starts removing output files in preparation for the output commit.

        Attachments

        1. MR-4819-4832.txt
          100 kB
          Robert Joseph Evans
        2. MR-4819-bobby-trunk.txt
          98 kB
          Robert Joseph Evans
        3. MR-4819-bobby-trunk.txt
          95 kB
          Robert Joseph Evans
        4. MR-4819-bobby-trunk.txt
          95 kB
          Robert Joseph Evans
        5. MR-4819-bobby-trunk.txt
          92 kB
          Robert Joseph Evans
        6. MR-4819-bobby-trunk.txt
          86 kB
          Robert Joseph Evans
        7. MR-4819-bobby-trunk.txt
          47 kB
          Robert Joseph Evans
        8. MAPREDUCE-4819.3.patch
          44 kB
          Bikas Saha
        9. MAPREDUCE-4819.2.patch
          35 kB
          Bikas Saha
        10. MAPREDUCE-4819.1.patch
          11 kB
          Bikas Saha

          Issue Links

            Activity

              People

              • Assignee:
                bikassaha Bikas Saha
                Reporter:
                jlowe Jason Lowe
              • Votes:
                0 Vote for this issue
                Watchers:
                15 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: