Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7183

Make app master recover history from latest history file that exists

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: applicationmaster
    • Labels:
      None
    • Target Version/s:

      Description

      When running a mapreduce job, when the original app master is killed, the new app master normally attempts to recover by reading the jhist file that was written by the app master from the previous app attempt (e.g. current app attempt - 1).

      This is usually fine, but is a problem in the following situation:

      1. App master 1 writes history to jobid_1.jhist, then is killed
      2. App master 2 starts up but is killed before it has the chance to write any history to jobid_2.jhist
      3. App master 3 attempts to recover, but it can't find jobid_2.jhist, so all job progress is lost.

      This problem manifests as "Unable to parse prior job history, aborting recovery" and "Could not parse the old history file. Will not have old AMinfos" errors, all job progress being lost, and previous app attempts not showing up in the job history UI.

      To fix this problem, if jobid_2.jhist is missing, app master 3 should just recover using the history in jobid_1.jhist.

      Related JIRAs that mention this same problem:

      https://issues.apache.org/jira/browse/MAPREDUCE-4729

      https://issues.apache.org/jira/browse/MAPREDUCE-4767 

        Attachments

        1. MAPREDUCE-7183.patch
          6 kB
          Mikayla Konst

          Activity

            People

            • Assignee:
              mkonst Mikayla Konst
              Reporter:
              mkonst Mikayla Konst
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: