Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5567 [Umbrella] Stabilize MR framework w.r.t ResourceManager restart
  3. MAPREDUCE-5607

Backport MAPREDUCE-5086 - MR app master deletes staging dir when sent a reboot command from the RM

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.23.9
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      If the RM is restarted when the MR job is running, then it sends a reboot command to the job. The job ends up deleting the staging dir and that causes the next attempt to fail.

        Issue Links

          Activity

          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12612244/MAPREDUCE-5607-branch-0.23.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4179//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612244/MAPREDUCE-5607-branch-0.23.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4179//console This message is automatically generated.
          Hide
          Jonathan Eagles added a comment -

          This patch only applies to branch-0.23 so failure is expected.

          Show
          Jonathan Eagles added a comment - This patch only applies to branch-0.23 so failure is expected.
          Hide
          Mit Desai added a comment -

          I testes the patch on my machine. Test passes both before and after applying the patch. Looks good to me.

          Show
          Mit Desai added a comment - I testes the patch on my machine. Test passes both before and after applying the patch. Looks good to me.
          Hide
          Mit Desai added a comment -

          +1 (non binding)

          Show
          Mit Desai added a comment - +1 (non binding)
          Hide
          Jason Lowe added a comment -

          Thanks for the patch, Jon. Comments:

          • This patch adds a new JOB_UPDATED_NODES event which is unrelated to the change in MAPREDUCE-5086. Nothing generates that event.
          • In branch-0.23, the number of AM attempts is set cluster-wide and not per-app as is the case in 2.x. Therefore it's probably not appropriate to add MRJobConfig.DEFAULT_MR_AM_MAX_ATTEMPTS. Instead we should use YarnConfiguration.DEFAULT_RM_AM_MAX_RETRIES to match what the rest of the code is doing in branch-0.23.
          Show
          Jason Lowe added a comment - Thanks for the patch, Jon. Comments: This patch adds a new JOB_UPDATED_NODES event which is unrelated to the change in MAPREDUCE-5086 . Nothing generates that event. In branch-0.23, the number of AM attempts is set cluster-wide and not per-app as is the case in 2.x. Therefore it's probably not appropriate to add MRJobConfig.DEFAULT_MR_AM_MAX_ATTEMPTS. Instead we should use YarnConfiguration.DEFAULT_RM_AM_MAX_RETRIES to match what the rest of the code is doing in branch-0.23.
          Hide
          Jonathan Eagles added a comment -

          This feature change is introduces too much risk to so close to the end of 0.23.x development and the beginning of maintenance for this line.

          Show
          Jonathan Eagles added a comment - This feature change is introduces too much risk to so close to the end of 0.23.x development and the beginning of maintenance for this line.

            People

            • Assignee:
              Jonathan Eagles
              Reporter:
              Jonathan Eagles
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development