Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6776

yarn.app.mapreduce.client.job.max-retries should have a more useful default

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.9.0, 3.0.0-alpha2
    • Component/s: client
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      The default value of yarn.app.mapreduce.client.job.max-retries has been changed from 0 to 3. This will help protect clients from failures that are transient. True failures may take slightly longer now due to the retries.

      Description

      The default is 0, so any communication failure results in a client failure. Oozie doesn't like that. If the RM is failing over and Oozie gets a communication failure, it assumes the target job has failed. I propose raising the default to something modest like 3 or 5. The default retry interval is 2s.

        Attachments

        1. MAPREDUCE-6776.003.patch
          6 kB
          Miklos Szegedi
        2. MAPREDUCE-6776.002.patch
          5 kB
          Miklos Szegedi
        3. MAPREDUCE-6776.001.patch
          4 kB
          Miklos Szegedi

          Activity

            People

            • Assignee:
              miklos.szegedi@cloudera.com Miklos Szegedi
              Reporter:
              templedf Daniel Templeton
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: