Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3347

Resource manager is not respawning MRAppMaster process if it goes down in the middle of job execution and the job is getting failed.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: 0.23.0
    • Fix Version/s: None
    • Component/s: mrv2
    • Labels:
      None

      Description

      ApplicationMaster service should recover the job if MRAppMaster process goes down in the middle of job execution.If not MRAppMaster process becomes the single point of failure for the job and losses the advantage of MRV1 framework.

        Activity

        Ramgopal N created issue -
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Ramgopal, do you have retries enabled? Bump yarn.resourcemanager.am.max-retries to say 3 or 4 and retry with a fresh cluster. The default value is 1, so the retry is off by default.

        Show
        Vinod Kumar Vavilapalli added a comment - Ramgopal, do you have retries enabled? Bump yarn.resourcemanager.am.max-retries to say 3 or 4 and retry with a fresh cluster. The default value is 1, so the retry is off by default.
        Hide
        Ramgopal N added a comment -

        Hi vinod ,
        By enabling yarn.resourcemanager.am.max-retries in yarn-site.xml the RM retries specified number of times before failing the job. Thanks

        Show
        Ramgopal N added a comment - Hi vinod , By enabling yarn.resourcemanager.am.max-retries in yarn-site.xml the RM retries specified number of times before failing the job. Thanks
        Ramgopal N made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Invalid [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        17h 51m 1 Ramgopal N 05/Nov/11 04:58

          People

          • Assignee:
            Unassigned
            Reporter:
            Ramgopal N
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development