Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5396

Application is "FAILED" when multiple appmaster attempts are spawned

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0.5-alpha
    • Fix Version/s: None
    • Component/s: mr-am
    • Labels:
      None

      Description

      1.Run job with 142 maps
      2.After some map tasks executed kill NM where appmaster running(Using kill -9 cmd)
      3.Now obeserve that till NM expiry interval that appmaster will be running after NM expiry interval that appmaster will be killed and new appmaster will be launched

      Observations:
      -------------------
      1.First appmaster while going down deletes the staging dir of job
      2.While new appmaster is running it will kill all the tasks running in it and fails the application saying files in staging dir not present

        Issue Links

          Activity

          Hide
          Jason Lowe added a comment -

          Moved this to MAPREDUCE since this is an issue with the MRAppMaster. Unless the app attempt is the last one, the MR AM should not be deleting the staging directory when told to shutdown by a heartbeat response from the RM.

          Show
          Jason Lowe added a comment - Moved this to MAPREDUCE since this is an issue with the MRAppMaster. Unless the app attempt is the last one, the MR AM should not be deleting the staging directory when told to shutdown by a heartbeat response from the RM.

            People

            • Assignee:
              Devaraj K
              Reporter:
              Nishan Shetty, Huawei
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development