Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4833

Task can get stuck in FAIL_CONTAINER_CLEANUP

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      If an NM goes down and the AM still tries to launch a container on it the ContainerLauncherImpl can get stuck in an RPC timeout. At the same time the RM may notice that the NM has gone away and inform the AM of this, this triggers a TA_FAILMSG. If the TA_FAILMSG arrives at the TaskAttemptImpl before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try to kill the container, but the ContainerLauncherImpl will not send back a TA_CONTAINER_CLEANED event causing the attempt to be stuck.

      1. MAPREDUCE4833-2.patch
        9 kB
        Robert Parker
      2. MAPREDUCE4833-1.patch
        9 kB
        Robert Parker
      3. MAPREDUCE4833.patch
        9 kB
        Robert Parker

        Activity

        Robert Joseph Evans created issue -
        Robert Parker made changes -
        Field Original Value New Value
        Assignee Robert Parker [ robsparker ]
        Robert Parker made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Release Note Previously the Container did not send an event on kill if it was DONE, and returned (essentially a no-op). This patch will send a TA_CONTAINER_CLEANED event in all cases.
        Robert Parker made changes -
        Attachment MAPREDUCE4833-23.patch [ 12561562 ]
        Robert Parker made changes -
        Release Note Previously the Container did not send an event on kill if it was DONE, and returned (essentially a no-op). This patch will send a TA_CONTAINER_CLEANED event in all cases.
        Robert Parker made changes -
        Attachment MAPREDUCE4833.patch [ 12562116 ]
        Robert Parker made changes -
        Attachment MAPREDUCE4833-1.patch [ 12562133 ]
        Robert Parker made changes -
        Attachment MAPREDUCE4833-23.patch [ 12561562 ]
        Robert Parker made changes -
        Attachment MAPREDUCE4833-2.patch [ 12562139 ]
        Jason Lowe made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 2.0.3-alpha [ 12323275 ]
        Fix Version/s 0.23.6 [ 12323502 ]
        Resolution Fixed [ 1 ]
        Arun C Murthy made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Robert Parker
            Reporter:
            Robert Joseph Evans
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development