Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4833

Task can get stuck in FAIL_CONTAINER_CLEANUP

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      If an NM goes down and the AM still tries to launch a container on it the ContainerLauncherImpl can get stuck in an RPC timeout. At the same time the RM may notice that the NM has gone away and inform the AM of this, this triggers a TA_FAILMSG. If the TA_FAILMSG arrives at the TaskAttemptImpl before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try to kill the container, but the ContainerLauncherImpl will not send back a TA_CONTAINER_CLEANED event causing the attempt to be stuck.

        Attachments

        1. MAPREDUCE4833.patch
          9 kB
          Robert Parker
        2. MAPREDUCE4833-1.patch
          9 kB
          Robert Parker
        3. MAPREDUCE4833-2.patch
          9 kB
          Robert Parker

          Activity

            People

            • Assignee:
              robsparker Robert Parker
              Reporter:
              revans2 Robert Joseph Evans
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: