Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-74

nodemanager should cleanup running containers when shutdown

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 0.23.3
    • Fix Version/s: None
    • Component/s: nodemanager
    • Labels:
      None

      Description

      Currently the nodemanager doesn't cleanup running containers when it gets restarted. This can cause containers to get lost and stick around forever. We've seen this happen multiple times when the RM is restarted. When the RM is brought back up, it doesn't know about what was running on the cluster, it tells the NMs to reboot and when the NM reboots it loses what it had running. If there are any containers that are behaving badly there is no one left that knows about them to kill them.

      We should try to kill any running containers when the node manager is shutting down. We should also check when the nodemanager is being brought back up - but that will be a separate jira.

      This might change a bit when RM restart is implemented if tasks can actually survive across RM/NM being rebooted, but that can be addressed at that point.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                tgraves Thomas Graves
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: