Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4852

Resource Manager Ran Out of Memory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.6.0
    • None
    • resourcemanager
    • None

    Description

      Resource Manager went out of memory (max heap size: 8 GB, CMS GC) and shut down itself.

      Heap dump analysis reveals that 1200 instances of RMNodeImpl class hold 86% of memory. When digging deeper, there are around 0.5 million objects of UpdatedContainerInfo (nodeUpdateQueue inside RMNodeImpl). This in turn contains around 1.7 million objects of YarnProtos$ContainerIdProto, ContainerStatusProto, ApplicationAttemptIdProto, ApplicationIdProto each of which retain around 1 GB heap.

      Back to Back Full GC kept on happening. GC wasn't able to recover any heap and went OOM. JVM dumped the heap before quitting. We analyzed the heap.

      RM's usual heap usage is around 4 GB but it suddenly spiked to 8 GB in 20 mins time and went OOM.

      There are no spike in job submissions, container numbers at the time of issue occurrence.

      Attachments

        1. threadDump.log
          199 kB
          Gokul

        Activity

          People

            Unassigned Unassigned
            slukog Gokul
            Votes:
            0 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

              Created:
              Updated: