Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8444

NodeResourceMonitor crashes on bad swapFree value

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.8.3, 3.0.2
    • 2.10.0, 3.2.0, 3.1.1, 2.9.2, 2.8.5, 3.0.4
    • None
    • None

    Description

      Saw this on a node that was running out of memory. Can't have NodeResourceMonitor exiting. System was above 99% memory used at the time, so this is not a common occurrence, but we should fix since this is a critical monitor to the health of the node.

       

      2018-06-04 14:28:08,539 [Container Monitor] DEBUG ContainersMonitorImpl.audit: Memory usage of ProcessTree 110564 for container-id container_e24_1526662705797_129647_01_004791: 2.1 GB of 3.5 GB physical memory used; 5.0 GB of 7.3 GB virtual memory used
      2018-06-04 14:28:10,622 [Node Resource Monitor] ERROR yarn.YarnUncaughtExceptionHandler: Thread Thread[Node Resource Monitor,5,main] threw an Exception.
      java.lang.NumberFormatException: For input string: "18446744073709551596"
       at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
       at java.lang.Long.parseLong(Long.java:592)
       at java.lang.Long.parseLong(Long.java:631)
       at org.apache.hadoop.util.SysInfoLinux.readProcMemInfoFile(SysInfoLinux.java:257)
       at org.apache.hadoop.util.SysInfoLinux.getAvailablePhysicalMemorySize(SysInfoLinux.java:591)
       at org.apache.hadoop.util.SysInfoLinux.getAvailableVirtualMemorySize(SysInfoLinux.java:601)
       at org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.getAvailableVirtualMemorySize(ResourceCalculatorPlugin.java:74)
       at org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl$MonitoringThread.run(NodeResourceMonitorImpl.java:193)
      2018-06-04 14:28:30,747 [org.apache.hadoop.util.JvmPauseMonitor$Monitor@226eba67] INFO util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 9330ms
      

      Attachments

        1. YARN-8444.001.patch
          5 kB
          Jim Brennan

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jbrennan Jim Brennan
            jbrennan Jim Brennan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment