Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16726

There is a memory-related problem about HDFS namenode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 2.7.2
    • None
    • hdfs, namenode
    • None

    Description

      In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX =280GB). The actual memory usage of Namenode is 479GB

      Output via pamp:

             Address Perm   Offset Device    Inode      Size       Rss       Pss Referenced Anonymous Swap Locked Mapping
        2b42f0000000 rw-p 00000000  00:00        0 294174720 293756960 293756960  293756960 293756960    0      0 
            01e21000 rw-p 00000000  00:00        0 195245456 195240848 195240848  195240848 195240848    0      0 [heap]
        2b897c000000 rw-p 00000000  00:00        0   9246724   9246724   9246724    9246724   9246724    0      0 
        2b8bb0905000 rw-p 00000000  00:00        0   1781124   1754572   1754572    1754572   1754572    0      0 
        2b8936000000 rw-p 00000000  00:00        0   1146880   1002084   1002084    1002084   1002084    0      0 
        2b42db652000 rwxp 00000000  00:00        0     57792     55252     55252      55252     55252    0      0 
        2b42ec12a000 rw-p 00000000  00:00        0     25696     24700     24700      24700     24700    0      0 
        2b42ef25b000 rw-p 00000000  00:00        0      9988      8972      8972       8972      8972    0      0 
        2b8c1d467000 rw-p 00000000  00:00        0      9216      8204      8204       8204      8204    0      0 
        2b8d6f8db000 rw-p 00000000  00:00        0      7160      6228      6228       6228      6228    0      0 

      The first line should configure the memory footprint for XMX, and [heap] is unusually large, so a memory leak is suspected!

       

      • [heap] is associated with malloc

      After configuring JCMD in the test environment, we found that the malloc part of Internal in JCMD increased significantly when the client was writing to a gz file (XMX =40g in the test environment, and the Internal area was 900MB before the client wrote) :

      Total: reserved=47276MB, committed=47070MB

      •                 Java Heap (reserved=40960MB, committed=40960MB)
                                    (mmap: reserved=40960MB, committed=40960MB) 
         
      •                     Class (reserved=53MB, committed=52MB)
                                    (classes #7423)
                                    (malloc=1MB #17053) 
                                    (mmap: reserved=52MB, committed=52MB) 
         
      •                    Thread (reserved=2145MB, committed=2145MB)
                                    (thread #2129)
                                    (stack: reserved=2136MB, committed=2136MB)
                                    (malloc=7MB #10673) 
                                    (arena=2MB #4256)
         
      •                      Code (reserved=251MB, committed=45MB)
                                    (malloc=7MB #10661) 
                                    (mmap: reserved=244MB, committed=38MB) 
         
      •                        GC (reserved=2307MB, committed=2307MB)
                                    (malloc=755MB #525664) 
                                    (mmap: reserved=1552MB, committed=1552MB) 
         
      •                  Compiler (reserved=8MB, committed=8MB)
                                    (malloc=8MB #8852) 
         
      •                  Internal (reserved=1524MB, committed=1524MB)
                                    (malloc=1524MB #323482) 
         
      •                    Symbol (reserved=12MB, committed=12MB)
                                    (malloc=10MB #91715) 
                                    (arena=2MB #1)
         
      •    Native Memory Tracking (reserved=16MB, committed=16MB)
                                    (tracking overhead=15MB)

      It is clear that the Internal malloc increases significantly when the client writes, and does not decrease after the client stops writing

       

      Through pref, I found some more instances when writing on the client side:

      Children      Self  Comm  Shared Ob  Symbol                                                                                                                                                                     
           0.05%     0.00%  java  libzip.so  [.] Java_java_util_zip_ZipFile_getEntry
           0.02%     0.00%  java  libzip.so  [.] Java_java_util_zip_Inflater_inflateBytes

      Therefore, it is suspected that the compressed write operation of the client may have a memory leak problem

       

      Use JCMD to locate the call link to Java_java_util_zip_Inflater_inflateBytes:

      "ExtensionRefresher" #59 daemon prio=5 os_prio=0 tid=0x000000002419d000 nid=0x69df runnable [0x00002b319d7a0000]
         java.lang.Thread.State: RUNNABLE
              at java.util.zip.Inflater.inflateBytes(Native Method)
              at java.util.zip.Inflater.inflate(Inflater.java:259)
              - locked <0x00002b278f7b9da8> (a java.util.zip.ZStreamRef)
              at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
              at java.io.FilterInputStream.read(FilterInputStream.java:133)
              at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown Source)
              at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
              at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
              at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source)
              at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
              at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
              at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
              at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
              at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
              at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
              at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
              at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
              at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
              at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2594)
              at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2582)
              at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2656)
              at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2606)
              at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2519)
              - locked <0x00002b3114eb4a98> (a org.apache.hadoop.conf.Configuration)
              at org.apache.hadoop.conf.Configuration.get(Configuration.java:1091)
              at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
              at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1546)
              at org.apache.hadoop.util.WhiteListFileManager.refresh(WhiteListFileManager.java:176)
              - locked <0x00002b2d6fe06a28> (a java.lang.Class for org.apache.hadoop.util.WhiteListFileManager)
              at org.apache.hadoop.util.ExtensionManager$2.run(ExtensionManager.java:70)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)

      Attachments

        Activity

          People

            Unassigned Unassigned
            yuyanlei Yanlei Yu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: