Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-17209

Erasure Coding: Native library memory leak

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0, 3.2.1, 3.1.3
    • 3.2.2, 3.3.1, 3.4.0
    • native
    • None

    Description

      We use both apache-hadoop-3.1.3 and CDH-6.1.1-1.cdh6.1.1.p0.875250 HDFS in production, and both of them have the memory increasing over -Xmx value. 

       

      We use EC strategy to to save storage costs.

      This's the jvm options:

      -Dproc_datanode -Dhdfs.audit.logger=INFO,RFAAUDIT -Dsecurity.audit.logger=INFO,RFAS -Djava.net.preferIPv4Stack=true -Xms8589934592 -Xmx8589934592 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError ...

      The max jvm heapsize is 8GB, but we can see the datanode RSS memory is 48g. All the other datanodes in this hdfs cluster has the same issue.

      PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
      226044 hdfs 20 0 50.6g 48g 4780 S 90.5 77.0 14728:27 /usr/java/jdk1.8.0_162/bin/java -Dproc_datanode

       

      This too much memory used leads to my machine unresponsive(if enable swap), or oom-killer happens.

       

      Attachments

        1. datanode.202137.detail_diff.5.txt
          18 kB
          Sean Chow
        2. HADOOP-17209.001.patch
          2 kB
          Sean Chow
        3. image-2020-08-15-18-26-44-744.png
          14 kB
          Sean Chow
        4. image-2020-08-20-12-35-39-906.png
          29 kB
          Sean Chow

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            seanlook Sean Chow
            seanlook Sean Chow
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment