Flume
  1. Flume
  2. FLUME-473

Master by default uses excessive virtual memory despite only having a small amount of memory resident.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Won't Fix
    • Affects Version/s: v0.9.4
    • Fix Version/s: v0.9.5
    • Component/s: None
    • Labels:
      None
    • Environment:

      linux.

      Description

      The jvm process defaults to allowing a process to take a percentage of physical memory. In one particular case, 6GB of virtual memory is allocated although <200MB of memory is actually used by the watchdog and master java processes. For processes with low memory requirements we should set -Xmx to a reasonable value (256m?)

        Activity

        Jonathan Hsieh created issue -
        Hide
        Disabled imported user added a comment -

        Good call, I agree.

        How do you measure the working set size of the master? (i.e. if 6GB is being allocated, how do we know that only 256MB is actually required and the rest is garbage?)

        Show
        Disabled imported user added a comment - Good call, I agree. How do you measure the working set size of the master? (i.e. if 6GB is being allocated, how do we know that only 256MB is actually required and the rest is garbage?)
        Hide
        Jonathan Hsieh added a comment -

        top/ps reports that 6GB of virtual memory is allocated. (page tables, not physical memory). My guess is that over time these java processes will write to the 6gb worth of addresses, and then get gc'ed. Once this happens, it only keeps a little bit resident where the rest of the pages are still allocated.

        The suspicion is that if there are many long running java proccesses (say a watchdog, a flume master, a task tracker, a data node, a zk node, etc), linux will reach a virtual memory over-subscription limit and prevent other jvms or applications from allocated memory.

        Show
        Jonathan Hsieh added a comment - top/ps reports that 6GB of virtual memory is allocated. (page tables, not physical memory). My guess is that over time these java processes will write to the 6gb worth of addresses, and then get gc'ed. Once this happens, it only keeps a little bit resident where the rest of the pages are still allocated. The suspicion is that if there are many long running java proccesses (say a watchdog, a flume master, a task tracker, a data node, a zk node, etc), linux will reach a virtual memory over-subscription limit and prevent other jvms or applications from allocated memory.
        Hide
        Jonathan Hsieh added a comment -

        Workaround/solution:

        You can limit the amount of memory these processes use by using the standard -Xmx setting. For the master 250MB should be more than sufficient (there is substantial padding in this number, you could probably go lower).

        If you are using the bin/flume script to start things, you could set the UOPT environment variable to have a setting like "-Xmx256m".

        If you are using the daemon script from the rpm packages you just need to make sure the UOPT env variable is set before the kicking off the service.

        Show
        Jonathan Hsieh added a comment - Workaround/solution: You can limit the amount of memory these processes use by using the standard -Xmx setting. For the master 250MB should be more than sufficient (there is substantial padding in this number, you could probably go lower). If you are using the bin/flume script to start things, you could set the UOPT environment variable to have a setting like "-Xmx256m". If you are using the daemon script from the rpm packages you just need to make sure the UOPT env variable is set before the kicking off the service.
        Jonathan Hsieh made changes -
        Field Original Value New Value
        Fix Version/s v0.9.4 [ 10050 ]
        Hide
        flume_mewmewball added a comment -

        Setting the UOPTS doesn't work because the parent FlumeWatchdog doesn't take UOPTS in bin/flume's MASTER_WATCHDOG var.

        Show
        flume_mewmewball added a comment - Setting the UOPTS doesn't work because the parent FlumeWatchdog doesn't take UOPTS in bin/flume's MASTER_WATCHDOG var.
        Hide
        Jonathan Hsieh added a comment -

        Even by setting a lower -Xmx256m value for he watchdog, about 2G if virtual memory space is reported. (despite only 40MB resident). using pmap shows 34 "anon" ~64MB chunks which is about 2GB. Trying to figure out what this corresponds to.

        Show
        Jonathan Hsieh added a comment - Even by setting a lower -Xmx256m value for he watchdog, about 2G if virtual memory space is reported. (despite only 40MB resident). using pmap shows 34 "anon" ~64MB chunks which is about 2GB. Trying to figure out what this corresponds to.
        Show
        Jonathan Hsieh added a comment - http://stackoverflow.com/questions/561245/virtual-memory-usage-from-java-under-linux-too-much-memory-used
        Hide
        Jonathan Hsieh added a comment -

        I have a feeling this has to do with thread stacks being larger on larger machines.

        On a large system, each java flume process claims 6GB virtual memory space (about 2/5 physical memory)
        On a "wimpier" machine each java flume processes claim 1.1GB virtual memory space (about 2/5+256MB specified by Xmx)

        Show
        Jonathan Hsieh added a comment - I have a feeling this has to do with thread stacks being larger on larger machines. On a large system, each java flume process claims 6GB virtual memory space (about 2/5 physical memory) On a "wimpier" machine each java flume processes claim 1.1GB virtual memory space (about 2/5+256MB specified by Xmx)
        Hide
        Jonathan Hsieh added a comment - - edited

        Using pmap on the large machine, the process would allocate (in VM) large 64 MB virtual memory blocks. The number of these roughly corresponded to the number of java threads (ctrl-\ ). When I force stack size to 512k per thread (-Xss512k), memory usage is significantly smaller.

        This size should be sufficient for small limited process like the watchdog. Doing more experimentation for other processes.

        http://stackoverflow.com/questions/1492711/java-what-is-the-rough-cost-of-a-thread-cpu-cycles-memory

        Show
        Jonathan Hsieh added a comment - - edited Using pmap on the large machine, the process would allocate (in VM) large 64 MB virtual memory blocks. The number of these roughly corresponded to the number of java threads (ctrl-\ ). When I force stack size to 512k per thread (-Xss512k), memory usage is significantly smaller. This size should be sufficient for small limited process like the watchdog. Doing more experimentation for other processes. http://stackoverflow.com/questions/1492711/java-what-is-the-rough-cost-of-a-thread-cpu-cycles-memory
        Hide
        Disabled imported user added a comment -

        64MB chunks sounds suspiciously like the new per-thread arenas in glibc's malloc. What are the operating systems and glibc versions for the large and small systems you tested on?

        Show
        Disabled imported user added a comment - 64MB chunks sounds suspiciously like the new per-thread arenas in glibc's malloc. What are the operating systems and glibc versions for the large and small systems you tested on?
        Hide
        Jonathan Hsieh added a comment -

        Sun/Oracle says that 1MB is reasonable default for stack size. Will be adding this to the various processes.
        http://www.oracle.com/technetwork/java/hotspotfaq-138619.html

        Show
        Jonathan Hsieh added a comment - Sun/Oracle says that 1MB is reasonable default for stack size. Will be adding this to the various processes. http://www.oracle.com/technetwork/java/hotspotfaq-138619.html
        Hide
        Jonathan Hsieh added a comment -

        will post a code review up, likely debate to ensue.

        Show
        Jonathan Hsieh added a comment - will post a code review up, likely debate to ensue.
        Hide
        Jonathan Hsieh added a comment -

        The main trigger here was that on a testing cluster, there was a machine that had a task tracker and a flume master with watchdog running. Occasionally the tasktracker would be unable to launch tasks. It seems that the root cause was excessive virtual memory subscription – both the flume master and watch dog would each use 6GB of virtual memory while only have 50-150 MB resident and in working set. This seems to have exhausted the amount of virtual memory the linux machine was willing to parcel out.

        There are two causes – max heap size set by default (likely around 4GB) and max stack size per thread set by default (seems to be 64MB/thread). One solution is to add more env variables that can be overridden to allow different settings for masters/nodes/watchdog.

        Show
        Jonathan Hsieh added a comment - The main trigger here was that on a testing cluster, there was a machine that had a task tracker and a flume master with watchdog running. Occasionally the tasktracker would be unable to launch tasks. It seems that the root cause was excessive virtual memory subscription – both the flume master and watch dog would each use 6GB of virtual memory while only have 50-150 MB resident and in working set. This seems to have exhausted the amount of virtual memory the linux machine was willing to parcel out. There are two causes – max heap size set by default (likely around 4GB) and max stack size per thread set by default (seems to be 64MB/thread). One solution is to add more env variables that can be overridden to allow different settings for masters/nodes/watchdog.
        Jonathan Hsieh made changes -
        Status Open [ 1 ] Patch Available [ 10000 ]
        Jonathan Hsieh made changes -
        Assignee Jonathan Hsieh [ jmhsieh ]
        Jonathan Hsieh made changes -
        Status Patch Available [ 10000 ] Open [ 1 ]
        Jonathan Hsieh made changes -
        Status Open [ 1 ] Patch Available [ 10000 ]
        Hide
        Jonathan Hsieh added a comment -

        review (and debate) here https://review.cloudera.org/r/1601/

        Show
        Jonathan Hsieh added a comment - review (and debate) here https://review.cloudera.org/r/1601/
        Jonathan Hsieh made changes -
        Attachment 0001-FLUME-473-Master-by-default-uses-excessive-virtual-m.patch [ 10537 ]
        Hide
        Jonathan Hsieh added a comment -

        Henry,

        Interesting. I think that may be it. The large machines are RHEL 6.0 with /lib/libc-2.12.so (glibc-2.12 docs present on system) .

        The wimpy machine is ubuntu maveric – it definitely had smaller memory footprints.

        Will revisit in a few days.

        Jon.

        Show
        Jonathan Hsieh added a comment - Henry, Interesting. I think that may be it. The large machines are RHEL 6.0 with /lib/libc-2.12.so (glibc-2.12 docs present on system) . The wimpy machine is ubuntu maveric – it definitely had smaller memory footprints. Will revisit in a few days. Jon.
        Jonathan Hsieh made changes -
        Fix Version/s v0.9.5 [ 10090 ]
        Fix Version/s v0.9.4 [ 10050 ]
        Mark Thomas made changes -
        Project Import Tue Aug 02 16:57:12 UTC 2011 [ 1312304232406 ]
        Hide
        Eric Hauser added a comment -

        Hi,

        We were given a box by our operations team that has way too much RAM since "it was available". I can confirm that there is excessive virtual memory use by the watchdog process on Ubuntu 10.02. We are launching the process with the following:

        java -Dflume.log.dir=/var/log/flume -Dflume.log.file=flume-flume-node-xtinclc01.log -Dflume.root.logger=INFO,DRFA,ZENOSS -Dzookeeper.root.logger=INFO,zookeeper,ZENOSS -Dwatchdog.root.logger=INFO,watchdog,ZENOSS -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 -Dpid=4973 -Dpidfile=/tmp/flumenode-4973.pid com.cloudera.flume.watchdog.FlumeWatchdog java -Dflume.log.dir=/var/log/flume -Dflume.log.file=flume-flume-node-xtinclc01.log -Dflume.root.logger=INFO,DRFA,ZENOSS -Dzookeeper.root.logger=INFO,zookeeper,ZENOSS -Dwatchdog.root.logger=INFO,watchdog,ZENOSS -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 -Xmx2g -Xms1g -XX:NewSize=64m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -server -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=11112 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false com.cloudera.flume.agent.FlumeNode

        The process is using a 30.3GB of virtual memory. I'm attaching the output of pmap and a thread dump against the process. With the GC settings we have given, it appears the JVM is allocating 17 threads for GC. Seems like we may want to turn off parallel GC.

        Show
        Eric Hauser added a comment - Hi, We were given a box by our operations team that has way too much RAM since "it was available". I can confirm that there is excessive virtual memory use by the watchdog process on Ubuntu 10.02. We are launching the process with the following: java -Dflume.log.dir=/var/log/flume -Dflume.log.file=flume-flume-node-xtinclc01.log -Dflume.root.logger=INFO,DRFA,ZENOSS -Dzookeeper.root.logger=INFO,zookeeper,ZENOSS -Dwatchdog.root.logger=INFO,watchdog,ZENOSS -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 -Dpid=4973 -Dpidfile=/tmp/flumenode-4973.pid com.cloudera.flume.watchdog.FlumeWatchdog java -Dflume.log.dir=/var/log/flume -Dflume.log.file=flume-flume-node-xtinclc01.log -Dflume.root.logger=INFO,DRFA,ZENOSS -Dzookeeper.root.logger=INFO,zookeeper,ZENOSS -Dwatchdog.root.logger=INFO,watchdog,ZENOSS -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 -Xmx2g -Xms1g -XX:NewSize=64m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -server -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=11112 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false com.cloudera.flume.agent.FlumeNode The process is using a 30.3GB of virtual memory. I'm attaching the output of pmap and a thread dump against the process. With the GC settings we have given, it appears the JVM is allocating 17 threads for GC. Seems like we may want to turn off parallel GC.
        Eric Hauser made changes -
        Attachment watchdog_pmap.txt [ 12497733 ]
        Eric Hauser made changes -
        Attachment watchdog_threaddump.txt [ 12497734 ]
        Arvind Prabhakar made changes -
        Affects Version/s v0.9.4 [ 12317557 ]
        Jonathan Hsieh made changes -
        Assignee Jonathan Hsieh [ jmhsieh ]
        Hide
        Ashish Paliwal added a comment -

        Won't fix. 0.X branch not maintained anymore

        Show
        Ashish Paliwal added a comment - Won't fix. 0.X branch not maintained anymore
        Ashish Paliwal made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Won't Fix [ 2 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Blocked Blocked Open Open
        6s 1 Jonathan Hsieh 23/Feb/11 19:55
        Open Open Blocked Blocked
        41d 5m 2 Jonathan Hsieh 23/Feb/11 19:55
        Patch Available Patch Available Resolved Resolved
        1350d 13h 29m 1 Ashish Paliwal 05/Nov/14 09:25

          People

          • Assignee:
            Unassigned
            Reporter:
            Jonathan Hsieh
          • Votes:
            3 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development