Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: v1.2.0, v1.3.0, v1.4.0
    • Fix Version/s: None
    • Component/s: None
    • Release Note:
      HDFS Sink memory usage best practices.

      Description

      We run a 3 node/1 collector test cluster pushing about 350events/sec per node... Not really high stress, but just something to evaluate flume with.

      Consistently our collector has been dying because of an OOMError killing the SinkRunner after running for about 30-40 hours(seems pretty consistent as we've had it 3 times now).

      Suspected cause would be a memory leak somewhere in HdfsSink. The feeder nodes which run AvroSink instead of HdfsSink have been up and running for about a week without restarts.

      flume-load/act-wap02/2012-06-26-17.1340697637324.tmp, packetSize=65557, chunksPerPacket=127, bytesCurBlock=29731328
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      2012-06-26 17:12:56,080 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:411)] process failed
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.util.Arrays.copyOfRange(Arrays.java:3209)
      at java.lang.String.<init>(String.java:215)
      at java.lang.StringBuilder.toString(StringBuilder.java:430)
      at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:306)
      at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:367)
      at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
      at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
      at java.lang.Thread.run(Thread.java:619)
      Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.util.Arrays.copyOfRange(Arrays.java:3209)
      at java.lang.String.<init>(String.java:215)
      at java.lang.StringBuilder.toString(StringBuilder.java:430)
      at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:306)
      at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:367)
      at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
      at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
      at java.lang.Thread.run(Thread.java:619)

      1. FLUME-1326.patch
        1 kB
        dave sinclair

        Activity

        Juhani Connolly created issue -
        thinker0 made changes -
        Field Original Value New Value
        Comment [ {panel}
        JAVA_OPTS="-server -Xms100m -Xmx200m"
        JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
        JAVA_OPTS="$JAVA_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"
        {panel}

        {code}

        2012-07-06T22:27:00.413+0900: [GC [1 CMS-initial-mark: 136575K(136576K)] 198009K(198016K), 0.0350390 secs] [Times: user=0.03 sys=0.00, real=0.04 secs]
        2012-07-06T22:27:00.449+0900: [Full GC [CMS2012-07-06T22:27:00.608+0900: [CMS-concurrent-mark: 0.159/0.160 secs] [Times: user=0.16 sys=0.00, real=0.16 secs]
         (concurrent mode failure): 136575K->136575K(136576K), 0.5808260 secs] 198014K->197895K(198016K), [CMS Perm : 20375K->20375K(34192K)] icms_dc=100 , 0.5809580 secs] [Times: user=0.58 sys=0.00, real=0.58 secs]
        2012-07-06T22:27:01.030+0900: [GC [1 CMS-initial-mark: 136575K(136576K)] 197897K(198016K), 0.0354160 secs] [Times: user=0.04 sys=0.00, real=0.03 secs]
        Heap
         par new generation total 61440K, used 61435K [0x00000000ee600000, 0x00000000f28a0000, 0x00000000f28a0000)
          eden space 54656K, 100% used [0x00000000ee600000, 0x00000000f1b60000, 0x00000000f1b60000)
          from space 6784K, 99% used [0x00000000f2200000, 0x00000000f289efa8, 0x00000000f28a0000)
          to space 6784K, 0% used [0x00000000f1b60000, 0x00000000f1b60000, 0x00000000f2200000)
         concurrent mark-sweep generation total 136576K, used 136575K [0x00000000f28a0000, 0x00000000fae00000, 0x00000000fae00000)
         concurrent-mark-sweep perm gen total 34192K, used 20375K [0x00000000fae00000, 0x00000000fcf64000, 0x0000000100000000)

        2012-07-06T22:27:01.069+0900: [Full GC [CMS2012-07-06T22:27:01.226+0900: [CMS-concurrent-mark: 0.159/0.160 secs] [Times: user=0.16 sys=0.00, real=0.17 secs]
         (concurrent mode failure): 136575K->136575K(136576K), 0.5796240 secs] 198013K->197886K(198016K), [CMS Perm : 20375K->20375K(34192K)] icms_dc=100 , 0.5797590 secs] [Times: user=0.58 sys=0.00, real=0.58 secs]
        2012-07-06T22:27:01.650+0900: [GC [1 CMS-initial-mark: 136575K(136576K)] 197967K(198016K), 0.0351150 secs] [Times: user=0.04 sys=0.00, real=0.03 secs]
        2012-07-06T22:27:01.685+0900: [Full GC [CMS2012-07-06T22:27:01.845+0900: [CMS-concurrent-mark: 0.159/0.160 secs] [Times: user=0.16 sys=0.00, real=0.16 secs]
         (concurrent mode failure): 136575K->136575K(136576K), 0.6167080 secs] 198008K->197894K(198016K), [CMS Perm : 20375K->20375K(34192K)] icms_dc=100 , 0.6170540 secs] [Times: user=0.62 sys=0.00, real=0.62 secs]
        2012-07-06T22:27:02.303+0900: [GC [1 CMS-initial-mark: 136575K(136576K)] 197918K(198016K), 0.0352080 secs] [Times: user=0.03 sys=0.00, real=0.04 secs]
        2012-07-06T22:27:02.339+0900: [Full GC [CMS2012-07-06T22:27:02.498+0900: [CMS-concurrent-mark: 0.159/0.160 secs] [Times: user=0.16 sys=0.00, real=0.16 secs]
         (concurrent mode failure): 136575K->136575K(136576K), 0.5811630 secs] 198014K->197913K(198016K), [CMS Perm : 20375K->20375K(34192K)] icms_dc=100 , 0.5812980 secs] [Times: user=0.58 sys=0.00, real=0.58 secs]

        ....



        2012-07-06T22:45:21.899+0900: [GC [1 CMS-initial-mark: 175551K(175552K)] 198883K(201920K), 0.0150690 secs] [Times: user=0.01 sys=0.00, real=0.02 secs]
        Heap
         par new generation total 26368K, used 25104K [0x00000000ee600000, 0x00000000f0290000, 0x00000000f0290000)
          eden space 23488K, 100% used [0x00000000ee600000, 0x00000000efcf0000, 0x00000000efcf0000)
          from space 2880K, 56% used [0x00000000effc0000, 0x00000000f01543d0, 0x00000000f0290000)
          to space 2880K, 0% used [0x00000000efcf0000, 0x00000000efcf0000, 0x00000000effc0000)
         concurrent mark-sweep generation total 175552K, used 175551K [0x00000000f0290000, 0x00000000fae00000, 0x00000000fae00000)
         concurrent-mark-sweep perm gen total 34104K, used 20352K [0x00000000fae00000, 0x00000000fcf4e000, 0x0000000100000000)

        2012-07-06T22:45:21.923+0900: [Full GC [CMS2012-07-06T22:45:22.120+0900: [CMS-concurrent-mark: 0.205/0.206 secs] [Times: user=0.22 sys=0.01, real=0.20 secs]
         (concurrent mode failure): 175551K->175551K(175552K), 0.7612190 secs] 201913K->198933K(201920K), [CMS Perm : 20352K->20352K(34104K)] icms_dc=100 , 0.7613540 secs] [Times: user=0.76 sys=0.00, real=0.77 secs]
        2012-07-06T22:45:22.685+0900: [GC [1 CMS-initial-mark: 175551K(175552K)] 199035K(201920K), 0.0151000 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
        2012-07-06T22:45:22.708+0900: [Full GC [CMS2012-07-06T22:45:22.906+0900: [CMS-concurrent-mark: 0.204/0.205 secs] [Times: user=0.23 sys=0.00, real=0.21 secs]
         (concurrent mode failure): 175551K->175551K(175552K), 0.7694170 secs] 201919K->199083K(201920K), [CMS Perm : 20352K->20352K(34104K)] icms_dc=100 , 0.7695470 secs] [Times: user=0.77 sys=0.00, real=0.77 secs]
        2012-07-06T22:45:23.483+0900: [GC [1 CMS-initial-mark: 175551K(175552K)] 200691K(201920K), 0.0157280 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
        2012-07-06T22:45:23.502+0900: [Full GC [CMS2012-07-06T22:45:23.716+0900: [CMS-concurrent-mark: 0.216/0.217 secs] [Times: user=0.22 sys=0.00, real=0.22 secs]
         (concurrent mode failure): 175552K->175551K(175552K), 0.7632440 secs] 201920K->199203K(201920K), [CMS Perm : 20352K->20352K(34104K)] icms_dc=100 , 0.7633760 secs] [Times: user=0.76 sys=0.00, real=0.77 secs]
        2012-07-06T22:45:24.268+0900: [GC [1 CMS-initial-mark: 175551K(175552K)] 199625K(201920K), 0.0156730 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
        2012-07-06T22:45:24.295+0900: [Full GC [CMS2012-07-06T22:45:24.490+0900: [CMS-concurrent-mark: 0.205/0.206 secs] [Times: user=0.22 sys=0.00, real=0.21 secs]
         (concurrent mode failure): 175551K->175551K(175552K), 0.7598450 secs] 201919K->199192K(201920K), [CMS Perm : 20352K->20352K(34104K)] icms_dc=100 , 0.7599740 secs] [Times: user=0.76 sys=0.00, real=0.76 secs]
        2012-07-06T22:45:25.058+0900: [GC [1 CMS-initial-mark: 175551K(175552K)] 199986K(201920K), 0.0155300 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
        2012-07-06T22:45:25.080+0900: [Full GC [CMS2012-07-06T22:45:25.279+0900: [CMS-concurrent-mark: 0.205/0.205 secs] [Times: user=0.22 sys=0.00, real=0.21 secs]
         (concurrent mode failure): 175551K->175551K(175552K), 0.7627070 secs] 201913K->199371K(201920K), [CMS Perm : 20352K->20352K(34104K)] icms_dc=100 , 0.7628390 secs] [Times: user=0.77 sys=0.00, real=0.76 secs]
        2012-07-06T22:45:25.844+0900: [GC [1 CMS-initial-mark: 175551K(175552K)] 199571K(201920K), 0.0154090 secs] [Times: user=0.01 sys=0.00, real=0.02 secs]
        2012-07-06T22:45:25.868+0900: [Full GC [CMS2012-07-06T22:45:26.065+0900: [CMS-concurrent-mark: 0.205/0.205 secs] [Times: user=0.22 sys=0.01, real=0.21 secs]
         (concurrent mode failure): 175551K->175551K(175552K), 0.7624890 secs] 201912K->199463K(201920K), [CMS Perm : 20352K->20352K(34104K)] icms_dc=100 , 0.7626150 secs] [Times: user=0.77 sys=0.00, real=0.76 secs]
        {code} ]
        dave sinclair made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Release Note HDFS Sink memory usage best practices.
        Affects Version/s v1.4.0 [ 12323372 ]
        Affects Version/s v1.3.0 [ 12322140 ]
        Labels hdfssink memory_leak
        dave sinclair made changes -
        Attachment FLUME-1326.patch [ 12627364 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Juhani Connolly
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development