Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2731

Flume Agent throws OutOfMemoryError during load tests.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.6.0
    • None
    • Node
    • None

    Description

      The flume agent throws an OutOfMemoryError during load tests.

      2015-06-29 15:30:24,590 (New I/O worker #4) [WARN - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.exceptionCaught(NettyServer.java:201)] Unexpected exception from downstream.
      java.lang.OutOfMemoryError: Java heap space
              at java.util.HashMap.<init>(HashMap.java:187)
              at java.util.HashMap.<init>(HashMap.java:199)
              at org.apache.avro.generic.GenericDatumReader.newMap(GenericDatumReader.java:330)
              at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:239)
              at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
              at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
              at org.apache.avro.ipc.Responder.respond(Responder.java:124)
              at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
              at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173)
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
              at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
              at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
              at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
              at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
              at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
              at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
              at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
              at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
              at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
              at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
              at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
              at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
              at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
              at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
              at java.lang.Thread.run(Thread.java:695)
      

      The test:

      A test worker consists of a NettyAvroRpcClient shared by a thread pool of size 12. The rpc client instance will be recreated whenever isActive is false. Flume events with a timestamp header and a body of 250 random bytes are submitted continuously. Test workers are started in groups of 20. 5 groups are started in total with 5 second delays between starts.

      Usually, after the first group of 20, we see the OOM error in the agent.

      Got the avro-1.8.0-SNAPSHOT source, and added debug logging in the newMap method to see the size of allocation:

      https://github.com/apache/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java#L405-L411

      And found that in most cases the size was 1, but when the OOM errors start happening, the size is always 640371331.

      The OOM error occurs more frequently when the connect-timeout and request-timeout are both shorter than 20 seconds.

      Seems to be related to

      AVRO-1111
      FLUME-1259
      FLUME-1641

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              wildsheep Masanobu Horiyama
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: