Kafka
  1. Kafka
  2. KAFKA-682

java.lang.OutOfMemoryError: Java heap space

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: core
    • Labels:
      None
    • Environment:

      Description

      git pull (commit 32dae955d5e2e2dd45bddb628cb07c874241d856)

      ...build...

      ./sbt update
      ./sbt package

      ...run...

      bin/zookeeper-server-start.sh config/zookeeper.properties
      bin/kafka-server-start.sh config/server.properties

      ...then configured fluentd with kafka plugin...

      gem install fluentd --no-ri --no-rdoc
      gem install fluent-plugin-kafka
      fluentd -c ./fluent/fluent.conf -vv

      ...then flood fluentd with messages inputted from syslog and outputted to kafka.

      results in (after about 10000 messages of 1K each in 3s):

      [2013-01-05 02:00:52,087] ERROR Closing socket for /127.0.0.1 because of error (kafka.network.Processor)
      java.lang.OutOfMemoryError: Java heap space
      at kafka.api.ProducerRequest$$anonfun$1$$anonfun$apply$1.apply(ProducerRequest.scala:45)
      at kafka.api.ProducerRequest$$anonfun$1$$anonfun$apply$1.apply(ProducerRequest.scala:42)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
      at scala.collection.immutable.Range$ByOne$class.foreach(Range.scala:282)
      at scala.collection.immutable.Range$$anon$1.foreach(Range.scala:274)
      at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
      at scala.collection.immutable.Range.map(Range.scala:39)
      at kafka.api.ProducerRequest$$anonfun$1.apply(ProducerRequest.scala:42)
      at kafka.api.ProducerRequest$$anonfun$1.apply(ProducerRequest.scala:38)
      at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:227)
      at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:227)
      at scala.collection.immutable.Range$ByOne$class.foreach(Range.scala:282)
      at scala.collection.immutable.Range$$anon$1.foreach(Range.scala:274)
      at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:227)
      at scala.collection.immutable.Range.flatMap(Range.scala:39)
      at kafka.api.ProducerRequest$.readFrom(ProducerRequest.scala:38)
      at kafka.api.RequestKeys$$anonfun$1.apply(RequestKeys.scala:32)
      at kafka.api.RequestKeys$$anonfun$1.apply(RequestKeys.scala:32)
      at kafka.network.RequestChannel$Request.<init>(RequestChannel.scala:47)
      at kafka.network.Processor.read(SocketServer.scala:298)
      at kafka.network.Processor.run(SocketServer.scala:209)
      at java.lang.Thread.run(Thread.java:722)

      1. java_pid22281.hprof.gz
        2.93 MB
        Ricky Ng-Adam
      2. java_pid22281_Leak_Suspects.zip
        65 kB
        Ricky Ng-Adam

        Activity

        Hide
        Joel Koshy added a comment -

        You might need to increase your heap size. What do you have it set to right now? Would you be able to run the broker with -XX:+HeapDumpOnOutOfMemoryError to get a heap-dump?

        In case you are overriding defaults - what's the replication factor for the topic, num-required-acks for the producer requests, and producer request timeout? Are any requests going through or are the produce requests expiring?

        Show
        Joel Koshy added a comment - You might need to increase your heap size. What do you have it set to right now? Would you be able to run the broker with -XX:+HeapDumpOnOutOfMemoryError to get a heap-dump? In case you are overriding defaults - what's the replication factor for the topic, num-required-acks for the producer requests, and producer request timeout? Are any requests going through or are the produce requests expiring?
        Hide
        Jun Rao added a comment -

        That commit is in trunk. Could you try the current head in 0.8 (which fixed one OOME issue KAFKA-664)?

        Show
        Jun Rao added a comment - That commit is in trunk. Could you try the current head in 0.8 (which fixed one OOME issue KAFKA-664 )?
        Hide
        Joel Koshy added a comment -

        I think that fix was merged into trunk (before 32da) so it should be there in trunk as well.

        Show
        Joel Koshy added a comment - I think that fix was merged into trunk (before 32da) so it should be there in trunk as well.
        Hide
        Ricky Ng-Adam added a comment -

        After filing the bug initially, I switched to these settings (and then added the HeapDump directive):

        bin/kafka-run-class.sh

        KAFKA_OPTS="-server -Xms1024m -Xmx1024m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -Xloggc:logs/gc.log -Djava.awt.headless=true -Dlog4j.configuration=file:$base_dir/config/log4j.properties -XX:+HeapDumpOnOutOfMemoryError"

        Shouldn't these be set more aggressively as per the operational suggestions? It's probably better to have the user lower them than to have to make them higher.

        I've downloaded MAT for Eclipse and ran it on the hprof. It points out two issues of which this is the more noticeable:

        One instance of "java.nio.HeapByteBuffer" loaded by "<system class loader>" occupies 8,404,016 (58.22%) bytes. The instance is referenced by kafka.network.BoundedByteBufferReceive @ 0x7ad6a038 , loaded by "sun.misc.Launcher$AppClassLoader @ 0x7ad00d40". The memory is accumulated in one instance of "byte[]" loaded by "<system class loader>"

        Show
        Ricky Ng-Adam added a comment - After filing the bug initially, I switched to these settings (and then added the HeapDump directive): bin/kafka-run-class.sh KAFKA_OPTS="-server -Xms1024m -Xmx1024m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -Xloggc:logs/gc.log -Djava.awt.headless=true -Dlog4j.configuration= file:$base_dir/config/log4j.properties -XX:+HeapDumpOnOutOfMemoryError" Shouldn't these be set more aggressively as per the operational suggestions? It's probably better to have the user lower them than to have to make them higher. I've downloaded MAT for Eclipse and ran it on the hprof. It points out two issues of which this is the more noticeable: One instance of "java.nio.HeapByteBuffer" loaded by "<system class loader>" occupies 8,404,016 (58.22%) bytes. The instance is referenced by kafka.network.BoundedByteBufferReceive @ 0x7ad6a038 , loaded by "sun.misc.Launcher$AppClassLoader @ 0x7ad00d40". The memory is accumulated in one instance of "byte[]" loaded by "<system class loader>"
        Hide
        Ricky Ng-Adam added a comment -

        the hprof dump

        Show
        Ricky Ng-Adam added a comment - the hprof dump
        Hide
        Jun Rao added a comment -

        BoundedByteBufferReceive is used for receiving client requests. Most of the space is likely taken by ProducerRequest. If you are sending many large ProducerRequests, the result in the head dump makes sense. Do you still see OOME with the new JVM setting? You heap size seems small. I would try 3-4GBs.

        Show
        Jun Rao added a comment - BoundedByteBufferReceive is used for receiving client requests. Most of the space is likely taken by ProducerRequest. If you are sending many large ProducerRequests, the result in the head dump makes sense. Do you still see OOME with the new JVM setting? You heap size seems small. I would try 3-4GBs.
        Hide
        Neha Narkhede added a comment -
        Show
        Neha Narkhede added a comment - I think this is the cause - https://issues.apache.org/jira/browse/KAFKA-671
        Hide
        Joel Koshy added a comment -

        That's why I asked for the configured "num-required-acks for the producer requests". If it is the default (0) then it shouldn't be added to the request purgatory which rule out KAFKA-671 no?

        Show
        Joel Koshy added a comment - That's why I asked for the configured "num-required-acks for the producer requests". If it is the default (0) then it shouldn't be added to the request purgatory which rule out KAFKA-671 no?
        Hide
        Jay Kreps added a comment -

        Yes, Joel, that makes sense. Ricky, do you know the "acks" setting being used in the requests the ruby client is sending?

        Show
        Jay Kreps added a comment - Yes, Joel, that makes sense. Ricky, do you know the "acks" setting being used in the requests the ruby client is sending?
        Hide
        Jay Kreps added a comment -

        Marking resolved as we fixed a 0.8 bug that impacted memory and improved the default GC settings.

        Show
        Jay Kreps added a comment - Marking resolved as we fixed a 0.8 bug that impacted memory and improved the default GC settings.

          People

          • Assignee:
            Unassigned
            Reporter:
            Ricky Ng-Adam
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development