Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-664

Kafka server threads die due to OOME during long running test

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:

      Description

      I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long running producer process that sends data to 100s of partitions continuously for ~15 hours. After ~4 hours of operation, few server threads (acceptor and processor) exited due to OOME -

      [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 (kafka.network.BoundedByteBufferReceive)
      java.lang.OutOfMemoryError: Java heap space
      [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-acceptor': (kafka.utils.Utils$)
      java.lang.OutOfMemoryError: Java heap space
      [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-processor-9092-1': (kafka.utils.Utils$)
      java.lang.OutOfMemoryError: Java heap space
      [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, session 0x13afd0753870103 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn)
      [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient)
      [2012-12-07 08:24:46,344] INFO Initiating client connection, connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 (org.apache.zookeeper.ZooKeeper)
      [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 (kafka.network.BoundedByteBufferReceive)
      java.lang.OutOfMemoryError: Java heap space
      [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread 'kafka-request-handler-0': (kafka.utils.Utils$)
      java.lang.OutOfMemoryError: Java heap space
      [2012-12-07 08:25:08,739] INFO Opening socket connection to server eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn)
      [2012-12-07 08:25:14,221] INFO Socket connection established to eat1-app311.corp/172.20.72.75:12913, initiating session (org.apache.zookeeper.ClientCnxn)
      [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from server in 3722ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
      [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$)
      java.lang.OutOfMemoryError: Java heap space
      [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 (kafka.network.BoundedByteBufferReceive)
      java.lang.OutOfMemoryError: Java heap space

      It seems like it runs out of memory while trying to read the producer request, but its unclear so far.

        Attachments

        1. kafka-664-draft.patch
          2 kB
          Neha Narkhede
        2. kafka-664-draft-2.patch
          2 kB
          Joel Jacob Koshy
        3. KAFKA-664-v3.patch
          7 kB
          Joel Jacob Koshy
        4. KAFKA-664-v4.patch
          12 kB
          Joel Jacob Koshy
        5. Screen Shot 2012-12-09 at 11.22.50 AM.png
          43 kB
          Neha Narkhede
        6. Screen Shot 2012-12-09 at 11.23.09 AM.png
          36 kB
          Neha Narkhede
        7. Screen Shot 2012-12-09 at 11.31.29 AM.png
          26 kB
          Neha Narkhede
        8. thread-dump.log
          20 kB
          Neha Narkhede
        9. watchersForKey.png
          9 kB
          Neha Narkhede

          Activity

            People

            • Assignee:
              jkreps Jay Kreps
              Reporter:
              nehanarkhede Neha Narkhede
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: