Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-6165

Kafka Brokers goes down with outOfMemoryError.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.11.0.0
    • None
    • logging
    • None
    • DCOS cluster with 4 agent nodes and 3 masters.

      agent machine config :
      RAM : 384 GB
      DISK : 4TB

    Description

      Performance testing kafka with end to end pipe lines of,
      Kafka Data Producer -> kafka -> spark streaming -> hdfs – stream1
      Kafka Data Producer -> kafka -> flume -> hdfs – stream2

      stream1 kafka configs :
      No of topics : 10
      No of partitions : 20 for all the topics

      stream2 kafka configs :
      No of topics : 10
      No of partitions : 20 for all the topics

      Some important Kafka Configuration :
      "BROKER_MEM": "32768"(32GB)
      "BROKER_JAVA_HEAP": "16384"(16GB)
      "BROKER_COUNT": "3"
      "KAFKA_MESSAGE_MAX_BYTES": "1000012"(1MB)
      "KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB)
      "KAFKA_NUM_PARTITIONS": "20"
      "BROKER_DISK_SIZE": "5000" (5GB)
      "KAFKA_LOG_SEGMENT_BYTES": "50000000",(50MB)
      "KAFKA_LOG_RETENTION_BYTES": "5000000000"(5GB)

      Data Producer to kafka Throughput:

      message rate : 5 lakhs messages/sec approx across all the 3 brokers and topics/partitions.
      message size : approx 300 to 400 bytes.

      Issues observed with this configs:

      Issue 1:

      stack trace:

      [2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to unrecoverable I/O error while handling produce request: (kafka.server.ReplicaManager)
      kafka.common.KafkaStorageException: I/O exception in append to log 'store_sales-16'
      at kafka.log.Log.append(Log.scala:349)
      at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443)
      at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429)
      at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
      at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240)
      at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
      at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407)
      at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
      at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
      at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
      at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
      at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
      at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
      at scala.collection.AbstractTraversable.map(Traversable.scala:104)
      at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393)
      at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330)
      at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425)
      at kafka.server.KafkaApis.handle(KafkaApis.scala:78)
      at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.IOException: Map failed
      at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
      at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116)
      at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106)
      at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
      at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106)
      at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160)
      at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
      at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
      at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
      at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:159)
      at kafka.log.Log.roll(Log.scala:771)
      at kafka.log.Log.maybeRoll(Log.scala:742)
      at kafka.log.Log.append(Log.scala:405)
      ... 22 more
      Caused by: java.lang.OutOfMemoryError: Map failed
      at sun.nio.ch.FileChannelImpl.map0(Native Method)
      at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937)
      ... 34 more

      Issue 2 :

      stack trace :

      [2017-11-02 23:55:49,602] FATAL [ReplicaFetcherThread-0-0], Disk error while replicating data for catalog_sales-3 (kafka.server.ReplicaFetcherThread)
      kafka.common.KafkaStorageException: I/O exception in append to log 'catalog_sales-3'
      at kafka.log.Log.append(Log.scala:349)
      at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:130)
      at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42)
      at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:159)
      at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:141)
      at scala.Option.foreach(Option.scala:257)
      at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:141)
      at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:138)
      at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
      at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:138)
      at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:138)
      at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:138)
      at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
      at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:136)
      at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
      at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
      Caused by: java.io.IOException: Map failed
      at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
      at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116)
      at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106)
      at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
      at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106)
      at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160)
      at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
      at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
      at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
      at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:159)
      at kafka.log.Log.roll(Log.scala:771)
      at kafka.log.Log.maybeRoll(Log.scala:742)
      at kafka.log.Log.append(Log.scala:405)
      ... 16 more
      Caused by: java.lang.OutOfMemoryError: Map failed
      at sun.nio.ch.FileChannelImpl.map0(Native Method)
      at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937)
      ... 28 more

      These two exceptions are happening across all the 3 brokers continously with the same kafka configuration.
      Broker dies with these exceptions.

      Attached the log files for 2 issues of two brokers.
      Also attached is the kafka configuration json data being used.

      Attachments

        1. config.json
          4 kB
          kaushik srinivas
        2. kafka_config.txt
          11 kB
          kaushik srinivas
        3. kafkaServer-gc_agent03.log
          1.42 MB
          kaushik srinivas
        4. kafkaServer-gc_agent04.log
          18.93 MB
          kaushik srinivas
        5. kafkaServer-gc.log
          1.11 MB
          kaushik srinivas
        6. kafkaServer-gc-agent06.7z
          6.66 MB
          kaushik srinivas
        7. map_counts_agent06
          26 kB
          kaushik srinivas
        8. stderr_broker1.txt
          19 kB
          kaushik srinivas
        9. stderr_broker2.txt
          19 kB
          kaushik srinivas
        10. stdout_broker1.txt
          32 kB
          kaushik srinivas
        11. stdout_broker2.txt
          32 kB
          kaushik srinivas

        Activity

          People

            Unassigned Unassigned
            kaushik_srinivas kaushik srinivas
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: