Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-6343

OOM as the result of creation of 5k topics

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.1.1, 0.10.2.0, 0.10.2.1, 0.11.0.1, 0.11.0.2, 1.0.0
    • Fix Version/s: 2.1.0
    • Component/s: core
    • Labels:
      None
    • Environment:
      RHEL 7, RAM 755GB per host

      Description

      Reproducing: Create 5k topics from the code quickly, without any delays. Wait until brokers will finish loading them. This will actually never happen, since all brokers will go down one by one after approx 10-15 minutes or more, depending on the hardware.

      Heap: -Xmx/Xms: 5G, 10G, 50G, 256G, 512G

      Topology: 3 brokers, 3 zk.

      Code for 5k topic creation:

      package kafka
      import kafka.admin.AdminUtils
      import kafka.utils.{Logging, ZkUtils}
      
      object TestCreateTopics extends App with Logging {
      
        val zkConnect = "grid978:2185"
        var zkUtils = ZkUtils(zkConnect, 6000, 6000, isZkSecurityEnabled = false)
      
        for (topic <- 1 to 5000) {
          AdminUtils.createTopic(
            topic             = s"${topic.toString}",
            partitions        = 10,
            replicationFactor = 2,
            zkUtils           = zkUtils
          )
          logger.info(s"Created topic ${topic.toString}")
        }
      }
      

      Cause of death:

          java.io.IOException: Map failed
              at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:920)
              at kafka.log.AbstractIndex.<init>(AbstractIndex.scala:61)
              at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:52)
              at kafka.log.LogSegment.<init>(LogSegment.scala:67)
              at kafka.log.Log.loadSegments(Log.scala:255)
              at kafka.log.Log.<init>(Log.scala:108)
              at kafka.log.LogManager.createLog(LogManager.scala:362)
              at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:94)
              at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
              at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
              at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
              at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:174)
              at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:168)
              at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
              at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:242)
              at kafka.cluster.Partition.makeLeader(Partition.scala:168)
              at kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:758)
              at kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:757)
              at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
              at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
              at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
              at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
              at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
              at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:757)
              at kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:703)
              at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148)
              at kafka.server.KafkaApis.handle(KafkaApis.scala:82)
              at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.OutOfMemoryError: Map failed
              at sun.nio.ch.FileChannelImpl.map0(Native Method)
              at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:917)
              ... 28 more
      

      Broker restart results the same OOM issues. All brokers will not be able to start again.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                alex.dunayevsky Alex Dunayevsky
                Reporter:
                alex.dunayevsky Alex Dunayevsky
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: