Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-6438

NSEE while concurrently creating and deleting a topic

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.0.0
    • Fix Version/s: None
    • Component/s: controller
    • Labels:
    • Environment:
      kafka_2.11-1.0.0.jar
      OpenJDK Runtime Environment (build 1.8.0_102-b14), OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)
      CentOS Linux release 7.3.1611 (Core)

      Description

      It appears that deleting a topic and creating it at the same time can cause NSEE, what later results in a forced controller shutdown.

      Most probably topics are being created because consumers/producers are still active (yes, this means the deletion is happening blindly).

      The main problem here (for me) is the controller switch, the data loss and following unclean election is acceptable (as we admit to deleting blindly).

      Environment description:
      20 kafka brokers
      80k partitions (20k topics 4partitions each)
      3 node ZK

      Incident:

      [2018-01-09 11:19:05,912] INFO [Topic Deletion Manager 6], Partition deletion callback for mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.TopicDeletionManager)
      [2018-01-09 11:19:06,237] INFO [Controller id=6] New leader and ISR for partition mytopic-0 is {"leader":-1,"leader_epoch":1,"isr":[]} (kafka.controller.KafkaController)
      [2018-01-09 11:19:06,412] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager)
      [2018-01-09 11:19:07,218] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager)
      [2018-01-09 11:19:07,304] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager)
      [2018-01-09 11:19:07,383] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager)
      [2018-01-09 11:19:07,510] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager)
      [2018-01-09 11:19:07,661] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager)
      [2018-01-09 11:19:07,728] INFO [Topic Deletion Manager 6], Deletion for replicas 9,10,11 for partition mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager)
      [2018-01-09 11:19:07,924] INFO [PartitionStateMachine controllerId=6] Invoking state change to OfflinePartition for partitions mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.PartitionStateMachine)
      [2018-01-09 11:19:07,924] INFO [PartitionStateMachine controllerId=6] Invoking state change to NonExistentPartition for partitions mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.PartitionStateMachine)
      [2018-01-09 11:19:08,592] INFO [Controller id=6] New topics: [Set(mytopic, other, other2)], deleted topics: [Set()], new partition replica assignment [Map(other-0 -> Vector(8), mytopic-2 -> Vector(6), mytopic-0 -> Vector(4), other-2 -> Vector(10), mytopic-1 -> Vector(5), mytopic-3 -> Vector(7), other-1 -> Vector(9), other-3 -> Vector(11))] (kafka.controller.KafkaController)
      [2018-01-09 11:19:08,593] INFO [Controller id=6] New topic creation callback for other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 (kafka.controller.KafkaController)
      [2018-01-09 11:19:08,596] INFO [Controller id=6] New partition creation callback for other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 (kafka.controller.KafkaController)
      [2018-01-09 11:19:08,596] INFO [PartitionStateMachine controllerId=6] Invoking state change to NewPartition for partitions other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 (kafka.controller.PartitionStateMachine)
      [2018-01-09 11:19:08,642] INFO [PartitionStateMachine controllerId=6] Invoking state change to OnlinePartition for partitions other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 (kafka.controller.PartitionStateMachine)
      [2018-01-09 11:19:08,828] INFO [Topic Deletion Manager 6], Partition deletion callback for mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.TopicDeletionManager)
      [2018-01-09 11:19:09,127] INFO [Controller id=6] New leader and ISR for partition mytopic-0 is {"leader":-1,"leader_epoch":1,"isr":[]} (kafka.controller.KafkaController)
      [2018-01-09 11:19:09,607] ERROR [controller-event-thread]: Error processing event TopicDeletion(Set(mytopic, other)) (kafka.controller.Contr
      ollerEventManager$ControllerEventThread)
      java.util.NoSuchElementException: key not found: mytopic-0
              at scala.collection.MapLike$class.default(MapLike.scala:228)
              at scala.collection.AbstractMap.default(Map.scala:59)
              at scala.collection.mutable.HashMap.apply(HashMap.scala:65)
              at kafka.controller.ControllerBrokerRequestBatch.kafka$controller$ControllerBrokerRequestBatch$$updateMetadataRequestPartitionInfo$1(ControllerChannelManager.scala:358)
              at kafka.controller.ControllerBrokerRequestBatch$$anonfun$addUpdateMetadataRequestForBrokers$3.apply(ControllerChannelManager.scala:394)
              at kafka.controller.ControllerBrokerRequestBatch$$anonfun$addUpdateMetadataRequestForBrokers$3.apply(ControllerChannelManager.scala:394)
              at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
              at kafka.controller.ControllerBrokerRequestBatch.addUpdateMetadataRequestForBrokers(ControllerChannelManager.scala:394)
              at kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:1039)
              at kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onTopicDeletion(TopicDeletionManager.scala:266)
              at kafka.controller.TopicDeletionManager$$anonfun$resumeDeletions$2.apply(TopicDeletionManager.scala:361)
              at kafka.controller.TopicDeletionManager$$anonfun$resumeDeletions$2.apply(TopicDeletionManager.scala:333)
              at scala.collection.immutable.Set$Set3.foreach(Set.scala:163)
              at kafka.controller.TopicDeletionManager.resumeDeletions(TopicDeletionManager.scala:333)
              at kafka.controller.TopicDeletionManager.enqueueTopicsForDeletion(TopicDeletionManager.scala:113)
              at kafka.controller.KafkaController$TopicDeletion.process(KafkaController.scala:1299)
              at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:53)
              at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:53)
              at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:53)
              at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
              at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:52)
              at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
      [2018-01-09 11:19:09,657] INFO [Controller id=6] New topics: [Set(mytopic)], deleted topics: [Set()], new partition replica assignment [Map(mytopic-2 -> Vector(6), mytopic-0 -> Vector(4), mytopic-1 -> Vector(5), mytopic-3 -> Vector(7))] (kafka.controller.KafkaController)
      [2018-01-09 11:19:09,658] INFO [Controller id=6] New topic creation callback for mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.KafkaController)
      [2018-01-09 11:19:09,658] INFO [Controller id=6] New partition creation callback for mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.KafkaController)
      [2018-01-09 11:19:09,658] INFO [PartitionStateMachine controllerId=6] Invoking state change to NewPartition for partitions mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.PartitionStateMachine)
      [2018-01-09 11:19:09,659] INFO [PartitionStateMachine controllerId=6] Invoking state change to OnlinePartition for partitions mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.PartitionStateMachine)
      [2018-01-09 11:19:09,664] WARN [OfflinePartitionLeaderSelector]: No broker in ISR is alive for mytopic-0. Elect leader 4 from live brokers 4. There's potential data loss. (kafka.controller.OfflinePartitionLeaderSelector)
      [2018-01-09 11:19:09,664] INFO [OfflinePartitionLeaderSelector]: Selected new leader and ISR {"leader":4,"leader_epoch":2,"isr":[4]} for offline partition mytopic-0 (kafka.controller.OfflinePartitionLeaderSelector)
      [2018-01-09 11:19:09,769] ERROR [Controller id=6] Forcing the controller to resign (kafka.controller.KafkaController)
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              adam.kotwasinski Adam Kotwasinski
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: