Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-6613

The controller shouldn't stop partition reassignment after an exception is being thrown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.11.0.2
    • 1.0.0
    • admin, config, controller, core
    • None

    Description

      I issued a partition reassignment command . It created the following entries in the zookeeper .

      But the entry never gets deleted because the partition reassigment hangs gets some exceptions in kafka logs . After that no matter how many hours the movement of partitions to other brokers never happens .

       

      Path in Zookeeper

      get /admin/reassign_partitions
      {"version":1,"partitions":[

      {"topic":"__consumer_offsets","partition":44,"replicas":([1003,1001,1004,1002]}

      ,{"topic":"683ad5e0-3775-4adc-ab55-82fda0761ba9_newTopic9","partition":0,"replicas":[1003,1004,1001,1002]},{"topic":"683ad5e0-3775-4adc-ab55-82fda0761ba9_newTopic1","partition":0,"replicas":[1003,1004,1001,1002]},{"topic":"__CruiseControlMetrics","partition":0,"replicas":[1002,1001,1004,1003]},{"topic":"b1c39c85-aee5-4ea0-90a1-9fc7eedc635b_topic","partition":0,"replicas":[1003,1004,1001,1002]},{"topic":"88ec4bd5-e149-4c98-8e8e-952e86ba5fae_topic","partition":4,"replicas":[1002,1004,1003,1001]},{"topic":"c8c56723-73a5-4a37-93bf-b8ecaf766429_topic","partition":4,"replicas":[1002,1003,1004,1001]},{"topic":"683ad5e0-3775-4adc-ab55-82fda0761ba9_newTopic9","partition":4,"replicas":[1002,1004,1003,1001]},{"topic":"b1c39c85-aee5-4ea0-90a1-9fc7eedc635b_topic","partition":4,"replicas":[1003,1001,1004,1002]},{"topic":"9db0cad2-69f8-4e85-b663-cd3987bd90fe_topic","partition":0,"replicas":[1003,1001,1004]},{"topic":"683ad5e0-3775-4adc-ab55-82fda0761ba9_topic","partition":1,"replicas":[1003,1004,1001,1002]}]}
      cZxid = 0x5000052f8
      ctime = Tue Mar 06 01:27:54 UTC 2018
      mZxid = 0x500005359
      mtime = Tue Mar 06 01:28:06 UTC 2018
      pZxid = 0x5000052f8
      cversion = 0
      dataVersion = 13
      aclVersion = 0
      ephemeralOwner = 0x0
      dataLength = 1114
      numChildren = 0

       

       

      Exception 

       

      ERROR [KafkaApi-1002] Error when handling request {replica_id=1005,max_wait_time=500,min_bytes=1,max_bytes=10485760,isolation_level=0,topics=[{topic=__consumer_offsets,partitions=[

      {partition=41,fetch_offset=0,log_start_offset=0,max_bytes=1048576}

      ]}]} (kafka.server.KafkaApis)
      kafka.common.NotAssignedReplicaException: Leader 1002 failed to record follower 1005's position 0 since the replica is not recognized to be one of the assigned replicas 1001,1002,1004 for partition __consumer_offsets-41.
      at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:274)
      at kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:1092)
      at kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:1089)
      at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
      at kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:1089)
      at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:623)
      at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:606)
      at kafka.server.KafkaApis.handle(KafkaApis.scala:98)
      at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:66)
      at java.lang.Thread.run(Thread.java:745)

       

       

       

      I was expecting it would be recover from that exception move the partitions to other nodes and finally remove the entries in /admin/reassign_partitions after the move has happened.

       

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              chandrakasiraju chandra kasiraju
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: