Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9478

Controller may stop react on partition reassignment command in ZooKeeper



    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.4.0, 2.4.1
    • Fix Version/s: None
    • Component/s: controller, core
    • Labels:


      Seemingly after bdf2446ccce592f3c000290f11de88520327aa19, the controller may stop watching /admin/reassign_partitions node in ZooKeeper and consequently accept partition reassignment commands via ZooKeeper.

      I'm not 100% sure that bdf2446ccce592f3c000290f11de88520327aa19 causes this, but it doesn't reproduce on 3fe6b5e951db8f7184a4098f8ad8a1afb2b2c1a0 - the one right before it.

      Also, reproduces on the trunk HEAD a87decb9e4df5bfa092c26ae4346f65c426f1321.

      How to reproduce

      1. Run ZooKeeper and two Kafka brokers.

      2. Create a topic with 100 partitions and place them on Broker 0:

      distro/bin/kafka-topics.sh --bootstrap-server localhost:9092,localhost:9093 --create \
          --topic foo \
          --replica-assignment $(for i in {0..99}; do echo -n "0,"; done | sed 's/.$$//')

      3. Add some data:

      seq 1 1000000 | bin/kafka-console-producer.sh --broker-list localhost:9092,localhost:9093 --topic foo

      4. Create the partition reassignment node /admin/reassign_partitions in Zoo and shortly after that update the data in the node (even the same value will do). I made a simple Python script for this:

      import time
      import json
      from kazoo.client import KazooClient
      zk = KazooClient(hosts='')
      reassign = {
      	"version": 1,
      for p in range(100):
      	reassign["partitions"].append({"topic": "foo", "partition": p, "replicas": [1]})
      zk.create("/admin/reassign_partitions", json.dumps(reassign).encode())
      zk.set("/admin/reassign_partitions", json.dumps(reassign).encode())

      4. Observe that the controller doesn't react on further updates to /admin/reassign_partitions and doesn't delete the node.

      Also, it can be confirmed with

      echo wchc | nc 2181

      that there is no watch on the node in ZooKeeper (for this, you should run ZooKeeper with 4lw.commands.whitelist=*).

      Since it's about timing, it might not work on first attempt, so you might need to do 4 a couple of times. However, the reproducibility rate is pretty high.

      The data in the topic and the big amount of partitions are not needed per se, only to make the timing more favourable.

      Controller re-election will solve the issue, but a new controller can be put in this state the same way.

      Proposed solution

      TBD, suggestions are welcome.



          Issue Links



              • Assignee:
                ivanyu Ivan Yurchenko
                ivanyu Ivan Yurchenko
              • Votes:
                1 Vote for this issue
                10 Start watching this issue


                • Created: