Kafka
  1. Kafka
  2. KAFKA-1019

kafka-preferred-replica-election.sh will fail without clear error message if /brokers/topics/[topic]/partitions does not exist

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.8.1
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:

      Description

      From Libo Yu:

      I tried to run kafka-preferred-replica-election.sh on our kafka cluster.
      But I got this expection:
      Failed to start preferred replica election
      org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/uattoqaaa.default/partitions

      I checked zookeeper and there is no /brokers/topics/uattoqaaa.default/partitions. All I found is
      /brokers/topics/uattoqaaa.default.

        Activity

        Hide
        Mickael Hemri added a comment -

        Hi,

        We faced the same issue with the latest Kafka version 0.8.1.
        Is there any known workaround ?

        Thanks

        Show
        Mickael Hemri added a comment - Hi, We faced the same issue with the latest Kafka version 0.8.1. Is there any known workaround ? Thanks
        Hide
        Jun Rao added a comment -

        In general, /brokers/topics/[topic]/partitions should be immediately available after a topic is created. Not sure how you get to this state. One possibility is that the controller somehow lost its registered ZK watcher and therefore didn't act on newly created topics. Could you verify that the topic watcher is still registered by the controller? If not, could you make sure that you are using ZK 3.3.4?

        Show
        Jun Rao added a comment - In general, /brokers/topics/ [topic] /partitions should be immediately available after a topic is created. Not sure how you get to this state. One possibility is that the controller somehow lost its registered ZK watcher and therefore didn't act on newly created topics. Could you verify that the topic watcher is still registered by the controller? If not, could you make sure that you are using ZK 3.3.4?
        Hide
        Mickael Hemri added a comment -

        Hi Jun,

        We use ZK 3.4.6, can it cause this issue?
        How can we check that the topic watcher is registered by the controller?

        Thanks

        Show
        Mickael Hemri added a comment - Hi Jun, We use ZK 3.4.6, can it cause this issue? How can we check that the topic watcher is registered by the controller? Thanks
        Hide
        Jun Rao added a comment -

        I am not sure how reliable ZK 3.4.6 is. If you check the ZK admin page, it shows you the command of listing all watchers registered. You want to check if there is a child watcher on /brokers/topics.

        Another thing is to avoid ZK session expiration, since it may expose some corner case bugs.

        Show
        Jun Rao added a comment - I am not sure how reliable ZK 3.4.6 is. If you check the ZK admin page, it shows you the command of listing all watchers registered. You want to check if there is a child watcher on /brokers/topics. Another thing is to avoid ZK session expiration, since it may expose some corner case bugs.
        Hide
        Neha Narkhede added a comment -

        Mickael Hemri were you able to confirm the issue with zookeeper 3.4.6? In any case, we can at least fix the error message when the path doesn't exist.

        Show
        Neha Narkhede added a comment - Mickael Hemri were you able to confirm the issue with zookeeper 3.4.6? In any case, we can at least fix the error message when the path doesn't exist.
        Hide
        Mickael Hemri added a comment -

        I tried with zookeeper 3.3.6 and we have the same issue.
        To reproduce:

        Create a topic named testid

        bin/kafka-topics.sh --topic testid --replication-factor 3 --partition 3 --zookeeper 127.0.0.1:2181/kafka --create
        Created topic "testid".
        ./bin/kafka-topics.sh --topic testid --zookeeper 127.0.0.1:2181/kafka --describe
        Topic:testid	PartitionCount:3	ReplicationFactor:3	Configs:
        	Topic: testid	Partition: 0	Leader: 31985	Replicas: 31985,9920,4580	Isr: 31985,9920,4580
        	Topic: testid	Partition: 1	Leader: 4580	Replicas: 4580,31985,9920	Isr: 4580,31985,9920
        	Topic: testid	Partition: 2	Leader: 9920	Replicas: 9920,4580,31985	Isr: 9920,4580,31985
        

        Ok great, we have leaders and /brokers/topics/testid/partitions in zookeeper

        Delete testid topic

        bin/kafka-run-class.sh kafka.admin.DeleteTopicCommand --topic testid --zookeeper 127.0.0.1:2181/kafka
        deletion succeeded!
        

        Create again a topic named testid

        bin/kafka-topics.sh --topic testid --replication-factor 3 --partition 3 --zookeeper 127.0.0.1:2181/kafka --create
        Created topic "testid".

        Now check:

        ./bin/kafka-topics.sh --topic testid --zookeeper 127.0.0.1:2181/kafka --describe
        Topic:testid	PartitionCount:3	ReplicationFactor:3	Configs:
        	Topic: testid	Partition: 0	Leader: none	Replicas: 31985,4580,9920	Isr: 
        	Topic: testid	Partition: 1	Leader: none	Replicas: 4580,9920,31985	Isr: 
        	Topic: testid	Partition: 2	Leader: none	Replicas: 9920,31985,4580	Isr:

        As you can see we have no leader when we create the topic after a deletion. And there is no /brokers/topics/testid/partitions in zookeeper
        It works again with a different topic name, so it seems that something is not properly deleted with DeleteTopicCommand command.

        We reproduced it on 3 differents zookeeper chroot: 127.0.0.1:2181/kafka, 127.0.0.1:2181/kafka2 and 127.0.0.1:2181/kafka3

        Thanks

        Show
        Mickael Hemri added a comment - I tried with zookeeper 3.3.6 and we have the same issue. To reproduce: Create a topic named testid bin/kafka-topics.sh --topic testid --replication-factor 3 --partition 3 --zookeeper 127.0.0.1:2181/kafka --create Created topic "testid" . ./bin/kafka-topics.sh --topic testid --zookeeper 127.0.0.1:2181/kafka --describe Topic:testid PartitionCount:3 ReplicationFactor:3 Configs: Topic: testid Partition: 0 Leader: 31985 Replicas: 31985,9920,4580 Isr: 31985,9920,4580 Topic: testid Partition: 1 Leader: 4580 Replicas: 4580,31985,9920 Isr: 4580,31985,9920 Topic: testid Partition: 2 Leader: 9920 Replicas: 9920,4580,31985 Isr: 9920,4580,31985 Ok great, we have leaders and /brokers/topics/testid/partitions in zookeeper Delete testid topic bin/kafka-run-class.sh kafka.admin.DeleteTopicCommand --topic testid --zookeeper 127.0.0.1:2181/kafka deletion succeeded! Create again a topic named testid bin/kafka-topics.sh --topic testid --replication-factor 3 --partition 3 --zookeeper 127.0.0.1:2181/kafka --create Created topic "testid" . Now check: ./bin/kafka-topics.sh --topic testid --zookeeper 127.0.0.1:2181/kafka --describe Topic:testid PartitionCount:3 ReplicationFactor:3 Configs: Topic: testid Partition: 0 Leader: none Replicas: 31985,4580,9920 Isr: Topic: testid Partition: 1 Leader: none Replicas: 4580,9920,31985 Isr: Topic: testid Partition: 2 Leader: none Replicas: 9920,31985,4580 Isr: As you can see we have no leader when we create the topic after a deletion. And there is no /brokers/topics/testid/partitions in zookeeper It works again with a different topic name, so it seems that something is not properly deleted with DeleteTopicCommand command. We reproduced it on 3 differents zookeeper chroot: 127.0.0.1:2181/kafka, 127.0.0.1:2181/kafka2 and 127.0.0.1:2181/kafka3 Thanks
        Hide
        Guozhang Wang added a comment -

        Moving to 0.9 now.

        Show
        Guozhang Wang added a comment - Moving to 0.9 now.
        Hide
        hongyu bi added a comment -

        @Mickael Hemri we face the same issue on zookeeper 3.4.5/kafka 0.8.1.1

        Show
        hongyu bi added a comment - @Mickael Hemri we face the same issue on zookeeper 3.4.5/kafka 0.8.1.1
        Hide
        Guozhang Wang added a comment -

        Hongyu, did you follow the same pattern as Mickael to re-produce this issue? From Mickael's pattern it seems to be related to the delete-topic tool (KAFKA-1558).

        Show
        Guozhang Wang added a comment - Hongyu, did you follow the same pattern as Mickael to re-produce this issue? From Mickael's pattern it seems to be related to the delete-topic tool ( KAFKA-1558 ).
        Hide
        hongyu bi added a comment - - edited

        Thanks @Guozhang.
        After diving into source code i got it.

        Show
        hongyu bi added a comment - - edited Thanks @Guozhang. After diving into source code i got it.
        Hide
        Sriharsha Chintalapani added a comment -

        Guozhang Wang Neha Narkhede I don't think this issues exists in the trunk
        I ran the above steps specified by Mickael Hemri with zookeeper 3.4.6

        bin/kafka-topics.sh --describe --topic testid --zookeeper zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
        Topic:testid PartitionCount:3 ReplicationFactor:3 Configs:
        Topic: testid Partition: 0 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
        Topic: testid Partition: 1 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2
        Topic: testid Partition: 2 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
        [kafka@zookeeper1 kafka]$ bin/kafka-topics.sh --delete --topic testid --zookeeper zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
        Topic testid is marked for deletion.
        Note: This will have no impact if delete.topic.enable is not set to true.
        [kafka@zookeeper1 kafka]$ bin/kafka-topics.sh --describe --topic testid --zookeeper zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
        [kafka@zookeeper1 kafka]$ bin/kafka-topics.sh --create --topic testid --replication-factor 3 --partition 3 --zookeeper zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
        Created topic "testid".
        [kafka@zookeeper1 kafka]$ bin/kafka-topics.sh --describe --topic testid --zookeeper zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
        Topic:testid PartitionCount:3 ReplicationFactor:3 Configs:
        Topic: testid Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
        Topic: testid Partition: 1 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
        Topic: testid Partition: 2 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1

        Show
        Sriharsha Chintalapani added a comment - Guozhang Wang Neha Narkhede I don't think this issues exists in the trunk I ran the above steps specified by Mickael Hemri with zookeeper 3.4.6 bin/kafka-topics.sh --describe --topic testid --zookeeper zookeeper1:2181,zookeeper2:2181,zookeeper3:2181 Topic:testid PartitionCount:3 ReplicationFactor:3 Configs: Topic: testid Partition: 0 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic: testid Partition: 1 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic: testid Partition: 2 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 [kafka@zookeeper1 kafka] $ bin/kafka-topics.sh --delete --topic testid --zookeeper zookeeper1:2181,zookeeper2:2181,zookeeper3:2181 Topic testid is marked for deletion. Note: This will have no impact if delete.topic.enable is not set to true. [kafka@zookeeper1 kafka] $ bin/kafka-topics.sh --describe --topic testid --zookeeper zookeeper1:2181,zookeeper2:2181,zookeeper3:2181 [kafka@zookeeper1 kafka] $ bin/kafka-topics.sh --create --topic testid --replication-factor 3 --partition 3 --zookeeper zookeeper1:2181,zookeeper2:2181,zookeeper3:2181 Created topic "testid". [kafka@zookeeper1 kafka] $ bin/kafka-topics.sh --describe --topic testid --zookeeper zookeeper1:2181,zookeeper2:2181,zookeeper3:2181 Topic:testid PartitionCount:3 ReplicationFactor:3 Configs: Topic: testid Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic: testid Partition: 1 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic: testid Partition: 2 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
        Hide
        Neha Narkhede added a comment -

        Thanks for checking, Sriharsha Chintalapani. I wonder if the original issue reported in the JIRA also doesn't exist anymore? The need for a clear error message when preferred replica election is attempted on a topic that doesn't exist.

        Show
        Neha Narkhede added a comment - Thanks for checking, Sriharsha Chintalapani . I wonder if the original issue reported in the JIRA also doesn't exist anymore? The need for a clear error message when preferred replica election is attempted on a topic that doesn't exist.

          People

          • Assignee:
            Sriharsha Chintalapani
            Reporter:
            Guozhang Wang
            Reviewer:
            Neha Narkhede
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development