Kafka
  1. Kafka
  2. KAFKA-341

Create a new single host system test to validate all replicas on 0.8 branch

    Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    1. kafka-341-v1.patch
      41 kB
      John Fung
    2. kafka-341-v2.patch
      38 kB
      John Fung
    3. kafka-341-v3.patch
      39 kB
      John Fung
    4. kafka-341-v4.patch
      39 kB
      John Fung
    5. kafka-341-v5.patch
      40 kB
      John Fung

      Activity

      Hide
      John Fung added a comment - - edited

      This patch has been tested based on the following pre-requisites:
      1. Kafka 0.8 branch rev. 1342841
      2. KAFKA-46 patch applied

      This test will do the following:
      1. 3 brokers are started up and have replicas enabled by default (no. of replicas should be less than or equal to no. of brokers)
      2. The logs will be parsed to find out which broker is the leader
      3. The leader will be stopped (currently commented out until KAFKA-350 is fixed)
      4. Producer will send out some messages
      5. Start the broker which is stopped in step 3 (currently commented out until KAFKA-350 is fixed)
      6. Repeat Step 2 for all 3 brokers.
      7. At the end of the test, it will compare the logs and validate the data file checksum

      Show
      John Fung added a comment - - edited This patch has been tested based on the following pre-requisites: 1. Kafka 0.8 branch rev. 1342841 2. KAFKA-46 patch applied This test will do the following: 1. 3 brokers are started up and have replicas enabled by default (no. of replicas should be less than or equal to no. of brokers) 2. The logs will be parsed to find out which broker is the leader 3. The leader will be stopped (currently commented out until KAFKA-350 is fixed) 4. Producer will send out some messages 5. Start the broker which is stopped in step 3 (currently commented out until KAFKA-350 is fixed) 6. Repeat Step 2 for all 3 brokers. 7. At the end of the test, it will compare the logs and validate the data file checksum
      Hide
      Jun Rao added a comment -

      Thanks for the patch. Some comments:

      1. Can we reuse kafka-run-class.sh in root/bin instead of making a copy in system_test/single_host_multi_brokers/bin?

      2. This test can just focus of the replication of a single cluster. We probably don't need to bundle the mirroring test here. So, no need to set up a target cluster.

      3. Can we print out either PASS or FAIL at the end of the test based on the validation that the test does?

      4. Could you write a README for the test so that people know how to run it and how to interpret the output?

      Show
      Jun Rao added a comment - Thanks for the patch. Some comments: 1. Can we reuse kafka-run-class.sh in root/bin instead of making a copy in system_test/single_host_multi_brokers/bin? 2. This test can just focus of the replication of a single cluster. We probably don't need to bundle the mirroring test here. So, no need to set up a target cluster. 3. Can we print out either PASS or FAIL at the end of the test based on the validation that the test does? 4. Could you write a README for the test so that people know how to run it and how to interpret the output?
      Hide
      Joel Koshy added a comment -

      I remembered I also had some review comments lying around.
      John - I think it's great that we now have this system test to help
      with 0.8 development. Brief comments - may have overlap with
      Jun's:

      • kafka-run-class should be removed.
      • Add readme.
      • run-test.sh:
      • typo on line 56 in var name, but doesn't hurt
      • typo on line 65 in var name, but doesn't hurt
      • line 94-97: cut is brittle for this purpose. Better to use sed/awk.
      • There are a lot of util functions such as info, kill_child_processes
        that we use in all our scripts. It would be good to factor these out
        into a common script and source it from the system tests.
      • No need any references to any target cluster. E.g., zk_target,
        kafka_target, etc. Likewise, should rename *_source_cluster to simply
        *_cluster.
      Show
      Joel Koshy added a comment - I remembered I also had some review comments lying around. John - I think it's great that we now have this system test to help with 0.8 development. Brief comments - may have overlap with Jun's: kafka-run-class should be removed. Add readme. run-test.sh: typo on line 56 in var name, but doesn't hurt typo on line 65 in var name, but doesn't hurt line 94-97: cut is brittle for this purpose. Better to use sed/awk. There are a lot of util functions such as info, kill_child_processes that we use in all our scripts. It would be good to factor these out into a common script and source it from the system tests. No need any references to any target cluster. E.g., zk_target, kafka_target, etc. Likewise, should rename *_source_cluster to simply *_cluster.
      Hide
      John Fung added a comment - - edited

      Hi Jun, Joel,

      Thanks for reviewing this patch. A v2 patch is attached to this JIRA. The following changes are made according to your suggestions.

        • Changes not made from Review Suggestions

      1. An extra copy of kafka-run-class.sh is added to the bin directory of this specific test folder because a customized copy of log4j.properties is required to enable some debugging code in ConsoleConsumer and ProducerPerformance for checksum validation

        • Changes made from Review Suggestions

      1. Move common functions to a separate script at system_test\common\util.sh

      2. Fixed typos (line 56 & 65 in previous version)

      3. Rename *_source_cluster to *_cluster

      4. README added

      5. PASS or FAIL validation is added at the end of the test

      6. Target cluster related code removed

      7. All the "cut" are replaced with "awk"

        • Other changes made

      1. The script will check the no. of brokers required and generate the corresponding no. of server.properties files

      Show
      John Fung added a comment - - edited Hi Jun, Joel, Thanks for reviewing this patch. A v2 patch is attached to this JIRA. The following changes are made according to your suggestions. Changes not made from Review Suggestions 1. An extra copy of kafka-run-class.sh is added to the bin directory of this specific test folder because a customized copy of log4j.properties is required to enable some debugging code in ConsoleConsumer and ProducerPerformance for checksum validation Changes made from Review Suggestions 1. Move common functions to a separate script at system_test\common\util.sh 2. Fixed typos (line 56 & 65 in previous version) 3. Rename *_source_cluster to *_cluster 4. README added 5. PASS or FAIL validation is added at the end of the test 6. Target cluster related code removed 7. All the "cut" are replaced with "awk" Other changes made 1. The script will check the no. of brokers required and generate the corresponding no. of server.properties files
      Hide
      Jun Rao added a comment -

      Thanks for patch v2. I kept getting the following exception. Do you know why?

      2012-06-12 09:24:23 creating topic [mytest] on [localhost:2181]
      creation failed because of org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/ids
      org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/ids
      at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
      at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
      at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413)
      at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409)
      at kafka.utils.ZkUtils$.getChildren(ZkUtils.scala:354)
      at kafka.utils.ZkUtils$.getSortedBrokerList(ZkUtils.scala:69)
      at kafka.admin.CreateTopicCommand$.createTopic(CreateTopicCommand.scala:87)
      at kafka.admin.CreateTopicCommand$.main(CreateTopicCommand.scala:72)
      at kafka.admin.CreateTopicCommand.main(CreateTopicCommand.scala)
      Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/ids
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
      at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
      at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1277)
      at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:99)
      at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416)
      at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413)
      at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
      ... 7 more

      Also, the test seems to be a bit long since there are a few places where we sleep for 10 or 30 seconds. Can those sleep be reduced?

      Show
      Jun Rao added a comment - Thanks for patch v2. I kept getting the following exception. Do you know why? 2012-06-12 09:24:23 creating topic [mytest] on [localhost:2181] creation failed because of org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/ids org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/ids at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409) at kafka.utils.ZkUtils$.getChildren(ZkUtils.scala:354) at kafka.utils.ZkUtils$.getSortedBrokerList(ZkUtils.scala:69) at kafka.admin.CreateTopicCommand$.createTopic(CreateTopicCommand.scala:87) at kafka.admin.CreateTopicCommand$.main(CreateTopicCommand.scala:72) at kafka.admin.CreateTopicCommand.main(CreateTopicCommand.scala) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/ids at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1277) at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:99) at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416) at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) ... 7 more Also, the test seems to be a bit long since there are a few places where we sleep for 10 or 30 seconds. Can those sleep be reduced?
      Hide
      John Fung added a comment -

      Uploaded kafka-341-v3.patch in which more comments are added to common/util.sh

      Show
      John Fung added a comment - Uploaded kafka-341-v3.patch in which more comments are added to common/util.sh
      Hide
      John Fung added a comment -

      Uploaded kafka-341-v4.patch to fix the followings:
      1. reduced sleep time
      2. fixed the way to count checksum in producer log

      Show
      John Fung added a comment - Uploaded kafka-341-v4.patch to fix the followings: 1. reduced sleep time 2. fixed the way to count checksum in producer log
      Hide
      John Fung added a comment -

      Hi Jun,

      I just uploaded a patch "kafka-341-v4.patch" which reduces sleep time. I also tried this patch with the latest 0.8 branch and I couldn't reproduce the exception you saw in ZK. Could you please try again with this latest patch?

      Thanks,
      John

      Show
      John Fung added a comment - Hi Jun, I just uploaded a patch "kafka-341-v4.patch" which reduces sleep time. I also tried this patch with the latest 0.8 branch and I couldn't reproduce the exception you saw in ZK. Could you please try again with this latest patch? Thanks, John
      Hide
      Jun Rao added a comment -

      For sleep time, could we make it more adaptive? Basically, we can periodically check whether certain condition (so that we can do the verification) is met until some max time is reached. That way, we don't have to sleep for the max time in the normal case.

      Otherwise, the patch looks fine. The ZK exception was due to some old brokers running in the background on my machine.

      Show
      Jun Rao added a comment - For sleep time, could we make it more adaptive? Basically, we can periodically check whether certain condition (so that we can do the verification) is met until some max time is reached. That way, we don't have to sleep for the max time in the normal case. Otherwise, the patch looks fine. The ZK exception was due to some old brokers running in the background on my machine.
      Hide
      John Fung added a comment -

      Uploaded kafka-341-v5.patch based on Jun's suggestion to reduce the sleep time.

      The change in this patch is in the validation stage, the script will compare the replicas' data sizes after a sleep of 5 sec repeatedly up to 30 sec max.

      Show
      John Fung added a comment - Uploaded kafka-341-v5.patch based on Jun's suggestion to reduce the sleep time. The change in this patch is in the validation stage, the script will compare the replicas' data sizes after a sleep of 5 sec repeatedly up to 30 sec max.
      Hide
      Jun Rao added a comment -

      John, thanks for the patch. Just committed to 0.8.

      Show
      Jun Rao added a comment - John, thanks for the patch. Just committed to 0.8.

        People

        • Assignee:
          John Fung
          Reporter:
          John Fung
        • Votes:
          0 Vote for this issue
          Watchers:
          3 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development