Kafka
  1. Kafka
  2. KAFKA-440

Create a regression test framework for distributed environment testing

    Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: None

      Description

      Initial requirements:

      1. The whole test framework is preferably coded in Python (a common scripting language which has well supported features)

      2. Test framework driver should be generic (distributed environment can be local host)

      3. Test framework related configurations are defined in JSON format

      4. Test environment, suite, case definitions may be defined in the following levels:
      4-a entity_id is used as a key for looking up related config from different levels

      4-b Cluster level defines: entity_id, hostname, kafka_home, java_home, ...

      4-c Test suite / case level defines:
      4-c-1 zookeeper: entity_id, clientPort, dataDir, log_filename, config_filename
      4-c-2 broker: entity_id, port, log.file.size, log.dir, log_filename, config_filename
      4-c-3 producer: entity_id, topic, threads, compression-codec, message-size, log_filename, config_filename

      1. kafka-440-v7.patch
        96 kB
        John Fung
      2. kafka-440-v6.patch
        93 kB
        John Fung
      3. kafka-440-v5.patch
        89 kB
        John Fung
      4. kafka-440-v3.patch
        84 kB
        John Fung
      5. kafka-440-v2.patch
        72 kB
        John Fung
      6. kafka-440-v1.patch
        64 kB
        John Fung

        Issue Links

        1.
        Simple message replication test Sub-task Closed John Fung

        0%

        Original Estimate - 48h
        Remaining Estimate - 48h
         
        2.
        Simple message replication test with varied producer acks Sub-task Closed John Fung

        0%

        Original Estimate - 24h
        Remaining Estimate - 24h
         
        3.
        Simple message replication test with multiple partitions Sub-task Closed John Fung

        0%

        Original Estimate - 24h
        Remaining Estimate - 24h
         
        4.
        Simple message replication test with multiple partitions and log segments Sub-task Closed John Fung

        0%

        Original Estimate - 24h
        Remaining Estimate - 24h
         
        5.
        Message replication with failures Sub-task Closed John Fung

        0%

        Original Estimate - 48h
        Remaining Estimate - 48h
         
        6.
        Message replication with failures and varied producer acks Sub-task Closed John Fung

        0%

        Original Estimate - 24h
        Remaining Estimate - 24h
         
        7.
        Message replication with failures and multiple partitions and log segments Sub-task Closed John Fung

        0%

        Original Estimate - 24h
        Remaining Estimate - 24h
         
        8.
        Message replication with failures and varied replication factor Sub-task Closed John Fung

        0%

        Original Estimate - 48h
        Remaining Estimate - 48h
         
        9.
        Leader election test Sub-task Closed John Fung

        0%

        Original Estimate - 24h
        Remaining Estimate - 24h
         
        10.
        Test in sync replica changes due to slow or stuck followers Sub-task Closed John Fung

        0%

        Original Estimate - 72h
        Remaining Estimate - 72h
         
        11. Refactor and optimize system tests Sub-task Open Unassigned  
         
        12.
        broker-list in testcase_1_properties.json should be picked up by the framework automatically Sub-task Resolved John Fung  
         
        13. Move start_consumer & start_producer inside "start_entity_in_background" Sub-task Open John Fung  
         
        14.
        Make producer to run for the entire duration of the System Test Sub-task Resolved John Fung  
         
        15.
        The system test is preferred to run out of the box without the need to update cluster_config.json Sub-task Resolved John Fung  
         
        16.
        Move log settings in the beginning of system_test_runner.py to an external log properties file Sub-task Resolved John Fung  
         
        17. Support MacOS for this test framework Sub-task Open John Fung  
         
        18.
        Rethrow exceptions to top level as much as possible Sub-task Resolved John Fung  
         
        19.
        Make sure all background running processes are terminated properly Sub-task Resolved John Fung  
         
        20.
        Port Mirroring System Test to this python system test framework Sub-task Resolved John Fung  
         
        21.
        Sometimes the python system test framework doesn't terminate all running processes Sub-task Resolved John Fung  
         
        22.
        Relative paths should be used for svg URLs in dashboards html Sub-task Resolved Unassigned  
         
        23.
        Simplify setup / initialization in replication_basic_test.py Sub-task Resolved John Fung  
         
        24.
        Support "testcase_to_run" or "testcase_to_skip" Sub-task Resolved John Fung  
         
        25.
        Shut down ZK last to avoid hanging brokers running processes Sub-task Resolved John Fung  
         
        26.
        Add more test cases to System Test Sub-task Closed John Fung  
         
        27. Add documentation for system tests Sub-task Open Unassigned  
         

          Activity

          Hide
          John Fung added a comment -

          Thank you Neha. kafka-440-v8.patch is removed from here and uploaded to KAFKA-483 as kafka-483-v1.patch.

          Show
          John Fung added a comment - Thank you Neha. kafka-440-v8.patch is removed from here and uploaded to KAFKA-483 as kafka-483-v1.patch.
          Hide
          Neha Narkhede added a comment -

          Thanks for the patch! Could you please upload it to one of those sub-tasks instead ? This JIRA was tracking the basic test framework and we have already checked in the patch for this.

          Show
          Neha Narkhede added a comment - Thanks for the patch! Could you please upload it to one of those sub-tasks instead ? This JIRA was tracking the basic test framework and we have already checked in the patch for this.
          Hide
          John Fung added a comment -

          Uploaded kafka-440-v8.patch with the following fixed:

          KAFKA-487 (Sub-Task 19) Make sure all background running processes are terminated properly

          KAFKA-486 (Sub-Task 18) Rethrow exceptions to top level as much as possible

          KAFKA-483 (Sub-Task 15) The system test is preferred to run out of the box without the need to update cluster_config.json

          KAFKA-477 (Sub-Task 12) broker-list in testcase_1_properties.json should be picked up by the framework automatically

          Show
          John Fung added a comment - Uploaded kafka-440-v8.patch with the following fixed: KAFKA-487 (Sub-Task 19) Make sure all background running processes are terminated properly KAFKA-486 (Sub-Task 18) Rethrow exceptions to top level as much as possible KAFKA-483 (Sub-Task 15) The system test is preferred to run out of the box without the need to update cluster_config.json KAFKA-477 (Sub-Task 12) broker-list in testcase_1_properties.json should be picked up by the framework automatically
          Hide
          Neha Narkhede added a comment - - edited

          +1

          John, thanks for patch v7. It works well now, the only thing that needs change is the kafka_home in cluster_config.json. What I suggest is to default kafka_home to the current kafka directory. The advantage is that the test will just run out-of-the-box without any changes on a fresh checkout of the codebase. You can probably fix this in a separate JIRA.

          Also, please can you file another JIRA to convert the existing mirroring system test to use this new framework ?

          Show
          Neha Narkhede added a comment - - edited +1 John, thanks for patch v7. It works well now, the only thing that needs change is the kafka_home in cluster_config.json. What I suggest is to default kafka_home to the current kafka directory. The advantage is that the test will just run out-of-the-box without any changes on a fresh checkout of the codebase. You can probably fix this in a separate JIRA. Also, please can you file another JIRA to convert the existing mirroring system test to use this new framework ?
          Hide
          John Fung added a comment - - edited

          Uploaded kafka-440-v7.patch which includes the following changes:

          1. cluster_config.json will be checked if the test are being run in localhost. Otherwise, the whole directory of <kafka_home> will be copied (by rsync) over to the individual remote host at the destination of "<hostname>:<kafka_home>" specified in cluster_config.json

          2. Updated the string pattern to find the leader election log message. Now the specific testcase will fail and gets recorded if a leader cannot be found. Then the framework will move on to the next testcase.

          ===========
          Known Issues:
          ===========
          1. In system_test/replication_testsuite/testcase_1/testcase_1_properties.json, "broker-list" has to be manually updated if the testing is to be run in remote hosts. Otherwise, test will not run properly.

          2. Sometimes, the running processes may not be terminated properly by the script. This will be fixed soon.

          3. If this test script is being run in a distributed environment, the "kafka_home" setting for each entity in cluster_config.json must be the same as the kafka_home of the host that kicks off the test. For example, there are 4 hosts: host1, host2, host3, host4. The original copy of kafka source is located in host1 at /export/home/user1/kafka. The test will be kicked off from host1. The test script is expecting the kafka_home of host2, host3, host4 to be at /export/home/user1/kafka and copy the local copy of kafka at host1 to the other remote hosts. This kafka_home setting has to be updated in cluster_config.json. This is a known issue and will be fixed soon.

          Show
          John Fung added a comment - - edited Uploaded kafka-440-v7.patch which includes the following changes: 1. cluster_config.json will be checked if the test are being run in localhost. Otherwise, the whole directory of <kafka_home> will be copied (by rsync) over to the individual remote host at the destination of "<hostname>:<kafka_home>" specified in cluster_config.json 2. Updated the string pattern to find the leader election log message. Now the specific testcase will fail and gets recorded if a leader cannot be found. Then the framework will move on to the next testcase. =========== Known Issues: =========== 1. In system_test/replication_testsuite/testcase_1/testcase_1_properties.json, "broker-list" has to be manually updated if the testing is to be run in remote hosts. Otherwise, test will not run properly. 2. Sometimes, the running processes may not be terminated properly by the script. This will be fixed soon. 3. If this test script is being run in a distributed environment, the "kafka_home" setting for each entity in cluster_config.json must be the same as the kafka_home of the host that kicks off the test. For example, there are 4 hosts: host1, host2, host3, host4. The original copy of kafka source is located in host1 at /export/home/user1/kafka. The test will be kicked off from host1. The test script is expecting the kafka_home of host2, host3, host4 to be at /export/home/user1/kafka and copy the local copy of kafka at host1 to the other remote hosts. This kafka_home setting has to be updated in cluster_config.json. This is a known issue and will be fixed soon.
          Hide
          John Fung added a comment -

          Uploaded kafka-440-v6.patch with the following changes:

          1. Updated README.txt

          2. The class KafkaTestcaseEnv is eliminated such that a generic class TestcaseEnv is used. The idea is that all product specific environment variables can be added to TestcaseEnv.userDefinedEnvVarDict which will be recycled on each testcase. Therefore, this class will be product independent.

          3. Different entity log will go to the corresponding log directory such as logs/broker-1 or logs/producer_performance-4, ... etc

          4. At the starting of the test, the test framework will validate the kafka_home and java_home specified in each host specified in cluster_config.json to make sure that the user has already updated their environment accordingly. Otherwise, it prints out error message and aborts the test.

          5. At the end of the test, "kafka_system_test_utils.stop_remote_entity" will be called to terminate the running processes. This patch also cover the situation:

          • when user presses "Ctrl-C"
          • when runtime exception occurs.
          Show
          John Fung added a comment - Uploaded kafka-440-v6.patch with the following changes: 1. Updated README.txt 2. The class KafkaTestcaseEnv is eliminated such that a generic class TestcaseEnv is used. The idea is that all product specific environment variables can be added to TestcaseEnv.userDefinedEnvVarDict which will be recycled on each testcase. Therefore, this class will be product independent. 3. Different entity log will go to the corresponding log directory such as logs/broker-1 or logs/producer_performance-4, ... etc 4. At the starting of the test, the test framework will validate the kafka_home and java_home specified in each host specified in cluster_config.json to make sure that the user has already updated their environment accordingly. Otherwise, it prints out error message and aborts the test. 5. At the end of the test, "kafka_system_test_utils.stop_remote_entity" will be called to terminate the running processes. This patch also cover the situation: when user presses "Ctrl-C" when runtime exception occurs.
          Hide
          Neha Narkhede added a comment -

          Also, when I run the test, the data validation fails -

          2012-08-20 12:18:50,193 - INFO - ======================================================
          2012-08-20 12:18:50,194 - INFO - validating data matched
          2012-08-20 12:18:50,194 - INFO - ======================================================
          2012-08-20 12:18:50,196 - INFO - See /home/nnarkhed/Projects/kafka-440/system_test/replication_testsuite/testcase_1/msg_id_missing_in_consumer.log for missing MessageID (kafka_system_test_utils)
          2012-08-20 12:18:50,196 - INFO - ================================================
          2012-08-20 12:18:50,196 - INFO - collecting logs from remote machines
          2012-08-20 12:18:50,196 - INFO - ================================================

          2012-08-20 12:18:52,415 - INFO - =================================================
          2012-08-20 12:18:52,416 - INFO - TEST REPORTS
          2012-08-20 12:18:52,416 - INFO - =================================================
          2012-08-20 12:18:52,416 - INFO - test_case_name : testcase_1
          2012-08-20 12:18:52,416 - INFO - test_class_name : ReplicaBasicTest
          2012-08-20 12:18:52,416 - INFO - validation_status :
          2012-08-20 12:18:52,416 - INFO - Validate for data matched : FAILED
          2012-08-20 12:18:52,416 - INFO - Validate leader election successful : PASSED

          But when I look at /home/nnarkhed/Projects/kafka-440/system_test/replication_testsuite/testcase_1/msg_id_missing_in_consumer.log, it is empty.

          Another observation -

          2012-08-20 12:17:14,227 - INFO - ======================================================
          2012-08-20 12:17:14,227 - INFO - starting brokers
          2012-08-20 12:17:14,228 - INFO - ======================================================
          2012-08-20 12:17:14,228 - INFO - field not found: clientPort (system_test_utils)
          2012-08-20 12:17:14,228 - INFO - starting broker in host [localhost] on client port [] (kafka_system_test_utils)

          It complains that it cannot find clientPort while starting kafka broker. But clientPort is applicable only to zookeeper, not kafka. I think that is the reason it doesn't print out the Kafka server port above.

          Show
          Neha Narkhede added a comment - Also, when I run the test, the data validation fails - 2012-08-20 12:18:50,193 - INFO - ====================================================== 2012-08-20 12:18:50,194 - INFO - validating data matched 2012-08-20 12:18:50,194 - INFO - ====================================================== 2012-08-20 12:18:50,196 - INFO - See /home/nnarkhed/Projects/kafka-440/system_test/replication_testsuite/testcase_1/msg_id_missing_in_consumer.log for missing MessageID (kafka_system_test_utils) 2012-08-20 12:18:50,196 - INFO - ================================================ 2012-08-20 12:18:50,196 - INFO - collecting logs from remote machines 2012-08-20 12:18:50,196 - INFO - ================================================ 2012-08-20 12:18:52,415 - INFO - ================================================= 2012-08-20 12:18:52,416 - INFO - TEST REPORTS 2012-08-20 12:18:52,416 - INFO - ================================================= 2012-08-20 12:18:52,416 - INFO - test_case_name : testcase_1 2012-08-20 12:18:52,416 - INFO - test_class_name : ReplicaBasicTest 2012-08-20 12:18:52,416 - INFO - validation_status : 2012-08-20 12:18:52,416 - INFO - Validate for data matched : FAILED 2012-08-20 12:18:52,416 - INFO - Validate leader election successful : PASSED But when I look at /home/nnarkhed/Projects/kafka-440/system_test/replication_testsuite/testcase_1/msg_id_missing_in_consumer.log, it is empty. Another observation - 2012-08-20 12:17:14,227 - INFO - ====================================================== 2012-08-20 12:17:14,227 - INFO - starting brokers 2012-08-20 12:17:14,228 - INFO - ====================================================== 2012-08-20 12:17:14,228 - INFO - field not found: clientPort (system_test_utils) 2012-08-20 12:17:14,228 - INFO - starting broker in host [localhost] on client port [] (kafka_system_test_utils) It complains that it cannot find clientPort while starting kafka broker. But clientPort is applicable only to zookeeper, not kafka. I think that is the reason it doesn't print out the Kafka server port above.
          Hide
          Neha Narkhede added a comment -

          My bad, I forgot to replace kafka_home in cluster_config.json. However, I'm sure people will forget to do the same as well. How about -

          1. Defaulting kafka_home to the current project's home directory in the cluster_config.json using localhost. That means, the cluster_config.json that ships with kafka will not override kafka_home and will run on localhost by default.
          2. On test startup, verify that the kafka_home directory exists. If not, exit the test with a meaningful error message.

          Show
          Neha Narkhede added a comment - My bad, I forgot to replace kafka_home in cluster_config.json. However, I'm sure people will forget to do the same as well. How about - 1. Defaulting kafka_home to the current project's home directory in the cluster_config.json using localhost. That means, the cluster_config.json that ships with kafka will not override kafka_home and will run on localhost by default. 2. On test startup, verify that the kafka_home directory exists. If not, exit the test with a meaningful error message.
          Hide
          Neha Narkhede added a comment -

          This time it fails with the following error -

          2012-08-20 11:18:55,963 - INFO - ======================================================
          2012-08-20 11:18:55,963 - INFO - bounce_leader flag : true
          2012-08-20 11:18:55,963 - INFO - ======================================================
          Traceback (most recent call last):
          File "system_test_runner.py", line 143, in <module>
          main()
          File "system_test_runner.py", line 123, in main
          instance.runTest()
          File "/home/nnarkhed/Projects/kafka-440/system_test/replication_testsuite/replica_basic_test.py", line 193, in runTest
          if self.testcaseEnv.validationStatusDict["Validate leader election successful"] == "FAILED":
          KeyError: 'Validate leader election successful'

          Please could you checkout 0.8, apply your patch, build the code and run the test, before submitting a new patch ?

          Show
          Neha Narkhede added a comment - This time it fails with the following error - 2012-08-20 11:18:55,963 - INFO - ====================================================== 2012-08-20 11:18:55,963 - INFO - bounce_leader flag : true 2012-08-20 11:18:55,963 - INFO - ====================================================== Traceback (most recent call last): File "system_test_runner.py", line 143, in <module> main() File "system_test_runner.py", line 123, in main instance.runTest() File "/home/nnarkhed/Projects/kafka-440/system_test/replication_testsuite/replica_basic_test.py", line 193, in runTest if self.testcaseEnv.validationStatusDict ["Validate leader election successful"] == "FAILED": KeyError: 'Validate leader election successful' Please could you checkout 0.8, apply your patch, build the code and run the test, before submitting a new patch ?
          Hide
          John Fung added a comment -

          Hi Neha,

          Thanks for reviewing the patch. I have fixed the issue and upload the fix in kafka-440-v5.patch.

          Please do the following after downloading the patch:

          1. Please add the following log4j settings to <kafka_home>/config/log4j.properties to allow ProducerPerformance to print out MessageID

          log4j.logger.kafka.perf=DEBUG
          log4j.logger.kafka.perf.ProducerPerformance$ProducerThread=DEBUG

          2. Please update <kafka_home>/system_test/cluster_config.json for the following attributes specific to you local machine:

          hostname
          kafka_home
          java_home

          3. To turn on debug logging, update <kafka_home>/system_test/system_test_runner.py:

          #namedLogger.setLevel(logging.INFO)
          namedLogger.setLevel(logging.DEBUG)

          Show
          John Fung added a comment - Hi Neha, Thanks for reviewing the patch. I have fixed the issue and upload the fix in kafka-440-v5.patch. Please do the following after downloading the patch: 1. Please add the following log4j settings to <kafka_home>/config/log4j.properties to allow ProducerPerformance to print out MessageID log4j.logger.kafka.perf=DEBUG log4j.logger.kafka.perf.ProducerPerformance$ProducerThread=DEBUG 2. Please update <kafka_home>/system_test/cluster_config.json for the following attributes specific to you local machine: hostname kafka_home java_home 3. To turn on debug logging, update <kafka_home>/system_test/system_test_runner.py: #namedLogger.setLevel(logging.INFO) namedLogger.setLevel(logging.DEBUG)
          Hide
          Neha Narkhede added a comment -

          Thanks for the patch! I tried to run the test, but got the following error -

          2012-08-17 18:18:23,852 - INFO - ======================================================
          2012-08-17 18:18:23,853 - INFO - validating leader election
          2012-08-17 18:18:23,853 - INFO - ======================================================
          Traceback (most recent call last):
          File "system_test_runner.py", line 143, in <module>
          main()
          File "system_test_runner.py", line 123, in main
          instance.runTest()
          File "/home/nnarkhed/Projects/kafka-440/system_test/replication_testsuite/replica_basic_test.py", line 186, in runTest
          kafkaTestcaseEnv, leaderDict, self.testcaseEnv.validationStatusDict)
          File "/home/nnarkhed/Projects/kafka-440/system_test/utils/kafka_system_test_utils.py", line 558, in validate_leader_election_successful
          leaderBrokerId = leaderDict["brokerid"]
          KeyError: 'brokerid'

          Show
          Neha Narkhede added a comment - Thanks for the patch! I tried to run the test, but got the following error - 2012-08-17 18:18:23,852 - INFO - ====================================================== 2012-08-17 18:18:23,853 - INFO - validating leader election 2012-08-17 18:18:23,853 - INFO - ====================================================== Traceback (most recent call last): File "system_test_runner.py", line 143, in <module> main() File "system_test_runner.py", line 123, in main instance.runTest() File "/home/nnarkhed/Projects/kafka-440/system_test/replication_testsuite/replica_basic_test.py", line 186, in runTest kafkaTestcaseEnv, leaderDict, self.testcaseEnv.validationStatusDict) File "/home/nnarkhed/Projects/kafka-440/system_test/utils/kafka_system_test_utils.py", line 558, in validate_leader_election_successful leaderBrokerId = leaderDict ["brokerid"] KeyError: 'brokerid'
          Hide
          John Fung added a comment - - edited

          Uploaded kafka-440-v4.patch for the following changes:

          8. The test framework will collect logs from remote machines and copy to the local machine according to the directory structure.

          9. This Api is available to construct the log dir path with the input arguments:
          kafka_system_test_utils.construct_logdir_pathname(testcaseEnv, role, entityId, type)

            • This patch has been tested in a distributed environment
          Show
          John Fung added a comment - - edited Uploaded kafka-440-v4.patch for the following changes: 8. The test framework will collect logs from remote machines and copy to the local machine according to the directory structure. 9. This Api is available to construct the log dir path with the input arguments: kafka_system_test_utils.construct_logdir_pathname(testcaseEnv, role, entityId, type) This patch has been tested in a distributed environment
          Hide
          John Fung added a comment -

          Hi Neha,

          Thank you for your review. Your suggested changes are made in kafka-440-v3.patch and is uploaded. Please see my comments inline:

          1. Standardize utils into system_test/utils. Remove lib. The libraries that go in here should be available to all other scripts through an 'import library' statement.

          • done

          1.1. Rename li_reg_test_helper to KafkaSystemTestUtils. This has Kafka specific helper APIs. Other util APIs can go in system_test/utils/SystemTestUtils.

          • done

          1.2. Can we include the logger in there as well ? Every script should be able to import that.

          • done

          1.3. Rename RtLogging to Logger.

          • done

          1.4. We shouldn't have to pass around the rtLogger object everywhere. I hope this can be available as a module level static variable. Once I import Logger, I should just be able to use a variable named logger to log statements

          • Two loggers are defined and created in the main script "system_test_runner.py":
          • namedLogger – to log a message with the class name appended at the end for easy debugging such as:
            2012-08-16 23:39:02,701 - INFO - field not found: clientPort (system_test_utils)
          • anonymousLogger – to log a message with generic info such as:
            2012-08-16 23:39:00,698 - INFO - sleeping for 2s

          1.5. Rename RegTestEnv to SystemTestEnv. Let's add all useful environment variables here like – kafkaBaseDir, systemTestBaseDir, testSuiteBaseDir, testCaseBaseDir, testCaseLogDir etc

          • These environment variables are now available in class TestcaseEnv and the sample usage to retrieve them is documented in kafka_system_test_utils.py

          1.6. replica_basic_test.py currently doesn't have access to the above environment variables which makes it awkward to use.

          • It's now documented in replica_basic_test.py on how to use them

          2. Rename reg_test* to SystemTestUtils or something like that

          • done

          3. Rename suite_replication to replication_test_suite. You might want to name every test suite as *_test_suite so that the scripts can detect that as a test suite.

          • done

          4. Rename reg_test_driver to test_driver.

          • It is now renamed to system_test_runner.py which is more appropriate for its purpose.

          5. It will be good to follow some Python style guide like this one - http://www.python.org/dev/peps/pep-0008/#descriptive-naming-styles
          5.1. Package names should be all lower case and preferably without underscores, al though sometimes this is difficult to avoid. So something like reg_test_helper could just be utils/testutils.
          5.2. Class names should be CamelCase, with first letter capitalized.
          5.3. Function names should be all lower case with words separated by underscores.
          5.4. Constants should be defined at module level with all letters capitalized and separated with underscores.

          • The changes are now complying with the style guidance

          6. A lot of commands need to prepend 'ssh host' to them. How about having a helper API run_at_host_command(host, command) that will do this and return the command string ?

          • The more lengthy and heavily "escaped" command string are now in list (array) and is easier to read

          7. Also, all commands need to specify the host to run on, the path to the script, some arguments to that script and an output file. It will be nice to add a helper API to do that.

          • For those functions that require hostname but has less than 3 input arguments, the hostname will be specified in the function arguments. However, for those functions which have much more input arguments, the SystemTestEnv and KafkaTestcaseEnv will be passed into the function and it will figure out the hostname, entity_id, … Etc by helper function system_test_utils.get_data_by_lookup_keyval()

          8. It will be good to structure the testcase/logs directory by role-entityid. The reason is that we want to collect metrics and plot graphs for every entity. These will involve generating several csv/svg/png files per entity. Instead of putting them all in one testcase/logs directory, how about having the following structure -
          1. testcase/logs
          2. |__ zookeeper-0
          3. |__ metrics
          4. ||__ dashboards
          5. |__ kafka-1
          6. |__ metrics
          7. |__ dashboards

          • The framework now supports this directory structure. But I will create a separate JIRA to collect all logs from remote hosts to local machine.

          9. To make it easier to use the directory structure above, I think we should have access to helper APIs that given the entity id can return the path to the metrics/dashboards directories. This will be used by the APIs that collect metrics and plot graphs.

          • This will be supported in the new JIRA created in Item 8 above

          10. cluster_config.json describes each entity with a set of properties. It might be easier to define a class called TestEntity that has all these properties. On startup, the test will read cluster_config.json and create a map/list of TestEntity objects. The test scripts should have access to these.

          • This will be supported in the new JIRA created in Item 8 above

          11. How about having lifecycle management for all testcases in a test suite similar to Junit ? For example, all test cases in one test suite can have a setup and teardown method, where common tasks can be performed. However, you can probably do this as part of another JIRA.

          • This will be supported in the new JIRA created in Item 8 above
          Show
          John Fung added a comment - Hi Neha, Thank you for your review. Your suggested changes are made in kafka-440-v3.patch and is uploaded. Please see my comments inline: 1. Standardize utils into system_test/utils. Remove lib. The libraries that go in here should be available to all other scripts through an 'import library' statement. done 1.1. Rename li_reg_test_helper to KafkaSystemTestUtils. This has Kafka specific helper APIs. Other util APIs can go in system_test/utils/SystemTestUtils. done 1.2. Can we include the logger in there as well ? Every script should be able to import that. done 1.3. Rename RtLogging to Logger. done 1.4. We shouldn't have to pass around the rtLogger object everywhere. I hope this can be available as a module level static variable. Once I import Logger, I should just be able to use a variable named logger to log statements Two loggers are defined and created in the main script "system_test_runner.py": namedLogger – to log a message with the class name appended at the end for easy debugging such as: 2012-08-16 23:39:02,701 - INFO - field not found: clientPort (system_test_utils) anonymousLogger – to log a message with generic info such as: 2012-08-16 23:39:00,698 - INFO - sleeping for 2s 1.5. Rename RegTestEnv to SystemTestEnv. Let's add all useful environment variables here like – kafkaBaseDir, systemTestBaseDir, testSuiteBaseDir, testCaseBaseDir, testCaseLogDir etc These environment variables are now available in class TestcaseEnv and the sample usage to retrieve them is documented in kafka_system_test_utils.py 1.6. replica_basic_test.py currently doesn't have access to the above environment variables which makes it awkward to use. It's now documented in replica_basic_test.py on how to use them 2. Rename reg_test* to SystemTestUtils or something like that done 3. Rename suite_replication to replication_test_suite. You might want to name every test suite as *_test_suite so that the scripts can detect that as a test suite. done 4. Rename reg_test_driver to test_driver. It is now renamed to system_test_runner.py which is more appropriate for its purpose. 5. It will be good to follow some Python style guide like this one - http://www.python.org/dev/peps/pep-0008/#descriptive-naming-styles 5.1. Package names should be all lower case and preferably without underscores, al though sometimes this is difficult to avoid. So something like reg_test_helper could just be utils/testutils. 5.2. Class names should be CamelCase, with first letter capitalized. 5.3. Function names should be all lower case with words separated by underscores. 5.4. Constants should be defined at module level with all letters capitalized and separated with underscores. The changes are now complying with the style guidance 6. A lot of commands need to prepend 'ssh host' to them. How about having a helper API run_at_host_command(host, command) that will do this and return the command string ? The more lengthy and heavily "escaped" command string are now in list (array) and is easier to read 7. Also, all commands need to specify the host to run on, the path to the script, some arguments to that script and an output file. It will be nice to add a helper API to do that. For those functions that require hostname but has less than 3 input arguments, the hostname will be specified in the function arguments. However, for those functions which have much more input arguments, the SystemTestEnv and KafkaTestcaseEnv will be passed into the function and it will figure out the hostname, entity_id, … Etc by helper function system_test_utils.get_data_by_lookup_keyval() 8. It will be good to structure the testcase/logs directory by role-entityid. The reason is that we want to collect metrics and plot graphs for every entity. These will involve generating several csv/svg/png files per entity. Instead of putting them all in one testcase/logs directory, how about having the following structure - 1. testcase/logs 2. |__ zookeeper-0 3. |__ metrics 4. ||__ dashboards 5. |__ kafka-1 6. |__ metrics 7. |__ dashboards The framework now supports this directory structure. But I will create a separate JIRA to collect all logs from remote hosts to local machine. 9. To make it easier to use the directory structure above, I think we should have access to helper APIs that given the entity id can return the path to the metrics/dashboards directories. This will be used by the APIs that collect metrics and plot graphs. This will be supported in the new JIRA created in Item 8 above 10. cluster_config.json describes each entity with a set of properties. It might be easier to define a class called TestEntity that has all these properties. On startup, the test will read cluster_config.json and create a map/list of TestEntity objects. The test scripts should have access to these. This will be supported in the new JIRA created in Item 8 above 11. How about having lifecycle management for all testcases in a test suite similar to Junit ? For example, all test cases in one test suite can have a setup and teardown method, where common tasks can be performed. However, you can probably do this as part of another JIRA. This will be supported in the new JIRA created in Item 8 above
          Hide
          Neha Narkhede added a comment -

          Thanks for patch v2! Overall, it looks like a good start. Here are some review comments -

          1. Standardize utils into system_test/utils. Remove lib. The libraries that go in here should be available to all other scripts through an 'import library' statement.
          1.1. Rename li_reg_test_helper to KafkaSystemTestUtils. This has Kafka specific helper APIs. Other util APIs can go in system_test/utils/SystemTestUtils.
          1.2. Can we include the logger in there as well ? Every script should be able to import that.
          1.3. Rename RtLogging to Logger.
          1.4. We shouldn't have to pass around the rtLogger object everywhere. I hope this can be available as a module level static variable. Once I import Logger, I should just be able to use a variable named logger to log statements
          1.5. Rename RegTestEnv to SystemTestEnv. Let's add all useful environment variables here like – kafkaBaseDir, systemTestBaseDir, testSuiteBaseDir, testCaseBaseDir, testCaseLogDir etc
          1.6. replica_basic_test.py currently doesn't have access to the above environment variables which makes it awkward to use.

          2. Rename reg_test* to SystemTestUtils or something like that

          3. Rename suite_replication to replication_test_suite. You might want to name every test suite as *_test_suite so that the scripts can detect that as a test suite.

          4. Rename reg_test_driver to test_driver.

          5. It will be good to follow some Python style guide like this one - http://www.python.org/dev/peps/pep-0008/#descriptive-naming-styles
          5.1. Package names should be all lower case and preferably without underscores, al though sometimes this is difficult to avoid. So something like reg_test_helper could just be utils/testutils.
          5.2. Class names should be CamelCase, with first letter capitalized.
          5.3. Function names should be all lower case with words separated by underscores.
          5.4. Constants should be defined at module level with all letters capitalized and separated with underscores.

          6. A lot of commands need to prepend 'ssh host' to them. How about having a helper API run_at_host_command(host, command) that will do this and return the command string ?

          7. Also, all commands need to specify the host to run on, the path to the script, some arguments to that script and an output file. It will be nice to add a helper API to do that.

          8. It will be good to structure the testcase/logs directory by role-entityid. The reason is that we want to collect metrics and plot graphs for every entity. These will involve generating several csv/svg/png files per entity. Instead of putting them all in one testcase/logs directory, how about having the following structure -
          1. testcase/logs
          2. |__ zookeeper-0
          3. |__ metrics
          4. ||__ dashboards
          5. |__ kafka-1
          6. |__ metrics
          7. |__ dashboards

          9. To make it easier to use the directory structure above, I think we should have access to helper APIs that given the entity id can return the path to the metrics/dashboards directories. This will be used by the APIs that collect metrics and plot graphs.

          10. cluster_config.json describes each entity with a set of properties. It might be easier to define a class called TestEntity that has all these properties. On startup, the test will read cluster_config.json and create a map/list of TestEntity objects. The test scripts should have access to these.

          11. How about having lifecycle management for all testcases in a test suite similar to Junit ? For example, all test cases in one test suite can have a setup and teardown method, where common tasks can be performed. However, you can probably do this as part of another JIRA.

          Show
          Neha Narkhede added a comment - Thanks for patch v2! Overall, it looks like a good start. Here are some review comments - 1. Standardize utils into system_test/utils. Remove lib. The libraries that go in here should be available to all other scripts through an 'import library' statement. 1.1. Rename li_reg_test_helper to KafkaSystemTestUtils. This has Kafka specific helper APIs. Other util APIs can go in system_test/utils/SystemTestUtils. 1.2. Can we include the logger in there as well ? Every script should be able to import that. 1.3. Rename RtLogging to Logger. 1.4. We shouldn't have to pass around the rtLogger object everywhere. I hope this can be available as a module level static variable. Once I import Logger, I should just be able to use a variable named logger to log statements 1.5. Rename RegTestEnv to SystemTestEnv. Let's add all useful environment variables here like – kafkaBaseDir, systemTestBaseDir, testSuiteBaseDir, testCaseBaseDir, testCaseLogDir etc 1.6. replica_basic_test.py currently doesn't have access to the above environment variables which makes it awkward to use. 2. Rename reg_test* to SystemTestUtils or something like that 3. Rename suite_replication to replication_test_suite. You might want to name every test suite as *_test_suite so that the scripts can detect that as a test suite. 4. Rename reg_test_driver to test_driver. 5. It will be good to follow some Python style guide like this one - http://www.python.org/dev/peps/pep-0008/#descriptive-naming-styles 5.1. Package names should be all lower case and preferably without underscores, al though sometimes this is difficult to avoid. So something like reg_test_helper could just be utils/testutils. 5.2. Class names should be CamelCase, with first letter capitalized. 5.3. Function names should be all lower case with words separated by underscores. 5.4. Constants should be defined at module level with all letters capitalized and separated with underscores. 6. A lot of commands need to prepend 'ssh host' to them. How about having a helper API run_at_host_command(host, command) that will do this and return the command string ? 7. Also, all commands need to specify the host to run on, the path to the script, some arguments to that script and an output file. It will be nice to add a helper API to do that. 8. It will be good to structure the testcase/logs directory by role-entityid. The reason is that we want to collect metrics and plot graphs for every entity. These will involve generating several csv/svg/png files per entity. Instead of putting them all in one testcase/logs directory, how about having the following structure - 1. testcase/logs 2. |__ zookeeper-0 3. |__ metrics 4. ||__ dashboards 5. |__ kafka-1 6. |__ metrics 7. |__ dashboards 9. To make it easier to use the directory structure above, I think we should have access to helper APIs that given the entity id can return the path to the metrics/dashboards directories. This will be used by the APIs that collect metrics and plot graphs. 10. cluster_config.json describes each entity with a set of properties. It might be easier to define a class called TestEntity that has all these properties. On startup, the test will read cluster_config.json and create a map/list of TestEntity objects. The test scripts should have access to these. 11. How about having lifecycle management for all testcases in a test suite similar to Junit ? For example, all test cases in one test suite can have a setup and teardown method, where common tasks can be performed. However, you can probably do this as part of another JIRA.
          Hide
          John Fung added a comment - - edited

          Hi Neha,

          Thank you for reviewing. I have uploaded kafka-440-v2.patch which has the changes suggested in your review.

            • Please note this regression test has 2 testcases and both are passing in rev. 1367821

          Thanks,
          John

          Show
          John Fung added a comment - - edited Hi Neha, Thank you for reviewing. I have uploaded kafka-440-v2.patch which has the changes suggested in your review. Please note this regression test has 2 testcases and both are passing in rev. 1367821 Thanks, John
          Hide
          Neha Narkhede added a comment - - edited

          Thanks for the patch, John! Here are few review comments -

          1. Let's move the framework under system_test instead of reg_test.
          2. Use svn propset to set executable properties on the scripts. http://lexfridman.com/blogs/research/2011/02/21/executable-files-in-svn/

          3. Testcase 2
          3.1 The testcase config file has /tmp/jfung as the log directory path. Let's change it to remove jfung from the path.
          3.2 Add some more details to the description in testcase_2. (Similar to the description you have for testcase_1). So, to understand one testcase I shouldn't have to understand another testcase.
          3.3 In "server_to_bounce", I would just expect to say "leader". Right now it says source, target and mirror_maker which are unrelated to replication.
          3.4. What does "partition" stand for ? Do you mean "num_partitions"

          4. Add README to reg_test/system_test that describes the new framework and also set of instructions that people can follow to add new testcases.

          5. It also probably makes sense to add a jmx_port config option to reg_test/cluster_config.json. Right now, the JMX_PORT is disabled, so there is no way to collect monitoring data from the various entities.

          Show
          Neha Narkhede added a comment - - edited Thanks for the patch, John! Here are few review comments - 1. Let's move the framework under system_test instead of reg_test. 2. Use svn propset to set executable properties on the scripts. http://lexfridman.com/blogs/research/2011/02/21/executable-files-in-svn/ 3. Testcase 2 3.1 The testcase config file has /tmp/jfung as the log directory path. Let's change it to remove jfung from the path. 3.2 Add some more details to the description in testcase_2. (Similar to the description you have for testcase_1). So, to understand one testcase I shouldn't have to understand another testcase. 3.3 In "server_to_bounce", I would just expect to say "leader". Right now it says source, target and mirror_maker which are unrelated to replication. 3.4. What does "partition" stand for ? Do you mean "num_partitions" 4. Add README to reg_test/system_test that describes the new framework and also set of instructions that people can follow to add new testcases. 5. It also probably makes sense to add a jmx_port config option to reg_test/cluster_config.json. Right now, the JMX_PORT is disabled, so there is no way to collect monitoring data from the various entities.
          Hide
          John Fung added a comment - - edited

          Uploaded kafka-440-v1.patch for review. After applying the patch, please do the following:

          1. $ chmod u+x reg_test/suite_replication/bin/kafka-run-class.sh
          2. Update reg_test/cluster_config.json for "kafka_home" & "java_home" specific to your environment
          3. To run the test, go to <kafka_home>/reg_test and run the following command:
          $ python -B bin/reg_test_driver.py

          Show
          John Fung added a comment - - edited Uploaded kafka-440-v1.patch for review. After applying the patch, please do the following: 1. $ chmod u+x reg_test/suite_replication/bin/kafka-run-class.sh 2. Update reg_test/cluster_config.json for "kafka_home" & "java_home" specific to your environment 3. To run the test, go to <kafka_home>/reg_test and run the following command: $ python -B bin/reg_test_driver.py

            People

            • Assignee:
              John Fung
              Reporter:
              John Fung
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 360h
                360h
                Remaining:
                Remaining Estimate - 360h
                360h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development