Kafka
  1. Kafka
  2. KAFKA-791

Fix validation bugs in System Test

    Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: None

      Description

      The following issues are found in data / log checksum match in System Test:

      1. kafka_system_test_utils.validate_simple_consumer_data_matched
      It reports PASSED even some log segments don't match

      2. kafka_system_test_utils.validate_data_matched (this is fixed and patched in local Hudson for some time)
      It reports PASSED in the Ack=1 cases even data loss is greater than the tolerance (1%).

      3. kafka_system_test_utils.validate_simple_consumer_data_matched
      It gets a unique set of MessageID to validate. It should leave all MessageID as is (no dedup needed) and the test case should fail if sorted MessageID don't match across the replicas.

      4. There is a data loss tolerance of 1% in the test cases of Ack=1. Currently 1% is too strict and seeing some random failures due to 2 ~ 3% of data loss. It will be increased to 5% such that the System Test will get a more consistent passing rate in those test cases. The following will be updated to 5% tolerance in kafka_system_test_utils:
      validate_data_matched
      validate_simple_consumer_data_matched
      validate_data_matched_in_multi_topics_from_single_consumer_producer

      1. kafka-791-v1.patch
        13 kB
        John Fung
      2. kafka-791-v2.patch
        26 kB
        John Fung
      3. kafka-791-v3.patch
        32 kB
        John Fung
      4. kafka-791-v4.patch
        33 kB
        John Fung

        Activity

        John Fung created issue -
        John Fung made changes -
        Field Original Value New Value
        Issue Type Bug [ 1 ] Task [ 3 ]
        John Fung made changes -
        Description The following issues are found in data / log checksum match in System Test:

        1. kafka_system_test_utils.validate_simple_consumer_data_matched
        It reports PASSED even some log segments don't match

        2. kafka_system_test_utils.validate_data_matched
        It reports PASSED in the Ack=1 cases even data loss is greater than the tolerance (1%).

        3. kafka_system_test_utils.validate_simple_consumer_data_matched
        It gets a unique set of MessageID to validate. It should leave all MessageID as is and the test case should fail if duplicates are detected.
        The following issues are found in data / log checksum match in System Test:

        1. kafka_system_test_utils.validate_simple_consumer_data_matched
        It reports PASSED even some log segments don't match

        2. kafka_system_test_utils.validate_data_matched (this is fixed and patched in local Hudson for some time)
        It reports PASSED in the Ack=1 cases even data loss is greater than the tolerance (1%).

        3. kafka_system_test_utils.validate_simple_consumer_data_matched
        It gets a unique set of MessageID to validate. It should leave all MessageID as is and the test case should fail if duplicates are detected.
        John Fung made changes -
        Description The following issues are found in data / log checksum match in System Test:

        1. kafka_system_test_utils.validate_simple_consumer_data_matched
        It reports PASSED even some log segments don't match

        2. kafka_system_test_utils.validate_data_matched (this is fixed and patched in local Hudson for some time)
        It reports PASSED in the Ack=1 cases even data loss is greater than the tolerance (1%).

        3. kafka_system_test_utils.validate_simple_consumer_data_matched
        It gets a unique set of MessageID to validate. It should leave all MessageID as is and the test case should fail if duplicates are detected.
        The following issues are found in data / log checksum match in System Test:

        1. kafka_system_test_utils.validate_simple_consumer_data_matched
        It reports PASSED even some log segments don't match

        2. kafka_system_test_utils.validate_data_matched (this is fixed and patched in local Hudson for some time)
        It reports PASSED in the Ack=1 cases even data loss is greater than the tolerance (1%).

        3. kafka_system_test_utils.validate_simple_consumer_data_matched
        It gets a unique set of MessageID to validate. It should leave all MessageID as is and the test case should fail if duplicates are detected.

        4. There is a data loss tolerance of 1% in the test cases of Ack=1. Currently 1% is too strict and seeing some random failures due to 2 ~ 3% of data loss. It will be increased to 5% such that the System Test will get a more consistent passing rate in those test cases. The following will be updated to 5% tolerance in kafka_system_test_utils:
        validate_data_matched
        validate_simple_consumer_data_matched
        validate_data_matched_in_multi_topics_from_single_consumer_producer
        Hide
        John Fung added a comment -

        Attached kafka-791-v1.patch

        Show
        John Fung added a comment - Attached kafka-791-v1.patch
        John Fung made changes -
        Attachment kafka-791-v1.patch [ 12572781 ]
        John Fung made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        John Fung made changes -
        Description The following issues are found in data / log checksum match in System Test:

        1. kafka_system_test_utils.validate_simple_consumer_data_matched
        It reports PASSED even some log segments don't match

        2. kafka_system_test_utils.validate_data_matched (this is fixed and patched in local Hudson for some time)
        It reports PASSED in the Ack=1 cases even data loss is greater than the tolerance (1%).

        3. kafka_system_test_utils.validate_simple_consumer_data_matched
        It gets a unique set of MessageID to validate. It should leave all MessageID as is and the test case should fail if duplicates are detected.

        4. There is a data loss tolerance of 1% in the test cases of Ack=1. Currently 1% is too strict and seeing some random failures due to 2 ~ 3% of data loss. It will be increased to 5% such that the System Test will get a more consistent passing rate in those test cases. The following will be updated to 5% tolerance in kafka_system_test_utils:
        validate_data_matched
        validate_simple_consumer_data_matched
        validate_data_matched_in_multi_topics_from_single_consumer_producer
        The following issues are found in data / log checksum match in System Test:

        1. kafka_system_test_utils.validate_simple_consumer_data_matched
        It reports PASSED even some log segments don't match

        2. kafka_system_test_utils.validate_data_matched (this is fixed and patched in local Hudson for some time)
        It reports PASSED in the Ack=1 cases even data loss is greater than the tolerance (1%).

        3. kafka_system_test_utils.validate_simple_consumer_data_matched
        It gets a unique set of MessageID to validate. It should leave all MessageID as is (no dedup needed) and the test case should fail if sorted MessageID don't match across the replicas.

        4. There is a data loss tolerance of 1% in the test cases of Ack=1. Currently 1% is too strict and seeing some random failures due to 2 ~ 3% of data loss. It will be increased to 5% such that the System Test will get a more consistent passing rate in those test cases. The following will be updated to 5% tolerance in kafka_system_test_utils:
        validate_data_matched
        validate_simple_consumer_data_matched
        validate_data_matched_in_multi_topics_from_single_consumer_producer
        John Fung made changes -
        Attachment kafka-791-v2.patch [ 12573575 ]
        Hide
        John Fung added a comment -

        Attached kafka-791-v2.patch with additional changes:

        kafka_system_test_utils.validate_simple_consumer_data_matched_across_replicas supports data validation in these scenarios :

        • multiple consumer entities
        • multiple topics in a single topic string separated by comma
        Show
        John Fung added a comment - Attached kafka-791-v2.patch with additional changes: kafka_system_test_utils.validate_simple_consumer_data_matched_across_replicas supports data validation in these scenarios : multiple consumer entities multiple topics in a single topic string separated by comma
        John Fung made changes -
        Attachment kafka-791-v2.patch [ 12573575 ]
        John Fung made changes -
        Attachment kafka-791-v2.patch [ 12573578 ]
        John Fung made changes -
        Attachment kafka-791-v3.patch [ 12574166 ]
        Hide
        John Fung added a comment -

        Uploaded kafka-791-v3.patch with additional changes.

        The following validation functions :

        • validate_data_matched
        • validate_simple_consumer_data_matched
        • validate_simple_consumer_data_matched_across_replicas
        • validate_data_matched_in_multi_topics_from_single_consumer_producer

        are modified with these common behaviors :

        • producer MessageID list is always converted to a set (deduped)
        • consumer MessageID list is left as is
        • data loss / mismatch are compared by removing consumer MessageID from producer MessageID set
        • any duplicates in consumer MessageID will be treated as failure
        • Ack=1 test case data loss failure threshold is set to 5%
        • ordering of MessageID is not validated

        For the function validate_simple_consumer_data_matched_across_replicas :

        • compare each list (no dedupe) of consumer MessageID associated with its topic-partition in each replica
        • any MessageID mismatch in a certain topic-partition between replicas is reported as failure
        Show
        John Fung added a comment - Uploaded kafka-791-v3.patch with additional changes. The following validation functions : validate_data_matched validate_simple_consumer_data_matched validate_simple_consumer_data_matched_across_replicas validate_data_matched_in_multi_topics_from_single_consumer_producer are modified with these common behaviors : producer MessageID list is always converted to a set (deduped) consumer MessageID list is left as is data loss / mismatch are compared by removing consumer MessageID from producer MessageID set any duplicates in consumer MessageID will be treated as failure Ack=1 test case data loss failure threshold is set to 5% ordering of MessageID is not validated For the function validate_simple_consumer_data_matched_across_replicas : compare each list (no dedupe) of consumer MessageID associated with its topic-partition in each replica any MessageID mismatch in a certain topic-partition between replicas is reported as failure
        Hide
        John Fung added a comment -

        Uploaded kafka-791-v4.patch with the following changes:

        1. Added system_test_utils.diff_list to compare if 2 lists are identical
        2. validate_simple_consumer_data_matched_across_replicas will use diff_list to make sure all messages received are in the same order and identical
        3. Removed function "validate_simple_consumer_data_matched" from kafka_system_test_utils.py for clean up.

        Show
        John Fung added a comment - Uploaded kafka-791-v4.patch with the following changes: 1. Added system_test_utils.diff_list to compare if 2 lists are identical 2. validate_simple_consumer_data_matched_across_replicas will use diff_list to make sure all messages received are in the same order and identical 3. Removed function "validate_simple_consumer_data_matched" from kafka_system_test_utils.py for clean up.
        John Fung made changes -
        Attachment kafka-791-v4.patch [ 12575357 ]
        Hide
        Jun Rao added a comment -

        Thanks for patch v4. Committed to 0.8.

        Show
        Jun Rao added a comment - Thanks for patch v4. Committed to 0.8.
        Jun Rao made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 0.8 [ 12317244 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            John Fung
            Reporter:
            John Fung
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development