Kafka
  1. Kafka
  2. KAFKA-603

System Test Data Validation Failure - Replication Factor less than No. of Broker

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Activity

      Hide
      John Fung added a comment - - edited

      This test case (testcase_4011) is set up as followings:
      1. Set up a 3-broker cluster with 2 topics, 2 partitions, Replication Factor 2
      2. Producer to produce messages in Sync mode, Acks = -1, Compression Off
      3. The expected results would be to see 3 Simple Consumer message count: one of them is zero, two of them are equal.
      4. The results shown below is that all ConsoleConsumer messages are matching. However, simple consumer are showing unexpected messages count

      To reproduce the issue:
      1. Download the latest 0.8 branch
      2. Apply the attached patch and build
      3. In <kafka_home>/system_test, execute: python –B system_test_runner.py

      validation_status :
      Unique messages from consumer on [test_1] : 1000
      Unique messages from consumer on [test_1] at simple_consumer_1.log : 1000
      Unique messages from consumer on [test_1] at simple_consumer_2.log : 500
      Unique messages from consumer on [test_1] at simple_consumer_3.log : 500
      Unique messages from consumer on [test_2] : 1000
      Unique messages from consumer on [test_2] at simple_consumer_1.log : 1000
      Unique messages from consumer on [test_2] at simple_consumer_2.log : 500
      Unique messages from consumer on [test_2] at simple_consumer_3.log : 505
      Unique messages from producer on [test_1] : 1000
      Unique messages from producer on [test_2] : 1000
      Validate for data matched on topic [test_1] : PASSED
      Validate for data matched on topic [test_1] across replicas : FAILED
      Validate for data matched on topic [test_2] : PASSED
      Validate for data matched on topic [test_2] across replicas : FAILED

      Broker Log Segments:

      rw-rr- 1 jfung eng 102834 Nov 7 16:39 kafka_server_1_logs/test_1-0/00000000000000000000.log
      rw-rr- 1 jfung eng 102834 Nov 7 16:40 kafka_server_1_logs/test_1-0/00000000000000000197.log
      rw-rr- 1 jfung eng 55332 Nov 7 16:40 kafka_server_1_logs/test_1-0/00000000000000000394.log

      rw-rr- 1 jfung eng 102834 Nov 7 16:39 kafka_server_1_logs/test_1-1/00000000000000000000.log
      rw-rr- 1 jfung eng 102834 Nov 7 16:40 kafka_server_1_logs/test_1-1/00000000000000000197.log
      rw-rr- 1 jfung eng 55332 Nov 7 16:40 kafka_server_1_logs/test_1-1/00000000000000000394.log

      rw-rr- 1 jfung eng 102834 Nov 7 16:39 kafka_server_1_logs/test_2-0/00000000000000000000.log
      rw-rr- 1 jfung eng 102834 Nov 7 16:40 kafka_server_1_logs/test_2-0/00000000000000000197.log
      rw-rr- 1 jfung eng 57942 Nov 7 16:40 kafka_server_1_logs/test_2-0/00000000000000000394.log

      rw-rr- 1 jfung eng 102834 Nov 7 16:39 kafka_server_1_logs/test_2-1/00000000000000000000.log
      rw-rr- 1 jfung eng 102834 Nov 7 16:40 kafka_server_1_logs/test_2-1/00000000000000000197.log
      rw-rr- 1 jfung eng 55332 Nov 7 16:40 kafka_server_1_logs/test_2-1/00000000000000000394.log

      ======================================================

      rw-rr- 1 jfung eng 102834 Nov 7 16:39 kafka_server_2_logs/test_1-1/00000000000000000000.log
      rw-rr- 1 jfung eng 102834 Nov 7 16:40 kafka_server_2_logs/test_1-1/00000000000000000197.log
      rw-rr- 1 jfung eng 55332 Nov 7 16:40 kafka_server_2_logs/test_1-1/00000000000000000394.log

      rw-rr- 1 jfung eng 102834 Nov 7 16:39 kafka_server_2_logs/test_2-1/00000000000000000000.log
      rw-rr- 1 jfung eng 102834 Nov 7 16:40 kafka_server_2_logs/test_2-1/00000000000000000197.log
      rw-rr- 1 jfung eng 55332 Nov 7 16:40 kafka_server_2_logs/test_2-1/00000000000000000394.log

      ======================================================

      rw-rr- 1 jfung eng 102834 Nov 7 16:39 kafka_server_3_logs/test_1-0/00000000000000000000.log
      rw-rr- 1 jfung eng 102834 Nov 7 16:40 kafka_server_3_logs/test_1-0/00000000000000000197.log
      rw-rr- 1 jfung eng 55332 Nov 7 16:40 kafka_server_3_logs/test_1-0/00000000000000000394.log

      rw-rr- 1 jfung eng 102834 Nov 7 16:39 kafka_server_3_logs/test_2-0/00000000000000000000.log
      rw-rr- 1 jfung eng 102834 Nov 7 16:40 kafka_server_3_logs/test_2-0/00000000000000000197.log
      rw-rr- 1 jfung eng 57942 Nov 7 16:40 kafka_server_3_logs/test_2-0/00000000000000000394.log

      Show
      John Fung added a comment - - edited This test case (testcase_4011) is set up as followings: 1. Set up a 3-broker cluster with 2 topics, 2 partitions, Replication Factor 2 2. Producer to produce messages in Sync mode, Acks = -1, Compression Off 3. The expected results would be to see 3 Simple Consumer message count: one of them is zero, two of them are equal. 4. The results shown below is that all ConsoleConsumer messages are matching. However, simple consumer are showing unexpected messages count To reproduce the issue: 1. Download the latest 0.8 branch 2. Apply the attached patch and build 3. In <kafka_home>/system_test, execute: python –B system_test_runner.py validation_status : Unique messages from consumer on [test_1] : 1000 Unique messages from consumer on [test_1] at simple_consumer_1.log : 1000 Unique messages from consumer on [test_1] at simple_consumer_2.log : 500 Unique messages from consumer on [test_1] at simple_consumer_3.log : 500 Unique messages from consumer on [test_2] : 1000 Unique messages from consumer on [test_2] at simple_consumer_1.log : 1000 Unique messages from consumer on [test_2] at simple_consumer_2.log : 500 Unique messages from consumer on [test_2] at simple_consumer_3.log : 505 Unique messages from producer on [test_1] : 1000 Unique messages from producer on [test_2] : 1000 Validate for data matched on topic [test_1] : PASSED Validate for data matched on topic [test_1] across replicas : FAILED Validate for data matched on topic [test_2] : PASSED Validate for data matched on topic [test_2] across replicas : FAILED Broker Log Segments: rw-r r - 1 jfung eng 102834 Nov 7 16:39 kafka_server_1_logs/test_1-0/00000000000000000000.log rw-r r - 1 jfung eng 102834 Nov 7 16:40 kafka_server_1_logs/test_1-0/00000000000000000197.log rw-r r - 1 jfung eng 55332 Nov 7 16:40 kafka_server_1_logs/test_1-0/00000000000000000394.log rw-r r - 1 jfung eng 102834 Nov 7 16:39 kafka_server_1_logs/test_1-1/00000000000000000000.log rw-r r - 1 jfung eng 102834 Nov 7 16:40 kafka_server_1_logs/test_1-1/00000000000000000197.log rw-r r - 1 jfung eng 55332 Nov 7 16:40 kafka_server_1_logs/test_1-1/00000000000000000394.log rw-r r - 1 jfung eng 102834 Nov 7 16:39 kafka_server_1_logs/test_2-0/00000000000000000000.log rw-r r - 1 jfung eng 102834 Nov 7 16:40 kafka_server_1_logs/test_2-0/00000000000000000197.log rw-r r - 1 jfung eng 57942 Nov 7 16:40 kafka_server_1_logs/test_2-0/00000000000000000394.log rw-r r - 1 jfung eng 102834 Nov 7 16:39 kafka_server_1_logs/test_2-1/00000000000000000000.log rw-r r - 1 jfung eng 102834 Nov 7 16:40 kafka_server_1_logs/test_2-1/00000000000000000197.log rw-r r - 1 jfung eng 55332 Nov 7 16:40 kafka_server_1_logs/test_2-1/00000000000000000394.log ====================================================== rw-r r - 1 jfung eng 102834 Nov 7 16:39 kafka_server_2_logs/test_1-1/00000000000000000000.log rw-r r - 1 jfung eng 102834 Nov 7 16:40 kafka_server_2_logs/test_1-1/00000000000000000197.log rw-r r - 1 jfung eng 55332 Nov 7 16:40 kafka_server_2_logs/test_1-1/00000000000000000394.log rw-r r - 1 jfung eng 102834 Nov 7 16:39 kafka_server_2_logs/test_2-1/00000000000000000000.log rw-r r - 1 jfung eng 102834 Nov 7 16:40 kafka_server_2_logs/test_2-1/00000000000000000197.log rw-r r - 1 jfung eng 55332 Nov 7 16:40 kafka_server_2_logs/test_2-1/00000000000000000394.log ====================================================== rw-r r - 1 jfung eng 102834 Nov 7 16:39 kafka_server_3_logs/test_1-0/00000000000000000000.log rw-r r - 1 jfung eng 102834 Nov 7 16:40 kafka_server_3_logs/test_1-0/00000000000000000197.log rw-r r - 1 jfung eng 55332 Nov 7 16:40 kafka_server_3_logs/test_1-0/00000000000000000394.log rw-r r - 1 jfung eng 102834 Nov 7 16:39 kafka_server_3_logs/test_2-0/00000000000000000000.log rw-r r - 1 jfung eng 102834 Nov 7 16:40 kafka_server_3_logs/test_2-0/00000000000000000197.log rw-r r - 1 jfung eng 57942 Nov 7 16:40 kafka_server_3_logs/test_2-0/00000000000000000394.log
      Hide
      John Fung added a comment -

      ConsoleConsumer command:

      2012-11-07 20:02:00,410 - DEBUG - executing command: [ssh localhost 'JAVA_HOME=/export/apps/jdk/JDK-1_6_0_27 JMX_PORT=9999 /home/jfung/workspace_kafka/kafka_r1406911_603_reproduce_issue/bin/kafka-run-class.sh kafka.consumer.ConsoleConsumer --zookeeper localhost:2188 --topic test_1 --consumer-timeout-ms 60000 --csv-reporter-enabled --metrics-dir /home/jfung/workspace_kafka/kafka_r1406911_603_reproduce_issue/system_test/log_retention_testsuite/testcase_4011/logs/console_consumer-6/metrics --from-beginning >> /home/jfung/workspace_kafka/kafka_r1406911_603_reproduce_issue/system_test/log_retention_testsuite/testcase_4011/logs/console_consumer-6/console_consumer.log & echo pid:$! > /home/jfung/workspace_kafka/kafka_r1406911_603_reproduce_issue/system_test/log_retention_testsuite/testcase_4011/logs/console_consumer-6/entity_6_pid'] (kafka_system_test_utils)

      SimpleConsumerShell command:

      2012-11-07 20:04:17,444 - DEBUG - executing command: [ssh localhost 'JAVA_HOME=/export/apps/jdk/JDK-1_6_0_27 /home/jfung/workspace_kafka/kafka_r1406911_603_reproduce_issue/bin/kafka-run-class.sh kafka.tools.SimpleConsumerShell --broker-list localhost:9091,localhost:9092,localhost:9093 --topic test_1 --partition 0 --replica 1 --offset 0 --no-wait-at-logend >> /home/jfung/workspace_kafka/kafka_r1406911_603_reproduce_issue/system_test/log_retention_testsuite/testcase_4011/logs/console_consumer-6/simple_consumer_1.log & echo pid:$! > /home/jfung/workspace_kafka/kafka_r1406911_603_reproduce_issue/system_test/log_retention_testsuite/testcase_4011/logs/console_consumer-6/entity_6_pid'] (kafka_system_test_utils)

      Show
      John Fung added a comment - ConsoleConsumer command: 2012-11-07 20:02:00,410 - DEBUG - executing command: [ssh localhost 'JAVA_HOME=/export/apps/jdk/JDK-1_6_0_27 JMX_PORT=9999 /home/jfung/workspace_kafka/kafka_r1406911_603_reproduce_issue/bin/kafka-run-class.sh kafka.consumer.ConsoleConsumer --zookeeper localhost:2188 --topic test_1 --consumer-timeout-ms 60000 --csv-reporter-enabled --metrics-dir /home/jfung/workspace_kafka/kafka_r1406911_603_reproduce_issue/system_test/log_retention_testsuite/testcase_4011/logs/console_consumer-6/metrics --from-beginning >> /home/jfung/workspace_kafka/kafka_r1406911_603_reproduce_issue/system_test/log_retention_testsuite/testcase_4011/logs/console_consumer-6/console_consumer.log & echo pid:$! > /home/jfung/workspace_kafka/kafka_r1406911_603_reproduce_issue/system_test/log_retention_testsuite/testcase_4011/logs/console_consumer-6/entity_6_pid'] (kafka_system_test_utils) SimpleConsumerShell command: 2012-11-07 20:04:17,444 - DEBUG - executing command: [ssh localhost 'JAVA_HOME=/export/apps/jdk/JDK-1_6_0_27 /home/jfung/workspace_kafka/kafka_r1406911_603_reproduce_issue/bin/kafka-run-class.sh kafka.tools.SimpleConsumerShell --broker-list localhost:9091,localhost:9092,localhost:9093 --topic test_1 --partition 0 --replica 1 --offset 0 --no-wait-at-logend >> /home/jfung/workspace_kafka/kafka_r1406911_603_reproduce_issue/system_test/log_retention_testsuite/testcase_4011/logs/console_consumer-6/simple_consumer_1.log & echo pid:$! > /home/jfung/workspace_kafka/kafka_r1406911_603_reproduce_issue/system_test/log_retention_testsuite/testcase_4011/logs/console_consumer-6/entity_6_pid'] (kafka_system_test_utils)
      Hide
      Joel Koshy added a comment -

      I think this is a non-issue, since the per-partition breakdown of consumed messages revealed that each partition ends up on exactly two replicas.

      Show
      Joel Koshy added a comment - I think this is a non-issue, since the per-partition breakdown of consumed messages revealed that each partition ends up on exactly two replicas.
      Hide
      John Fung added a comment -

      Thanks Joel for looking into this. The validation approach is modified to break down the messages received per partition as shown below:

      _test_case_name : testcase_4011
      _test_class_name : ReplicaBasicTest
      arg : bounce_broker : true
      arg : broker_down_time_in_sec : 5
      arg : broker_type : leader
      arg : log_retention_test : true
      arg : message_producing_free_time_sec : 15
      arg : num_iteration : 1
      arg : num_partition : 2
      arg : replica_factor : 2
      arg : sleep_seconds_between_producer_calls : 1
      validation_status :
      Leader Election Latency - iter 1 brokerid 2 : 325.00 ms
      Leader Election Latency MAX : 325.00
      Leader Election Latency MIN : 325.00
      Unique messages from consumer on [test_1] : 4000
      Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r1.log : 2000
      Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r2.log : 2000
      Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r3.log : 0
      Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r1.log : 0
      Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r2.log : 2000
      Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r3.log : 2000
      Unique messages from consumer on [test_2] : 3500
      Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r1.log : 1755
      Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r2.log : 1755
      Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r3.log : 0
      Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r1.log : 0
      Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r2.log : 1745
      Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r3.log : 1745
      Unique messages from producer on [test_1] : 4000
      Unique messages from producer on [test_2] : 3500
      Validate for data matched on topic [test_1] : PASSED
      Validate for data matched on topic [test_1] across replicas : PASSED
      Validate for data matched on topic [test_2] : PASSED
      Validate for data matched on topic [test_2] across replicas : PASSED
      Validate leader election successful : PASSED

      This ticket can be closed now

      Show
      John Fung added a comment - Thanks Joel for looking into this. The validation approach is modified to break down the messages received per partition as shown below: _test_case_name : testcase_4011 _test_class_name : ReplicaBasicTest arg : bounce_broker : true arg : broker_down_time_in_sec : 5 arg : broker_type : leader arg : log_retention_test : true arg : message_producing_free_time_sec : 15 arg : num_iteration : 1 arg : num_partition : 2 arg : replica_factor : 2 arg : sleep_seconds_between_producer_calls : 1 validation_status : Leader Election Latency - iter 1 brokerid 2 : 325.00 ms Leader Election Latency MAX : 325.00 Leader Election Latency MIN : 325.00 Unique messages from consumer on [test_1] : 4000 Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r1.log : 2000 Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r2.log : 2000 Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r3.log : 0 Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r1.log : 0 Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r2.log : 2000 Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r3.log : 2000 Unique messages from consumer on [test_2] : 3500 Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r1.log : 1755 Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r2.log : 1755 Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r3.log : 0 Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r1.log : 0 Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r2.log : 1745 Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r3.log : 1745 Unique messages from producer on [test_1] : 4000 Unique messages from producer on [test_2] : 3500 Validate for data matched on topic [test_1] : PASSED Validate for data matched on topic [test_1] across replicas : PASSED Validate for data matched on topic [test_2] : PASSED Validate for data matched on topic [test_2] across replicas : PASSED Validate leader election successful : PASSED This ticket can be closed now

        People

        • Assignee:
          Unassigned
          Reporter:
          John Fung
        • Votes:
          0 Vote for this issue
          Watchers:
          2 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development