Kafka
  1. Kafka
  2. KAFKA-583

SimpleConsumerShell may receive less data inconsistently

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None

      Issue Links

        Activity

        Hide
        John Fung added a comment -

        This happens inconsistently by executing testcase_0108.

        Test description:
        1. Start Zk and 3-brokers cluster for 3 replicas.
        2. Start producer until it finishes producing data.
        3. Keep zk and brokers running.
        4. Start SimpleConsumerShell to consume data as follows:
        bin/kafka-run-class.sh kafka.tools.SimpleConsumerShell --broker-list localhost:9091,localhost:9092,localhost:9093 --topic test_1 --partition 0 --replica 1 --no-wait-at-logend > replica_data_r1_p0.log
        bin/kafka-run-class.sh kafka.tools.SimpleConsumerShell --broker-list localhost:9091,localhost:9092,localhost:9093 --topic test_1 --partition 0 --replica 1 --no-wait-at-logend > replica_data_r1_p0.log
        . . .

        5. Look for the MessageID in each replica:
        grep MessageID replica_data_r1_p0.log | sed 's/.MessageID://' | sed 's/:.//' | sort -u | wc -l
        grep MessageID replica_data_r1_p1.log | sed 's/.MessageID://' | sed 's/:.//' | sort -u | wc -l
        . . .

        (a shell script is attached for the above shell commands)

        6. The following is the output:
        ./run-simple-consumer.sh

        735
        735
        630

        735
        735
        200

        735
        735
        630

        7. The above numbers are showing that there are only 200 messages consumed in replica 2, partition 2 (3rd partition)

        Show
        John Fung added a comment - This happens inconsistently by executing testcase_0108. Test description: 1. Start Zk and 3-brokers cluster for 3 replicas. 2. Start producer until it finishes producing data. 3. Keep zk and brokers running. 4. Start SimpleConsumerShell to consume data as follows: bin/kafka-run-class.sh kafka.tools.SimpleConsumerShell --broker-list localhost:9091,localhost:9092,localhost:9093 --topic test_1 --partition 0 --replica 1 --no-wait-at-logend > replica_data_r1_p0.log bin/kafka-run-class.sh kafka.tools.SimpleConsumerShell --broker-list localhost:9091,localhost:9092,localhost:9093 --topic test_1 --partition 0 --replica 1 --no-wait-at-logend > replica_data_r1_p0.log . . . 5. Look for the MessageID in each replica: grep MessageID replica_data_r1_p0.log | sed 's/. MessageID://' | sed 's/:. //' | sort -u | wc -l grep MessageID replica_data_r1_p1.log | sed 's/. MessageID://' | sed 's/:. //' | sort -u | wc -l . . . (a shell script is attached for the above shell commands) 6. The following is the output: ./run-simple-consumer.sh 735 735 630 735 735 200 735 735 630 7. The above numbers are showing that there are only 200 messages consumed in replica 2, partition 2 (3rd partition)
        Hide
        John Fung added a comment -

        This issue can be reproduced consistently using the attached broker log segment files and zookeeper data.

        Please do the followings to reproduce:

        1. Check out the latest 0.8 branch

        2. The attached data file "kafka_583_zk_kafka_data.tar.gz" contains the following directories:
        /tmp/zookeeper_0
        /tmp/kafka_server_1_logs
        /tmp/kafka_server_2_logs
        /tmp/kafka_server_3_logs

        If your local "/tmp" directory also contains the above directories, please rename them.

        3. Download "kafka_583_zk_kafka_data.tar.gz" and extract them to your local "/tmp" directory

        4. Download "kafka_583_reproduce_issue.patch" and apply under <kafka_home>:
        patch -p0 -i kafka_583_reproduce_issue.patch

        5. Build kafka as: <kafka_home> $ ./sbt update package

        6. In <kafka_home>, execute "chmod u+x validate_data_and_log_segment.sh"

        7. In <kafka_home>/system_test : execute "python -B system_test_runner.py"

        8. Wait for about 1 min, when the following message is showing on the console:

        =====================================================

            • Sleeping for 30 min ...
              You may now run : <kafka_home>/run_simple_consumer.sh
              =====================================================

        execute this command under <kafka_home>: ./validate_data_and_log_segment.sh

        9. The following will be showing:

        $ ./validate_data_and_log_segment.sh

        Validated by SimpleConsumerShell :
        replica 1 message count:
        735
        735
        630

        replica 2 message count:
        735
        735
        630

        replica 3 message count:
        735
        735
        200

        Validated by DumpLogSegments :
        broker 1 partition 0 messages count : 735
        broker 1 partition 1 messages count : 735
        broker 1 partition 2 messages count : 630

        broker 2 partition 0 messages count : 735
        broker 2 partition 1 messages count : 735
        broker 2 partition 2 messages count : 630

        broker 3 partition 0 messages count : 735
        broker 3 partition 1 messages count : 735
        broker 3 partition 2 messages count : 630

        10. The message count in broker 3 partition 2 is different between SimpleConsumerShell & DumpLogSegments

        11. Please note that if you get the following messages, please do Ctrl-C and re-run the test again:

        Error: replica 1 does not exist for partition (test_1, 0)
        Error: replica 1 does not exist for partition (test_1, 1)
        Error: replica 1 does not exist for partition (test_1, 2)
        Error: replica 2 does not exist for partition (test_1, 0)
        Error: replica 2 does not exist for partition (test_1, 1)
        Error: replica 2 does not exist for partition (test_1, 2)
        Error: replica 3 does not exist for partition (test_1, 0)
        Error: replica 3 does not exist for partition (test_1, 1)
        Error: replica 3 does not exist for partition (test_1, 2)

        Validated by SimpleConsumerShell :
        replica 1 message count:
        0
        0
        0

        replica 2 message count:
        0
        0
        0

        replica 3 message count:
        0
        0
        0

        Show
        John Fung added a comment - This issue can be reproduced consistently using the attached broker log segment files and zookeeper data. Please do the followings to reproduce: 1. Check out the latest 0.8 branch 2. The attached data file "kafka_583_zk_kafka_data.tar.gz" contains the following directories: /tmp/zookeeper_0 /tmp/kafka_server_1_logs /tmp/kafka_server_2_logs /tmp/kafka_server_3_logs If your local "/tmp" directory also contains the above directories, please rename them. 3. Download "kafka_583_zk_kafka_data.tar.gz" and extract them to your local "/tmp" directory 4. Download "kafka_583_reproduce_issue.patch" and apply under <kafka_home>: patch -p0 -i kafka_583_reproduce_issue.patch 5. Build kafka as: <kafka_home> $ ./sbt update package 6. In <kafka_home>, execute "chmod u+x validate_data_and_log_segment.sh" 7. In <kafka_home>/system_test : execute "python -B system_test_runner.py" 8. Wait for about 1 min, when the following message is showing on the console: ===================================================== Sleeping for 30 min ... You may now run : <kafka_home>/run_simple_consumer.sh ===================================================== execute this command under <kafka_home>: ./validate_data_and_log_segment.sh 9. The following will be showing: $ ./validate_data_and_log_segment.sh Validated by SimpleConsumerShell : replica 1 message count: 735 735 630 replica 2 message count: 735 735 630 replica 3 message count: 735 735 200 Validated by DumpLogSegments : broker 1 partition 0 messages count : 735 broker 1 partition 1 messages count : 735 broker 1 partition 2 messages count : 630 broker 2 partition 0 messages count : 735 broker 2 partition 1 messages count : 735 broker 2 partition 2 messages count : 630 broker 3 partition 0 messages count : 735 broker 3 partition 1 messages count : 735 broker 3 partition 2 messages count : 630 10. The message count in broker 3 partition 2 is different between SimpleConsumerShell & DumpLogSegments 11. Please note that if you get the following messages, please do Ctrl-C and re-run the test again: Error: replica 1 does not exist for partition (test_1, 0) Error: replica 1 does not exist for partition (test_1, 1) Error: replica 1 does not exist for partition (test_1, 2) Error: replica 2 does not exist for partition (test_1, 0) Error: replica 2 does not exist for partition (test_1, 1) Error: replica 2 does not exist for partition (test_1, 2) Error: replica 3 does not exist for partition (test_1, 0) Error: replica 3 does not exist for partition (test_1, 1) Error: replica 3 does not exist for partition (test_1, 2) Validated by SimpleConsumerShell : replica 1 message count: 0 0 0 replica 2 message count: 0 0 0 replica 3 message count: 0 0 0

          People

          • Assignee:
            Unassigned
            Reporter:
            John Fung
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development