Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-590

System Test - 4 cases failed due to insufficient no. of retry in ProducerPerformance

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      1. Functional Test Area : Replication with Leader Hard Failure (1 Topic, 3 Partitions)

      2. Testcases failed :

      0151 (Sync Producer, Acks = -1, No Compression)
      0152 (Async Producer, Acks = -1, No Compression)
      0155 (Sync Producer, Acks = -1, Compressed)
      0156 (Async Producer, Acks = -1, Compressed)

      3. Sample test results :

      2012-10-25 18:22:20,206 - INFO - ======================================================
      2012-10-25 18:22:20,206 - INFO - validating data matched
      2012-10-25 18:22:20,206 - INFO - ======================================================
      2012-10-25 18:22:20,206 - DEBUG - request-num-acks [-1] (kafka_system_test_utils)
      2012-10-25 18:22:20,228 - INFO - no. of unique messages on topic [test_1] sent from publisher : 900 (kafka_system_test_utils)
      2012-10-25 18:22:20,235 - INFO - no. of unique messages on topic [test_1] at simple_consumer_1.log : 853 (kafka_system_test_utils)
      2012-10-25 18:22:20,242 - INFO - no. of unique messages on topic [test_1] at simple_consumer_2.log : 853 (kafka_system_test_utils)
      2012-10-25 18:22:20,247 - INFO - no. of unique messages on topic [test_1] at simple_consumer_3.log : 853 (kafka_system_test_utils)

      4. Investigations :

      a. Merge log segment files per partition:
      Under test_1351181987/testcase_0151/logs/broker-1/kafka_server_1_logs:
      cat test_1-0/00000000000000000000.log >> merged_test_1_0/00000000000000000000.log
      cat test_1-0/00000000000000000197.log >> merged_test_1_0/00000000000000000000.log
      . . .

      b. Retrieve all CRC from merged data log segment:
      bin/kafka-run-class.sh kafka.tools.DumpLogSegments merged_test_1_0/00000000000000000000.log | grep crc | sed 's/.* crc: //' | sort -u > test_1_0_crc.log
      . . .

      c. Merge the CRC files together:
      cat test_1_0_crc.log >> all_crc.log
      cat test_1_1_crc.log >> all_crc.log
      cat test_1_2_crc.log >> all_crc.log

      d. Sort the merged CRC file:
      cat all_crc.log | sort -u > all_crc_sorted.log

      e. Get the no. of 'failed to send' CRC in producer_performance.log (70 in this case):
      grep 'failed to send' producer_performance.log | sed 's/.* crc = //' | sed 's/, key = null.*//' | sort -u | wc -l
      70

      f. Match those 'failed to send' CRC from producer_performance.log to see how many messages eventually got retried to send successfully:

      $ for i in `grep 'failed to send' ../../producer_performance-4/producer_performance.log | sed 's/.* crc = //' | sed 's/, key = null.*//' | sort -u`; do echo -n "$i => "; grep $i all_crc_sorted.log || echo "n/a"; done;
      . . .
      1302684126 => n/a
      1456125554 => 1456125554
      15299643 => n/a
      1653550869 => 1653550869
      1741661084 => n/a
      1764395211 => 1764395211
      . . .
      (23 msgs are sent successfully in retry)

      g. As a result, (70 messages 'failed to send' in producer_performance.log - 23 messages successfully sent in retry) = 47 messages are lost (which matches the data loss count in the test result)

      Therefore, if the no. of retry is increased to a higher value, all the messages could be sent successfully.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jfung John Fung
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: