Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-8688

Upgrade system tests fail due to data loss with older message format

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.4.0
    • system tests
    • None

    Description

      System test failure for TestUpgrade/test_upgrade: from_kafka_version=0.9.0.1, to_message_format_version=0.9.0.1, compression_types=.lz4

      3 acked message did not make it to the Consumer. They are: [33906, 33900, 33903]. The first 3 missing messages were validated to ensure they are in Kafka's data files. 3 were missing. This suggests data loss. Here are some of the messages not found in the data files: [33906, 33900, 33903]
      
      Traceback (most recent call last):
        File "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py", line 132, in run
          data = self.run_test()
        File "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py", line 189, in run_test
          return self.test_context.function(self.test)
        File "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/mark/_mark.py", line 428, in wrapper
          return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
        File "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/core/upgrade_test.py", line 136, in test_upgrade
          self.run_produce_consume_validate(core_test_action=lambda: self.perform_upgrade(from_kafka_version,
        File "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 112, in run_produce_consume_validate
          self.validate()
        File "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 135, in validate
          assert succeeded, error_msg
      AssertionError: 3 acked message did not make it to the Consumer. They are: [33906, 33900, 33903]. The first 3 missing messages were validated to ensure they are in Kafka's data files. 3 were missing. This suggests data loss. Here are some of the messages not found in the data files: [33906, 33900, 33903]
      

      Logs show:

      1. Broker 1 is leader of partition
      2. Broker 2 successfully fetches from offset 10947 and processes request
      3. Broker 2 sends fetch request to broker 1 for offset 10950
      4. Broker 1 sets is HW to 10950, acknowledges produce requests up to HW
      5. Broker 2 is elected leader
      6. Broker 2 truncates to its local HW of 10947 - 3 messages are lost

      This data loss is a known issue that was fixed under KIP-101. But since this can still happen with older messages formats, we should update upgrade tests to cope with some data loss.
       

      Attachments

        Activity

          People

            rsivaram Rajini Sivaram
            rsivaram Rajini Sivaram
            Ismael Juma Ismael Juma
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: